Choosing an Agentic AI Framework: A Practical Guide for Engineering Leaders

Guest Post by Sandeep Seshadri, Executive Vice President of Engineering @ Kasasa November 30, 2025

Feb 14, 2026

Something we’ve been excited to start: guest posts from practitioners we respect.

Between this Substack and 100k+ followers on LinkedIn, our community has a real audience. When someone in our network ships something worth writing about, we want to give that work a bigger stage. Not a favor, not a vanity feature. A real piece of writing from someone who built something and has opinions about what they learned.

If you’re reading this, you probably work on AI systems or lead teams that do. You know how rare it is to find writing that’s specific enough to be useful. Most framework comparisons read like product brochures. Most “lessons learned” posts skip the parts that actually hurt. The bar for guest posts here is simple: did you build something real, and can you write about it in a way that helps someone else build theirs?

So if you’ve done the work (shipped an agent to production, designed an eval pipeline that actually caught failures, migrated a system and lived to tell the story), we want to hear from you.

First up: Sandeep Seshadri, EVP of Engineering at Kasasa, breaks down how he evaluated LangGraph, Spring AI, and Strands for his team, and what he’d tell another engineering leader making the same call today.

Over the past year, many engineering teams, including ours, focused their AI efforts on agentic systems. And the first question everyone asks is the same: which framework should we pick?

This isn’t another exhaustive comparison of 20 frameworks. This is a decision framework for those who need to make the call, get their team moving, and ship. I’ve spent months evaluating LangGraph, Spring AI, and Strands across real production criteria. Here’s what I learned.

The Three Frameworks, at a Glance

LangGraph gives you maximum control through explicit graph-based orchestration. You define every node, edge, and state transition yourself. It’s the most mature option (21.5K GitHub stars), runs on Python and JavaScript, and has the largest community by a wide margin. Choose it when you need precise control and have strong Python skills.

Spring AI Core takes a “workflows over agents” philosophy, built for enterprise Java teams. It’s backed by VMware/Broadcom, has 7.3K GitHub stars, and ships with 5 explicit workflow orchestration patterns (Chain, Parallelization, Routing, Orchestrator-Workers, Evaluator-Optimizer). If your team lives in Java and Spring Boot, this is the natural fit.

Strands offers what it calls “progressive complexity”: start simple with autonomous agent loops, graduate to explicit graphs when you need more control. It’s AWS-backed (4.1K stars), Python 3.10+ only, and optimized for Bedrock. It’s the only framework that natively supports both autonomous reasoning and explicit graphs in a single architecture.

Other frameworks exist, including CrewAI, AutoGen, Semantic Kernel, and OpenAI Swarm. The evaluation criteria throughout this post apply to all of them.

Framework Choice Is About 20% of Your Project’s Success

It matters when you get it wrong. A wrong abstraction means your team fights the framework instead of building features. A steep learning curve delays delivery by weeks. Poor tooling slows debugging. And if the framework dies or breaks backward compatibility every release, you’re looking at a rewrite.

Production readiness counts too. If you can’t observe what your agent is doing in production, you’re flying blind. Community size determines how fast you get unstuck at 2am. Enterprise support matters if you’re in a regulated industry.

It matters less than most people assume. Agent concepts are framework-agnostic. ReAct, Chain-of-Thought, and tool use patterns are universal. Good engineers adapt quickly. This space is roughly 18 months old, and new frameworks launch monthly. Today’s winner may not be tomorrow’s.

The real question isn’t “which framework is best?” but “which framework can I ship with fastest while maintaining the flexibility to evolve?”

My breakdown of where success actually comes from:

Prompt engineering: ~40%. The quality of your instructions to the LLM matters more than anything else. 10 hours improving prompts beats 10 hours debating frameworks.
Tool design: ~20%. How well your tools work, how they handle errors, whether they compose cleanly.
Evaluation strategy: ~15%. Testing, human review, quality metrics. How do you actually know your agent works?
Framework choice: ~20%. It matters, but it’s not dominant.
Team execution: ~5%. Shipping vs. eternal planning.

I’m not saying framework choice is irrelevant. I’m saying don’t spend three months evaluating when you could be building.

What to Evaluate

When I evaluated these frameworks for our team at Kasasa, I didn’t need to understand every implementation detail. I needed to know what questions to ask. The criteria below are what I landed on. I’ve grouped them by how urgently they’ll bite you in production.

Provider support and ecosystem flexibility

A framework that locks you into one LLM provider creates three risks: cost risk (you can’t shop around), capability risk (when a better model launches elsewhere, you’re stuck), and reliability risk (provider outage = your app is down).

What good looks like: the framework supports 3+ providers with the same code interface, switching is a config change not a code rewrite, and you can use different providers for different tasks (GPT-4 for complex reasoning, Claude for writing, Llama for local dev). Red flag: the framework is tightly coupled to one provider’s API.

LangGraph is provider-agnostic by design through LangChain integrations. Spring AI Core supports OpenAI, Azure OpenAI, Anthropic, Ollama, Hugging Face, and others. Strands is AWS Bedrock-first (optimized for it) but supports Anthropic direct, OpenAI, Ollama, Gemini, and custom providers.

The same logic applies to vector databases and RAG. Most production agent use cases involve RAG (searching your data, not just using the LLM’s knowledge). If the framework makes RAG painful, you’ll spend weeks building infrastructure instead of shipping features. You want built-in integrations with 3+ vector databases, the ability to swap vector DBs without a rewrite, support for custom embedding models, and clear patterns for hybrid search (vector + keyword). Red flag: no vector DB integrations at all, or tight coupling to one proprietary vector DB.

LangGraph has the most extensive RAG story via the LangChain ecosystem (10+ vector DBs, document loaders, text splitters). Spring AI Core has good Spring-style integrations with Pinecone, Weaviate, Redis, and PostgreSQL pgvector. Strands is AWS-focused with native Bedrock Knowledge Bases and S3 as vector store, but has limited support for popular third-party options like Pinecone, Weaviate, and ChromaDB.

Observability determines your debugging speed

I consider this the single most underestimated requirement for agentic AI in production. Without observability, you’re flying blind. You need to be able to trace why the agent gave a particular answer, track how much a query cost in tokens, pinpoint where failures occurred in the workflow, and catch performance degradation before users do.

Without observability, debugging production issues takes hours instead of minutes. You can’t optimize costs because you don’t know where tokens are spent. User complaints like “it gave a wrong answer” become un-actionable because you can’t tell where in a 12-step workflow things went wrong.

What good looks like: native OpenTelemetry integration that works with your existing monitoring, automatic trace creation for every agent execution, token usage tracked per LLM call and aggregated per request, visualization tools to see agent execution flow, and integration with LangSmith, Datadog, or similar production monitoring. Red flag: no built-in observability, or it only supports one proprietary monitoring tool.

LangGraph has excellent native LangSmith integration (detailed traces, token tracking, execution graphs) plus OpenTelemetry support. Spring AI Core has excellent Spring Boot Actuator observability out of the box, and it integrates with enterprise monitoring. Strands has good AWS CloudWatch integration (native if you’re using AWS) and likely supports OpenTelemetry.

State management and control flow

State management is where most agent bugs hide. If your framework treats state as a black box, debugging production issues becomes impossible.

There are two types of control flow. Explicit control flow means you define every step, and the agent is deterministic. That’s good for regulated industries and anything high-stakes. Autonomous control flow means the LLM decides next steps, and the agent adapts dynamically. That’s good for exploration and research tasks. The wrong choice creates friction: explicit frameworks feel heavy for simple tasks, and autonomous frameworks feel unpredictable for anything that needs to be reliable.

What good looks like: state is inspectable at any point (you can see what the agent “knows”), state can be persisted to a database for long-running workflows, and the framework provides debugging tools to trace state changes. Red flag: state is opaque (you can’t inspect it), or control flow is entirely implicit (the LLM decides everything).

LangGraph uses explicit graph-based state machines where you define every node, edge, and state transition (maximum visibility, more boilerplate). Spring AI Core provides explicit workflow control with Spring’s state management patterns. Strands offers dual-mode: autonomous agent loop (LLM decides) or explicit graphs (you control), and you can mix and match.

This ties directly into agent reasoning patterns. Different use cases need different patterns. A simple Q&A might need a single LLM call. Multi-step tasks need Chain-of-Thought or ReAct. Complex planning needs Plan-Execute or Tree-of-Thoughts. Self-improving agents need Reflection. What good looks like: the framework supports 3-5 core patterns out of the box, lets you implement custom patterns when needed, and makes pattern choice explicit rather than magical. LangGraph is fully explicit (you build any pattern yourself, maximum flexibility, more code). Spring AI Core ships 5 explicit workflow patterns. Strands is dual-mode with a built-in autonomous ReAct pattern (agent loop) and explicit graphs (progressive complexity).

Tool calling and custom integrations

Agents are only as good as their tools. Your agent needs to call APIs, query databases, run scripts, and integrate with internal systems. If defining custom tools is painful, your team’s velocity tanks.

What good looks like: tools are just Python/Java functions with decorators (minimal boilerplate), the framework handles tool discovery and selection automatically, parallel tool execution works when appropriate (fetch from 3 APIs simultaneously), and error handling and retries are built in. Red flag: complex tool definition process (lots of boilerplate classes), no error handling, sequential-only tool execution.

LangGraph has excellent tooling with Python functions using @tool decorator, full control over execution, and parallel tool support. Spring AI Core has good Spring-style tool definitions with enterprise integration patterns (REST clients, database templates). Strands has excellent decorator-based tools (minimal code), designed for simplicity.

Human-in-the-loop for production safety

Human-in-the-loop (HITL) is how you make production agents safe. Without it, you’re forced to choose between fully autonomous (risky, the agent might take wrong action) and fully manual (defeats the purpose, the agent just makes suggestions).

In production, you need approval gates before risky actions (deleting data, sending emails, making purchases), regulatory compliance for industries that require human oversight, quality control to catch agent errors before they impact users, and clarification so the agent can ask the user for missing information rather than guessing.

What good looks like: native support for pausing agent execution (interrupt points), state persistence while waiting for human input (the workflow can pause for hours or days without losing context), configurable approval gates, and the ability to approve, reject, or modify agent actions before execution. Red flag: no built-in HITL support (you build it from scratch), or the workflow can’t persist state while waiting for input.

LangGraph has excellent native interrupt() for pausing execution, checkpointing/state persistence, and can pause at any node. Spring AI Core has limited documentation with no clear built-in HITL patterns, and likely requires custom implementation via Spring State Machine or manual workflow pause points. Strands has good native interrupt support via @interrupt decorator and interrupt points in graph mode, with state persisting during pauses.

Community size and production maturity

As your use cases mature, you’ll hit scenarios where one agent isn’t enough. Customer support might need a routing agent, a research agent, and a response agent working in sequence. Data analysis might chain a retrieval agent into an analysis agent into a visualization agent. You want built-in patterns for multi-agent coordination (supervisor, sequential delegation, parallel execution), clear agent communication mechanisms, and the ability to start with a single agent and scale to multi-agent without a rewrite. LangGraph has excellent graph-based coordination with explicit multi-agent patterns and supervisor/delegation examples. Spring AI Core has good orchestration patterns that support multi-step workflows. Strands has good multi-agent support via agent loop orchestration and graph mode.

And when things break in production (they will), community size determines how fast you recover. Choosing a brand-new framework means you’re the beta tester. Every edge case, every production issue, you’re the first to hit it.

LangGraph: 21,500+ GitHub stars (most mature). 4.2M+ downloads/month via the LangChain ecosystem. Production use at Uber (customer support automation), Klarna (shopping assistant), and LinkedIn (content recommendations). Very large community (LangChain Discord has 100K+ members). Stable releases, enterprise support available via LangChain (paid).

Spring AI Core: 3,000+ GitHub stars (newer but growing). Fortune 100 companies in production (500+ developers in production per VMware). Small but professional community with Spring community backing. VMware/Broadcom backing provides enterprise-grade support. Stable releases following Spring quality standards.

Strands: 4,100+ GitHub stars (emerging). Used internally by AWS (Amazon Q Developer, AWS Glue AI features), 478 dependent projects. AWS-backed community, active development (35 releases, v1.18.0 as of November 2025). Enterprise support available indirectly via Bedrock.

The verdict: LangGraph is the most mature, with the largest community and most production examples. Spring AI Core is enterprise-ready with VMware backing and Fortune 100 production use. Strands is emerging strong with AWS backing, 4.1K stars, and internal AWS use.

The Three Frameworks, Compared in Depth

LangGraph: Maximum Control Through Explicit Graphs

LangGraph is the most mature framework in this space. It offers explicit graph-based orchestration for AI agents. Every decision point, state transition, and control flow is explicitly defined by developers. The framework doesn’t make autonomous decisions for you.

If you want complete control over agent behavior, LangGraph gives you the building blocks. You define nodes (steps), edges (transitions), and state. 21.5K GitHub stars and 4.2M+ downloads/month make it by far the most widely adopted. Uber, Klarna, and LinkedIn use it in production. It’s part of the LangChain family, which means rich integrations with vector stores, LLM providers, and tools.

What this means for your team:

Predictability. You know exactly what your agent will do. No surprises from autonomous reasoning.
Debugging. When something breaks, you can trace through the graph to find the issue.
Community support. Largest developer community means more examples, Stack Overflow answers, and third-party tools.
Steeper learning curve. Building autonomous behavior requires you to implement the reasoning loop yourself.
More code. What Strands does in 50 lines might take 200+ lines in LangGraph.

Choose LangGraph when: you need maximum control and predictability (regulated industries, high-stakes decisions), your team has strong Python skills and wants to build custom orchestration patterns, you’re building complex multi-agent systems where explicit coordination matters, or you value ecosystem maturity and production battle-testing.

Spring AI: Enterprise Java Workflows (Split Architecture)

Spring AI is actually two separate libraries with different philosophies, and many teams miss this distinction.

Spring AI Core follows a “workflows over agents” philosophy. Their official blog post states: “We believe workflows offer better predictability, debugging, and control than autonomous agents.” It provides 5 workflow orchestration patterns (Chain, Parallelization, Routing, Orchestrator-Workers, Evaluator-Optimizer), explicit control flow where you define exactly how LLM calls are composed and sequenced, and native Spring ecosystem integration (Spring Boot, Spring Cloud, Spring Security).

In production, it has 7.3K+ GitHub stars (newer than LangGraph, but growing), VMware/Broadcom backing with enterprise-grade support and roadmap, Fortune 100 production use (500+ developers running Spring AI in production at large enterprises), and if you’re already using Spring, this integrates into your existing stack.

Spring AI Agents is a separate experimental library. It provides Java APIs for existing autonomous CLI tools like Claude Agent SDK, Gemini CLI, Amp, and Amazon Q Developer. It does not build autonomous agents from scratch. It’s a wrapper layer that lets Java applications call external agentic tools. While the initial philosophy leaned toward external wrappers, the Spring AI ecosystem is evolving. An alternative from Alibaba’s Spring AI project features ReactAgent-based design and Graph runtime with orchestration patterns inspired by LangGraph. Native autonomous-agent patterns are being actively integrated into the Spring AI ecosystem.

What this means for your team:

Enterprise Java fit. If your stack is Java/Spring, this is the natural choice.
Explicit workflows. Spring AI Core gives you predictable, debuggable orchestration.
Enterprise support. VMware backing means long-term commitment and professional support.
Production-proven patterns. The 5 workflow patterns cover most real-world use cases.
No native autonomous mode. If you want autonomous reasoning, you need external tools (Spring AI Agents) or build it yourself.
Limited HITL documentation. No clear built-in human-in-the-loop patterns; likely requires custom implementation.
Java-only. No Python or JavaScript support (unlike LangGraph and Strands).
Smaller community. Newer framework means fewer examples and third-party integrations.

Choose Spring AI when: your team is Java-first and already uses the Spring ecosystem, you want explicit workflows over autonomous agents (aligns with their philosophy), you need enterprise support and long-term stability (VMware backing), or you’re building predictable, debuggable AI workflows in regulated industries.

Strands: Unified Dual-Mode

Strands is the only framework that natively supports both autonomous reasoning and explicit graphs in a unified architecture.

The philosophy is “progressive complexity”: start simple with autonomous agent loops, graduate to explicit graphs as your use case demands more control.

Mode 1: Agent Loop (Autonomous). Built-in autonomous reasoning cycle where the LLM decides workflow: Input goes to LLM Reasoning, then Tool Selection, then Tool Execution, then Observation, and repeats until done. This is the ReAct pattern (Reason + Act) built natively into the framework, with no external wrappers needed.

Mode 2: Graph Mode (Explicit). Like LangGraph, you define nodes, edges, and state for explicit control. But it’s in the same framework as the agent loop, so you can mix and match.

In production, it has 4.1K GitHub stars (newer but AWS-backed), and is used internally by Amazon Q Developer and AWS Glue AI features. It’s Python 3.10+ only with a modern, decorator-based API. Despite being newer, it’s battle-tested within AWS products.

What this means for your team:

Flexibility. Start with a simple agent loop, evolve to graphs when you need more control.
AWS integration. If you’re using Bedrock or the AWS ecosystem, this is optimized for it.
HITL support. Native interrupt capabilities via @interrupt decorator for human approval gates.
Modern Python. Clean, decorator-based API that feels natural to Python developers.
AWS ecosystem lock-in. RAG/vector DB support is heavily focused on AWS services (Bedrock Knowledge Bases, S3); limited support for third-party vector DBs (Pinecone, Weaviate, ChromaDB).
Smaller community. Newest of the three, with fewer public examples and resources.
Python 3.10+ only. No Java or JavaScript support (though Python-first may be an advantage).
Less documentation. AWS-backed but not as extensively documented as LangGraph.

Choose Strands when: you want both autonomous and explicit modes in one framework (this is unique to Strands), your team values flexibility to start simple and scale complexity as needed, you’re heavily invested in the AWS ecosystem (Bedrock, S3, AWS services) and don’t need third-party vector DBs, you prefer modern Python patterns and are comfortable with a newer framework, or you’re okay with AWS-focused tooling and integrations.

How to Choose

This is the decision process I’d walk through with you.

Filter by must-haves (5 minutes)

Language/Stack Fit:

Java/Spring shop? Spring AI Core.
Python team? LangGraph or Strands.
JavaScript/TypeScript? LangGraph (only option with full JS support).

Ecosystem Lock-in Tolerance:

Already using AWS Bedrock heavily? Strands (optimized for it).
Need multi-cloud flexibility? LangGraph (most agnostic).
Enterprise Java ecosystem? Spring AI Core (Spring integration).

Team Skill Level:

Beginner team, need fast start? Strands (simplest API).
Advanced team, want max control? LangGraph (most flexible).
Enterprise Java developers? Spring AI Core (familiar patterns).

Prioritize your top three concerns (5 minutes)

Pick the three things from the evaluation criteria above that matter most to your context. Here are a few examples of how this works in practice.

Example: Regulated Fintech. Observability (you need audit trails), HITL (compliance requires human approval), and production maturity (you can’t be beta testers). Result: LangGraph (best observability + proven HITL + most mature).

Example: Startup on AWS. Speed to ship (you need to deliver in 2 weeks), AWS Bedrock integration (you’re already using Claude via Bedrock), and simplicity (small team, minimal code). Result: Strands (fastest to prototype + Bedrock-first + simplest API).

Example: Enterprise Java Team. Ecosystem fit (you must use Spring stack), enterprise support (you need VMware backing), and predictability (you’re in a regulated industry and need deterministic workflows). Result: Spring AI Core (only Java option + explicit workflows).

The one-hour decision matrix

Use this checklist to score each framework (30 minutes):

Does it fit our stack? Can team learn it in 1 week? Has 3+ production examples? Community with less than 1 hour response time? Observability built-in?

If 4+ Yes: Shortlist. If 3 Yes: Possible. If less than 3 Yes: Skip.

Score all three frameworks. Pick the highest.

Make the decision (15 minutes)

The 2-Framework Rule: If two frameworks score equally: (1) Pick the one that fits your stack better (Python vs Java). (2) Pick the one with the larger community (faster answers when stuck). (3) Just pick one. You’re overthinking it.

The Analysis Paralysis Trap: If you’re still debating after 1 hour, you’re optimizing for the wrong thing. Framework choice is 20% of success. Execution matters more. Pick the one your team is excited about and move forward.

Common scenarios

“We’re not sure about our long-term needs.” Pick LangGraph (most flexible, largest community, proven track record).

“We need to ship in 2 weeks.” Pick Strands if you’re Python + AWS ecosystem and don’t need third-party vector DBs. Otherwise LangGraph (both have fast learning curves).

“We’re building for a regulated industry.” Pick LangGraph (best observability) or Spring AI (enterprise support).

“We’re a Java shop.” Pick Spring AI Core (only Java option, Spring ecosystem fit).

“We need flexibility in vector database choice.” Pick LangGraph (10+ vector DB integrations) or Spring AI Core (good third-party support). Not Strands (AWS-focused).

On Framework Choice Long-Term

Every framework does the same underlying work: calls LLM APIs, manages state and context, executes tools/functions, and orchestrates multi-step workflows. The abstraction differs. The underlying work is identical.

GPT-4 + function calling launched March 2023. Most frameworks launched in 2024. Today’s “winner” could be obsolete by 2027. You’re choosing in a space that changes fast.

And core concepts transfer between frameworks. Agent patterns (ReAct, CoT, Plan-Execute) are universal. State management concepts translate. Tool integration patterns are similar. LLM prompting skills transfer completely. Learn one framework deeply, and switching later is manageable.

When the choice does matter

It matters when there’s a language/stack mismatch (a Java team using a Python framework creates constant friction). It matters at production scale when you need observability. It matters when a junior team picks a complex framework and velocity drops for months. And it matters when your ecosystem investment creates switching costs (heavy AWS investment + AWS-first framework can be a good fit, but it also creates lock-in).

What matters more

Prompt engineering quality. 10 hours improving prompts beats 10 hours debating frameworks. Better prompts work in any framework.

Tool reliability. Agents are only as good as their tools. Well-designed, tested tools matter more than framework features.

Evaluation strategy. How do you know your agent works? Testing, human review, and quality metrics matter more than most teams realize.

Shipping velocity. A team that ships fast learns fast. Learning beats choosing perfectly.

Bottom line: pick one, ship something, and learn from what breaks.

Resources

Framework Documentation:

LangGraph Python: https://python.langchain.com/docs/langgraph
LangGraph JavaScript: https://langchain-ai.github.io/langgraphjs/
Spring AI: https://docs.spring.io/spring-ai/reference/
Strands:

https://strandsagents.com/

Strands Technical Overview (AWS Blog): https://aws.amazon.com/blogs/machine-learning/strands-agents-sdk-a-technical-deep-dive-into-agent-architectures-and-observability/

Community:

LangChain Discord: https://discord.gg/langchain
Spring AI Community: https://spring.io/projects/spring-ai#learn
Strands GitHub: https://github.com/strands-agents/sdk-python

The Short Version

The next phase of AI isn’t experimentation. It’s shipping agentic systems that work reliably, safely, and at scale. The right decision is the one that minimizes your team’s friction: LangGraph for the Python control-freak, Spring AI for the predictable Java shop, and Strands for the flexible, AWS-native team.

Now choose your framework and get back to building.

The Nuanced Perspective

Discussion about this post

Ready for more?