The State of Agentic Development
What’s Real, What’s Hype, What’s Changing
My experience with software has been pretty magical.
I’m a builder because I enjoy the whole process of bringing a product to life. I like thinking through what it should be, figuring out how it should work, and then doing the work to make it real. I’ve always liked that part. But the frustrating part of building software has also been pretty consistent for most of my career: there are always more things that matter than a team has time to do well.
That is part of why this moment feels important to me.
A lot of the market is talking about agentic development like we’re already at the point where software teams can hand work to fully autonomous AI engineers and just watch the output roll in. That is not what I’m seeing.
What I am seeing is something more interesting, and more grounded. Teams are getting real leverage out of these tools. They’re building faster, exploring more options, and compressing parts of the development cycle that used to take much longer.
But they’re also discovering a new kind of management burden.
As I build with agents in my own work, I’ve noticed that I often end the day exhausted, and not because the agents are slow. It’s almost the opposite. They move so fast that the mental load goes up. Sometimes it feels like I work for the agent instead of the other way around. Instead of handing specs to an engineer and expecting a few check-ins throughout the week, I can hand them to an agent, get a draft back in minutes, and have new work dropped into my queue faster than I can process it.
The hard part is all the switching. You’re thinking about one project, giving it context, then jumping to another tab and trying to pull that whole project back into your brain so you can review what it did or answer what it needs. Some things the agent can do on its own. It can run unit tests. It can check whether code changes are blowing up the codebase. But it still needs your opinion and your direction. It still needs you to decide what good looks like, what correct looks like, and whether the thing it made is actually the thing you meant. And that still has to pass through your brain.
For me, that is exhausting. I already struggle with task switching, and AI has made that worse, not better. On a busy day it feels like jumping from one flying platform to another in the Mario mushroom level. Every jump is another context shift. Every context shift has a cost. That part doesn’t feel magical at all.
That doesn’t mean the leverage is fake. It just means the work is different than people think. I’ve definitely been able to structure enough work to keep an agent busy for a few hours, but that is not the norm, and getting there takes a lot more thought than people usually admit. It’s not fifteen minutes of brainstorming and then two hours of autonomous execution. At this point, I almost feel like I spend more time thinking through specs than the agent actually spends working. That is still a win, because the agent can do in minutes what might have taken me hours or days. But it also speeds up everything else around the work.
The category is real, but the market is talking about it too cleanly
It is obvious now that AI coding has moved beyond autocomplete and chatbot assistance.
We have systems that can explore a codebase, use tools, generate plans, write code, call services, run checks, and participate in longer-running workflows. That is a meaningful shift. It is not fake, and it is not small.
For me, one of the earliest big shifts was Claude Code. When I first started using it seriously, it changed how I thought about both coding and knowledge work. I could point it at a codebase I barely knew, ask it to analyze the structure, and get up to speed much faster than I could have on my own. But just as important, I started using it like a second brain. Once I had worked something out, I did not have to remember it the same way anymore. I did not have to keep rebuilding the same mental map from scratch. That matters more than people realize.
But the market language is getting ahead of the average team’s reality.
The loudest version of the story is about autonomy. It is about the AI engineer, lights-out software development, or the idea that the whole software lifecycle is about to become self-driving.
Most of the real-world stories I’ve seen do not look like that.
They look more like teams finding narrow places where agents are genuinely useful, then building new habits around those pockets of leverage. Sometimes that means planning. Sometimes implementation. Sometimes review, triage, debugging, or integration work. Sometimes it means one person running several specialized agent sessions with very clear roles. Sometimes it means a product person and an engineer using documents as a bridge because the AI workflow itself is still not naturally shareable.
That is a very different picture from the marketing layer. It is less magical, but it is much more believable.
What I’m hearing in the field
One thing that has stood out as I’ve talked with software engineering teams is that almost nobody is asking for maximum autonomy as the first step.
What they want is leverage without chaos.
A solo founder I spoke with is building a real product with Claude Code, GitHub, Replit, Notion, Apple Notes, and Google Docs stitched together into a workflow that mostly works. The planning takes hours. The coding might take ten to forty minutes. Testing takes hours again. When he gets stuck, tasks can just sit there because there is no escalation path and no second set of eyes. What he asked for was not “make the agent more autonomous.” What he wanted was, basically, a fractional CTO in a box, something that could tell him what he is doing right, what he is doing wrong, and what he does not know is wrong.
That is not really a model request. It is a workflow request.
A more advanced team I spoke with has already built a disciplined workflow around multiple agent roles. One system handles orchestration and tracking. Another helps shape the design. Another focuses only on deployment readiness. Another handles integrations. They pass prompts and receipts between systems, review each other’s work, and keep context separated on purpose. That is not “vibe coding.” It is a real process. But it is also bespoke, personal, and hard for the rest of the team to see.
Again, the issue is not whether the model can produce code. The issue is whether that way of working can be understood, trusted, and repeated by more than one person.
Then there are teams in the middle. One CTO described a workflow where feature requests start as a sentence or two, an engineer explores the codebase with Claude Code, then breaks out into a Google Doc to get product clarification before going back into the coding session. The issue was not that the team lacked AI tools. The issue was that handoffs were still manual and lossy.
And on the far other end, I’ve talked to teams that already have AI tools all over the place but are not using them effectively. Their engineering pace is lagging. Their documentation is weak. Their roadmap is fuzzy. Important work is months late. The question there is not “how do we unlock autonomous coding?” It is “how do we get this team operating coherently at all?”
Those are very different teams. But they are all pointing at the same truth: the tools are here, but the practice is still uneven.
What demos hide
One reason the market gets confused is that demos make the easy part look like the whole thing.
A good agent demo can look incredible. It can make it feel like you can one-shot a prompt, let the system dig through the codebase, make the right changes, test everything, and ship. And for small prototypes, clones, or isolated workflows, that can be close enough to true.
But that is not the same thing as building a good product.
I’ve seen plenty of demos that look polished at first glance. It is the same dynamic we used to have with human engineering work. Someone brings back a feature, it looks great, everyone is excited, and then you start clicking through it. That is when you find the holes, the gaps, the edge cases, the workflow problems, and the places where the builder misunderstood what really mattered.
Agentic development did not invent that problem. It just makes the loop faster.
That is why I still think vibe coding is useful. It is great for prototyping. It is often better than a static mockup, because you can actually interact with the thing and learn from it. But demos usually stop right before the part where software becomes product.
What real teams need before more autonomy
If a real software team wants to lean into agentic development, my first answer is not “buy more autonomy.” It is test coverage.
That is not a glamorous answer, but I think it is the right one.
A lot of teams have spent years knowing they should improve unit test coverage and never finding the time. It is hard to sell that work to a leadership team, a board, or customers when everyone wants feature velocity. So teams compromise. They rely on manual QA, some happy-path automation, and a lot of good intentions.
That gets much riskier when agents start moving faster.
If you do not have unit tests, and you do not have a CI/CD pipeline with meaningful automated checks, then you are just increasing the speed at which untrusted changes can pile up. You will spend more time validating what the agents did, not less.
The good news is that agents are actually very good at writing tests. So for a lot of teams, phase one of adopting agentic development should be using agents to help build the testing discipline they skipped earlier. That is a much better use of time than pretending the foundation does not matter.
What feels genuinely important underneath the hype
The thing that feels most important to me is not that more people can now build software. That part is obvious.
What matters is that more people can now build enough software to start judging the quality of the software around them.
People who never wrote code before can now vibe code a product in a few weeks and start to feel where the cracks form. They learn where their own limits are, but they also learn how much better things could be than the tools they use every day.
That raises the bar.
For years, software teams had to make brutal tradeoffs. You had a limited team, a long bug list, customer requests piling up, and not enough time to do all the polish work you wanted to do. So you made uncomfortable calls. A nasty edge case might only hit five percent of users. Fixing it might be medium-high effort. There might be no good workaround. But you still could not justify spending the time on it because there were bigger fires and louder roadmap asks.
That logic has shaped software for a long time.
I do not think users are going to be as tolerant of it in an agentic world.
If the software has holes, rough edges, bad workflow design, or obvious bugs, people are going to feel that gap more sharply because they can now see how much is possible. And a lot of those bugs and rough edges are exactly the kind of work agents can help with. They are often smaller, more bounded, and easier to patch than big greenfield feature bets.
So yes, agentic development is coming. I just do not think it is coming in the way people like to pretend.
If you drop agents into a sloppy team with weak processes, fuzzy specs, and no real checks and balances, you are not going to get magic. You are going to get faster chaos.
But if you put in the work, build the harnesses, improve the workflows, tighten the specs, and create the right review and testing loops, then I think this shift is genuinely important.
Not because it removes the need for judgment.
Because it raises the value of judgment, quality, and opinionated product thinking.
The serious builders are starting to talk this way too
One reason I feel confident in that read is that some of the strongest public material from the model companies is starting to sound a lot more grounded than the market around them.
Anthropic has written that the most successful teams they’ve seen are often not using the most complicated frameworks. They are using simpler, composable patterns and only adding complexity when it is actually needed. They also draw a helpful distinction between workflows and agents, which matters because a lot of the industry still blends those together.
They have also been clear that more agent capability brings more risk, which means human control, transparency, secure interactions, and operating boundaries are not optional extras. They are part of the system.
OpenAI’s own agent material points in a similar direction. Their framing of agents as systems that plan, call tools, collaborate across specialists, and keep state, along with their broader point that 2025 was the year AI got easier to run in production, fits much better with what I’m hearing from teams than the noisier autonomy discourse does.
GitHub is also starting to talk about these systems in more operational terms, especially when agents move closer to real repositories and CI/CD. Their writeup on how agentic workflows need isolation, constrained outputs, and comprehensive logging is a good example of where the conversation goes once the question is no longer “can agents help?” but “how do you make them predictable and safe enough to trust?”
That is a very different conversation from the one you get if you only follow the hype layer of social posts and product launches.
The serious builders are increasingly talking about state, review, control, orchestration, and trust. That should tell us something.






