Specs are how experiments become products
A short manifesto on why we built Penling, what we think is broken about how teams use AI today, and why we believe the next era of software development will be spec-driven - together.
The thing that's actually happening
Walk into any engineering team using AI today and you'll see two things at once.
The first is real, undeniable velocity. Features that used to take a week ship in a day. PRs land faster than reviewers can read them. Engineers who would have spent an afternoon stuck on boilerplate are instead three problems deeper into the work. The productivity story is true. We're not here to argue with it.
The second is harder to name. The codebase is moving forward - but the reasoning isn't. Tickets are getting vaguer because the AI “figures it out.” PRs are getting harder to review because nobody quite remembers what was asked for. The team is shipping more, but understanding less of what they shipped. The story of why the product looks the way it does is dissolving in real time, one prompt at a time.
Six months in, you can feel it. You ship a feature. Someone asks “why did we build it like this?” and the only answer is “because the AI did it that way.” That isn't an answer. It's an absence where an answer should be.
We built Penling because we got tired of watching that happen - on our own teams, and on every team we talked to.
The diagnosis
The problem isn't AI. The problem is that the tools teams use to coordinate work were designed for a world where humans wrote the code.
A Jira ticket is a coordination artifact. It exists so a human knows what to do next. The acceptance criteria are loose because the human filling in the gaps has context - they were in the standup, they remember the conversation, they know what the customer actually meant. The ticket doesn't have to be precise because the human is.
The moment you hand that ticket to an AI, every loose phrase becomes a guess. Every unstated assumption becomes a decision the AI makes silently. The ticket isn't a contract anymore - it's a prompt. And prompts are interpretive, not definitive.
Prompts are interpretive, not definitive.
This is the seam where everything is breaking. We didn't change the coordination layer to match what the execution layer became. We just bolted AI onto the side of tools designed for a different era and hoped the seams would hold.
They aren't holding.
What we believe
We believe a few things, plainly:
The quality of AI code is decided before the AI writes it. Not during code review. Not in a verification step. Before. The work you get is a function of the spec you wrote, and if the spec is vague, no amount of cleverness downstream will rescue the output.
Specs are not prompts. A prompt is a single act of communication, optimized for an answer right now. A spec is an artifact - durable, structured, reviewable, shareable. The difference matters. Prompts disappear; specs persist. Prompts are individual; specs are collective.
Specs are not a developer artifact. This is where most current thinking about AI development goes wrong. Spec Kit and similar CLI tools treat the spec as something one engineer writes alone, in a terminal, between themselves and their AI. That works for a hobby project. It does not scale to a team. Designers shape the expected results. PMs define the conditions. Leads guard the boundaries. Engineers drive the build. When the spec lives on one laptop, the team's collective intelligence - which is the whole reason teams exist - never gets into the work.
Reasoning has to outlive the prompt. If the AI made a choice, the team needs to be able to find out why later. If a clarification happened, it has to be on the record. If the plan changed, the change has to be traceable. A codebase without a memory is a codebase the next engineer can't safely change.
A codebase without a memory is a codebase the next engineer can't safely change.
Velocity without intention is a liability. Shipping fast is only valuable if you're shipping the thing you meant to ship. Otherwise you're just generating technical debt at a higher clip than ever before, and you'll meet it again on the other side.
What we're building
Penling is a spec-driven workspace for teams that use AI to build software.
The work flows through five stages: a goal that the team is moving toward; a focus area with a definition, expected results, conditions, and boundaries - the spec, written by the whole team together; a plan drafted by an LLM and refined by humans before any code is written; a build carried out by an AI agent through MCP, with every decision and event reporting back as the work evolves; and a PR that lands with the full chain of reasoning attached.
That's the mechanism. But the mechanism isn't really the point.
The point is that when the spec is written by the team - really written, with the designer and the PM and the lead all in the same artifact at the same time - the AI gets a contract worth executing against. The plan is reviewed by humans because humans should be reviewing plans. The build is observable because the team should be able to see what's happening. The PR carries its reasoning because six months from now, someone is going to ask “why is this the way it is?” and there should be an answer.
We're not trying to make AI more accountable. We're trying to make teams more deliberate.
What we're not
We're not building an AI code reviewer. We're not building a guardrail. We're not trying to slow AI down or wrap it in safety theatre. We have no interest in being the bureaucracy that engineers route around at 4pm on a Friday.
We are building the workspace where the thinking happens - before the typing does.
If that sounds slower, sit with it for a moment. “Slow is smooth, smooth is fast” is something soldiers and surgeons and pilots all believe, and they believe it because the alternative is worse. Vibe coding is fast in the way that running through a forest at night is fast: until it isn't, and then you're hurt and lost and you can't explain how you got there.
Spec-driven development isn't slower. It's deliberate. And deliberate is fast.
Where this goes
We think the next era of software development will look different from the last one. We think the teams that thrive in it will be the ones who figured out how to keep humans deeply in the loop without slowing AI down - by building shared, durable specs that AI can execute against, and by keeping the reasoning attached to the result.
We think Jira and the rest of the ticketing layer will get replaced, not because they're bad tools, but because they were designed for the wrong era. The artifact a team coordinates around is going to look more like a spec and less like a ticket.
We think CLI tools for spec-driven development are a stepping stone, not a destination. The team practice needs a shared surface, and a terminal isn't one.
We think the codebases that age well over the next decade will be the ones whose teams can still answer “why did we build it like this?” in a month, in a year, in five years. The ones who can't will become archaeology projects.
And we think the difference between those two outcomes is decided right now, in the choices teams are making about how to use AI today. Not in the verification step. Not in code review. Now. Upstream.
That's where we're trying to help.
If this sounds familiar
We talk to a lot of leads. Most of them are quietly worried about something they can't quite name. They like the velocity. They don't want to go back. But they can feel that the team's collective grip on the codebase is loosening, and they don't have a vocabulary for it yet.
If that's you, you're not alone, and you're not wrong. The thing you're feeling is real. Penling exists because we felt it too.
We built the tool we wanted to be using. We'd be honored if you tried it.

- The Penling team