Stop Adding AI to Your Product. Start Deleting Everything It Replaces.

Most teams are retrofitting LLMs onto Software 1.0 stacks. Adding a /chat endpoint here, a summarization call there, maybe a Copilot suggestion in the IDE. They’re treating the most powerful computing substrate in history like a plugin.

Andrej Karpathy’s recent Sequoia interview crystallizes what’s wrong with this thinking. He calls it Software 3.0: a paradigm where the LLM is the primary computer, not an accessory bolted onto one. The primary artifact is a prompt and context window, not source code. The primary engineering activity is shaping inputs, tools, and verification loops, not writing control flow.

After spending 18 months building agentic systems and watching my own stack shrink by 60-70% as models got more capable, I think the real insight is simpler: AI-native design means asking “can I delete this?” before asking “can I add AI to this?”

Here are 10 patterns that define what AI-native software actually looks like.

1. Agent-Native Docs Instead of Human-Oriented Docs

Most documentation tells humans what to click. Step 1, open the dashboard. Step 2, navigate to settings. Step 3, configure the webhook. This is useless to an agent.

AI-native documentation ships “agent sections” on every page: copy-pasteable task prompts with input/output schemas designed for LLMs, not humans.

Here’s what this looks like in practice:

## For Agents

Give this block to your coding agent to integrate webhooks:

Task: Configure a webhook endpoint for order events
Input: API key (from env), target URL, event types array
Output: Webhook ID, signing secret
Constraints:
  - Must verify endpoint responds with 200 before activating
  - Retry policy: exponential backoff, max 5 attempts
  - Events: order.created, order.updated, order.cancelled

Compare that to a 15-step tutorial with screenshots. The agent section is 8 lines that an LLM can execute end-to-end.

Why this works: LLMs are context-window computers. They need structured task descriptions, not narrative walkthroughs. Every API doc, every SaaS settings page, every deployment guide should have a “paste this to your agent” block alongside the human instructions.

If you maintain a CLAUDE.md or AGENTS.md in your repo, you’re already doing this. I wrote about how this works in practice in Claude Code in Large Codebases, where the entire developer experience hinges on agent-readable context files. The pattern scales to every customer-facing doc you ship.

2. Spec-First, Code-Later via Executable Design Docs

The traditional workflow: write a design doc, get it approved, then manually implement it line by line. The design doc becomes stale the moment someone starts coding.

The AI-native workflow: the design doc is the executable spec. You write structured constraints (object model, invariants, UX flows, error budgets), then an agent scaffolds and maintains the codebase directly from that spec.

%%{init: {"layout": "dagre"}}%%
flowchart LR
    A[Structured Spec] --> B[Agent Scaffolds Code]
    B --> C[Tests Verify Invariants]
    C --> D{Spec Changed?}
    D -->|Yes| B
    D -->|No| E[Ship]

I use this pattern daily. My planning documents have structured sections: goal, constraints, file targets, verification criteria. The agent reads those and generates implementation. When the spec changes, the agent re-derives what needs updating.

The key insight: docs become the OS, not the PDF. Your spec is a living program that agents interpret and execute, not a static artifact humans translate by hand.

This isn’t hypothetical. Tools like Claude Code already consume PLAN.md files as executable specs. I covered the mechanics of this in Harness Engineering: the environment (specs, constraints, tool access) determines agent output quality far more than the model itself. The pattern works today.

3. Agent-Driven Onboarding Instead of Setup Wizards

Enterprise SaaS still ships 10-20 screen setup wizards. A human admin clicks through each one, configuring roles, connecting data sources, setting permissions. It takes hours and produces misconfigurations that take weeks to discover.

AI-native onboarding asks one question: “What’s the one paragraph an admin pastes to their org agent so it configures everything end-to-end?”

The admin provides intent:

Mirror our existing Salesforce CRM roles into this platform.
Connect PostgreSQL (prod-replica) and S3 (analytics-bucket) as data sources.
Enforce SOC 2 access policies: no PII access without manager approval.
Audit log everything.

The agent calls the product’s APIs to configure schema, permissions, webhooks, and integrations. It surfaces only genuinely ambiguous decisions back to the human: “Your CRM has 14 custom roles. 11 map cleanly. These 3 need clarification.”

Approach	Time to Configure	Error Rate	Handles Edge Cases
Manual wizard	2-4 hours	15-30% misconfig	Only documented ones
Agent-driven	5-10 minutes	2-5% (ambiguity only)	Asks when uncertain

Why this works: Configuration is a translation problem. Humans translate intent into clicks. Agents translate intent into API calls. The second path is faster, more complete, and self-documenting.

4. Continuous Agentic QA Instead of Static CI

Code creation is cheap now. The bottleneck moved to verification. Most CI pipelines still run the same static checks they ran in 2019: lint, type-check, run the test suite, maybe a security scanner.

AI-native CI treats verification as a swarm of agents, not a checklist of scripts.

%%{init: {"layout": "dagre"}}%%
flowchart TB
    PR[Pull Request] --> A1[Test Agent]
    PR --> A2[Security Agent]
    PR --> A3[Performance Agent]
    PR --> A4[Break-It Agent]
    A1 --> |"proposes new tests"| Summary
    A2 --> |"threat models the change"| Summary
    A3 --> |"benchmarks hot paths"| Summary
    A4 --> |"tries to break preview"| Summary
    Summary --> Human[Human Reviewer]

Every PR kicks off:

Test agent: runs existing tests, proposes new ones for uncovered paths
Security agent: threat-models the change against OWASP patterns
Performance agent: benchmarks affected hot paths, flags regressions over 10%
Break-it agent: deploys to preview, then actively tries to break it

The human reviewer gets a risk summary, not raw logs. “This PR touches auth middleware. Security agent found no vulnerabilities but notes the session token now includes PII. Performance agent shows 3ms regression on /api/users. Break-it agent couldn’t exploit the change.”

Why this works: verification is highly verifiable (meta, but true). Agents can check correctness, find edge cases, and propose tests because the success criteria are concrete. This is Karpathy’s core thesis: automation accelerates where outputs can be verified. I explored the production resilience side of this in 15 Patterns That Keep Production AI Agents From Burning Down Prod, where containment and observability make agent autonomy safe enough to trust.

5. Always-On Background Product Managers

Today you instrument analytics, read dashboards quarterly, maybe run a user study. Insights arrive weeks after the friction happened. By then, you’ve shipped three more features on top of the broken flow.

In an AI-native app, you attach a resident agent to your telemetry and logs. Its job: observe user behavior, detect friction, and continuously propose concrete changes.

This agent doesn’t generate vague reports. It opens pull requests:

“Users drop off 40% at step 3 of onboarding. Proposed fix: collapse steps 3-4 into a single screen. Here’s the diff.”
“The ‘Export’ button gets 2 clicks/week across 500 users. Proposed: move to overflow menu, reclaim the space for ‘Share’ which gets 340 clicks/week.”
“FAQ page gets 60% of traffic from /pricing. Proposed: add pricing-specific FAQ section inline on the pricing page.”

Each PR includes the data justification, the proposed change, and a rollback plan. A human reviews and merges.

The key insight: the feedback loop compresses from weeks to hours. Instead of “instrument, wait, analyze, prioritize, implement,” it becomes “observe, propose, review, ship.”

6. Agent-Driven Data Migrations

Schema design and migrations are still the most anxiety-inducing part of production engineering. Hand-write a migration, pray in staging, push to prod, hope nothing explodes. Rollback plans are often fiction.

Yet schema correctness is highly verifiable: referential integrity constraints, type checks, test fixtures, and sandbox environments all provide concrete signals. This makes migrations a natural fit for agentic workflows.

The pattern:

You describe conceptual changes in plain language: “Split user into person and account. Preserve referential integrity. Backfill account.owner_id from the existing user.email join on auth_providers.”
Agent generates: migration SQL, rollback SQL, test fixtures for all edge cases, a verification query suite
Agent runs the migration in a sandbox with production-scale data
Agent reports: “Migration succeeded. 2.3M rows backfilled in 47s. 0 orphaned records. 3 edge cases caught and handled (users with multiple auth providers).”
Human approves for production

Traditional:    Write → Review → Stage → Pray → Prod (hours, 10-15% rollback rate)
Agent-driven:   Describe → Generate → Sandbox-verify → Approve → Prod (minutes, <2% issues)

Why this works: migrations have clear success criteria. The agent can verify its own work by running constraint checks and comparing row counts. Verifiability enables autonomy.

7. Agent-as-Integration-Layer Instead of ETL Glue

In a 1.0 mindset, you connect SaaS products with bespoke ETL pipelines, webhook handlers, and cron jobs. Every integration is custom code that breaks when either API changes. You spend 30% of engineering time maintaining glue.

In a 3.0 mindset, you define semantic intent and let an agent manage the choreography.

Instead of writing:

# 200 lines of webhook handlers, retry logic, schema mapping...
def sync_crm_to_billing():
    customers = crm.list_customers(modified_since=last_sync)
    for customer in customers:
        billing_account = map_crm_to_billing(customer)
        billing.upsert(billing_account)
        support.update_org(customer.org_id, billing_account.plan)

You declare:

integration:
  name: customer-state-machine
  systems: [crm, billing, support]
  invariants:
    - "No active subscription without an account owner"
    - "CRM contact.plan always matches billing.subscription.tier"
    - "Support org.priority derived from billing.mrr"
  conflict_resolution: "Surface to human if >$10k MRR affected"

A long-lived agent watches events from all three systems, detects drift, proposes resolutions for conflicts, and executes API calls to converge state. It maintains those invariants continuously rather than running on a cron schedule that misses events between runs.

Why this works: integrations are really just constraint satisfaction problems. Agents are good at constraint satisfaction. You define what “correct” looks like; the agent figures out how to get there and stay there.

8. Agent-Native Back-Office Instead of Admin UIs

For every internal workflow, someone builds an admin dashboard. A ticket resolution screen. A bulk operations panel. A reporting tool. Each one is bespoke UI that takes 2-4 weeks to build and maintain forever.

AI-native back-office: expose a clean domain model plus tool APIs. Give internal teams an agent console.

Ops people talk to the agent:

“Close all stale tickets older than 90 days that have no customer response. But confirm any that have SLA implications.”
“Generate a report of all users who signed up in Q1 but never activated. Break down by acquisition channel.”
“Bulk-update pricing for all accounts on the legacy plan. Show me the revenue impact before applying.”

The agent uses tools over your domain model to perform bulk operations, generate reports, and enforce policies. No bespoke screens required.

Traditional Admin UI	Agent Console
2-4 weeks per screen	0 UI to build
Fixed workflows only	Any query or action
Requires developer for changes	Self-service for ops
10+ screens to maintain	1 tool API surface

Why this works: internal tools are CRUD + permissions + business rules. Agents can navigate all three given a well-defined domain model. You build the model once and get infinite interfaces for free.

9. Autonomous Evaluation as a Product Primitive

Karpathy’s thesis: automation accelerates where outputs are verifiable. You can generalize this into a product design principle: ship built-in evaluators as a first-class feature, not a separate MLOps concern.

AI-native products include evaluators that continuously grade their own outputs:

%%{init: {"layout": "dagre"}}%%
flowchart LR
    Input --> Model[AI Model]
    Model --> Output
    Output --> Eval[Evaluator]
    Eval --> |score| Logging
    Eval --> |below threshold| Fallback[Human Review]
    Logging --> Ranking[Improve Ranking]

Every generated answer, every automated action, every piece of content gets scored. Scores feed into logging (for debugging), ranking (for quality improvement), and gating (for safety). Below-threshold outputs get routed to human review automatically.

This turns quality from a quarterly audit into a continuous signal. You know your system’s reliability in real-time, not after a customer complaint.

10. Multimodal Direct Transforms

Karpathy’s MenuGen story is the purest expression of AI-native thinking: he built a full web app (upload endpoint, OCR, image generation, rendering), then realized the whole thing could be one prompt. Photo of menu in, enhanced photo with dish images out. No app needed.

The generalization: aggressively ask “can this entire flow be one direct multimodal transform?”

Traditional Pipeline	Direct Transform
Whiteboard photo → OCR → parse → wireframe tool → responsive code	Whiteboard photo → responsive UI code
Screen recording → transcribe → analyze → automation builder → script	Screen recording → executable automation script
Stack of contracts → OCR → NER → diff → formatting → redline	Stack of contracts → redlined master agreement

Each row eliminates 4-5 intermediate services. The model sees the raw input and produces the desired output. No pipeline. No glue code. No intermediate state to debug. I stumbled into this myself when building presentations with Claude Code: every slide-generation tool failed, but “take this content and produce a styled HTML presentation” worked first try.

Why this works: multimodal foundation models already understand images, text, code, and structured data simultaneously. The pipeline stages existed because pre-LLM systems needed explicit decomposition. With a sufficiently capable model, the decomposition is waste.

The Uncomfortable Implication

These 10 patterns share a theme: AI-native design deletes software rather than adding it.

Every setup wizard, admin dashboard, ETL pipeline, migration script, and CI config file is a liability. It needs maintenance, has bugs, and becomes stale. If an agent can replace it with a prompt and verification loop, you’ve traded ongoing maintenance cost for a context window that improves every 6 months without you touching it.

Karpathy frames this as “design for replaceable brains.” If your system’s behavior is encoded in prompts, specs, and tool schemas rather than brittle glue code, swapping in a better model is a config change. Your product gets better automatically as models improve.

The teams that win won’t be the ones who add the most AI features. They’ll be the ones who delete the most code and replace it with well-structured prompts, verification loops, and agent-native interfaces. As I argued in Pros Don’t Vibe, They Control: the best engineers aren’t generating more code with AI. They’re specifying outcomes with surgical precision and verifying every result.

The Bottom Line

Software 3.0 isn’t about making your 1.0 stack faster. It’s about discovering that half your stack shouldn’t exist. The installer becomes a prompt. The admin dashboard becomes a conversation. The integration layer becomes a constraint declaration. The CI pipeline becomes a verification swarm.

The uncomfortable question every engineering team should be asking: “If we built this product from scratch today, assuming a frontier model with tools as our primary compute substrate, how much of our current codebase would survive?”

For most teams, the honest answer is 20-30%. That’s not a problem to solve later. That’s the opportunity sitting right in front of you.

Rethinking your stack for AI-native patterns? I’d love to hear which of these patterns you’re already using or planning to adopt. Reach out on LinkedIn.