Part 2: Foundation - The First 48 Hours

The Story So Far: Day 1 ended with a working credential management system - all AI-generated. But we had no UI, no analysis engine, and no deployment orchestration. Just encrypted database models.

The Multi-AI Planning Session

Before writing more code, I tried something unconventional: cross-validating architecture across three different AI models.

I took the same detailed prompt and fed it to:

Claude Code (via Anthropic CLI)
ChatGPT-4 (via web interface)
Google Gemini (via Google AI Studio)

The prompt (condensed):

Design a platform that:
- Accepts GitHub repo URLs for MCP servers
- Uses Claude API to analyze repos and extract config
- Generates dynamic credential forms
- Deploys to Fly.io with isolated containers
- Implements MCP Streamable HTTP transport

Tech stack: Next.js 15, FastAPI, PostgreSQL, Fly.io

Question: What's the optimal architecture?

The Results

Claude's Response:

3-layer architecture:
- Frontend (Next.js) → Backend API (FastAPI) → MCP Machines (Fly.io)
- Use Claude API with web search plugin for repo analysis
- Fernet encryption for credentials
- PostgreSQL for deployments + credentials + cache
- Streamable HTTP at /api/mcp/{deployment_id}

GPT-4's Response:

Similar structure, but suggested:
- Use asyncpg for PostgreSQL (WRONG - we'll discover this later)
- Redis for caching instead of PostgreSQL
- Separate microservices for analysis vs deployment

Gemini's Response:

Agreed with Claude on most points:
- Monolith is fine for MVP (not microservices)
- Use psycopg3 for PostgreSQL async (CORRECT)
- Consider rate limiting on analysis endpoint
- MCP session management via headers

Consensus Patterns (Good Design Indicators)

All three AIs agreed on:

✅ 3-layer architecture (Frontend → Backend → MCP Containers)
✅ FastAPI with async (not sync)
✅ PostgreSQL for persistence (not in-memory)
✅ Fernet encryption (industry standard, simple)
✅ Caching analysis results (expensive LLM calls)

Discrepancies (Complexity Indicators)

Where they disagreed:

PostgreSQL driver: asyncpg (GPT-4) vs psycopg3 (Gemini, Claude)
Caching layer: Redis (GPT-4) vs PostgreSQL JSONB (Claude, Gemini)
Architecture: Microservices (GPT-4) vs Monolith (Claude, Gemini)

My decision process:

PostgreSQL driver: I chose psycopg3 (majority + better SSL support for Fly.io)
- This proved critical later when asyncpg failed with Fly.io's sslmode parameters
Caching: Chose PostgreSQL JSONB (simpler, fewer dependencies)
- MVP doesn't need Redis complexity
- Can migrate later if needed
Architecture: Chose monolith (faster iteration)
- Microservices add deployment complexity
- Can split later if scaling demands it

Lesson learned: Different AI training data = different blind spots. Cross-validation catches issues early.

Phase 4: Aurora UI Implementation

With architecture decided, I tasked Claude Code with building the frontend:

commit f5a957a
Date: 2025-12-11

feat(phase-4): Implement Aurora UI (Landing, Dashboard, Forms)

This single commit added:

Landing page with hero section and feature grid
Dashboard for viewing deployments
Dynamic credential form builder
GitHub URL input with validation
~1,200 lines of TypeScript + React

Generated in under an hour.

The Dynamic Form Magic

The most impressive part: dynamic form generation from analysis results.

Here's the flow:

User pastes GitHub URL: https://github.com/alexarevalo9/ticktick-mcp-server

Backend analyzes repo, extracts config:

{
  "package": "@alexarevalo.ai/mcp-server-ticktick",
  "env_vars": [
    {
      "name": "TICKTICK_CLIENT_ID",
      "description": "Your TickTick OAuth Client ID",
      "required": true,
      "secret": false
    },
    {
      "name": "TICKTICK_CLIENT_SECRET",
      "description": "Your TickTick OAuth Client Secret",
      "required": true,
      "secret": true
    },
    {
      "name": "TICKTICK_ACCESS_TOKEN",
      "description": "OAuth access token from TickTick",
      "required": true,
      "secret": true
    }
  ]
}

Frontend auto-generates a form with:
- Text inputs for CLIENT_ID
- Password inputs for secrets (CLIENT_SECRET, ACCESS_TOKEN)
- "Required" labels where needed
- Help text from descriptions

Claude Code generated the FormBuilder.tsx component that does this transformation automatically. No hardcoding. No manual form creation.

This was the first "wow" moment - AI building AI-powered forms that adapt to any MCP server's requirements.

The Type Safety Obsession

I enforced strict TypeScript rules:

No any types allowed
All props must have explicit interfaces
Zod schemas for runtime validation

Example from the generated code:

// backend/app/models/deployment.py returns this schema
interface AnalysisResult {
  package: string
  name: string
  description: string
  env_vars: EnvVar[]
  tools: Tool[]
  resources: Resource[]
  prompts: Prompt[]
}

interface EnvVar {
  name: string
  description: string
  required: boolean
  secret: boolean
  default?: string
}

// FormBuilder component uses this to generate inputs
const FormBuilder: React.FC<{ analysis: AnalysisResult }> = ({ analysis }) => {
  // AI-generated logic to map env_vars → form fields
  return (
    <form>
      {analysis.env_vars.map(envVar => (
        <Input
          key={envVar.name}
          type={envVar.secret ? 'password' : 'text'}
          required={envVar.required}
          placeholder={envVar.description}
        />
      ))}
    </form>
  )
}

Zero type errors. Zero runtime surprises.

Claude Code generated this because I was explicit in my prompt:

Use TypeScript 5+ with strict mode
All components must have typed props
Use Zod for schema validation
Never use 'any' - create types or use 'unknown' with guards

Creating the Context Structure

By the end of Day 2, I noticed AI sessions were becoming inconsistent. Claude Code would:

Forget architectural decisions from earlier sessions
Suggest patterns we'd already rejected
Ask questions I'd already answered

The problem: AI memory is session-bound. When you start a new session, it's a blank slate.

I needed persistent external memory - a knowledge base that survived across sessions.

The `context/` Directory

I created a structured documentation system:

context/
├── ARCHITECTURE.md       # System design, data flow, tech stack
├── CURRENT_STATUS.md     # What works, what doesn't, next steps
├── Project_Overview.md   # The problem, solution, user journey
├── TECH_STACK.md         # Every dependency and why we chose it
├── API_SPEC.md           # Endpoint documentation
└── plans/
    ├── phase-3-credential-management.md
    ├── phase-4-frontend-ui.md
    └── phase-5-deployment-orchestration.md

Each file served a purpose:

ARCHITECTURE.md - The "how it works" bible:

## Data Flow: Complete Journey

### Flow 1: Analyze Repository
User enters GitHub URL
  ↓
Frontend validates URL format
  ↓
POST /api/analyze {repo_url}
  ↓
Backend checks cache (AnalysisCache table)
  ↓ (cache miss)
Claude API called with web search plugin
  ↓
Claude extracts: package, env_vars, tools, resources
  ↓
Backend validates response schema
Backend caches result (24h TTL)
  ↓
Returns JSON config to frontend

CURRENT_STATUS.md - The living "where we are" document:

## ✅ Completed Features

### Phase 3: Credential Management
- ✅ Fernet encryption service
- ✅ PostgreSQL models (Deployment, Credential)
- ✅ Dynamic form schema generation

### Phase 4: Frontend UI
- ✅ Landing page
- ✅ Dashboard components
- ✅ Dynamic credential forms

## 🚧 What's NOT Working Yet

- ❌ No GitHub repo analysis yet (Phase 4 WIP)
- ❌ No Fly.io deployment (Phase 5)
- ❌ No MCP Streamable HTTP transport (Phase 6)

AGENTS.md - The AI system prompt:

You are a Senior Full-Stack Engineer for Catwalk Live.

## Current Project Status (READ THIS FIRST)
**Phase**: 4 In Progress - Frontend UI + Repo Analysis

**Critical Context Files**:
1. context/CURRENT_STATUS.md - What works, what doesn't
2. context/ARCHITECTURE.md - System design
3. context/TECH_STACK.md - Every dependency

## Interaction Protocol
1. Read context files before starting any task
2. Implement thoughtful, type-safe solutions
3. Write tests for critical logic
4. Run quality checks (typecheck, lint, test)
5. Update CURRENT_STATUS.md when completing tasks

## Boundaries

### ✅ Always
- Use TypeScript strict mode (no 'any' types)
- Write descriptive variable names
- Add error handling for edge cases
- Pass `ruff check` (Python) or `bun run typecheck` (TypeScript)

### 🚫 Never
- Commit secrets or API keys
- Skip type hints in Python
- Use 'any' in TypeScript
- Modify database schemas without migrations

The Immediate Impact

Next session, I started Claude Code with:

Read context/CURRENT_STATUS.md and context/ARCHITECTURE.md
before proceeding. Implement Phase 4: GitHub repo analysis
using Claude API with web search.

Claude Code:

✅ Read both files
✅ Understood we were on Phase 4 (not starting from scratch)
✅ Generated code consistent with existing architecture
✅ Used the exact tech stack documented (OpenRouter API, not direct Claude)

This changed everything. No more "AI amnesia." No more re-explaining architectural decisions.

The context files became the "codebase constitution" that AI had to respect.

The Analysis Engine Emerges

With context structure in place, I tasked Claude Code:

Implement repository analysis service:
- Use OpenRouter API (anthropic/claude-haiku-4.5)
- Enable web search plugin with max_results=2
- Extract: package name, env vars, tools, resources, prompts
- Cache results in AnalysisCache table (24h TTL)
- Return structured JSON matching AnalysisResult schema

The AI generated backend/app/services/analysis.py:

from openai import AsyncOpenAI  # OpenRouter is OpenAI-compatible
import json
import re

class AnalysisService:
    def __init__(self, api_key: str):
        self.client = AsyncOpenAI(
            api_key=api_key,
            base_url="https://openrouter.ai/api/v1"
        )

    async def analyze_repo(self, repo_url: str) -> dict:
        # Use Claude Haiku 4.5 with web search
        response = await self.client.chat.completions.create(
            model="anthropic/claude-haiku-4.5",
            messages=[{
                "role": "user",
                "content": ANALYSIS_PROMPT.format(repo_url=repo_url)
            }],
            extra_body={
                "plugins": [{"id": "web", "max_results": 2}]
            }
        )

        # Extract JSON from response
        content = response.choices[0].message.content
        json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)

        if json_match:
            return json.loads(json_match.group(1))
        else:
            # Try to parse entire response as JSON
            return json.loads(content)

Critical detail: The AI used regex to extract JSON from markdown code blocks.

Why? Because Claude API often wraps JSON in triple backticks:

```json
{
  "package": "@example/mcp-server",
  ...
}
```

This regex pattern made the analysis service robust to formatting variations.

I didn't tell AI to do this. It inferred the problem from the prompt: "Extract structured JSON from LLM response"

What Worked vs What Didn't

✅ What Worked

Multi-AI cross-validation:

Caught the asyncpg vs psycopg3 debate early
Majority consensus led to correct decision
Prevented a painful database driver migration later

Context files as external memory:

Eliminated "AI amnesia" across sessions
Kept architecture consistent
Made onboarding new AI sessions trivial

Structured prompts with explicit constraints:

Type safety enforcement in prompts → zero any types
Linter requirements in prompts → all code passed ruff check
Explicit tech stack → no dependency surprises

Dynamic form generation:

Single FormBuilder component handles any MCP server
No hardcoding per-service
Scales to infinite MCP servers

❌ What Didn't Work

AI tried to build too much at once:

commit af021a1
Date: 2025-12-12

work on front end and backend for attempt to complete flow,
next sttepis working on fly.ioo integration

This commit message (with typo "sttepis") showed AI was rushing. It tried to:

Build frontend components
Implement backend analysis
Start Fly.io deployment
All in one session

Result: None of it worked properly. Features half-implemented. Integration bugs everywhere.

Lesson: I should've enforced one phase per session. Keep scope tight.

Variable naming still generic in places: Despite my prompts for descriptive names, AI still generated:

# AI's first pass
data = process_data(input_data)
result = calculate_result(data)

I had to manually catch these during code review:

# After feedback
analysis_result = analyze_github_repo(repo_url)
deployment_config = generate_deployment_config(analysis_result)

Lesson: Even with explicit prompts, code review is non-negotiable.

Missing error handling in happy-path code: The analysis service didn't handle:

Invalid GitHub URLs
API rate limits
Malformed JSON responses
Network timeouts

I had to prompt: "Add comprehensive error handling for all failure modes"

Only then did AI add try/except blocks and custom exceptions.

Lesson: AI generates happy paths. You must prompt for error cases explicitly.

Key Metrics After 48 Hours

Lines of Code: ~2,400

Backend (Python): ~1,200
Frontend (TypeScript): ~1,200

Time Spent: ~8 hours

Multi-AI planning: 1 hour
Creating context structure: 1 hour
AI code generation: 3 hours
Code review & validation: 2 hours
Debugging integration: 1 hour

Manual Coding: 0 lines

100% AI-generated code
My role: architect, reviewer, validator

Quality:

✅ Passes ruff check with zero warnings
✅ Passes bun run typecheck with zero errors
✅ Type-safe throughout (no any types)
✅ Dynamic forms working locally
❌ No production deployment yet

The Moment I Knew This Would Work

End of Day 2, I ran the frontend locally:

cd frontend
bun run dev

The landing page loaded. I pasted a GitHub URL into the analysis form. Clicked "Analyze."

The backend hit Claude API. Web search plugin fetched the repo. Analysis extracted:

{
  "package": "@alexarevalo.ai/mcp-server-ticktick",
  "env_vars": [
    {"name": "TICKTICK_CLIENT_ID", ...},
    {"name": "TICKTICK_CLIENT_SECRET", ...}
  ],
  "tools": ["list-tasks", "create-task", "update-task"],
  "resources": ["ticktick://tasks"],
  "prompts": []
}

The frontend auto-generated a credential form with three password fields, help text, and validation.

I hadn't hardcoded any of this. The form adapted dynamically to the analysis result.

This was the proof: AI orchestration could build adaptive, production-quality systems.

Not just CRUD. Not just boilerplate. But intelligent UX that responds to data.

Coming Next

In Part 3, reality hits hard:

First Fly.io production deployment
The PostgreSQL driver nightmare (asyncpg fails, migrate to psycopg3)
Docker CRLF line ending hell on Windows
Missing dependencies causing crash loops
MCP Streamable HTTP implementation
The moment Claude Desktop successfully connected to a remote MCP server

Spoiler: I spent 6 hours debugging PostgreSQL connections. AI generated the infrastructure code in 20 minutes, but infrastructure debugging required deep manual intervention.

Commit References:

f5a957a - Aurora UI (Landing, Dashboard, Forms)
af021a1 - First integration attempt (too much scope)
a06d684 - Refactored error handling in encryption service
4d6b32b - Credential management foundation

Tools Used:

Claude Code (primary implementation)
ChatGPT-4 (architecture validation)
Google Gemini (cross-validation)

Code:

This is Part 2 of a 7-part series. The code works locally. Now comes the hard part: production.

Previous: ← Part 1: Genesis Next: Part 3: Production Baptism →