romulus@roma-eterna ~ _
Access Required
ACCESS DENIED — incorrect passphrase

Romulus: Building an AI system that remembers.

From a persistent personal assistant to a five-cohort command structure — a case study in what one engineer can do when context never resets.

uptime: 69 days · cohorts: 5 · morning briefs: 69 delivered · products shipped: 4
Romulus — From Personal AI to Multi-Cohort Command

Who Is Romulus

He came online February 3, 2026. Named after the legend of Rome's founding — not as a gimmick, but to clarify what I was trying to build: something with a name, an identity, and a purpose, rather than another stateless chat window.

The problem was straightforward. I work as a senior PM by day and build my own products by night. The gap between "interesting idea" and "shipped prototype" was filled with re-explained context, lost threads, and morning sessions where I had to brief myself on whatever I'd figured out the week before.

Romulus started as an experiment in persistent AI memory: a system that reads from a shared knowledge base (an Obsidian vault) on every session and writes back after. First use case: a morning brief delivered to a private Discord channel at 8 AM — weather, subway timings, product signals from the market. Nothing glamorous. Just a baseline to verify the system was actually running.

By week two, the brief was running every morning without intervention. It stopped feeling like a tool and started feeling like a system.

The Throughput Wall

A decade of shipping between day jobs taught me the same thing at every startup: ideas aren't the bottleneck. It's throughput — the time between seeing a problem clearly and having a working prototype.

The current generation of AI tools has a specific weakness: they're stateless. Every new conversation starts from zero. Context that took a session to build evaporates the moment it's over. For solopreneurs working in stolen hours, this means constant overhead just to stay aligned with their own thinking.

The constraint isn't the model. It's the absence of memory.

Design Decisions

01
Discord as interface
Threaded, logged, pocketable. Every brief, command, and response is timestamped and searchable. The channel history doubles as episodic memory before anything needs a vector database.
02
Obsidian as memory substrate
Durable Markdown. No vendor lock. Human-inspectable. Files survive reboots. Context doesn't disappear when the agent does. 83 files, 29 project notes, 31 daily logs, 11 key decisions — dating back to first boot on February 3, 2026.
03
A dedicated host
Stable cron. Zero cold-start. Separate from the daily laptop. A Mac mini named Roma Eterna means scheduled jobs run reliably and context is always preloaded.
04
Match model to task
Not every task deserves the same model. Running an 8 AM morning brief through the most expensive model costs 400x more with identical output quality. Match capability and cost to the job.

Memory

83 files. 12 core memory documents. 29 project notes. 31 daily logs. 11 key decisions.

Chat history is ephemeral. Obsidian files are not. The system retrieves relevant past work before every session — comparing context, flagging contradictions, loading project state. After 69 days, the knowledge graph compounds. What was an empty vault is now a living map of 400+ tasks, shipped products, failures, and lessons.

The system stores context in two places, by design. Obsidian holds the human-readable layer: project notes, daily logs, decisions — things Mike can open, edit, and reason about at any time. SQLite holds the machine-readable layer: 36 entities, 33 relationships, task histories, build reports — structured data that Caesar's workers load as context before they start. The Obsidian vault provides qualitative memory (what happened, decisions made, lessons learned). The SQLite graph provides quantitative memory (relationships, dependencies, outcomes). Together they form a knowledge base that's both human-inspectable and machine-actionable.

What Broke — Day 11

BUG

The morning brief shipped with stop ID R16. The feed returned plausible-looking data — 12-minute travel time from Prospect Ave to Canal St. It looked right.

R16 is Times Square. We were tracking the wrong station entirely for 11 days.

Fix: Downloaded the full static GTFS dataset, cross-referenced stops.txt, found R34N (Prospect Ave northbound) and R23N (Canal St northbound). Patched the script. Correct travel time: 26 minutes.

Plausible data isn't verified data. Always cross-reference against a second source before declaring production-ready.
romulus@roma-eterna legion pipeline

The Extension: Legion

Running one AI with persistent memory was useful. But it still required one person to initiate each step: research, spec, build, deploy, analyze. The bottleneck shifted from context to bandwidth.

Legion extends the architecture from one system to five — each specialized in a different function, sharing the same memory layer, orchestrated through a single routing system. The model: decompose a prompt, route it to the right cohort, chain the outputs together automatically.

Caesar — intelligence. Research, competitive analysis, market signals.
Augustus — execution. Architecture review, build, and deployment.
Vespasian — revenue. Pricing models, unit economics, exit scenarios.
Trajan — distribution. GTM strategy, content planning, launch sequencing.
Romulus — orchestrator. Task decomposition, routing, quality gates.

A critical mechanic: every transition between cohorts is gated. Caesar's research brief doesn't automatically flow to Augustus — it gets reviewed first. Augustus doesn't deploy without a review cycle. Vespasian doesn't run unless the build succeeded. These gates aren't manual approvals — they're programmed checkpoints that validate outputs against structured criteria before the next phase triggers. This is what distinguishes Legion from a chained prompt: each stage has its own quality bar, and the chain breaks if a stage fails.

Before: you → one tool, sequential.
After: you → five cohorts, parallel.

🐺
ROMULUS
OPERATIONAL · ORCHESTRATOR
"What's the mission, what's the priority, and who runs point right now?"
task-decomposer
router
gate-enforcer
synthesizer
Output:
Task routing, pipeline orchestration, cross-cohort synthesis
🦅
CAESAR
OPERATIONAL
"What information is missing before we decide whether to build?"
researcher
analyst
competitor
synthesizer
monitor
Output:
Competitive landscapes, pricing intelligence, strategic gap analysis
🏛
AUGUSTUS
OPERATIONAL
"What does the spec say, and what needs to be built first?"
architect
builder
reviewer
deployer
Output:
Shipped code, deployed apps, architecture specs
💰
VESPASIAN
OPERATIONAL
"What's the unit economics look like at scale?"
pricing-analyst
unit-economist
exit-modeler
Output:
Pricing models, unit economics, revenue strategy
🛣
TRAJAN
OPERATIONAL
"Where does this audience live, and what's the fastest way to reach them?"
strategist
content-writer
partnerships
Output:
Launch plans, GTM strategy, content assets

The Pipeline

A prompt enters. The system classifies, routes, and chains five cohorts automatically. Each stage has a quality gate. The total cycle time is measured and logged.
⚔️ Caesar
2m 23s
Gate
10s
🏛 Augustus
5m 42s
Gate
5s
💰 Vespasian
48s
Gate
2s
🛣 Trajan
31s
🔥 Total
7m 45s
~/legion/experiments/01-caesar-speed.md

Caesar — Research Speed

Control
~3 hours of manual competitive research, web searches, pricing page audits, and synthesis.
Result
Five specialized workers produce a ranked competitive landscape with pricing intelligence and strategic gaps in under three minutes.
2.4 min
vs ~3 hours manual
Under three minutes, five specialized workers produced a competitive landscape that would have required hours of manual research. This brief directly informed the decision to prioritize Caveat over three other ideas. Caveat shipped three weeks later.
Insight: Specialized agents don't outperform generalists because the underlying models are better. They perform better because the context and output constraints are tuned for one specific question.
~/legion/experiments/02-memory-compound.md

Memory Compounding

Run 1 (Forbidden)
0 entities in the knowledge graph. Every task starts blind.
Run 3 (Alexa Portfolio)
36 entities. 33 relationships. The system loaded prior context from Forbidden, Chronicle, and Legion's own architecture notes before Caesar's workers even started.
36 entities
33 relationships
The difference between the first task and the third was that the system had 36 prior data points from two completed projects. By the sixth project, Caesar was surfacing competitive context I hadn't explicitly asked for.
Insight: The knowledge graph only becomes useful after a threshold — roughly three projects for our dataset. Before that, it's overhead. After that, it pays for itself.
~/legion/experiments/03-synth-fix.md

The Synthesizer Fix

Problem
The synthesizer worker had a 21-minute timeout. It was outputting the full raw research from every upstream worker — thousands of words of unfiltered data dump.
Fix
"KEY FINDINGS" extraction constraint in the system prompt. Output only the 5-8 most important findings. No fluff. No restating the question.
48 seconds
down from 21 min
The fix was a single constraint in the system prompt: extract only the 5–8 most important findings. Output dropped to a few paragraphs. Processing time dropped by 80%.
Insight: Most prompt engineering is subtraction. The synthesizer wasn't failing because it lacked instruction — it was failing because it had too much to work with.
~/legion/experiments/04-voidbreaker.md

VOIDBREAKER — One-Shot Build

Prompt
"Build a web-based asteroid shooter with retro CRT aesthetics, mobile touch controls, and a void-break mechanic."
Result
Touch joystick. Particle effects. Game loop. Deployed to GitHub Pages.
1,047 lines
one prompt, one build
One prompt, one build session, deployed. The build was reviewed before shipping and passed on the first pass.
Insight: The important part isn't the output size — it's that Caesar's competitive research on the game genre was already loaded as context before Augustus touched a single file. The pipeline works end to end.

What Broke — Legion Era

TIMEOUT
iOS Build Timeout

Augustus uses Claude Code CLI to build native Swift apps. The sandbox has a 15-minute timeout. Scaffolding a real iOS project with SwiftUI, Speech framework, and AI integration takes 20+ minutes. Architectural limitation, not a fixable bug. Workaround: generate key files directly and scaffold manually.

Bot-to-Bot Mentions Blocked

Discord limits self-bots. The cohort bots can't reliably @mention each other for handoffs. Workaround: database-driven event routing with polling.

Alias Matching Contamination

The memory alias matcher for project notes was too loose. "Asteroid game" matched "party game" via the keyword "game," loading Forbidden.md as context for an unrelated build. Fix: word-boundary regex and expanded stop words.

What's Next

The current version works for research and build workflows. The next phase is connecting the pipeline to live business data and automating distribution.

Revenue Tracking
Vespasian tracking actual Stripe revenue — not modeling hypothetical MRR, but watching real numbers move.
Auto-Posting
Trajan auto-posting to X — the launch assets it produces are already write-ready. The missing piece is the posting API.
Ambient Monitoring
Caesar ambient monitoring — flagging when a competitor changes pricing, ships a feature, or gets mentioned in a relevant community.
LEGIO Dashboard
A LEGIO command dashboard replacing the Discord terminal. Real-time cohort status. Active pipeline tasks. Memory graph visualization.
The right model for AI-assisted product work isn't a single brilliant assistant. It's a disciplined force with clear specializations, shared memory, and a single command structure.

What We've Learned

The system works because it remembers. Everything else — the cohorts, the pipeline, the routing — is infrastructure built on top of persistent memory. Without the Obsidian vault as a shared knowledge base, none of this would be possible. The lesson: if your AI system doesn't have memory, it doesn't have a foundation.

The most important design decisions were about what to restrict, not what to enable. Limiting each cohort to its specific question. Restricting synthesizers to KEY FINDINGS only. Keeping humans in the approval loop for build specs. Constraints make agents productive.

The system has been running for 69 days. It ships on schedule, runs every morning, and knows every project I've touched. The next test is scaling it — not in complexity, but in impact. Can this architecture support multiple concurrent projects? Can it connect directly to business metrics? Those are the questions for the next phase.

What We Shipped

These projects either shipped through the Legion pipeline or were built alongside Romulus. All of them benefited from persistent memory and the research-to-build loop.

LIVE
Chronicle
thischronicle.com
A daily history guessing game. One event per day, three chances to guess the year. Wordle-style digit feedback across five centuries. 90 hand-curated puzzles.
Next.js + Tailwind + Vercel
LIVE
Caveat
trycaveat.com
AI contract analyzer. Upload a PDF, get a risk report in 60 seconds. Built on Next.js 15 + Stripe + GPT-4o. Privacy-first — contracts are never stored.
Next.js + Stripe + OpenAI
LIVE
Forbidden
playforbidden.com
Taboo-style party game with AI-generated themed card decks. Pressure rounds, multiplayer scoring. Designed for groups, played in two minutes.
Next.js + Stripe + Vercel
LIVE
BiblePath
apps.apple.com/us/app/my-bible-path
iOS app for daily scripture reading and habit building. Clean, focused, local-first design.
Swift + SwiftUI