Agency-OS: A White-Label AI Operations Platform for Agencies

The short version

Agency-OS is a deployable AI operations platform that agencies install on their own infrastructure, brand as their own product, and run for their clients.
White-label by default — one branding.json file controls the entire visual surface and editorial voice. No code changes to re-skin.
Managed loops are the core primitive: scheduled workflows with a human approval gate and learning capture. Two ship in v1: Client Narrative and Raw Materials.
Skills are markdown files. Each agency’s playbook lives in /skills/<agency>/*.md — not in code — so adding a workflow is one file, no engineer required.
Per-deploy topology — one process, one SQLite, one branding.json per agency. No multi-tenant SaaS until 5+ paying customers ask for it.
Pricing: £3-5K setup + £1.5-3K/month. Agency owns their Anthropic billing. We provision and maintain.

Most agencies are using AI the way most people use a microwave: heat something up, send it out, hope it’s warm enough. ChatGPT for first drafts. A copy of Notion AI on the side. Maybe Lindy or n8n for an automation or two. The work happens in the cracks between tools, in the heads of two senior people, in spreadsheets nobody updates.

That’s the problem agency-os solves. Not by adding another tool, but by giving an agency a single deployable that is theirs — their brand, their playbooks, their data — and that runs the recurring loops that actually move the needle for clients.

What agency-os actually is

Three things, glued together cleanly:

1. A brand layer

Every visual and editorial decision in the app reads from a single branding.json file at boot. Palette, fonts, logo, favicon, OG image, app name, login copy, tone of voice, banned phrases, signature style. Drop in your file, restart, every page and every email is now the agency’s product, not ours.

For a luxury PR agency this looks like cream + ink + gold, EB Garamond serif headers, no emojis, NYT-Style-section voice. For a perf-marketing shop it’s the opposite: louder palette, sharper voice, tighter copy. Same product. Same code. Different file.

The voice tokens (voice.tone, voice.bannedPhrases, voice.signatureStyle) are read by the AI agents at runtime, so the drafts the platform produces sound like the agency, not like ChatGPT pretending to.

2. The loops primitive

A loop is a workflow that runs on a cadence, has a human approval gate, and captures learning. That last bit matters — loops compound. Each run feeds learnings into the next.

Concretely:

Inputs (collectors): structured pulls from GA4, HubSpot, Slack, Gmail, Gong, review sources, competitor pages, etc. Per-agency configurable.
Agents: chained LLM calls in roles — analytics, narrative, QA — that turn raw inputs into a draft output.
Human gate: account manager edits and approves before anything ships. Always. No silent send.
Outputs: branded email, Slack post, Notion doc, or Google Sheet update.
Learning: what worked, what didn’t, what was missing — saved into loop_learnings and fed into the next run’s prompt.

The AM’s job becomes editor, not author. The agency keeps its taste; the AI does the prep.

3. The skills directory

Each agency’s playbook lives in /skills/<agency>/*.md as plain markdown files with YAML frontmatter. One file = one workflow. Adding a new workflow is editing one file. No code change. No deploy. The runner picks it up on the next loop tick.

A skill file looks like this:

---
name: weekly-client-narrative
description: Synthesize last week's perf into a client-ready narrative
inputs:
  - { kind: ga4, required: true }
  - { kind: hubspot_pipeline, required: false }
  - { kind: slack_threads, required: false }
agents:
  - { role: analytics, model_slot: extraction }
  - { role: narrative, model_slot: workflow_default }
  - { role: qa, model_slot: auto_title }
human_gate: required
outputs:
  - { kind: email_draft, template: client_narrative }
learning_kinds: [winner, loser, gap]
---

# Weekly Client Narrative

## When to use
Every Monday morning, summarize the prior week for each active client.

## What good output looks like
- Opens with the single most important change ("Bookings up 18% WoW")
- Names the cause if known, names the unknown if not
- One decision the client needs to make this week
- Two paragraphs max

## What needs human approval
Always. Account manager edits the draft, then approves.

## What gets saved as learning
- Which framings the client engaged with (replied to, forwarded)
- Which framings they ignored
- Recurring objections or questions

That’s the entire workflow definition. Prompt, role mix, inputs, outputs, learning hooks — all of it. Codified expertise, version-controlled, editable by anyone on the agency team who knows how to use Git or even just a markdown editor.

The two loops that ship in v1

Loop A — Client Narrative

Every Monday at 8:00 a.m., for every active client, the platform pulls the prior week’s GA4 metrics, HubSpot pipeline, and Slack threads. The analytics agent surfaces anomalies against a 4-week baseline. The narrative agent turns those anomalies into two short paragraphs in the agency’s editorial voice. The QA agent checks every claim has data backing it.

The draft lands in the AM’s inbox by 8:30. They edit, approve, and the platform sends a branded email to the client with the agency’s logo, palette, and signature.

What the agency notices: the AM goes from spending 90 minutes per client per week on the “weekly update” to spending under 10 minutes editing a draft. Across 15 clients, that’s a senior person’s entire Monday morning back.

Loop B — Raw Materials

Continuously updates a per-client knowledge base from sales calls (Gong/Fathom), support emails, public reviews, and competitor pages. Three agents extract pain, language, objections, proof, and competitor claims into structured records, dedupe near-duplicates, and maintain a living “raw materials brief” per client.

That brief becomes the substrate for every other loop — ad copy, email sequences, sales decks, content briefs all start from the same source of customer truth, not from a copywriter’s memory.

How it deploys

This part matters because it’s where most “AI for agencies” pitches break.

Agency-os is not multi-tenant SaaS. Each agency gets their own deployment — their own server (or a shared box with isolated dirs), their own subdomain, their own SQLite database, their own branding.json, their own Anthropic billing. We provision the first version. The agency owns the keys after that.

Why this shape:

Data sovereignty. Client data lives on the agency’s box, not in a SaaS vendor’s shared DB. Clients increasingly ask for that — especially anyone in regulated industries.
No vendor lock-in for the agency. The branding, the skills, the data — all of it portable. If the agency wants to bring it in-house in year two, the migration is moving a directory.
Anthropic billing on the agency’s account. They control spend, they own the usage history, and the unit economics make sense from day one. We don’t mark up tokens.
Scales by adding deploys, not by re-architecting. Multi-tenancy gets built when 5+ paying agencies ask for shared resources. Not before.

The architecture, briefly

For the technically curious. Skip if you don’t care.

Backend: Express + better-sqlite3 + TypeScript ESM
Frontend: Vite + React 19 + Tailwind v4
Database: SQLite (one per deploy, encrypted at rest)
LLM: Anthropic Claude (OAuth via Claude CLI subprocess, or direct API key) + Gemini for image generation
Schedule: node-cron v4
Encryption: AES-256-GCM, master key from APP_SECRET
Hosting: Hetzner CAX11 (~€7/month per agency) + Caddy + systemd. Backups to a separate bucket nightly.

Three pluggable registries are the “OS” part — collectorRegistry (data sources), agentRegistry (LLM roles), delivererRegistry (outputs). Adding a new collector is one file in the registry; the loop runner doesn’t change.

Who it’s for (and who it isn’t)

Good fit:

Agencies in the 5-50 person range running 10+ active client accounts.
Account managers spending hours per week on report-writing, client updates, or research that follows the same shape every time.
Agencies that have a real point of view on how their work should be done — opinions encoded as SOPs, style guides, or just senior-people-knowing-things.
Agencies who want to look like a platform to their clients without becoming a SaaS company.

Bad fit:

Solo operators — the £3K setup + monthly is overkill for one person. Use a stack of off-the-shelf tools.
Agencies under 10 active clients — the loops compound on volume; under that you don’t feel the leverage.
Agencies that haven’t codified any playbook — if there’s no agency POV to encode, the platform just gives generic AI output, badly branded.
Anyone wanting a self-serve sign-up. We’re manually onboarding the first 5-10 customers. That’s deliberate.

The bigger pattern (this is for SMB readers)

Most of this blog is read by SMBs — 10-500 person companies trying to figure out custom AI. Agency-os isn’t for you directly. But the shape of it is exactly the shape of what custom AI looks like when it’s done right for any business, not just an agency.

It’s yours. Your brand on the surface, your data in the database, your billing on the API. Not rented from a SaaS vendor at $99/seat.
It encodes your playbook. Not the generic average from training data. Your tone, your decisions, your edge cases — in markdown files anyone on the team can edit.
Humans stay in the loop where it matters. The AI does the prep, your team does the judgment. No silent sends, no AI-mediated client relationships.
It compounds. Every run captures what worked and what didn’t, and the next run is a little smarter. SaaS tools can’t do this — their model is the same for everyone.
It deploys to a box you can move. No lock-in. If you want to take it in-house in year two, you’re moving a directory, not migrating off a SaaS vendor.

Agency-os happens to be for agencies. The same architecture — brand layer, custom workflows, human-gated loops, learning capture — is what we build for SMBs in finance, real estate, e-commerce, and trades. The names change (loops becomes “daily AP review” or “listing assistant”) but the shape is the same.

What’s on the roadmap

Locked in v1: brand layer, loops primitive, skills directory, two shipped loops (Client Narrative + Raw Materials).

Backlog, not in v1, in priority order:

Conversion loop — needs analytics + CRM + form-tracking integrations.
Sales enablement loop — needs deeper Gong/Fathom + CRM coverage.
Creative testing loop — ad account integrations (Meta Ads, Google Ads).
Demand capture loop — SEO data via Search Console + optionally Ahrefs/Semrush API.
Learning meta-loop — cross-loop pattern mining (e.g. winning hooks library). Only worth building when there are 5+ live loops producing signal.
Multi-tenant SaaS — only after 5+ paying agencies on per-deploy.
Skills marketplace — agencies share/sell SKILL.md files. Far future.

The pattern: ship the primitives, prove them with two real loops, then add domain-specific loops as paying customers ask. We’d rather have one agency loving the Client Narrative loop than five agencies half-using ten loops.

Pricing and how to start

Setup: £3-5K. Covers brand config (we work from your existing brand book), deploy on your subdomain, the first two loops authored from your existing SOPs, AM training.

Monthly: £1.5-3K. Covers hosting, ongoing skill iteration, monitoring, support. Anthropic API usage is on your billing — we don’t mark up tokens.

What you provide: a subdomain you control, an Anthropic API key (or Claude Max OAuth), and an initial conversation about your existing SOPs that becomes the seed for your first two SKILL.md files.

What you get in week 1: a deployed instance at your subdomain, branded, with your first loop running on test data. Week 2-3: production data, AM trained, weekly cadence locked in. Week 4 onwards: ongoing iteration based on what the loop is teaching you.

Live demo: agency.aimakers.co. The default brand is the editorial-luxury preset (cream + gold + serif). To see what it looks like with your colours, send your brand book and we’ll re-skin it for the demo.

Ready for a real number?

Estimate your custom AI project in 30 seconds

Three questions, an instant cost range and timeline based on real shipped projects. After 30 minutes on a discovery call you have a written fixed-price quote.

Try the cost estimator →See the timeline

Why we’re shipping this

Two reasons.

One: we’ve built versions of this for ourselves and for clients half a dozen times. Every time, the same architecture re-emerges — brand layer, scheduled loops, human approval, learning capture. It’s the right shape. Time to make it a product.

Two: the agencies we know are stuck between two bad options — SaaS that’s built for the average customer and doesn’t fit their work, or in-house dev that takes 18 months and a team they can’t afford. Agency-os is the third option: a deployable platform that’s opinionated enough to be useful from day one, but flexible enough to encode the agency’s actual point of view.

If you run an agency and any of this resonates, the live demo is at agency.aimakers.co. Login — we’ll send credentials — and click around. Ten minutes will tell you whether it’s the right shape for you.

If you’re an SMB reader and you’ve got this far — the same architecture is what we build for businesses every day. Different vocabulary, same shape. Talk to us if you want one.

— Mark