Building an Operating Environment for AI Agents

Over the past twenty years I've managed server infrastructure, built WordPress sites, handled data migrations and put together architecture specs for clients across multiple organisations. If you juggle multiple clients you'll know the feeling. Every project has its own context, its own history, its own set of things that need to happen next. Even when you use some sort of project management tool, a lot of the details still need to live in your head. Furthermore, every time you switch between projects, you need to reframe your mindspace to recall where a particular project is in that particular moment in time.

I've now unexpectedly solved this problem by building a coordination system that lets AI agents work across projects, clients, and organisations. It wasn't something I set out to build and instead, it just sort of happened organically as I began incorporating agentic coding tools into my everyday workflow. I've started calling it an Agentic Operating Environment, or AOE. Essentially, it's a set of file-based conventions that give AI agents persistent memory and a way to coordinate across projects and vendors.

The Problem That Crept Up on Me

I was already using AI seriously across my projects for a range of different tasks like writing code, research and troubleshooting, drafting client documentation, and processing data. The problem was continuity. Every new conversation started from scratch and I'd spend the first ten minutes of every session re-explaining what the problem or project was about, what had been done, and what needed to happen next. I'd have to prepare and manually upload files to give context. Switch to a different AI tool and the whole thing started from zero. This is fine for a one-off task, but it was a real hassle for working on a dozen active projects across multiple clients.

Then context files appeared with Claude Code introducing a file the agent reads at the start of every session, giving it project background and instructions. Other tools followed with their own versions and suddenly agents could pick up where the last one left off. I remember thinking this changes everything.

Most people were writing code with these tools but I was now starting to run my business through them. For that purpose, knowing about one project at a time wasn't nearly enough. I needed agents that understood what was happening across projects: what's blocked, who's waiting on what, which client deliverable depends on which internal task finishing first.

This problem opened up a bigger question. If an agent can read a context file, what else could you put in it? Could you give it genuine working memory, like what happened last session, what's blocked, what another project needs from this one? Also, what if one agent's work needed to inform a different project? Could you pass that context along without me being the human messenger as was needed in the web versions?

What I ended up with is a few things that work together::

  • Working memory and project context so agents know what happened last time and
    understand the project
  • A memo system so they can talk across projects
  • Vendor-neutral conventions so I'm not locked to one AI provider
  • Specialised agents each handling a different area of work
  • Utility agents that manage the environment itself: a coordinator for cross-project visibility, a sysadmin for machine configuration, a configurator that scaffolds new projects
  • A factory that deploys new agent environments from templates

The individual pieces aren't complicated. Getting them to work together reliably across dozens of projects is where the time went, and what emerged is an architecture that looks fundamentally different from how most people structure their AI assistants.

It Doesn't Look Like Much

New AOE deployment for a website manager agent
New AOE deployment for our website manager

It doesn't look revolutionary. It's a terminal interface that's been used since the earliest days of computing. But what's happening in that conversation is that I described what a project needed in plain English and an agent built the entire working environment: context files, state tracking, communication channels, vendor-specific instructions. No commands to remember, no forms to fill in, no setup wizard, just a description of the outcome I wanted.

This is what I mean by "operating environment". It's not an operating system that manages hardware and device drivers, but something that sits above all of that. Where your OS gives you a graphical or command line interface to the computer (windows, menus, mouse clicks, command parameters to remember), this gives you a natural language one. You just describe what you need in plain English and an agent works out how to get it done.

What surprised me is how little custom code this requires. Tools like Claude Code, Codex, and Gemini CLI already ship with everything you need to get started. They read and write files, run shell commands, search codebases, and follow instructions from context files. I'm not stitching together third-party tools and hoping they still work next month. Instead, the foundation rests on core features that the vendors build and maintain.

The hard part is more the conventions than the tooling. For example, what goes in each file, how to draw boundaries between projects, how to keep agents following the rules session after session. That's where the time went.

This started off as a simple improvement for one client but has now completely changed the way I use a computer and manage my projects.

Working Memory

Every project gets a handful of markdown files that serve as the agent's working memory. A project brief (BRIEF.md) captures confirmed knowledge that changes slowly like what the project is, who's involved, what's been decided about scope and technical direction. A state file (STATE.md) tracks what's happening right now and what needs to happen next. An archive (HISTORY.md) holds older progress that isn't immediately relevant. Other files capture reusable patterns (INSIGHTS.md) and whatever domain context the agent needs. The whole lot sits in a git repository, so nothing falls through the cracks and every change is auditable.

Here's what a typical project looks like on disk:

project-root/
├── BRIEF.md           # Confirmed project knowledge
├── CLAUDE.md          # Agent instructions
├── STATE.md           # Working memory
├── HISTORY.md         # Archived progress
├── INSIGHTS.md        # Reusable learnings
├── memos/
│   ├── incoming/      # Unprocessed messages
│   └── archive/       # Completed messages
└── docs/              # Project-specific reference

If any of that sounds abstract, the files map to things you already know:

File Purpose
BRIEF.md What the project is, who's involved, key decisions
STATE.md What's happening now, what's next
HISTORY.md Archived progress
INSIGHTS.md Reusable patterns and gotchas
CLAUDE.md How the agent should behave

It works in layers, with the agent reading STATE.md at the start of a session, doing its work, and updating it when it's done. Here's what a real one looks like. It's anonymised, but the structure is genuine.

## Quick Reference

| Key | Value |
|-----|-------|
| Project Path | ~/Projects/ClientA/web-dev |
| Last Updated | 2025-09-28 |
| Last Agent | Claude Code |
| Current Focus | Header component refactor |

## Current Status

Theme migration 80% complete. Navigation and footer done,
header still needs mobile breakpoints. Blocked on logo
asset from client.

## Recent Progress

### 2025-09-28
- Completed footer template with accessibility fixes
- Agent: Claude Code

### 2025-09-26
- Migrated navigation patterns from legacy theme
- Agent: Gemini CLI

## Next Actions

1. [x] Migrate navigation patterns
2. [x] Rebuild footer template
3. [ ] Refactor header component (blocked: logo asset)
4. [ ] Cross-browser testing

## Handover Notes

Header refactor is ready to go once the client sends the
logo. SVG preferred, PNG fallback. The mobile nav uses a
slide-out pattern, not a dropdown. See docs/nav-spec.md.

Every new session picks up exactly where the last one left off. The agent knows what's done, what's blocked, and why. Older entries get archived periodically to keep the working file lean.

Pick the Right Tool for the Job

I use Claude Code, Codex, Gemini CLI, and DeepAgent across different projects, and each has strengths for different types of work (and some genuinely maddening blind spots, but that's a longer conversation). The system doesn't care which one you pick.

Each project carries vendor-specific instruction files like CLAUDE.md for Claude Code, AGENTS.md for Codex, GEMINI.md for Gemini CLI. DeepAgent handles its configuration through its own rules system rather than a markdown file. These files tell the agent how to work in this particular project, covering things like session protocols, coding standards, and how to handle state updates. The content differs per vendor because each tool loads instructions differently, but the conventions they enforce are identical. If I want to switch providers for a project, nothing breaks.

What those instruction files actually enforce is the same core loop. Read STATE.md and check for incoming memos, extract your action list, do the work, update STATE.md with progress and handover notes, commit. Every agent follows this regardless of vendor. The instruction files also cover git conventions (like always use the -C flag, never commit to another project's repo) and cross-project communication rules (send memos, don't edit other projects' files). The format is vendor-specific but the behaviour is nearly identical.

This is a deliberate choice as the AI vendor market is volatile, with pricing changes, shifting capabilities shifts, and new tools appearing constantly. Last year's best option might not be this year's, and if your operation depends on one provider's way of doing things, a pricing change or service outage becomes a showstopper.

Claude API Error 500
Claude API errors? Launch another vendor's model and keep working.

With vendor-neutral conventions, I simply run another vendor's agent and the project keeps running. It's a standard part of my workflow. Also, because the conventions are just files, there's nothing stopping you from running local models if your hardware is powerful enough. The system doesn't care whether the AI is in the cloud or on your own machine.

Cross-project Messaging

The agents needed to communicate across projects just as human team members do. For example, an infrastructure change in one might affect web development in another, or a content decision might depend on input from project management. However, the agents lived in separate project directories with no native way to talk to each other.

Initially I would let agents edit files directly in other projects. This was obviously not workable very early on when an agent tidied up what it thought were stale files that another agent needed. The fix was obvious as my university research was in multiprocessor computing, and this is a solved problem. Also, Linux processes don't scribble in each other's memory; they communicate through pipes and message queues. Early bulletin board systems worked the same way: post a message to a board, the recipient reads it when they next connect. The principle hasn't changed in decades: send a message, let the receiver deal with it in its own time.

Same principle here. Instead of agents editing each other's files, they send structured messages (I call them memos) into each other's memos/incoming/ directories. The receiving project picks them up at its next session then archives any that have the necessary actions completed. Because the memos are plain text files with naming conventions, they work across any AI tool that can read files. A Claude agent sends a memo, a Gemini agent picks it up, and the vendor boundary becomes invisible.

Agents sending memos
A sysadmin agent sends a bug to the project's developer agent.

The files are named MEMO-<topic>.md and follow a standard format. Here's one (anonymised) where a web development agent is notifying an infrastructure agent that a deployment is ready:

# Memo: Staging Deployment Ready

**From:** web-dev @ ~/Projects/ClientA/web-dev
**To:** sysadmin @ ~/Projects/ClientA/sysadmin
**Date:** 2026-02-28
**Subject:** Theme migration ready for staging

---

## Purpose

Header refactor is complete. Ready for staging deployment.

## Content

All templates migrated and tested locally. No new
dependencies. The only change to server config is the
new image optimisation path in the nginx rules
(documented in docs/nginx-changes.md).

## Action Required

- [ ] Deploy to staging environment
- [ ] Verify nginx config change
- [ ] Run smoke tests and report back

---

## Completion Notes
<!-- Receiving agent: complete this section before archiving -->

The receiving agent reads the memo at its next session, works through the checkboxes, fills in the completion notes, and archives it. The checkboxes are the accountability mechanism and a memo can't be archived until every action is ticked off. At least, that's how it's supposed to work; in practice, agents occasionally skip steps or archive memos prematurely. Getting them to follow the conventions reliably is one of the harder problems I've had to solve.

Putting Agents to Work

I started with one client who had particularly complex needs. Instead of one general-purpose AI assistant, I created specialised agents, each handling a different area like systems administration, web development, content curation, project management, research, document generation. Each agent has its own working memory and domain expertise.

Then I deployed the same approach on a second client, and the cognitive burden genuinely lifted. The agents remembered every detail I used to carry around in my head, across two separate clients, without mixing anything up. I stopped being the person who has to remember everything.

Sitting above the project agents is a coordinator which is a dedicated agent whose job is cross-project visibility. It reads the state of every project (44 at last count, across 14 organisations), tracks what's blocked, routes memos between projects that don't know each other's paths, and maintains a dashboard of the whole operation.

The way it knows what's happening is straightforward because each project is registered in a YAML file:

- name: client-a-web
  path: ~/Projects/ClientA/web-dev
  organization: client-a
  description: WordPress theme development and maintenance
  tags:
    - web
    - wordpress
  dependencies:
    - client-a-infra
  registered: 2026-01-15

A scanner reads STATE.md from every registered project and builds a view of what's active, what's blocked, and what's gone quiet. The mechanism is simple but the visibility it gives you is automatic. No one has to update a dashboard or fill in a status report as the agents' own working files are the source of truth.

An Agent That Builds Agents

When the conventions started working reliably, I wanted a way to replicate them without manually setting up every new project, so I built what amounts to a factory agent. I give the agent a project brief and it scaffolds everything in one step. The output is a complete project directory including: - instruction files for each AI vendor, with session protocols, git conventions, and cross-project communication rules baked in; - a STATE.md with the Quick Reference table and empty sections ready to fill; - a memos directory; - its own README, HISTORY.md, INSIGHTS.md, and whatever domain-specific extras the project type calls for.

Different project types get different scaffolds. A sysadmin project gets server inventory sections and SSH configuration templates. A bookkeeping project gets invoices and reconciliation directories. The conventions are the same across all of them but the domain-specific bits vary. I can point the factory at an existing working project and say "set up a new one like this, but for Client B." It uses the existing project as a reference and adjusts for the new context.

The factory has deployed over forty projects so far, each one inheriting the same coordination conventions. A new project goes from nothing to a working agent environment in one step.

Star vs Mesh

Now that you've seen all the pieces, it's worth stepping back to look at the structure of what emerged because it's not the shape most people build.

Most AI assistant frameworks use a hub-and-spoke topology. Daniel Miessler's PAI is probably the best-known example with one central assistant, one identity, one memory system, and skill modules for different domains. Kenny Liao's personal assistant follows a similar pattern, with domain-specific plugins loaded into a single runtime. The skills are more sophisticated than they look. PAI's chain into each other, and Liao's use progressive context loading. But the structural principle is the same: one hub and everything flows through it.

YouSingle Assistant(PAI)Skill:HealthSkill:FinanceSkill:Writing

My system uses a mesh topology. I first came across mesh architectures in the early 2000s while evaluating mesh networking technologies for a Japanese trading house. The concept isn't new, so applying it to AI agent coordination seemed like a natural fit.

Each project is an autonomous node with its own state (STATE.md, HISTORY.md), its own instructions (CLAUDE.md, AGENTS.md, GEMINI.md), and its own agent identity. No single node holds everything, and nothing requires them to sit on the same machine. Projects communicate asynchronously via memos, like Unix processes communicating through pipes.

client-a-webclient-b-webcoordinatorclient-b-infrashared-toolsmemomemomemomemomemo

Star (PAI) Mesh (AOE)
Context One context for everything Scoped per project; agent only sees what it needs
Boundaries All domains share memory Client A's data never enters Client B's context
Failure Centre goes down, everything stops One project breaks, others unaffected
Scaling Gets heavier as you add domains Adding a project is just adding a node
Vendor Tied to one agent runtime Each project picks its own AI vendor
Collaboration Implicit (shared memory) Explicit (memos with structured actions)

None of the major agent frameworks frame this as a deliberate choice. Anthropic calls their pattern "orchestrator-workers". LangGraph calls it "supervisor". CrewAI calls it "hierarchical process". They describe what the orchestration does, not what shape it takes. The structural decision goes unnamed, which is probably why it goes unexamined.

The hub model is optimised for one person wearing many hats, like a freelancer who does health tracking, writing, finance, and coding through a single assistant that knows their whole life. My model is optimised for many projects with clear boundaries, where a web development agent for Client A must never see Client B's project data, where cross-project coordination needs to be auditable (every memo is in git), and where different projects might need different AI vendors entirely.

The word "mesh" has started showing up in enterprise AI writing too, but there it means container orchestration, agent registries, policy enforcement. This is infrastructure for managing fleets of agents across an organisation. What I'm describing is much simpler concept of project directories that talk to each other through plain-text files.

It's the same underlying insight as PAI and similar tools. The filesystem is the context system, markdown files are the medium, but there's a key difference. Those systems use the filesystem for context while mine uses it for coordination too. Memos route between projects, state files track handoffs, every change lives in git. The filesystem is both the memory and the message bus.

Where It Stands

That's the full picture. Working memory, memos, vendor-neutral conventions, specialised agents, a factory to set up new projects, and an architectural structure that none of the major frameworks are talking about yet. The building blocks are markdown files, naming conventions, and structured messages. No database (unless you specifically need it for a project), no platform, and no special software. However, the judgement calls about where to draw project boundaries, how to scope agent responsibilities, and how to keep conventions working reliably across a growing number of projects are what took months of trial and error across real client work.

As of writing, there are 44 active projects across 14 organisations. This is real production work involving published websites, live content, architecture specifications, server infrastructure. None of this is proof of concept or theorising.

A few design principles fell out of the process that I think are worth naming:

  • Vendor-agnostic. Works with any AI that can read markdown. No proprietary formats, no lock-in.
  • Convention-based. No runtime dependencies, no database, no platform to maintain. It's files and naming conventions and that's it.
  • Distributed boundaries. Each project is its own island. Client A's working files never enter Client B's context unless you explicitly tell an agent to share something. Cross-project communication goes through memos as standard.
  • Human-in-the-loop. I approve every consequential action so the agents do the legwork while I make the decisions.

Right now the whole thing is passive infrastructure. I start a session, the agent picks up where the last one left off, then I close the session. The direction is toward semi-automated, where events trigger sessions and memos route themselves. But that's a story for another post.

It's not finished (honestly, I doubt it ever will be as it continues to grow and improve), but it handles real client work across multiple organisations every day. There's more to say about how this compares to other approaches and where it goes next, and I'll get to that.

If you're running multiple projects and spending too much time being the human messenger between them, this approach works. I'm happy to talk through how it might apply to your setup, drop me a line.


This article is part of an ongoing series on how Another Cup of Coffee is adapting to AI. Explore all articles in this series.

You may also like

One person running dozens of projects with AI agents

I Run Dozens of Projects with AI. The Hard Part Isn't the AI.

One person, dozens of projects, four AI vendors. I spent a year building a coordination system for AI agents. The components are simple. Getting them right was not.

ChatLLM by Abacus.AI

Why We Keep Using ChatLLM Despite Everything That's Wrong With It

ChatLLM delivers powerful AI capabilities at a fraction of the cost, despite terrible documentation and non-existent support. Our review reveals how we harness this rough-but-effective tool to provide value.

Coffee and a laptop with ChatGPT

Still Alive: A Micro Agency's 20 Year Journey

This article will be the first in a series where I'll share how Artificial Intelligence has reshaped how we operate at Another Cup of Coffee.