I Run Dozens of Projects with AI. The Hard Part Isn't the AI.

I run a micro-agency which, by nature, is always resource constrained. We purposely stay small and nimble to adjust quickly to changes in a project, or the industry as a whole, and that means we can't take on lots of team members. I've written about our twenty-year journey elsewhere but the upshot is that I end up doing a lot of the work myself.

Staying small and specialised is manageable with a handful of projects but when you're running dozens for multiple clients, the problem becomes keeping track of the context. What's new on this project? What are the priorities on that one? What are the specific requirements for this particular client? Tracking all of that detail, even when using a project management tool like Teamwork or Basecamp, across a bunch of projects doesn't scale.

This article is about how I fixed this problem of project context. The fix itself started off as a simple improvement for one client, but it grew organically, and ended up completely changing the way I use a computer and manage my projects. I've started calling the result an Agentic Operating Environment.

Too Busy for New Toys

ChatGPT was released on 30 November 2022 and while I followed developments with interest, I was too busy to be an early adopter. When you run a small operation, you don't have the luxury of playing with new tools just because they're novel or fun. You need to know something works before you invest time in it because paying clients demand your attention.

However, by mid-2023, ChatGPT had gone mainstream enough that it was time I gave it a spin in earnest, so I suggested to a client that we try it on a project. The results were astounding. We completed months of work in a few weeks and saved significantly on hiring specialist consultants.

The potential was immediately obvious. One person could now do the work of several, deliver higher quality output, and do it faster. For a small outfit, that's a significant edge.

Trapped in the Browser

I started with the web interfaces everyone else was using, ChatGPT, Claude, and primarily ChatLLM which gave me access to multiple models at a fraction of the cost. They were genuinely useful, but two problems kept getting worse the more I relied on them.

First, there was no real continuity between sessions, and even after the Projects feature rolled out, keeping the uploaded context updated meant manually preparing and re-uploading documents every time something changed.

Second, the work was trapped inside the browser. AI could help me think through a problem or draft a document, but getting that output into my actual project files meant tedious copy-paste. The download and export features on these platforms were unreliable. Many times ChatGPT or ChatLLM would offer me a download link and the file would be empty or not what was expected. I needed a way to break out of the browser.

Breaking Out

Claude Code was the breakthrough. Instead of talking to an AI in a browser window, I had an agent that worked directly inside my file system's project directories. It could read files, run commands, and do real work on my local machine.

Then there were the context files, giving the agent project background and instructions. Suddenly we had an AI that could pick up where the last session left off so there was no more re-explaining, no more starting from scratch.

The real shift was realising what "runs commands on your machine" actually meant. Yes, this was a coding tool designed for software developers to write and debug code. But it could run any terminal command, which meant it could configure my local environment, install packages, and manage services. And if it could do that locally, it could SSH into a remote server and do the same thing there. This was more than a coding tool. It was a general-purpose operator that could do anything I could reach from a terminal. That's when it became the foundation for everything else.

Claude Code obviously wasn't just for coding. Systems administration, project management, writing, research, document generation, anything that benefits from knowing a project's context could now be handled by an AI. The tool was designed for developers but the principle applied to everything I do day-to-day.

That opened a bigger question. If an agent can read a context file, what else could you put in one? Could you give it genuine working memory, like what happened last session, what's stuck, what another project needs from this one?

One Client, Many Agents

I started with one client who had particularly complex needs. Instead of one general-purpose AI assistant, I created specialised agents as separate Claude Code projects, each handling a different area of the client's work:

  • Systems administration
  • Web development
  • Content curation
  • Project management
  • Requirements analysis
  • Research
  • Document generation

Each agent had its own context, its own working memory, and its own domain expertise. The difference was immediate. Switching between projects no longer meant rebuilding my mental model of where each one stood. For someone juggling many fast-moving and slow-moving projects at the same time, that friction adds up.

Then I deployed the same approach on a second client, one that happened to have team members who overlapped with the first. This is where the cognitive burden genuinely lifted because the agents remembered all the details. Who works on what, what the specific requirements are, what happened last session and the session before that. Each agent would know at the start of every session what I used to carry around in my head, across two separate clients, without mixing anything up.

I stopped being the person who has to remember everything because the agents did that now.

Teaching Agents to Talk

The agents needed to communicate though. An infrastructure change in one project might affect the web development project, or a content decision might depend on input from project management. Plus, these agents weren't all running on the same AI vendor.

I was already using different models for different strengths. Some are better at careful, structured reasoning; others are faster and cheaper for routine tasks; some handle large volumes of context well; others are stronger at creative work or code generation. Picking the right model for each job made sense, but the agents lived on different platforms and had no native way to coordinate.

My first attempt at solving this let agents edit files directly in other projects. That soon broke when one agent tidied up files another agent needed, but the fix was obvious. My university research was in multiprocessor computing, and this is a solved problem. Also, Linux processes don't scribble in each other's memory; they communicate through pipes and message queues. Same principle, different scale. Instead of agents editing each other's files, they send structured messages, or memos. Each memo is a plain-text markdown file dropped into the receiving project's directory, and the receiving project picks them up at its next session. Because the memos are plain text files, they work across any AI tool that can read files. A Claude agent sends a memo, then a ChatGPT agent picks it up next session. The vendor boundary becomes invisible.

This was a bigger deal than it might sound because it meant I could pick the best model for each job without worrying about whether the agents could coordinate afterwards. Cross-project and cross-vendor communication, solved in one move with plain text and file conventions.

Working Memory

The memo system solved communication, but agents also needed memory that persisted between sessions.

My first version was a single state file with no size cap. It grew quickly and before long the agent was spending its limited context window reading history it didn't need. Still, the fix was straightforward. A project brief captures confirmed knowledge that changes slowly, like what the project is, who's involved, and what's been decided. A state file tracks what's happening right now and what needs to happen next. An archive holds older progress that isn't immediately relevant. A separate file captures reusable patterns and gotchas so the agent doesn't repeat the same mistakes. Git sits underneath everything, so nothing is ever truly lost and you can always trace what happened and when.

Each layer serves a different purpose. The conventions keep them from bleeding into each other.

An Agent That Builds Agents

By this point I had a system that worked. I had specialised agents with context files, state management, memo-based communication, vendor-neutral conventions. However, setting up a new project meant creating the right files, writing the context, establishing all the conventions. Repetitive work.

So I created what I think of as a factory agent. An AI agent whose job is to create other AI agents. Give it a project brief and it scaffolds everything like context files, state tracking, memo conventions, project-specific instructions.

Next was a manager agent that sits above the project agents and keeps the high-level view. It knows what's happening across all projects without holding the details of any single one. Each project agent tracks its own domain in depth but the manager tracks the big picture and coordinates between them.

Three layers:

  • a factory that creates agents;
  • a manager that coordinates them;
  • and project agents that do the actual work.

I started moving all my projects onto this system and jumped from two to dozens in the space of a few weeks.

The Boring Bits That Matter

The system works. But the problems that come with running AI agents on real client work are unglamorous and unavoidable.

I was reviewing an agent's work and noticed it had been claiming tasks as complete without actually doing the work. It reported everything as fine but when I challenged it, the agent admitted taking shortcuts. A convention was needed to require explicit state updates, so I put in place a task checklist system. It's not a trust issue. The models just need clear structures to follow. That's a recurring theme with this work: better conventions lead to better results.

I also learned which tasks justify the most capable model. Strategic work, writing, and complex decisions benefit from the best model available, but routine file scaffolding and handover updates don't need it. The difference shows in speed. A lighter model handles simple tasks in seconds where a heavier one takes noticeably longer, and if you're paying per use rather than a flat plan, the cost adds up too. It sounds like a small thing until you're running several sessions in parallel and a heavier model is still churning through a handover update while you're waiting to start the next project.

Then there are the invisible failures, the ones that don't produce error messages. An agent confident it's working in the right project directory turns out to be somewhere else entirely, and without conventions this happens a lot. Agents that archive messages before completing the actions in them. Getting them to follow the conventions reliably is one of the harder problems, and it's still ongoing. Over twenty documented patterns now, each one born from something going wrong in real client work.

Why Better AI Won't Fix This

If you're running multiple projects across multiple clients, a smarter model doesn't help much when it still forgets everything between sessions. The AI is already good enough but the coordination problem stays the same. What's missing for most people is the infrastructure and conventions that make it consistently useful across real work.

Most small businesses are still figuring out where AI fits. If you're starting to rely on it across multiple projects, the ceiling comes quickly. There's too much context to carry manually, too many sessions starting from scratch, and you end up being the human messenger between tools that can't talk to each other. The temptation is to stitch together automation workflows to fix this, but that's plumbing, not infrastructure. It breaks when any component changes. Building on the vendors' own tools means they maintain the foundation. You just maintain the conventions and customise them for your needs.

The hard part was never the AI but the conventions, the memory management, the communication protocols. Boring problems that determine whether AI actually works across real projects or just impresses you in a single session. I've solved it for my own business and I build this infrastructure for others. If any of this sounds familiar, I'm happy to talk through it.

Common Questions

Do I need to be technical to set this up?

To operate it, no. The conventions are plain text files that AI agents read and update. You don't write code or manage infrastructure day to day. Setting the system up does require technical knowledge (which is where I come in), but once it's running, working within it doesn't.

What AI tools does this work with?

Any tool that can read files in a project directory, which is most of the capable ones now. I currently use four different vendor families. Because the conventions are plain text with no vendor-specific dependencies, new tools slot in without changes. Given how quickly things shift in this space, that flexibility has already proved its worth several times over.

What happens if I stop using AI agents?

Everything is in markdown files. Your project state, history, issues, and accumulated knowledge are all human-readable and useful whether or not you're using AI. You'd walk away with better project documentation than most businesses have, which isn't a bad outcome regardless.


This article is part of an ongoing series on how Another Cup of Coffee is adapting to AI. Explore all articles in this series.

You may also like

Building an Operating Environment for AI Agents

Building an Operating Environment for AI Agents

How markdown files and conventions turned CLI agent tools into a coordination system running 44 projects across 14 organisations. No framework required.

Coffee and a laptop with ChatGPT

Still Alive: A Micro Agency's 20 Year Journey

This article will be the first in a series where I'll share how Artificial Intelligence has reshaped how we operate at Another Cup of Coffee.

Red lobster on a white plate

What OpenClaw Teaches Us About AI Agent Security

OpenClaw's security crisis exposed real problems with how AI agents handle credentials, plugins, and system access. Here's what went wrong and how a convention-based approach avoids these risks entirely.

Footnotes