VibeOps: Let AI Do the Prep, Not the Decisions

There's a growing conversation about whether we need "VibeOps", an AI tool that reads your repo and automatically sets up CI/CD, containerisation, scaling, and infrastructure. In my experience the idea addresses a real gap. AI tools can generate frontend and backend code rapidly, but getting code to production safely still requires judgment.

I do get the appeal though. But automating deployment decisions is a different problem to automating code generation, and the consequences of getting it wrong are much worse.

At Another Cup of Coffee, I run a setup where AI agents handle most of the software development workflow: writing code, coordinating across projects, drafting documentation, managing communications. We use CI/CD where it fits the project, but the deployment decisions stay human-gated, quite deliberately. So why not automate the lot and how does our setup actually work in practice?

The Setup: AI Agents That Don't Deploy Themselves

Our development environment runs on a multi-agent architecture. Currently that's Claude Code, Codex, and Gemini CLI across different projects, with a number of specialised agents for each one. A developer agent writes code, a sysadmin agent manages infrastructure, a writer agent handles content, a project manager coordinates timelines. You get the idea. I appreciate that sounds like chatbots bolted onto an IDE, but it really isn't. They're session-based agents with persistent state, each with access to the filesystem, shell commands, and when required, other agents and projects. The architecture is vendor-neutral; each project has its own AI provider through vendor-specific instruction files, and the coordination conventions work identically regardless of which tool is running.

Agents communicate through a memo system. When the developer agent finishes a build, it doesn't trigger a deployment pipeline. Instead, it writes a memo to the sysadmin agent's inbox describing what was built, what changed, and what infrastructure it needs. The sysadmin agent reads the memo, reviews the requirements, prepares the deployment configuration, and presents the steps to a human operator for execution.

Agents prepare deployments. Humans decide when and how to execute them.

A typical deployment flow looks like this:

  1. The developer agent builds the application and produces artifacts. It might, for example, be a static site, a Docker image, a database migration, or all three.
  2. It writes a deployment memo to the sysadmin project's inbox with full context on what's being deployed, what services it depends on, what environment variables it needs, and what verification steps should follow.
  3. The sysadmin agent reads the memo, creates or updates the Docker Compose configuration and reverse proxy rules, and writes a step-by-step runbook.
  4. The human operator reviews the runbook. Depending on the operation, they either execute it manually via SSH or approve the agent to run it directly.
  5. The sysadmin agent updates its state file and marks the memo as complete.

We don't draw a hard line between what agents handle and what humans do. Some operations the agent runs directly, like copying build artifacts to a staging server. Others get a human at the keyboard, particularly anything involving firewall rules on a production box. It depends on how much damage a mistake could do, and that changes from one operation to the next.

Why Not Automate Deployment?

For most people, the vision behind VibeOps is an AI that reads your code and decides how it should run. Reading the code is the easy part, but how something should run depends on context that lives nowhere near the code.

A startup with a single VPS has different deployment requirements to an enterprise with a large AWS budget. The code could be identical, but the infrastructure decisions are completely different. An AI that auto-provisions "optimal" infrastructure has no concept of your monthly spend limit unless you tell it, and if you're telling it all your constraints, you're just writing a different kind of configuration file. You haven't automated the decision-making, you've moved it from a hosting control panel to an AI prompt.

It's the same thing with traffic patterns. A project serving a handful of internal users and a project facing the public internet can share the same general structure but need radically different scaling and security configurations. The AI would need to know a whole bunch of things beyond just your current traffic, like your expected traffic, your tolerance for downtime, and your plan for traffic spikes. These are business decisions rather than technical ones.

There's also the cost runaway problem. On cloud infrastructure, uncontrolled automation risks running up your bill. The combination of a modest traffic spike and an auto-scaling rule that's a bit too eager, and suddenly you owe your cloud provider a fortune. On a fixed-capacity machine, the failure is different but the root cause is the same. Instead of a surprise bill, you get a frozen system.

I hit the same problem at smaller scale a few weeks ago. One of my development environments runs on a modest workstation, an i3-6100 with 8GB of RAM, nothing fancy. I'd been a little careless about Docker container management as I was deep into a project. Four separate stacks were running simultaneously, fourteen containers in total, all set to restart: always so they'd come back up after every reboot whether I needed them or not. One afternoon the machine just froze. It was completely unresponsive for about twenty minutes, right in the middle of some urgent work. I couldn't even get a terminal so had to hard reboot.

The dangerous failure mode of automated infrastructure is that it works too well. It scales resources you didn't intend to scale, restarts services you meant to stop, provisions capacity you can't afford.

Keeping the Infrastructure Simple

The stacks we build for clients tend to follow the same principle: every piece should be something you can fully understand and troubleshoot without too much effort. Containerised applications, a reverse proxy that handles HTTPS automatically, hardened servers, scheduled backups, encrypted secrets. Kubernetes and Terraform solve real problems at scale, but for many of our clients who run a few services, they're way too much overhead for what you get back.

One thing I always care about is rollback. Deployment configurations are version-controlled in git, so if something goes wrong you redeploy the previous version. For data, you restore from your most recent scheduled backup. The simpler the stack, the easier this is. All you do is put the old files back and restart.

How AI Agents Coordinate Deployments

The memo system deserves more detail because it's the part that most resembles the VibeOps vision, which is AI understanding your project and making infrastructure decisions, while keeping humans in control.

Each agent project has an inbox (memos/incoming/) and an archive (memos/archived/). Every memo follows the same structure. There's a header with sender, date, and priority, then context, then action items with checkboxes, then a completion section. I know this sounds like bureaucracy, and honestly it sort of is. But any agent can read any other agent's correspondence, and when an agent finishes processing a memo it fills in the completion notes and moves it to the archive. The result is that every infrastructure decision has a paper trail; when something breaks, you can trace back through archived memos to see exactly what was deployed, when, why, and by which agent.

Agents are constrained to their scope through layered controls. Most can only interact with other projects through the memo system, and any operation that could affect a live server gets presented to a human operator first.

Cross-project coordination happens asynchronously. If a security concern is discovered, the sysadmin agent can write memos to all affected projects describing the new access control policy. Each project's agent processes the memo independently. No central orchestrator, no shared state, no single point of failure.

The agents also maintain state files (STATE.md) that track current status, recent progress, next actions, and handover notes. When a new conversation starts, the agent reads its state file and pending memos to understand where things left off. Agents don't need persistent memory of every past conversation. They reconstruct context from documentation, the same way a human engineer would read a project's README and recent commit history before starting work. I've written about how this scales across dozens of projects in Run Dozens of Projects with AI.

Why Access Between Projects Is Deliberate

Unlike the hub-and-spoke model most agent frameworks follow, our architecture uses a mesh. Each project is an autonomous node with its own state, its own instructions, and its own agent identity. Further, I can set things up so that agents can't see other projects by deploying to separate machines or instances. When an agent does need broader access (a coordinator that works across multiple projects, for example), that access is granted explicitly.

This matters because a single AI tool managing all your infrastructure at once is a hub model. One compromise, misconfiguration, or simply a bad judgment call, affects everything. A mesh where access is deliberately granted rather than assumed by default limits the damage when something goes wrong.

I've written in more detail about this architecture in Building an Operating Environment for AI Agents.

Security: The Part VibeOps Would Get Wrong

Automated security scanning, patching, and monitoring are well-established and genuinely useful. The problem here is different: an AI deciding how to containerise your application, configure your deployment pipeline, and scale your infrastructure, all based on reading your repo. Those decisions depend on context the tool either can't see or that would be too time-consuming to keep feeding in.

That's why we keep humans in the loop for those decisions, and layer security controls on top so the agents can't overstep even when they're handling the routine parts. If one layer misses something, the next one hopefully catches it. You can read more about how we approach this in What OpenClaw Teaches Us About AI Agent Security.

The agents are session-based. This means that when I'm not actively working with them, there's no daemon or background service running. At this stage of AI development, I don't think agents are reliable enough to be making infrastructure decisions unsupervised. That will likely change in the future, but right now I'd rather not risk my business or my clients' by having them work while I'm not looking.

What VibeOps Should Actually Be

There is a real gap between AI-generated code and production deployment, and I think it will get filled well eventually. Right now, for most businesses our size, the practical approach is to let AI handle the preparation but keep a human on the decisions that actually matter.

In practice that means AI can generate your deployment configurations, write your runbooks, and tell you what it doesn't know. What it shouldn't be doing yet is executing changes to live infrastructure without someone looking at them first. That boundary will shift as the tools get more reliable, but for now the risk of getting it wrong is too high.

Our agent system works this way. The sysadmin agent prepares configurations, writes runbooks, and flags risks, but it presents everything to a human before anything touches a production server. I'm still figuring out exactly where the boundary sits (it moves as I get more confident in the guardrails), but the principle holds: humans spend their time on decisions, not on remembering technical steps.

Getting from Vibe Coding to Production

If you're currently stuck between AI-generated code and manual deployments with no clear path between them, the good news is that most of the pieces already exist. You probably just haven't connected them yet.

The starting point is to containerise your applications so they run the same way everywhere, put a reverse proxy in front that handles HTTPS automatically, and harden your server before you deploy anything. If you have a developer or technical partner handling this, get them to write deployment runbooks rather than relying on scripts that fail silently. A runbook that says "run this, check this, then run this" is easier to review and safer to hand off than a bash script that does everything at once.

If you're already using AI coding tools, use them for deployment preparation too. Have them generate your configurations and templates. Make sure you review the output. Let them handle the low-risk steps, and keep your hands on the controls for the rest.


Common Questions

What is VibeOps (vibe DevOps)?

VibeOps is an emerging concept for AI tools that automatically read your codebase and set up deployment infrastructure, CI/CD pipelines, containerisation, and scaling. Sometimes called "vibe DevOps," the idea is to bridge the gap between AI-generated code and production deployment without manual configuration.

Can AI agents safely handle production deployments?

AI agents can safely prepare deployments by generating configurations, writing runbooks, and flagging risks. What they shouldn't be doing yet is executing changes to live infrastructure without a human reviewing them first. As these tools mature this will likely change, but right now a human-in-the-loop approach is the safest way to get the benefits without the risk.

What's the risk of fully automated cloud deployment?

The main risk is that automation can work too well. On cloud infrastructure, a modest traffic spike combined with an eager auto-scaling rule can run up a significant bill. On a fixed-capacity machine, uncontrolled automation can freeze your system entirely. In both cases the problem is the same: automation doing more than anyone intended, with no human checkpoint to catch it.

What's the best approach to deployment for small businesses?

Keep the infrastructure simple enough that you can fully understand and troubleshoot it. Containerise your applications, automate HTTPS, harden your servers, and write deployment runbooks that a human can review. If you're using AI coding tools, use them for deployment preparation too, but keep your hands on the controls for anything that touches a live server.


This article is part of an ongoing series on how Another Cup of Coffee is adapting to AI. Explore all articles in this series.

You may also like

Building an Operating Environment for AI Agents

Building an Operating Environment for AI Agents

How markdown files and conventions turned CLI agent tools into a coordination system running 44 projects across 14 organisations.

What OpenClaw Teaches Us About AI Agent Security

What OpenClaw Teaches Us About AI Agent Security

What an open-source AI agent framework reveals about the security challenges of giving AI tools access to your filesystem and shell.

One person running dozens of projects with AI agents

Run Dozens of Projects with AI

How a markdown-based working memory system and session protocol lets AI agents coordinate across dozens of active projects.

Featured image photo by Hans Westbeek on Unsplash.