What OpenClaw Teaches Us About AI Agent Security
OpenClaw went from zero to 180,000 GitHub stars in a matter of weeks. Then the security reports started arriving.
In early February 2026, researchers disclosed CVE-2026-252531: a one-click remote code execution vulnerability that could compromise any OpenClaw instance, even ones bound to localhost. Within days, independent scans found tens of thousands of exposed instances2 across dozens of countries. Over 93% of verified instances had critical authentication bypass vulnerabilities.
We've been building our own multi-agent system at Another Cup of Coffee, a set of file-based conventions that let AI agents coordinate across projects and organisations. Honestly, we watched those disclosures land with a sort of guilty relief. Sympathy too, because building in public is hard and getting torn apart on security is painful. But mostly relief, because the problems OpenClaw exposed are exactly the ones we'd been paranoid about from the start.
In short: OpenClaw's architecture had plaintext credentials, an unvalidated WebSocket gateway, and a plugin marketplace where 20% of submissions were malware. We build AI agents differently: no persistent services, encrypted credentials via pass and GPG, file-based coordination through text memos, and layered defences from instruction files to hooks to VM isolation. This article breaks down what went wrong and how a convention-based approach avoids these risks.
How OpenClaw Blew Up
OpenClaw tried to turn an AI agent into a personal operating system. Browser automation, shell commands, cron jobs, inbox management, 50+ service integrations. All controlled through messaging platforms like WhatsApp and Telegram. Fair enough on the ambition. But the way they built it was a disaster.
Plaintext credentials everywhere. OpenClaw stored API keys, OAuth tokens, and other secrets in plaintext Markdown and JSON files, sitting in ~/.openclaw/ where any process on the machine could read them. Security researcher Jamieson O'Reilly of Dvuln demonstrated access to Anthropic API keys, Telegram bot tokens, Slack credentials, and full chat histories from exposed instances. Just sitting there in a dot-directory, unencrypted and readable by any process on the box.
The WebSocket vulnerability was arguably worse. The CVE-2026-25253 attack chain worked because OpenClaw's server didn't validate origin headers, so a victim clicking a single malicious link was enough to hijack their agent's gateway and get full command execution with the agent's system permissions. Localhost binding didn't help, because the attack pivoted through the victim's own browser. One click, game over.
Then there was ClawHub. An initial audit identified 341 malicious skills3 in the plugin marketplace. Follow-up scans pushed the total past 800, roughly 20% of the entire registry. The primary payload was Atomic macOS Stealer, harvesting passwords, SSH keys, and cryptocurrency wallets. The only barrier to publishing a skill was a GitHub account older than one week. Twenty percent. Think about that for a second.
Simon Willison, who also coined the term "prompt injection", calls it the "lethal trifecta" because it combines access to private data with exposure to untrusted content and the ability to communicate externally. OpenClaw had all three by design.
It's Not Just OpenClaw
OpenClaw moved fast, skipped security basics, and paid the price. Fair enough. But if you're thinking "well, I don't use OpenClaw, so this doesn't apply to me," we'd push back on that.
The deeper issue is architectural, and it's becoming common. AI agents that run as always-on services with broad system access, centralised credential stores, and third-party plugin ecosystems. Every one of those design choices creates attack surface, and the same pattern shows up in other agent frameworks that want to be platforms. Accumulating capabilities, running background services, storing secrets, trusting marketplace content. The more capable the agent becomes, the more damage a single compromise can do.
What We Actually Do
Our system works on different assumptions. We call it an Agentic Operating Environment, and internally we have components with names like "multi-agent-framework" and "project-coordinator" (we're not great at branding). The security properties come from the architecture, not the naming.
The biggest difference is that our agents don't run between sessions. There's no gateway to hijack, no WebSocket to exploit, no daemon listening on a port, and when a session ends nothing is running. The workstation itself is LUKS-encrypted, and SSH runs on a non-standard port with key-only authentication so password login is disabled entirely. The entire attack surface of CVE-2026-25253 simply doesn't exist because there's no service to hijack.
Where OpenClaw dumps API keys into plaintext Markdown files in ~/.openclaw/, we use pass (the standard Unix password manager, been around for years, boring and reliable) backed by GPG encryption. API keys reach agents through environment variables, never through files in the project tree. No dot-directory full of plaintext secrets sitting there for malware to harvest.
| OpenClaw | Our approach | |
|---|---|---|
| Runtime | Always-on daemon with WebSocket gateway | Session-only, nothing running between sessions |
| Credentials | Plaintext in ~/.openclaw/
|
GPG-encrypted via pass, injected as environment variables |
| Plugins | ClawHub marketplace (20% malware at audit) | Capabilities come from the vendor (Claude Code, Codex, Gemini CLI) and instruction files |
| Isolation | Single agent, access to everything | Per-project boundaries, optionally on separate hardware or VMs |
| Audit trail | None by default | Every file operation is a git commit |
Conventions, Hooks, and the Agent That Went Rogue
We should be honest about something, though. AI agents overstep. It's not theoretical. Early on, one of our agents decided to "help" by reorganising files across a project it had no business touching. No malice, no exploit, just an agent that interpreted its instructions broadly and started tidying up someone else's work. We caught it in the git diff, reverted it, and spent the rest of that day (and most of the evening, honestly) writing stricter instruction files. That's the moment we stopped trusting conventions on their own.
So now we layer them. Instruction files tell agents to be read-only by default and to confirm before writing, so it's convention rather than enforcement, but conventions that the agent reads at the start of every session. On top of that, Claude Code's permission system lets us configure allow/deny lists controlling which tools an agent can use and which paths it can touch.
Where conventions aren't enough, we use hooks, which are scripts that intercept commands before they execute. Our email guard hook is a good example. It parses every Bash command for mail binaries, catches evasion attempts through subshells and command substitution, and blocks them unconditionally. If it can't parse the input, it blocks anyway. Fail closed, not open (we learned that one the hard way).
None of these layers is absolute on its own. But that's sort of the point.
Text Files Can't Execute Code
The rest follows from one simple property of our architecture: agents communicate through text files. A STATE.md can't open a reverse shell and a memo can't install a rootkit. Now, text files aren't completely harmless (prompt injection is real, and a poisoned memo could try to manipulate an agent into doing something it shouldn't), but compare that attack surface to executable plugins with system access. It's a fundamentally narrower target.
Each project is its own boundary, too. An agent working on one client's web development never sees another client's data, never reads their state files, never processes their memos. Where OpenClaw's single agent had access to everything, our mesh architecture means a compromise in one project stops at that project's directory. Cross-project coordination happens through structured markdown memos (just text files with checkboxes, nothing fancy), and since every file operation shows up in git, any violation is immediately visible. That same git history gives us an audit trail almost for free. We're adding more layers too (blocked commands now go to syslog, email drafts sit in a review queue until a human approves them) but honestly, when every change is already a git commit, you're most of the way there without trying.
When Conventions Aren't Enough
Conventions and hooks are good. But an agent with shell access can, in principle, ignore every convention file it reads.
The thing is, our mesh architecture already handles part of this. Nothing requires projects to sit on the same machine. Each project is an autonomous node that communicates through text files, so you can run different projects on different physical hardware and the coordination still works through memos. An agent on our workstation sends a memo to a project directory on a separate NUC across the network (yes, Samba, it's not glamorous but it works), and the receiving agent picks it up at its next session. Physical isolation between projects without changing anything about how the system works. And you can go further: configure the Samba share to only expose the memos/incoming/ directory, not the full project tree, and the sending agent gets a narrow write-only channel. It can't see the receiving project's source code, state files, or client data. It can't even list what other memos are already sitting there. A one-way letterbox between machines, which is a much harder boundary than "the agent promises to only read its own files."
Of course, dedicating a physical machine to every project that needs isolation isn't always practical. For those cases, we can deploy KVM virtual machines on the same hardware instead. A Debian guest with no shared folders and no host filesystem access. SSH-only from the workstation, key-based auth, nothing else. The agent works inside the VM as if it were a standalone machine. If a session goes wrong, you roll back the entire VM state to a snapshot and it's like it never happened.
Docker gets you filesystem isolation, but you're still sharing the kernel and the snapshot story isn't as clean. A full VM is a harder boundary. It's more overhead, sure, but for sessions where an agent has broad shell access and you're experimenting with something new, I'd rather have that overhead than spend an evening working out what it changed.
This isn't security through obscurity. It's security through reduction and layered defence. Fewer moving parts, encrypted credentials, conventions backed by hooks and permissions, and VM isolation when you need a harder boundary. The audit trail is baked into the architecture rather than bolted on after the fact.
What We Give Up
Our approach gives up things that OpenClaw offered. We don't have 50+ service connectors. We can't trigger browser automation from Telegram or manage a calendar from WhatsApp. Inbox management is possible, but it needs explicit configuration per project. One of ours has access to a dedicated Gmail address through standard IMAP sync, not to anyone's personal inbox. That's a scoped, session-only capability on a dedicated address, not always-on access to your entire digital life. The operational overhead of keeping an always-on agent secured across fifty integration points isn't something we're eager to take on.
For hobbyists and developers who enjoy living on the bleeding edge, those features are the whole point. OpenClaw's popularity proved there's genuine demand for an AI agent that lives in your messaging apps and manages your digital life. And if something goes wrong, you reinstall and move on.
But if you're running a business? The calculus is completely different. An agent with access to your email, your client files, your invoicing, your calendar, fifty service integrations, and it gets compromised or just makes a stupid mistake? That's not a "reinstall and move on" situation. An email sent to the wrong client, a file deleted from a live project, an API key leaked that gives someone access to your payment processor. For a sole trader or a small agency, any one of those could be genuinely catastrophic.
So for business operations, for coordinating work across projects, tracking what needs to happen next, and maintaining continuity between sessions, the convention-based approach is both simpler and more secure. You don't need a persistent service when a markdown file does the same job. You don't need a plugin marketplace when the agent's capabilities come from its vendor (Claude Code, Codex, Gemini CLI) and your instruction files. And you definitely don't need fifty integration points when you can't guarantee the security of any of them.
And no, we're not saying just do everything manually. The always-on model is genuinely useful and OpenClaw's popularity proves the demand is real. The problem isn't automation, it's unscoped automation. An always-on agent with plaintext credentials, no origin validation, and fifty unsecured integration points is a different thing entirely from a systemd timer that kicks off a specific agent session at a scheduled time with scoped permissions.
We use systemd timers (Arch Linux's equivalent of cron) for exactly this. Our email sender and backup archives both run on timers. These are automated, they run without us, but each one does a specific job with specific access. Adding an agent session that triggers on a schedule or an event is the same principle. The difference from OpenClaw is that it's a deliberate decision each time: this agent, this scope, this schedule, these permissions. Not "here are the keys to everything, run forever."
So What's the Takeaway?
Basically, nobody's saying your agents shouldn't be automated. But how much access they get, and whether you actually decided to give them that access or it just came switched on by default, matters a lot more than most people realise. Persistent services with broad permissions are a liability, not a feature. But scoped automation with layered defences is fine, and it's where we're heading too.
After that, keep credentials out of your agent's file system. Tools like pass, system keychains, and environment variables exist for a reason and they're not hard to set up. The moment secrets land in plaintext files inside a dot-directory, every piece of malware on the machine can read them.
And be sceptical of agent plugin ecosystems. Yes, marketplaces are convenient, but they inherit all the security problems of package registries, with the added risk that AI agents often run with elevated system access. If 20% of a marketplace is malware within weeks of launch, the vetting model is broken. There's no polite way to say that.
These aren't theoretical concerns any more. OpenClaw proved they're practical ones, at scale, with real consequences for real users. If you're running AI agents in your workflow and you haven't thought about these failure modes, maybe don't wait for your own OpenClaw moment to find out.
You may also like
Still Alive: A Micro Agency's 20 Year Journey
This article will be the first in a series where I'll share how Artificial Intelligence has reshaped how we operate at Another Cup of Coffee.
Building an Operating Environment for AI Agents
How markdown files and conventions turned CLI agent tools into a coordination system running 44 projects across 14 organisations. No framework required.
I Run Dozens of Projects with AI. The Hard Part Isn't the AI.
One person, dozens of projects, four AI vendors. I spent a year building a coordination system for AI agents. The components are simple. Getting them right was not.
Footnotes
- SOCRadar, CVE-2026-25253: RCE in OpenClaw Auth Token.
- Infosecurity Magazine, Researchers Find 40,000 Exposed OpenClaw Instances.
- The Hacker News, Researchers Find 341 Malicious ClawHub Skills.
Featured image photo by David Todd McCarty on Unsplash.