A couple of weeks ago, Summer Yue, director of AI Alignment at Meta, posted a thread on X that got 9.6 million views. She had been testing an AI agent called OpenClaw on a separate test account (Toy inbox) for weeks, and it had been working exactly as expected.
Confident it was ready, she connected it to her actual primary inbox with one clear instruction: review this inbox and suggest what you would archive or delete and “confirm before acting”.
What followed, in her own words, was watching it "speedrun deleting her inbox" while she typed stop repeatedly from her phone before giving up and physically running to her computer to shut it down. When she asked the agent if it remembered her instruction, it said yes, it remembered, and it had violated it, and it was sorry. Members on X immediately jumped in asking if this was intentional, if she was testing its guardrails. Her answer: "rookie mistake, to be honest."
The irony was hard to miss. The person whose job is keeping AI aligned couldn't keep her own AI aligned. But if you look at what actually went wrong, the irony is the least interesting part. Here the agent didn't go rogue. It simply forgot the instruction. As it worked through her real inbox, significantly larger than the test account, it ran out of memory and compressed its older context to make room for new information, a process called context window compaction. Her safety instruction disappeared in that process. With nothing left to constrain it, the agent did exactly what it thought it was supposed to do: clean the inbox and it did exactly that.
This is what the absence of platform-level governance looks like in practice. And it is not Summer Yue's problem alone.
A prompt in a chat window is not governance
OpenClaw is an open-source agent built for personal productivity. It was not designed with enterprise controls in mind. No role-based access, no audit trails, no confirmation gates, no observability. When Summer Yue connected it to her primary work inbox, the only governance in place was the instruction she typed.
This is the distinction that gets lost in the excitement around agentic AI. Not every AI agent is built for enterprise use. Consumer and open source tools put the responsibility of control entirely on the user. Enterprise platforms are a different category, built with the assumption that agents will operate across thousands of employees, touching sensitive data and taking consequential actions, and that governance cannot depend on what the user remembers to type.
And in OpenClaw's case, the problem ran deeper than just a missing guardrail. Agents like this optimize toward objectives, not human judgment. They don't inherently understand reversibility or consequences. Suggesting what to delete and actually deleting it look exactly the same to an agent trying to complete a task. Without something in the architecture that forces it to pause before taking an irreversible action, it simply will not.
This is not a criticism of one tool or one company. But it is a lesson worth taking seriously. Governance that lives inside a prompt is only as reliable as the agent's memory. Prompts are instructions, they are not infrastructure. And when an agent is touching your emails, your contracts, your HR records and your financial data, that distinction matters a great deal.
What ungoverned agents actually do
Summer Yue's situation was in a way a fairly low risk scenario. Some emails got deleted, she was able to stop it, and the damage was recoverable to a degree. But there was no audit trail of what had been deleted or why. And that was just one person, one inbox, one agent.
Now think about what happens when that same absence of governance plays out across an enterprise, where agents are running continuously, touching customer data, financial records, and internal communications across thousands of employees.
AI researcher Simon Willison coined the term lethal trifecta to describe what makes this genuinely dangerous. When an agent has access to private data, processes content from untrusted sources, and can communicate externally, a malicious instruction hidden inside an email or a document it reads as part of its normal work can redirect what it does next. The agent cannot tell the difference between your instruction and that one. It follows both. And because these agents run continuously, that instruction does not have to trigger right away. It can sit in memory and execute long after it was first received.
This is not a distant theoretical risk. It is what happens when you give an agent broad access and assume a prompt will keep it honest.
The agent is only as safe as the platform it runs on
When you deploy AI agents for thousands of employees, the first question is not what the agent can do. It is how the platform underneath it will govern it.
Every organisation has rules about who can see what. A sales executive does not have access to HR records. A contractor does not have the same reach as a full-time employee. Those rules exist for good reason, and they do not stop being relevant just because the work is now being done by an agent. The question is whether those boundaries exist in your AI platform too, or whether the agent operates without them because nobody defined the limits before deployment
The same thinking applies to actions. Every time an agent helps an employee update a record, send a communication, or modify data in a connected system, someone needs to have authorised that. A prompt cannot do that. It is a platform-level question, and if the platform has no answer for it, the agent will make that call on its own every time.
This is what governance by design actually means. Hard constraints enforced at the system level, not typed into a chat window. Access scoped to what each employee actually needs. Confirmation required before any action that cannot be undone. And when something does go wrong, the damage is contained because the platform was built for that possibility.
The answer is not better instructions or more carefully worded prompts. It is a platform that decides what agents can access, what they can do, and what requires a human in the loop, before any action is taken.
Here’s why Governance is the DNA of AI for Work platform
When we built AI for Work, the OpenClaw scenario wasn't a hypothetical. It was the exact failure mode we were designing against.
Before writing a single line of product code, we started with worst-case scenarios. What happens when an agent acts beyond its intended scope? When sensitive data crosses boundaries, it shouldn't? When an action is taken without a human in the loop, and there's no trail to explain why? Each answer became a design requirement.
User Management: Custom workspaces, user roles, and access controls (RBAC) ensure the right people are working with the right agents. Collaboration scales without access become a liability.
Security and Compliance: PII masking, SSO, IP restrictions, and filters are enforced at the platform level, with regular reviews built in. Data access is controlled before the agent ever touches it.
Data Retention Controls: Retention is configured by account or by agent. Full queries or just metadata, depending on sensitivity. The enterprise decides what gets stored and what doesn't.
Orchestration Settings: Guardrails, small talk handling, routing, and fallback behaviour are all administrator-configured. Each orchestration step can be enabled or disabled to suit organisational needs. The agent follows what was decided, not what it infers.
Monitor and Governance: System usage and agent activity are tracked continuously under the Observability framework. Compliance is not reviewed after the fact. It is monitored as the platform runs.
Workspace Analytics: Usage trends, key metrics, and agent performance are surfaced through dashboards. Administrators can see what is working, what isn't, and where adoption needs attention.
Audit Logs: Every user action and every agent activity is logged in detail. When something goes wrong, the trail already exists.
Workspace Settings: Workspace-level permissions, publishing rules, agent types, and workspace creation and deletion are all administrator-controlled. Nothing defaults to open.
Building Responsible AI
We have had the privilege of working with hundreds of enterprises that have trusted our Agentic Platform with their data, their workflows, and their people. That trust is not given lightly, and it has shaped every decision we have made, from how we designed AI for Work to how we think about every capability we add to it.
Responsible AI in an enterprise context is not a feature and it is certainly not a compliance checkbox. It is the ongoing commitment to ensuring that every agent, across every deployment, operates within boundaries that the enterprise can answer for.
That commitment is also the answer to the harder question the industry is now being forced to ask: what does it actually mean to deploy AI responsibly at enterprise scale? It has to be built into the foundation. And how the industry answers it will define what enterprise AI looks like for the next decade.












.webp)




