The reversibility engine: classifying actions before they execute

Every current agentic system is an improvisation. A language model is given tools. It calls them in sequence. Someone writes orchestration code to handle errors. Someone else adds a logging layer. The result is a system that works until it doesn’t — and when it fails, nobody can say exactly why.

This is not an engineering failure. It is an architectural one.

The classification scheme

We classify every action into one of five classes at runtime — not statically:

Class 0  Pure read — no state change           → Pass through
Class 1  Local reversible write                → Pass through
Class 2  Remote write, reversible within window → Log and proceed
Class 3  Irreversible, low blast radius        → Escalate or proceed per policy
Class 4  Irreversible, high blast radius       → Require explicit approval

The critical word is runtime. Classification is not a static property of an action type — it is a property of an action in context. A send_message call starts as Class 2. It becomes Class 4 when the recipient is a distribution list.

Why this matters

The alternative — static classification — fails in exactly the cases where it matters most. A model that has learned “send_message is usually safe” will be wrong precisely when the stakes are highest. Runtime classification requires access to execution context: the current undo stack state, the scope of affected recipients, the reversibility window for the target system. This is infrastructure. It cannot live in the model.