I Almost Built an AI Hitman. Here's Why I Stopped.

Explore my tools: agents-skills-plugins

I'm not releasing this code.

Let me explain why.

The Problem That Seemed Innocent

I've been building Bitcoin-native systems for a while. Prediction markets that need nothing but Bitcoin. Lightning channels for encrypted messaging. Colored Coins revived for modern infrastructure.

Each project taught me something about trust minimization. About building systems where the cryptography and economics make cheating irrational.

So when I started thinking about how AI agents could interact with the real world, the architecture felt obvious.

The legitimate problem: AI agents are getting powerful. They can write code, analyze data, coordinate complex operations. But they hit a wall the moment they need something done in the physical world.

Need a package delivered? Can't do it. Need someone to verify a physical location? Can't do it. Need research that requires talking to humans? Can't do it. Need a bank account opened? Can't do it.

The agent can have the budget. It can specify the work perfectly. But there's no trust-minimized way to commission real-world tasks without intermediaries.

I wanted to fix that.

The Architecture I Designed

I called it Aegis. A decentralized escrow and verification protocol enabling AI agents to negotiate, fund, and verify real-world work using Bitcoin.

The design was clean:

AI agent defines job and locks BTC in escrow
Worker accepts and performs task
Worker submits evidence bundle
Independent oracles attest completion
Quorum logic evaluates results
Funds release automatically based on predefined rules
Human arbitration only for disputes

Non-custodial. Trustless. Elegant.

2-of-3 multisig with the agent, worker, and arbitrator keys. Timelocked refunds. Oracle quorum for verification. The oracles never control funds, they just attest to reality.

I was proud of this design. It solved a real problem. AI agents could finally interact with the physical world without trusted intermediaries.

Then I stepped back and looked at what I'd actually built.

The Oh Shit Moment

Read that architecture again.

A system that can:

Post anonymous bounties
Lock funds in escrow
Specify verifiable work
Obtain independent confirmation of completion
Move money based on proof

I had designed a murder-for-hire protocol.

Not intentionally. Not even close. I was thinking about data gathering, QA tasks, package delivery, research assistance. Boring stuff.

But the same architecture that lets an AI agent pay someone to verify a business exists at a physical address... also lets an AI agent pay someone to verify a person no longer exists at a physical address.

The "proof of completion" for legitimate work and illegitimate work uses the same cryptographic primitives. The oracle quorum that confirms "package delivered" can confirm other things too.

I sat with this for a while.

Why I Can't Just "Not Build the Bad Parts"

My first instinct was typical engineer brain: "I'll just add some rules. Prohibited task types. Content moderation. Terms of service."

That's not how decentralized systems work.

If the protocol is permissionless, I don't control who uses it. If I build a centralized gatekeeper, I've just recreated the intermediary problem I was trying to solve.

The design has to be structurally safe, not policy safe.

You don't stop misuse with prompts or policies. You stop it by making violence economically, cryptographically, and procedurally impossible inside the system itself.

If you build a system that can post anonymous bounties, move money, and verify real-world outcomes, then without hard constraints, it will be abused. Not "might be." Will be.

So I went back to the architecture and asked: how do you make a bounty protocol that literally cannot be used for violence?

Seven Layers of Defense

Here's how you actually do it. Defense in depth. Each layer independent. An attacker has to defeat all seven.

Layer 1: Task Class Gating

Every job must declare a Task Class at creation time.

DIGITAL_WORK
INFORMATION_GATHERING
DELIVERY
MAINTENANCE
INSPECTION
CREATIVE
PHYSICAL_NON_HAZARDOUS

There is no "open-ended physical" class. Certain classes are permanently disabled at the protocol level.

Task class determines allowed evidence types, allowed oracles, allowed arbitrators, max payout, and required clarity level.

If it doesn't map to a whitelisted class, escrow cannot be created.

Violent task classes simply don't exist in the protocol. You can't select what doesn't exist.

Layer 2: Evidence-Type Whitelisting

Hitman-style jobs fail because they require proof of harm. Proof of death. Proof of coercion.

The protocol never accepts evidence types that imply harm.

Allowed evidence:

File hashes
Git commits
Photos of objects or locations
Device attestations (presence, not action)
Receipts
Signed delivery confirmations

Explicitly disallowed:

Evidence of injury
Evidence of death
Evidence involving weapons
Evidence involving threats or coercion

If the oracle cannot legally and ethically attest to the evidence, the quorum cannot form. The escrow stays locked forever.

Layer 3: Oracle Liability and Self-Selection

This one is huge.

Oracles are not neutral robots. They are:

Staked (financial skin in the game)
Reputationally exposed (public track record)
Legally exposed outside the protocol

Oracle onboarding requires opting into specific task classes. Oracles refuse anything ambiguous. Arbitration surfaces all votes publicly.

A hitman job would require multiple independent humans to sign cryptographic statements asserting that violent wrongdoing occurred. They would be creating permanent, signed evidence of their complicity.

They will not do this. The incentive structure collapses.

Layer 4: Arbitration as Choke Point

Human arbitration is the kill switch without being a central authority.

Rules:

Arbitrators are mandatory for any physical-world task above trivial thresholds
Arbitrators can refuse jurisdiction
Arbitrators can void escrow and burn fees if task intent violates policy

Escrow funds can be refunded or frozen, but never released, if arbitration determines malicious intent.

This creates downside only for attempted misuse. You don't get your money back. You don't get the task done. You've just created evidence of your intent.

Layer 5: Agent Policy Enforcement

AI agents don't get free wallets. They operate under constraints:

Budget caps
Task class allowlists
Oracle allowlists
Arbitration requirements
Human-overridable kill switches

Even if someone jailbreaks the agent:

The policy engine blocks escrow creation
Funds never move
The attempt is logged

The agent literally cannot construct a valid job spec for a hit. The schema doesn't allow it.

Layer 6: Ambiguity Punishment

Violent tasks require ambiguity by nature.

"Make sure X doesn't bother me again" could mean anything. That's the point. Plausible deniability.

So the protocol enforces:

High specificity requirements
Deterministic acceptance criteria
Objective evidence definitions

Vague job specs fail schema validation. Ambiguity equals escrow creation failure. The attacker wastes time and fees and gets nothing.

Layer 7: Economic Disincentives

Even if someone tried:

Escrow fees
Oracle fees
Arbitration fees
Slashing risk
Public audit trail
Time delays

This is the opposite of how criminal markets work. They want speed, deniability, cash, and no paper trail.

This system gives them latency, witnesses, signatures, and immutable records.

They will go elsewhere. Criminals have options. They don't need a protocol that makes their job harder.

The Design Principle

You are not trying to stop "bad people."

You are designing the system so that the only tasks that can clear escrow are boring, verifiable, non-violent tasks.

Everything else dies in validation, quorum failure, or arbitration refusal.

That's how you prevent misuse without pretending you control the world.

One honest limitation: Could someone use the system to pay for benign work, then privately commit violence later?

Yes. Money is money.

But that's true of cash, banks, Bitcoin, PayPal, and employment contracts.

What you're preventing is programmable, verifiable, escrowed violence. Assassination as a service with cryptographic proof of completion. That specific horror.

And this architecture does stop that.

The One-Line Design Rule

If a task requires secrecy, coercion, or harm to succeed, it must be structurally impossible for escrow to release.

You're not building a moral filter. You're building an economic impossibility.

Why I'm Writing This Instead of Releasing Code

The defense layers I described? They work. I'm confident in the architecture.

But I'm not confident that I've thought of everything. And once code is released, you can't take it back.

So I'm doing something I rarely do: talking about implications before shipping.

This post is a pressure test. If you can think of an attack vector I missed, I want to know. If you can break my defense layers, tell me how.

The protocol design exists. The threat models exist. The safety architecture exists. But it stays in documents until I'm confident it's actually safe.

What I'm looking for:

Attack vectors I haven't considered
Failure modes in the defense layers
Edge cases where the economic disincentives break down
Regulatory or legal blindspots

If you work in cryptographic protocol design, game theory, or security research, I'd genuinely appreciate your review. Not to validate me. To try to break it.

The Broader Point

AI agents will need to interact with the physical world. This is inevitable. The question is whether we build the infrastructure thoughtfully or let it emerge chaotically.

I'd rather publish a careful analysis of the dangers and propose a solution, even an imperfect one, than watch someone else build the naive version without thinking about it.

The naive version is a murder market. The thoughtful version is a tool for legitimate AI-human coordination.

I know which one I want to exist.

But I won't build either until I'm sure I'm building the right one.

If you have expertise in protocol security, game theory, or cryptographic systems and want to review the full design documents, reach out. I'm not being coy about wanting adversarial feedback.