Most grant systems have the same problem: they're slow, opaque, and biased toward whoever writes the best proposal — not who does the best work.
I've been on both sides of this. At Medic, I watched organizations spend weeks waiting on funding decisions that could have been made in hours. People doing critical community health work, stuck in a queue because a three-person committee hadn't met yet. At Wave, I saw how trust-based systems break the moment they scale. The humans in the loop become bottlenecks, and the bottlenecks introduce bias — not malicious bias, just the kind that creeps in when you're reviewing your fortieth proposal on a Thursday afternoon.
The question I started with was simple: what if agents held the money and made the calls?
What I Built
Mini Grant Allocator is a fully autonomous grant allocation system. No human approves or rejects proposals. Three AI agents collaborate to evaluate submissions, and a fourth agent holds the treasury and disburses funds based on their verdict.
That last sentence sounds scary. It shouldn't. The design is deliberately conservative — the agents are constrained, auditable, and wrong-proof by construction. Every decision has a paper trail. Every dollar has a receipt. The point isn't to remove accountability. It's to make accountability automatic.
The Three-Agent Evaluation Panel
The panel is adversarial by design.
The Evaluatorscores each proposal across five dimensions: team credibility, impact potential, budget realism, goal alignment, and execution risk. It produces a 0–100 score per dimension with explicit reasoning — not a vague "looks good," but a structured breakdown you can argue with.
The Skepticdoesn't try to agree. That's the whole point. It reads the Evaluator's output and actively looks for the weakest arguments, the assumptions that haven't been challenged, the places where a 72/100 might really be a 45 if you squint. It argues back — in writing, on the record.
The Coordinator reads both sides. It resolves disagreements, produces a final score, and writes a plain-language explanation of the decision — including why the proposal passed or failed each dimension. This is the output a human funder would actually want to read.
Three agents. One decision. Every step logged.
The Treasury Agent
This is where things get interesting. The treasury agent holds a real wallet with hard limits enforced by the HLOS infrastructure layer. It can't be prompted into overspending. It can't be manipulated through a clever proposal. It doesn't have opinions. It runs one job: when the panel returns a score ≥70, disburse full funding. 50–69, partial. Below 50, reject.
Every disbursement produces a cryptographic STAAMP receipt— an immutable record that any stakeholder can independently audit. The ledger is append-only. Nothing gets deleted, overwritten, or "corrected." If you want to know where every dollar went, you can trace it. That's not a feature — it's the architecture.
On-Chain Identity
Each agent has a verifiable on-chain identity, registered as a Metaplex Core NFTon Solana devnet. This isn't a gimmick — it's the foundation for agent reputation. When you're letting AI agents make financial decisions, the question isn't just "did it work this time?" It's "how has this agent performed over hundreds of decisions?"
After each batch run, a quality score is submitted on-chain via the ATOM protocol. Over time, the system builds a tamper-proof track record of funding decision quality. An agent that consistently makes poor calls accumulates that record publicly. No one can edit the tape.
The evaluation API is exposed via the x402 protocol: external agents POST a proposal and receive a 402 response with USDC payment instructions. 0.10 USDC per evaluation. Pay, get a verdict. No accounts, no API keys, no handshake — just a transaction.
Why These Design Choices
No human in the loop— not because humans are bad at this, but because humans in the loop become the attack surface. The moment someone can email a reviewer, the system is gameable. I've seen it happen. A well-connected organization gets a faster review. A first-time applicant gets deprioritized. None of it intentional, all of it structural. I wanted a system that's genuinely the same for everyone.
Adversarial agents— a single evaluator is too easy to fool with a well-written proposal. Good writing isn't the same as good work, and the gap between the two is where most grant fraud lives. The Skeptic's job is to be skeptical, not helpful. That tension — Evaluator optimism vs. Skeptic pessimism — produces decisions that are harder to game than either alone.
Hard wallet limits— the treasury agent doesn't ask for permission. It operates within constraints set at configuration time. If the wallet is empty, no disbursement happens. No exceptions, no overrides, no "emergency" approvals. This is the part that makes traditional grant administrators uncomfortable, and it's also the part that makes the system trustworthy.
Full audit trail— every token, every decision, every score, every disagreement between agents. Not because I expect fraud, but because trust requires evidence. The whole point of removing humans from the decision loop is that the machine's reasoning is legible in a way that a committee meeting never is.
What's Next
The system runs on devnet today. Moving to mainnet means solving a real problem: who defines the evaluation rubric? Right now it's hardcoded — I set the five dimensions and their weights. A production version needs rubric governance — the ability for funding communities to define what "good" looks like without breaking the autonomous evaluation chain.
That's a coordination problem. And coordination problems are exactly what agentic commerce is built to solve.