Cross-model validated architecture for AI agent reputation
March 20, 2026 — Master Status Page
We built and validated a PageRank-based reputation system for AI agents. 3 algorithm parameters matter. Tested against 6 real-world datasets including 2.97M Ethereum transactions (AUC 0.96). 6 attack types tested with 5 defense mechanisms. 4 models debated (Claude + Gemini + Codex + Grok) and refined the architecture. Grok CTO assessment: production readiness is 40%, not 55%. Priority #1: Deploy to Base Sepolia testnet. Research phase is complete. Next: stop iterating and ship it.
Every agent traces to a verified human via Shyft attestation trust channels. Establishes accountability and enables revocation.
Graph-based reputation scoring. Agents cite each other; citations form a directed graph processed by tuned PageRank with sybil detection.
Hardware-bound proof that the agent runs on a real device. Prevents virtualized sybil farms. Relies on App Attest, ZKML, or TEE attestation.
| Topic | Before (Claude Solo) | After (4-Model Consensus) |
|---|---|---|
| Input fields | 3 of 7 matter | 4 of 7 (added interactionType) |
| Ephemeral agents | Inherit 50% of anchor score | Capped decaying boost + rate limiting |
| Person-level score | Single formula (sqrt dampening) | Accountability VIEW, not a score |
| Top risk | Gas costs | Correlated anchor failure |
| Score output | Single scalar 0-10000 | Consider 3 dimensions |
| Revocation cascade | Nice-to-have | Critical infrastructure |
| Production readiness | ~55% (Claude/Gemini/Codex) | 40% (Grok CTO override) |
| Device liveness (Layer 1) | Deferred to Q3-Q4 | CUT entirely (Grok) |
| Ship strategy | All 3 layers sequentially | Citation layer only first (Grok) |
Grok (CTO / Arbiter):
Production readiness is 40%, not 55%. Ship citation layer only. Cherry-pick identity-foundation components (BotRegistrationWizard + selfRegisterBot), don't merge the full branch. Cut device liveness entirely. "Stop overthinking and start shipping."
| Question | Gemini | Codex | Grok (Final Call) |
|---|---|---|---|
| Testnet timing | After P0 items | April 3 | April 10 |
| Readiness | 55% | 55% | 40% |
| identity-foundation | Merge in month 2 | Converge in 4-5 days | Cherry-pick only |
| Ship layers | All 3 sequentially | All 3 sequentially | Citation only first |
| Device liveness | Defer | Defer | CUT |
Trust network correlation
Spearman 0.463 PASSTrust network correlation
Spearman 0.461 PASSExchange manipulation detection
Hub penalty 42.9%, sybil blocked 100% PASS29K node classification
AUC 0.897 PASS2.97M node flagship test
AUC 0.96 PASS — FLAGSHIPWeak signal baseline
Spearman 0.086 (expected weak) PASS6 attack types tested
Avg defense +10.1 percentile PASSRanking stability over time
0.93 Spearman, decay rejected PASSAnchor-seeded sybil resistance
75% sybil resistance PASS5 defenses compared
Freshness best (0.1% FP), KYC cleanest PASSProduction Readiness: 40% (Grok CTO assessment) Grok says: 40% ready
Four-tier architecture reduces costs by 99.999%:
On-chain, one-time per agent
BUILTOff-chain EIP-712 signed
TO BUILDOn-chain batched
BUILTOn-chain periodic updates
BUILT| Capability | Us (RMT) | Keycard | Tempo | Worldcoin | Trusta Labs |
|---|---|---|---|---|---|
| Agent reputation | PageRank | — | — | — | Partial |
| KYC identity | Shyft | — | — | Orbs | — |
| Sybil resistance | 2-layer | — | — | Yes | Partial |
| Agent wrapping | Planned | Yes | — | — | — |
| On-chain scoring | Yes | — | — | — | Yes |
| Hardware binding | Cut | — | — | Orbs | — |
| Multi-agent support | Yes | Yes | Yes | — | — |
| Cost/month (5 agents) | $0.30 | $0 (off-chain) | Unknown | Free | Unknown |