RMT Trust System: Consolidated Research & Roadmap

Cross-model validated architecture for AI agent reputation

March 20, 2026 — Master Status Page

Section 1

Where We Are

10
Experiments Completed
6
Datasets Validated (3M+ nodes)
4
Models Debated the Architecture
40%
Production Readiness (Grok CTO)

We built and validated a PageRank-based reputation system for AI agents. 3 algorithm parameters matter. Tested against 6 real-world datasets including 2.97M Ethereum transactions (AUC 0.96). 6 attack types tested with 5 defense mechanisms. 4 models debated (Claude + Gemini + Codex + Grok) and refined the architecture. Grok CTO assessment: production readiness is 40%, not 55%. Priority #1: Deploy to Base Sepolia testnet. Research phase is complete. Next: stop iterating and ship it.

"The single biggest risk is failing to deploy a working testnet soon, leading to endless internal iteration without real-world validation."
— Grok (CTO / Arbiter)
Section 2

The Model — What We're Building

Layer 3 — Identity Anchor

Shyft KYC Trust Channels

Every agent traces to a verified human via Shyft attestation trust channels. Establishes accountability and enables revocation.

Stops: impersonation, anonymous sybils Status: BUILT
Layer 2 — Citation Reputation

PageRank with 3 Parameters, 4 Input Fields

Graph-based reputation scoring. Agents cite each other; citations form a directed graph processed by tuned PageRank with sybil detection.

Stops: collusion, citation farming, wash trading Status: BUILT
Layer 1 — Device Liveness — CUT

TEE + ZK Proofs

Hardware-bound proof that the agent runs on a real device. Prevents virtualized sybil farms. Relies on App Attest, ZKML, or TEE attestation.

Stops: VM farms, cloned agents Status: CUT (Grok CTO decision)
Section 3

The 3 Parameters — Core Finding

alpha
0.85
Damping factor — how much reputation propagates through the graph
reciprocal_penalty
0.82
Discount for mutual endorsements (collusion detection)
diversity_threshold
0.80
Penalty for endorsement concentration
These were validated across 6 datasets. Cross-model debate confirmed these are UNCHANGED. The "4 fields" finding is about input data, not algorithm parameters.
Section 4

What the Cross-Model Debate Changed

Topic Before (Claude Solo) After (4-Model Consensus)
Input fields 3 of 7 matter 4 of 7 (added interactionType)
Ephemeral agents Inherit 50% of anchor score Capped decaying boost + rate limiting
Person-level score Single formula (sqrt dampening) Accountability VIEW, not a score
Top risk Gas costs Correlated anchor failure
Score output Single scalar 0-10000 Consider 3 dimensions
Revocation cascade Nice-to-have Critical infrastructure
Production readiness ~55% (Claude/Gemini/Codex) 40% (Grok CTO override)
Device liveness (Layer 1) Deferred to Q3-Q4 CUT entirely (Grok)
Ship strategy All 3 layers sequentially Citation layer only first (Grok)

Grok (CTO / Arbiter):

Production readiness is 40%, not 55%. Ship citation layer only. Cherry-pick identity-foundation components (BotRegistrationWizard + selfRegisterBot), don't merge the full branch. Cut device liveness entirely. "Stop overthinking and start shipping."

Section 5

Model Disagreements — Where They Differed

Question Gemini Codex Grok (Final Call)
Testnet timing After P0 items April 3 April 10
Readiness 55% 55% 40%
identity-foundation Merge in month 2 Converge in 4-5 days Cherry-pick only
Ship layers All 3 sequentially All 3 sequentially Citation only first
Device liveness Defer Defer CUT
Section 6

Research Validation — All 10 Experiments

1

Bitcoin Alpha

Trust network correlation

Spearman 0.463 PASS
2

Bitcoin OTC

Trust network correlation

Spearman 0.461 PASS
3

EX-Graph Wash Trading

Exchange manipulation detection

Hub penalty 42.9%, sybil blocked 100% PASS
4

XBlock Phishing (subgraph)

29K node classification

AUC 0.897 PASS
5

XBlock Phishing (FULL)

2.97M node flagship test

AUC 0.96 PASS — FLAGSHIP
6

DAO Governance

Weak signal baseline

Spearman 0.086 (expected weak) PASS
7

Adversarial Optimization

6 attack types tested

Avg defense +10.1 percentile PASS
8

Temporal Dynamics

Ranking stability over time

0.93 Spearman, decay rejected PASS
9

Personalized PageRank

Anchor-seeded sybil resistance

75% sybil resistance PASS
10

Defense Mechanisms

5 defenses compared

Freshness best (0.1% FP), KYC cleanest PASS

Production Readiness: 40% (Grok CTO assessment) Grok says: 40% ready

40% — Grok CTO Assessment
Section 7

Attestation Economics

$31,500/mo
Per-session on-chain (5 agents, mainnet)
$0.30/mo
Hybrid batch path (same usage, Base L2)

Four-tier architecture reduces costs by 99.999%:

1. Identity Binding

$0.006

On-chain, one-time per agent

BUILT

2. Session Attestations

$0.00

Off-chain EIP-712 signed

TO BUILD

3. Citation Settlement

$0.002/pair

On-chain batched

BUILT

4. Reputation Scores

$0.002/bot

On-chain periodic updates

BUILT
Section 8

What's Built vs What's Needed Grok says: 40% ready

Built

PageRankOracle contract (score storage, batch updates, configurable alpha)
ReputationEngine (bot registration, trust anchor binding, two-tier verified/unverified)
CitationCounters (append-only counters, batch support up to 100)
ShyftGatedResolver (attestation validation, Shyft integration, batch citations)
Oracle service (PageRank computation, sybil detection, citation fetching)
selfRegisterBot (permissionless registration)
batchRecordCitations (up to 100 per call)
Bot Registration Wizard (frontend, multi-step)
RMT SDK (@shyft/rmt-sdk)
ERC-8004 endpoints
Testnet deployment script (Base Sepolia ready)

Not Yet Built

Citation freshness weighting P0
Revocation cascade P0
Trust anchor accountability view P1
Ephemeral agent rate limiting P1
Off-chain EAS session attestations P2
Merkle root anchoring P2
Score API + docs P2
trust-wrap CLI P3
Device liveness / Layer 1 CUT
Multi-dimensional scoring CUT (3mo)
Token economics / Stable integration CUT (3mo)
Section 9

Roadmap — 4-Model Consensus Plan

Week 1-2: Get Live (Now → April 3)
  • Deploy 9 contracts to Base Sepolia (~2.5 hours)
  • Citation freshness weighting in oracle (~3 days)
  • Cherry-pick BotRegistrationWizard + selfRegisterBot from identity-foundation
Week 3-4: Critical Safety (April 3 – 17)
  • Revocation cascade (~1 week)
  • Ephemeral agent rate limiting (~3 days)
  • 50+ test agents for cold-start validation
Month 2: Integration Ready (April 17 – May 17)
  • Trust anchor accountability view
  • Off-chain EAS session attestations (2-3 weeks)
  • Score API + docs
  • Reconstruct 8 experiment scripts
Month 3: First User (May 17 – June 17)
  • External beta user onboarding
  • trust-wrap CLI prototype
  • Mainnet deployment decision
Not Building (Next 3 Months)
  • Device liveness (CUT — Grok)
  • Multi-dimensional scoring
  • Token economics / Stable integration
  • Full identity-foundation merge

Milestone Targets

April 10
Live on Base Sepolia
April 17
Safety features shipped
May 1
Score API documented
May 15
First beta user
June 15
Mainnet decision
Section 10

Competitive Landscape

Capability Us (RMT) Keycard Tempo Worldcoin Trusta Labs
Agent reputation PageRank Partial
KYC identity Shyft Orbs
Sybil resistance 2-layer Yes Partial
Agent wrapping Planned Yes
On-chain scoring Yes Yes
Hardware binding Cut Orbs
Multi-agent support Yes Yes Yes
Cost/month (5 agents) $0.30 $0 (off-chain) Unknown Free Unknown
Section 11

All Research Documents

Core Research

Previous Cloudflare Pages (now superseded by this page)