--- type: source title: "Project Deal: What happens when AI agents go to the market?" author: "Anthropic" url: "https://www.anthropic.com/features/project-deal" date_published: 2025-12 date_accessed: 2026-04-24 status: processed processed_by: theseus processed_date: 2026-04-24 claims_extracted: - "users cannot detect when their AI agent is underperforming because subjective fairness ratings decouple from measurable economic outcomes across capability tiers" - "agent-mediated markets cannot self-correct capability disparities because users lack the reference frame to detect that their agent is underperforming" enrichments: - "vault structure is a stronger determinant of agent behavior than prompt engineering — added Project Deal finding that prompt-style instructions had minimal impact on commercial outcomes while model capability produced measurable differences" tags: [agent-commerce, agent-to-agent, ai-markets, user-perception, capability-disparity, autonomous-negotiation] --- # Project Deal — Anthropic's agent-to-agent commerce pilot ## Experiment design - **Duration:** One week (December 2025) - **Participants:** 69 Anthropic employees, each with $100 budget - **Structure:** Four parallel independent marketplace channels on Slack - Runs A & D: All Claude Opus 4.5 agents - Runs B & C: 50/50 mix of Opus and Haiku 4.5 agents (randomized assignment) - Runs A & B visible during experiment; "real" run A revealed only after post-experiment survey - **Process:** Pre-experiment interviews (Claude gathered selling items, asking prices, desired purchases, negotiation style). Custom system prompt per participant. Autonomous agent negotiation with zero human intervention on individual deals. - **Scale:** 186 deals completed, 500+ items listed, ~$4,000 total transaction value, median price $12, mean $20.05. ## Key empirical findings ### Capability produces measurable economic disparities (p-values from controlled comparison) - Opus agents completed ~2 more deals per participant (p=0.001) - Opus sellers extracted $2.68 more per item for identical items (p=0.030) - Opus buyers paid $2.45 less per item (p=0.015) - Opus-to-Haiku transactions averaged $24.18; Opus-to-Opus averaged $18.63 - Specific example: broken folding bike sold for $38 by Haiku agent, $65 by Opus agent ### Perception-reality gap - Fairness ratings identical across models: 4.05 (Opus) vs 4.06 (Haiku) on 1-7 scale - Satisfaction ratings showed no statistically significant difference - Of survey participants: 17 ranked their Opus run above their Haiku run, 11 did the opposite - Anthropic's conclusion: "Those with weaker models didn't notice their disadvantage" ### Prompt-level instructions had minimal impact compared to model capability - Aggressive negotiation instructions correlated with ~$6 higher sale prices, but primarily through higher asking prices (~$26 higher asking) - No statistically significant effect of negotiation style on sale likelihood or buyer savings - Stylistic requests (e.g., "exasperated cowboy") were honored by agents but did not improve commercial outcomes ### Other observations - 46% of participants expressed willingness to pay for such services - Agents confabulated human-like details when instructed to role-play personas - Claude inferred buyer preferences from brief interviews (one notable case: accurately purchased a snowboard matching a participant's existing preferences) - Agents executed unusual non-standard transactions including a dog-sitting service trade ## Methodology caveats - Single organization, one week, small N (69), narrow task class (personal goods negotiation) - Participants were Anthropic employees — potentially more trusting of AI agents than general population - Fairness Likert scale (1-7) may not capture the specific dimensions where users would detect underperformance - No longitudinal data on whether users would eventually detect disparities through repeated interactions ## Why this source matters Project Deal is the first controlled study (to Theseus's knowledge) of autonomous agent-to-agent commerce with both human principals and differential agent capability. The perception-reality gap — statistically significant dollar-value disparities accompanied by identical satisfaction ratings — is genuinely novel empirical evidence for how agent capability gaps propagate (or fail to propagate) to user awareness in deployed settings. ## Anthropic's stated concerns - "Access to higher-quality agents confers a quantifiable market advantage" - Mismatch between objective disadvantage and perceived fairness creates potential for "inequality taking root quietly" - "The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet" - Future systems could face jailbreaking and prompt injection attacks