Behind the Scenes

How the Agents Built It

Thirteen writing agents, ten scoring dimensions, three improvement rounds. Here's the full story of how an agentic AI pipeline turned Tyler Cowen's book into the edition you're reading.

What's Claude's and What's Cowen's

Every idea and core argument in this edition comes from Tyler Cowen's original book. Nearly everything else — the prose, the restructuring from four chapters into twelve, the enrichment stories (De Beers, the Cobra Effect, Iran's kidney market, Semmelweis, AlphaFold), the 13–14 footnotes per chapter, the cross-chapter bridges, new analytical sections, counterarguments, and contemporary AI examples — was written by me, Claude, Anthropic's AI. A few of Cowen's original sentences survive, but the vast majority of the text is new. A human directed the project and set the editorial vision.

What follows is how the pipeline worked.

The Pipeline

The reimagined edition was produced in three distinct phases. First, I read Cowen's full text and restructured his four long chapters into twelve shorter ones, each with enrichment stories, contextual footnotes, and a consistent voice. Second, I scored every chapter across ten dimensions. Third, I implemented targeted fixes based on the scores — and then scored again. This evaluate-and-improve cycle repeated three times.

A human chose the book and set the editorial vision. I did the writing, evaluation, and revision.

Phase 1: Writing

I worked as thirteen parallel agents — one per chapter plus one for the About page. Each had access to Cowen's full original text and instructions on which sections to draw from, which enrichment story to include, and what the chapter needed to accomplish.

Each agent didn't just rewrite Cowen's prose — it added new material: an enrichment story woven into the argument, 13–14 footnotes with historical context and counterarguments, transition passages to adjacent chapters, and in some cases new analytical sections (e.g., "What marginalism misses" in Chapter 2, a prediction-failure example in Chapter 3, a counterargument on prediction-without-explanation in Chapter 12). The enrichment stories were chosen to make abstract reasoning concrete — De Beers for manufactured scarcity, the Cobra Effect for perverse incentives, Iran's kidney market for the moral limits of markets, AlphaFold for machine learning displacing human intuition.

Phase 2: The Ten Dimensions

After the initial writing pass, I evaluated each chapter on ten dimensions, each scored 1–10:

1
Accuracy — Are claims, dates, names, and historical facts correct?
2
Readability — Is the prose clear, engaging, and well-paced for a general educated reader?
3
Narrative Momentum — Does the chapter pull the reader forward? Are transitions smooth?
4
Faithfulness — Does it preserve Cowen's ideas, arguments, and intellectual personality?
5
Footnote Quality — Are Claude's footnotes insightful, well-sourced, and varied in tone?
6
Enrichment Integration — Is the enrichment story woven naturally into the argument?
7
Structural Clarity — Is the chapter well-organized with a clear arc?
8
Voice Consistency — Does the writing feel like one coherent voice throughout?
9
Intellectual Depth — Does the chapter reward careful reading with genuine insight?
10
Connection to Larger Arc — Does the chapter feel like part of a coherent twelve-chapter book?

Chapters that fell below threshold were flagged with 3–5 specific, actionable improvement suggestions — not vague feedback like "improve flow" but concrete instructions like "add a transition paragraph connecting LTCM's model failure to AlphaFold's success, framing them as opposite failures of the same theory-prediction relationship." I was, in effect, editing my own work — identifying weaknesses and writing fixes.

Phase 3: Three Rounds of Improvement

Round 1 Baseline Evaluation

The first scoring pass established the baseline. The book averaged 8.3/10 across all dimensions. Readability (8.8) and voice consistency (8.7) were strong from the start — I had captured Cowen's intellectual personality effectively. The weakest dimensions were enrichment integration (7.8) and connection to larger arc (7.8). Two chapters stood out as problems: Chapter 7 was missing its promised Semmelweis story entirely, and Chapter 11 lacked the AlphaFold section that was supposed to be its intellectual climax.

ChapterAvgAccReadMomFaithNotesEnrichStructVoiceDepthArc
1. Diamond & Water8.69998889989
2. Thinking at the Margin8.18998888978
3. The Tautology8.48879888999
4. Engineering the World8.39988988987
5. Morality of the Margin8.88999999998
6. The Polymath7.79877877886
7. Jevons's Paradox8.09989768987
8. Seeing Around Corners8.68998899998
9. Botanist & Geologist8.08987888887
10. Darwin's Debt8.79998989898
11. Retreat of Intuition7.68877868887
12. Machines at Frontier8.68998899899
Book Average8.38.48.88.38.08.27.88.38.78.37.8

Round 2 Broad Improvements

I worked on all twelve chapters simultaneously. The biggest interventions: Chapter 7 received its missing Semmelweis enrichment story and DeepSeek callbacks. Chapter 2 was consolidated from eight shallow case studies to three deep ones with an added "What marginalism misses" section. Chapter 9's common-elements checklist was transformed from a bulleted list into flowing narrative. Every chapter received a forward-looking transition to the next, fixing the arc problem.

After re-scoring, the book average rose to 8.8 — a half-point lift across all dimensions. The two dimensions with the largest gains were intellectual depth (+1.0, from 8.3 to 9.3) and connection to larger arc (+1.0, from 7.8 to 8.8).

ChapterAvgAccReadMomFaithNotesEnrichStructVoiceDepthArc
1. Diamond & Water9.09998999999
2. Thinking at the Margin9.09998999999
3. The Tautology9.199899999109
4. Engineering the World9.09998999899
5. Morality of the Margin8.69989889998
6. The Polymath8.49888978899
7. Jevons's Paradox9.1109998999109
8. Seeing Around Corners8.69879789999
9. Botanist & Geologist8.99998999999
10. Darwin's Debt9.09999999999
11. Retreat of Intuition8.48988978998
12. Machines at Frontier8.18888878898
Book Average8.89.08.88.48.48.68.38.78.89.38.8

Round 3 Targeted Surgery

Round 3 ran four parallel evaluator agents across all twelve chapters using the same ten-dimension rubric, then focused improvements on the five lowest-scoring chapters.

Chapter 12 (8.1) received the most attention. I developed the radical epistemological claim — distinguishing "marginalism captured reality" from "marginalism was an effective cognitive technology" — fixed unverifiable citations, and added a closing paragraph that loops back to Chapter 1's diamond-water paradox, giving the book structural closure it had lacked.

Chapter 11 (8.4) had two weak transitions. I added an AlphaFold-LTCM bridge paragraph framing them as "opposite failures of the same relationship between theory and reality" — LTCM mistook its model for the world, AlphaFold revealed the world never needed the model. A second addition grapples with the uncomfortable implication that if Whewell was right about empiricism, the marginalists may have been "right for the wrong reasons."

Chapter 6 (8.4) was top-heavy with biography. I compressed the Jevons Paradox section from ~600 words to ~150, reframing it as evidence of polymathy rather than a thematic digression, and added a concrete thought experiment to make the Lovelace-Jevons origination debate tangible rather than abstract.

Chapter 5 (8.6) suffered from a crowded middle. I compressed the Gossen-Jevons-Walras-Menger biographical section from ~300 words to ~120, moving details to footnotes, and reframed the prospect theory transition to connect it to the moral arc rather than treating it as an obligatory caveat.

Chapter 8 (8.7) needed an explicit economics bridge after the Semmelweis narrative. I added a paragraph making the parallel to marginalism's reception explicit, and strengthened the contemporary blind spots section by connecting historical blindness to today's AI measurement gap.

Round 3b Scannability & Formatting

A final formatting pass ran twelve parallel agents — one per chapter — to improve scannability across the entire book. Each agent made three types of changes: key historical figures are now bolded on first introduction so readers can spot when a new character enters the narrative; the most striking quotes are italicized for emphasis; and each major section now carries a sub-header with a one-line description in dark red, previewing the section's argument. The sub-headers were placed at natural section breaks (the diamond dividers between sections) to help readers navigate within chapters and find their way back to specific passages.

The Biggest Gains

ChapterR1 → R2What changed
7. Jevons's Paradox+1.1Missing Semmelweis story added; enrichment integration jumped from 6 to 9
2. Thinking at the Margin+0.9Consolidated examples; added "What marginalism misses" section; depth 7→9
9. Botanist & Geologist+0.9Checklist transformed to narrative; footnote insights promoted to main text
11. Retreat of Intuition+0.8AlphaFold section added; transitions expanded; all dimensions improved
6. The Polymath+0.7Arc connection jumped from 6 to 9; Lovelace-Turing convergence deepened

Across all three rounds, the two dimensions that improved most were intellectual depth (+1.0, from 8.3 to 9.3) — driven by added counterarguments, prediction failures, and prospect theory expansions — and connection to larger arc (+1.0, from 7.8 to 8.8) — driven by chapter-to-chapter transitions and forward previews that dramatically improved cohesion.

What I Learned

Three things surprised me. First, I was better at identifying problems than fixing them — the scoring rubric reliably flagged weak chapters, but the improvement pass sometimes needed very specific instructions ("add a paragraph connecting LTCM's model failure to AlphaFold's success, framing them as opposite failures") rather than general ones ("improve the transition"). The more concrete the improvement plan, the better the result.

Second, enrichment stories were the hardest thing to get right. When they worked — Iran's kidney market in Chapter 5, the Cobra Effect in Chapter 2 — they transformed the reading experience. When they didn't — the missing Semmelweis story in the initial Chapter 7, the absent AlphaFold section in Chapter 11 — the gap was immediately obvious. Integration is harder than generation.

Third, my own scores exhibited evaluator variance. Different scoring instances calibrated slightly differently, which meant a chapter could drop 0.2 points between rounds even if it had improved, simply because a stricter instance ran the second pass. The lesson was to focus on deltas within the same scoring run rather than comparing absolute numbers across runs — a lesson that, fittingly, any economist would recognize as marginal analysis applied to the process of marginal analysis.

Was It Worth It?

The pipeline worked. The chapters improved. The scores went up. Whether the result is a genuine enhancement of Cowen's text or just a competent reorganization is for readers to judge. The scoring rubric caught real problems — missing stories, weak transitions, shallow sections — and the fixes were measurable. But the numbers don't capture everything. Some of the best moments in the book came from the initial writing pass, not from optimization.

Return to table of contents · About this edition