Prologue
I remember the exact moment I stopped trusting the internet to know things.
It was a Tuesday — sometime in mid-2024, nothing dramatic about the day — and I was debugging a fairly obscure issue with a vector database configuration. The kind of problem that, five years earlier, would have been solved in twenty minutes: search the error message, find a Stack Overflow thread where someone had the same problem in 2019, scroll past two wrong answers, find the one from a person who clearly understood the underlying architecture, implement the fix, move on with your life. That was the contract. You put in a well-formed question, the internet gave you someone else’s hard-won experience. Not information — experience. You could tell the difference because the good answers had that texture of someone who’d actually broken things and fixed them, not someone reciting documentation.
On that Tuesday, the contract was already dead. I just hadn’t fully processed it yet.
Every result on the first two pages was AI-generated. Not all of them were obviously so — some had bylines, some were on platforms that used to be reliable — but the tell was always the same: the answers were fluent, structurally coherent, and completely devoid of the kind of specific, experiential detail that distinguishes someone who’s solved this problem from someone (or something) that’s read about it. The syntax was perfect. The knowledge was absent. One article confidently recommended a configuration parameter that doesn’t exist in the version I was running. Another gave a technically plausible solution that would have corrupted the index. A third — and this is the one that really stayed with me — was a near-verbatim rewrite of an article I’d read two years earlier, stripped of the caveats and edge cases that made the original actually useful.
I ended up solving the problem by reading the source code directly. Which is fine — I’ve been writing code since I was a teenager, long before anyone was willing to pay me for it — but the point isn’t that I was inconvenienced. The point is that the mechanism I’d relied on for fifteen years — ask the internet a question, get a knowledge-quality answer from a human who’d been there — had quietly broken. Not because the information disappeared, but because it was buried under an avalanche of synthetic content that looked right but knew nothing.
And I thought: if this is happening to me — someone who can read source code, who’s spent the better part of two decades building AI systems, who can tell the difference between a real answer and a plausible one — what’s happening to everyone else?
Some context on who’s writing this and why, because it matters for what follows.
I studied linguistics. Not computer science, not business administration — linguistics. The science of how humans structure, encode, and transmit meaning. I ended up in AI not because I followed a career plan but because I realized, sometime around 2010, that the intersection of language and computation was going to be the most consequential space in technology. That was not a widely held opinion at the time. NLP was a niche academic discipline, machine learning was something you did in Python scripts that took three days to run, and if you told a recruiter you wanted to work in artificial intelligence, they assumed you meant robotics.
I took the scenic route. A few years at consulting firms where I got my hands dirty with data architecture and analytics. Cassini Consulting, where I started doing serious work in emerging tech. Somewhere in between, a detour into startups — I co- founded Virtuis, which built Germany’s first VR showrooms, back when you still had to explain to people what virtual
reality was and why it mattered. That taught me something no corporate job ever could: the difference between having an idea and making it survive contact with reality. Startups don’t care about your slide deck. They care about whether the thing works and whether someone will pay for it. I’ve carried that instinct into everything since.
Eventually I ended up at Deloitte, where I headed the AI Office and led the Gen AI transformation for about 14,000 practitioners across Germany; later, I became Lead Alliance Partner for the Deloitte x NVIDIA’ Alliance for Continental Europe. In my free time (mostly at night) I built a multi-agent orchestration system from scratch because the existing ones weren’t good enough, and wrote a custom database that unifies temporal, vector, graph, and full-text search because — again — the existing ones weren’t good enough. I spent more nights and weekends than I care to count coding in the attic of a 220year-old house near Hamburg while my dogs slept at my feet, which is my preferred working environment and — if I’m being honest — the place where most of my best thinking happens.
I mention all of this not to perform credentials. The part of my career I’m proudest of is the coding, not the titles. But the perspective that produced this book comes from a specific and somewhat unusual combination: boardrooms where AI strategy gets decided by people who’ve never trained a model, server rooms where those strategies get implemented by people who’ve never attended a board meeting, startup garages where the question isn’t “what’s the roadmap” but “does this work,” and — underneath all of it — a background in linguistics that makes me pathologically focused on the question of what it means to
actually know something versus merely having information about it.
Before I get into the substance of what this book argues, I need to tell you what I believe.
Not what I’ve concluded from data — that comes in the chapters. What I believe, as a human being who thinks about this stuff more hours per week than is probably healthy.
I’m a humanist. That word gets thrown around in tech circles the way “synergy” gets thrown around in boardrooms — emptied of meaning through overuse — so let me be specific about what I mean by it. I believe that human beings are extraordinary. Not in the motivational-poster sense, but in the structural sense: our capacity for judgment, creativity, empathy, ethical reasoning, and the kind of messy, intuitive pattern recognition that comes from actually living in the world — these are not bugs in an otherwise automatable system. They’re the point.
I grew up watching Star Trek. Not casually — obsessively, the way certain kinds of minds lock onto a vision and don’t let go. And what stuck with me wasn’t the technology. It was the premise underneath it: a future where machines handle the work that humans only ever did because someone had to, and humans are free to do the things that actually make us human
— explore, create, connect, think, argue about philosophy over dinner. Not because they’re unemployed. Because the question of “how do we survive” has been answered, and the question that remains is “what do we want to be?”
That’s not naïve. It’s a design choice. And the alternative — which is equally plausible and considerably more likely if we sleepwalk into it — is a world where machines do everything, most people do nothing, and a vanishingly small number of people own the machines. The technology doesn’t care which outcome we get. It enables both. The dystopia and the utopia run on the same hardware.
This matters for a book about knowledge because the entire argument comes down to a deceptively simple question: as machines get better at processing information, what’s left for humans to contribute?
My answer — and this is the thesis that runs underneath every chapter — is: knowledge. The real kind. The kind that comes from experience, from judgment, from understanding context and nuance in ways that emerge from actually doing things in the real world. Machines are already better than us at processing information. They’ll keep getting better. That’s not a threat — it’s a liberation, if we’re smart enough to take it. The threat is only there if we confuse information with knowledge, declare humans obsolete, and hand the entire economy to systems that are extraordinarily good at pattern matching and structurally incapable of wisdom.
This book isn’t about fighting machines. I work with these systems every day. I build them. I think they’re remarkable. It’s about a reordering — a fundamental renegotiation of what humans do and what machines do, based on an honest assessment of what each is actually good at. Machines process. Humans know. The Knowledge Economy is what happens when we stop treating that distinction as philosophical and start treating it as economic.
Here’s what I figured out, and it took me embarrassingly long to put it into words clearly — probably because I’m a linguist and should have gotten there faster.
The thing we call “the internet” was never really a knowledge system. It was an information system that, for about twenty years, had enough human-generated content flowing through it that you could extract knowledge from it if you were skilled enough to sift through the noise. That era is over. The humans stopped being the primary authors of the internet sometime around 2024. The models that scraped the internet’s golden age — roughly 1998 to 2022, if I’m being generous — now produce the majority of the new text that goes back onto the internet. And the next generation of models will scrape that text to train on. It’s a photocopier copying its own copies. Each generation a little blurrier, a little less connected to the original, a little more generically plausible and specifically useless.
Meanwhile, the actual knowledge — the kind that makes the difference between a company that’s good at what it does and a company that’s extraordinary at what it does — was never on the internet in the first place. It’s in the heads of people. In the habits of teams. In the undocumented workarounds that keep
production lines running. In the instincts of a sales rep who can feel when a deal is about to go sideways. In the judgment of an engineer who knows that the spec says 45 but the answer is 42 — and can tell you exactly why, if you bother to ask.
Nobody’s asking. That’s the problem. Or more precisely — nobody’s asking systematically, and nobody’s capturing the answers in a form that AI systems can use. We’ve spent the last decade building increasingly sophisticated models and increasingly powerful computing infrastructure, and we’ve neglected the one input that actually determines whether any of it produces something worth having: the knowledge.
This book is about that gap — the gap between the information we’ve digitized and the knowledge we haven’t — and why closing it is the single most important strategic challenge for any organization that intends to remain competitive in the next decade. It’s about why the Information Age is ending (not because information became worthless, but because it was commoditized to the point of indistinguishability), why proprietary knowledge is the defining competitive advantage of what comes next, and what you actually need to do about it — not in theory, but on Monday morning.
I’m going to be direct about a few things throughout this book, because I think honesty is more useful than diplomacy and I’ve never been particularly good at diplomacy anyway. Most of what passes for “AI strategy” in enterprises today is vendorselection theater. The bottleneck for enterprise AI is almost never the technology — it’s the knowledge. The organizations that figure this out in the next two to three years will build
advantages that are, for all practical purposes, permanent. And the reason most organizations won’t figure it out is that capturing knowledge is hard, messy, political, and doesn’t fit neatly into a quarterly planning cycle.
But here’s the part that keeps me going: if we get this right — if we actually build an economy that values human knowledge, that compensates people for what they know, that uses machines to handle the drudgery and frees humans to do the thinking — then we’re a step closer to the future I’ve been watching on a screen since I was a kid. Not because the technology gets us there. Because the choices we make about how to use it do.
The future of AI depends not on who builds the biggest model, but on who feeds it the best knowledge. This book is about how to be the one doing the feeding — and why it matters for more than just the bottom line.
Let’s get into it.
Part I: The End of the Information Age
The Information Age was built on a lie. Not a malicious one — more the kind of lie that starts as a simplification, gets repeated enough times to become an assumption, and eventually hardens into a worldview that nobody questions because questioning it would mean rethinking about forty years of investment decisions.
The lie is this: more data equals more knowledge.
It doesn’t. It never did. And the fact that an entire economic era was named after the wrong layer of the stack — “information” instead of “knowledge” — tells you something about how deeply the confusion runs. We didn’t call it the Knowledge Age. We called it the Information Age. That wasn’t a branding accident. It was a philosophical tell.
Let me unpack this, because understanding why that distinction matters is the foundation for everything else in this book.
The Hierarchy Nobody Taught the CEOs
There’s a framework that’s been floating around academia since the 1980s — the DIKW hierarchy. Data, Information, Knowledge, Wisdom. It shows up in information science textbooks and gets a polite nod in the occasional management seminar, but I’ve never once seen it on a boardroom slide. Which is remarkable, given that it’s the single most useful mental model for understanding why most enterprise AI initiatives underperform.
Here’s how it works, stripped of the academic scaffolding.
Data is raw signal. Numbers, characters, sensor readings, timestamps. A temperature reading of 87.3°C means nothing by itself. It's a fact without a frame. Machines generate data constantly and in quantities that stopped being meaningfully expressible in human terms somewhere around 2015. Data is plentiful. Data is cheap. Data, by itself, is almost completely useless.
Information is data with context. That temperature reading becomes information when you know it's from Reactor 4, taken at 14:32 on a Thursday, and the normal operating range is –8085°C. Now you have something structured, something searchable, something a human or a machine can act on. This is what databases store. This is what dashboards display. This is what Google indexed. The entire digital infrastructure of the last four decades was built to create, store, transmit, and retrieve information. And it was extraordinary at it.
Knowledge is information integrated with experience and judgment. It's knowing that Reactor 4 runs hot on Thursdays because the upstream feed rate increases after the Wednesday maintenance cycle, and that 87.3°C is technically out of range but within the acceptable drift pattern that the engineering team validated in 2021 after the thermal modelling update -- so you don't shut down the line, you monitor for two more cycles and only escalate if it hits 89. That knowledge lives in the head of the shift supervisor who's been running this reactor for twelve years. It does not live in any database. It does not show up in any dashboard. If you asked the AI system, it would recommend an immediate shutdown -- because the AI has information, not knowledge.
Wisdom is knowledge applied with ethical judgment and long-term perspective. It's the plant manager who knows everything the shift supervisor knows, but also understands that the push to run Reactor 4 harder is
coming from a quarterly target that doesn't account for the accelerated wear on the thermal coupling -- and who makes the call to reduce throughput even though it means missing the number, because the alternative is a $4 million repair in Q3. Wisdom involves accountability. It involves weighing outcomes against values. It involves the kind of judgment that can only come from having been wrong before and having learned what "wrong" costs.
Four layers. Each one built on the previous. Each transition harder to automate than the last.
Here’s the thing that should bother you: virtually all of the technology investment of the last forty years has been concentrated on the bottom two layers. We got spectacularly good at generating data and turning it into information. We built search engines, databases, data warehouses, data lakes, data meshes — an entire vocabulary of increasingly elaborate metaphors for “we store and retrieve information at scale.” And then we declared victory and named an entire era after it.
This, by the way, is why “data is the new oil” was always bullshit.
I’ve sat through more conference keynotes than any one person should have to endure, and that phrase — data is the new oil — was the laziest, most seductive, most fundamentally misleading analogy of the last decade. It sounded profound. It justified billions in investment. And it was wrong at the most basic level,
because it confused the raw material with the refined product.
Data isn’t oil. Data is — at best — the dinosaurs. The organic matter buried under millions of years of sediment that, under enormous pressure and the right geological conditions, might eventually become something you can put in an engine. Oil is what you get after the transformation. Oil is knowledge — contextual, refined, energy-dense, immediately usable. Nobody has ever filled a tank with decomposing biomass and expected their car to start. But that’s essentially what most enterprise AI initiatives are doing: shoveling raw data into systems and wondering why the output is mediocre.
The entire “data is the new oil” era optimized for extraction — collect more, store more, process more. It never asked the question that actually matters: what do we know, and how do we make that knowledge usable? It optimized for volume at the bottom of the hierarchy and completely ignored the top. Which is how you end up with organizations sitting on petabytes of data, terabytes of information, and approximately zero structured, AI-consumable knowledge.
But information was never the hard part. Information was always just the ingredient. The hard part — the part where competitive advantage actually lives — is the transition from information to knowledge. And that transition requires something no technology has automated yet: human experience.
A Very Short History of Getting This Wrong
I’ll keep this brief because the past is only interesting insofar as it explains the present, and the present is more urgent than most people realize.
Wave One: Digitization (1980s–2000s)
The promise was straightforward: take everything that exists on paper and make it digital. Searchable. Accessible. The assumption -- largely correct at the time -- was that enormous amounts of useful knowledge were locked in filing cabinets and that liberating it into digital form would create value. It did. Enormously. Enterprise resource planning systems, digital document management, the early internet -- all of this turned analog information into digital information and made it available at speeds that would have been inconceivable a generation earlier.
What nobody noticed -- or at least nobody said out loud -- was that the digitization process captured the information layer of organizational knowledge and almost completely missed the knowledge layer. The filing cabinet was digitized. The person who knew which documents in the filing cabinet actually mattered, and why, and under what circumstances -- that person's knowledge stayed in their head.
Wave Two: The Social Web (2005–2015)
Something genuinely remarkable happened. Humans started voluntarily publishing their knowledge in digital form -- on forums, blogs, wikis, Q&A sites, and social platforms. Stack Overflow. Wikipedia. Thousands of niche forums where experts in absurdly specific domains shared what they'd learned. For roughly a decade, the internet functioned as an accidental knowledge capture system. Not because anyone designed it that way, but because humans are social creatures who enjoy sharing what they know, especially when other humans respond with recognition and reciprocity.
This was the golden age of "search the internet, get a knowledge-quality answer." And it worked -- imperfectly, with a lot of noise, but it worked -- because the content was human-generated, experience-driven, and maintained by communities that had quality standards (even if those standards were enforced by nothing more sophisticated than downvotes and social pressure).
The mistake -- and it's a big one -- was assuming this would continue. That the internet would always be a reliable source of human-generated knowledge. That the supply was infinite.
Wave Three: AI Consumption (2015–present)
The AI industry did something that, in retrospect, was both brilliant and catastrophically shortsighted. It took the entire accumulated output of Wave One and Wave Two -- every digitized document, every forum post, every Wikipedia article, every book it could
scrape -- and used it as training data for large language models. The results were spectacular. GPT-3 could write essays. GPT-4 could reason. The models got better and better, and the AI industry congratulated itself on having solved intelligence.
What it had actually done was build extraordinarily sophisticated systems for processing information. Which -- as we've established -- is not the same thing as knowledge. But the outputs were fluent enough, and confident enough, and useful enough for a wide enough range of tasks that the distinction got blurred. If the AI can answer my question in natural language and the answer is usually right, then the AI "knows" things -- right?
Wrong. The AI has information. Lots of it, very well-organized, very quickly retrievable. But it has no experience. It has no judgment. It has no understanding of context that wasn't in the training data. And the training data -- the accumulated knowledge of the internet's golden age -- is a fixed stock, not a renewable flow. The AI scraped the library. The library is not restocking.
This is where we are. The paradigm broke not because the technology failed, but because the technology succeeded at exactly the wrong thing. We built systems that are magnificent at processing information and structurally incapable of generating knowledge. And then we ran out of knowledge to feed them.
Where the Paradigm Actually Broke
There’s a concept from information theory that I find useful here — not because I expect anyone to care about Claude Shannon’s mathematics, but because it clarifies something important about why the Information Age ended.
Shannon defined information as the reduction of uncertainty. A message has information content to the extent that it tells you something you didn’t already know. By this definition, the internet was extraordinarily rich in information during its first two decades: every web page, every forum post, every article reduced your uncertainty about something. There was always more to learn.
But Shannon’s definition says nothing about whether reduced uncertainty is useful. It says nothing about whether the information enables you to act effectively in a specific context. It says nothing about judgment, about trade-offs, about the difference between knowing a fact and understanding its implications. That gap — between Shannon-information and actionable knowledge — is where the entire edifice starts to crack.
Here’s the practical version. Google worked — brilliantly, transformatively, historically — because it solved an information retrieval problem. Given a query, find the most relevant documents. Page Rank was genius because it used the structure
of human linking behavior as a quality signal: pages that many other pages linked to were probably more relevant. And for about fifteen years, that signal was reliable, because humans were the primary creators and curators of web content.
Then two things happened, roughly simultaneously. First, people started creating content not for other humans but for search algorithms. SEO turned the web from a knowledge commons into a competition for attention, and the quality signal degraded. Second — and more fatally — AI systems started generating content at scale. By 2025, the majority of new web content was synthetic. The signal that Page Rank relied on — human judgment expressed through linking behavior — was drowned out by machine-generated noise.
The search engine didn’t break. The thing it was searching broke.
And this isn’t just a Google problem. It’s a civilizational problem. The entire implicit contract of the Information Age — put knowledge into digital form, make it searchable, everyone benefits — depended on humans continuing to be the primary authors of digital content. That contract has been voided. The internet is now a system where machines generate content that other machines process, with humans increasingly unable to distinguish signal from noise.
The Information Age didn’t end because we ran out of information. It ended because we drowned in it — and the information that kept flowing was increasingly disconnected from the human experience that made it
valuable in the first place.
Why This Matters for You, Specifically
I can hear the objection forming. “Interesting history lesson, but what does this have to do with my organization?” Everything.
If you’re running AI initiatives on top of public information — which is what every out-of-the-box AI deployment does — you’re building on a foundation that is actively degrading. The training data your models use is a snapshot of the internet circa 2022, maybe 2023. It’s not getting better. The web content available for future training is increasingly synthetic, increasingly circular, increasingly disconnected from the human knowledge that made early models impressive. Your AI will not get smarter by consuming more public data. It will get more fluent — and more generically useless.
The organizations that pull ahead from this point will be the ones that understood the hierarchy. Not the ones with the best models or the biggest compute budgets — the ones that realized the bottleneck was never at the data or information layer. It was always at the knowledge layer. It was always about the shift supervisor who knows Reactor 4 runs hot on Thursdays. It was always about the sales rep, the engineer, the twenty-year
veteran who can’t tell you how they know what they know but whose decisions are right 90% of the time because they’ve been wrong enough times to have calibrated their intuition against reality.
That’s knowledge. It has always been the scarce resource. We just didn’t notice because the Information Age was loud enough to drown out the signal.
Now that the noise is all that’s left, the signal becomes everything.
The ocean looks full. The fish are gone.
That’s the shortest version of this chapter, and if you’re in a hurry, you can stop here and skip to Chapter 3. But if you want to understand why the most powerful technology ever built is running into a wall that has nothing to do with compute, nothing to do with algorithms, and everything to do with the fact that we’ve eaten through the planet’s supply of useful human-generated knowledge — keep reading.
The Numbers Nobody Wants to Say Out Loud
In 2024, the research firm Epoch AI published an analysis that should have been front-page news everywhere and instead got politely discussed in a few academic circles and promptly ignored by the people making the investment decisions. Their finding: the stock of high-quality public text data — the
stuff that actually makes language models good — would be fully consumed by AI training processes somewhere between 2026 and 2032. The median estimate, with standard training approaches, was 2028. With the kind of aggressive overtraining that labs were already doing — running models five to ten times over the same data — the effective depletion date moved to 2026 or 2027.
By early 2026, we’re not projecting anymore. We’re observing.
The web corpus that powered GPT-3, GPT-4, Llama, Deep Seek, and every other foundation model you’ve heard of is effectively exhausted. Not gone — it still exists — but squeezed dry. Additional scraping yields diminishing returns. The labs know this. They’ve known it for a while. The constraint on AI is no longer bigger models or more compute. It’s better training data. And the easily accessible supply of better training data is running out.
I want to be precise about what “running out” means here, because the objection I always hear is: “But there’s more data being generated every day than ever before!” Which is true. And completely irrelevant. Because the total volume of data continues to grow exponentially while the useful, high-quality, human-generated portion of that data is a finite stock, not an infinite flow. It’s the fishing analogy again — the ocean is bigger than ever, but the fish stocks are depleted. You can cast bigger nets. You can build bigger boats. You can deploy sonar and GPS and satellite tracking. You’ll still catch less and less of what you actually need, because the thing you need isn’t water — it’s fish.
The AI industry spent a decade building bigger boats. It forgot to check whether there were still fish in the water.
When the Internet Started Eating Itself
The depletion story has a darker companion, and it’s this: the data that is being generated is increasingly worthless for training purposes — because it’s being generated by the very systems it’s supposed to train.
By April 2025, analyses showed that approximately 74% of newly created web pages contained detectable AI-generated content. A study of 65,000 URLs found that AI-generated articles first surpassed human-written articles in November 2024 and have since stabilized at roughly a 50/50 split for new publications. Europol had estimated that up to 90% of online content could be synthetically generated by 2026. That extreme scenario hasn’t fully materialized, but the trajectory is clear enough that arguing about the exact percentage misses the point.
The point is this: the open web — the thing that powered the entire AI revolution — is now predominantly authored by AI. Which means that scraping the web in 2026 to train a new model is, to a substantial degree, training AI on AI’s output. And if you don’t see why that’s a problem, let me introduce you to a concept that sounds academic but has consequences that are anything but.
Model Collapse: The Photocopier Problem
In July 2024, a peer-reviewed study in Nature — not a blog post, not a preprint, not a conference talk, but Nature — demonstrated something that the AI research community had been nervously theorizing about for two years. When you train AI models on data that was itself generated by AI models, the output degrades. Not slowly. Not gracefully. Irreversibly.
The technical term is model collapse. The intuitive version is a game of telephone — except each person in the chain also adds their own best guesses about what was said, and after ten rounds, the original message is unrecoverable. What the Nature study showed was more specific and more alarming: the tails of the original data distribution disappear. The rare, unusual, novel, edge-case content — the stuff that makes a model interesting and capable of genuine insight — gets progressively shaved off with each generation of recursive training. What’s left is the middle. The average. The generically plausible.
If that were the extent of it, you could argue it’s a manageable problem. Train more carefully. Filter the synthetic data. Keep a clean reserve of human-generated content. And to some extent, that’s true — model collapse is not inevitable if you’re disciplined about it. Recent research shows that synthetic data accumulated alongside original human data, rather than replacing it, and curated through verification pipelines, can actually improve model performance.
But then came the ICLR 2025 spotlight paper, and the comfortable narrative got considerably less comfortable.
That paper showed that even the smallest fraction of synthetic data in training sets — as little as one sample in a thousand — can trigger collapse dynamics. One in a thousand. And larger models, which you’d intuitively expect to be more robust, may actually amplify rather than mitigate the effect. The bigger the model, the more efficiently it memorizes and reinforces the patterns of its own output. Scale makes the problem worse, not better.
Think about what that means for a moment. The entire strategic bet of the AI industry — build bigger models, train on more data, scale to AGI — runs headlong into a wall where the data supply is contaminated by the industry’s own output, and the contamination threshold is absurdly low. One in a thousand. That’s not a filter you can engineer away. That’s a fundamental constraint on the methodology.
I’m not saying this to be alarmist. I’m saying it because the practical implication is stark, and it’s the load-bearing argument of this entire book: AI that feeds on AI becomes progressively less capable of generating genuinely new insights. It reproduces variations of existing information — increasingly bland, increasingly average, increasingly divorced from the messy, specific, contextual reality that makes knowledge valuable. The models get more fluent. They get less useful. And the gap between “sounds right” and “is right” widens with every generation of recursive training.
The Three Exhaustion Vectors
Let me consolidate this into something actionable, because the academic nuance matters but the strategic picture is what should be driving decisions. The supply of useful training data is under pressure from three independent vectors simultaneously.
Quantity: it's running out. The high-quality public text that made current models possible was a one-time windfall -- the accumulated output of thirty years of human activity on the internet. That stock has been consumed. New contributions to the stock are being outpaced by synthetic content generation. There is no second internet to scrape.
Quality: it's degrading. Even the data that's being added to the web is increasingly polluted. AI-generated content, SEO-optimized filler, content-farm output designed to game engagement metrics -- all of it dilutes the signal. The ratio of knowledge-quality content to noise has inverted. In 2010, you searched and mostly found signal with some noise. In 2026, you search and mostly find noise with occasional signal -- if you know how to look.
Legality: it's being restricted. This is the vector most people underestimate, and I'll spend a full section on it in Chapter 3. But the short version: the legal frameworks that allowed AI labs to scrape the internet without permission are collapsing. Court rulings, copyright office reports, and regulatory mandates are converging on a simple principle -- if you want to use someone's content to train your AI, you need to license it or own it. The era of "scrape first, worry later" is ending. And the organizations that don't have their own knowledge to fall back on will find themselves dependent on increasingly expensive, increasingly restricted, increasingly commoditized licensed data.
Three vectors. All pointing in the same direction. All accelerating.
The Library We’re Not Burning — We’re Ignoring There’s a story we like to tell about the Library of Alexandria. The greatest repository of human knowledge in the ancient world, destroyed by fire — sometimes we blame Caesar, sometimes religious fanatics, sometimes it was just neglect and underfunding, which is probably the most historically accurate and definitely the most depressing version. The story works as a cultural touchstone because it represents an irreversible loss. Knowledge, accumulated over centuries, gone.
We tell that story as a tragedy of destruction. What we’re doing now is worse, because it’s a tragedy of indifference.
We’re not burning the library. We’re leaving it standing and filling it with AI-generated copies of copies of copies — fluent, plausible, worthless — while the actual knowledge rots in the basement. Centuries of human expertise, institutional memory, scientific observation, craft knowledge, engineering intuition, medical judgment — sitting in filing cabinets, in the heads of aging experts, in handwritten notebooks, in languages nobody’s funding the translation of, in disciplines nobody’s funding at all — and we’re not digitizing it, not structuring it, not making it available. Not because we can’t. Because it doesn’t show up in a quarterly earnings report.
And here’s the part that should genuinely make you angry, or at least keep you up at night the way it keeps me up at night.
The modern Library of Alexandria could be rebuilt. Not metaphorically — literally. We have the technology to capture, digitize, structure, and make searchable an almost incomprehensible volume of human knowledge. OCR has gotten absurdly good. Speech-to-text is nearly flawless. Knowledge graphs can represent relationships between concepts at a scale no human mind could navigate. And AI — the very technology that’s consuming the existing knowledge supply — is extraordinarily good at finding connections between pieces of knowledge that no human would have thought to connect. That’s not a small thing. That might be the most important thing.
But we’re not doing it. Instead, we’re funding the hundred- thousandth SEO-optimized website about “the next big thing in AI” — written by AI, for algorithms, consumed by no one — while a retired metallurgist in Stuttgart takes thirty years of knowledge about high-temperature alloy behavior to his grave because nobody thought to ask him what he knows. While a monastery in Tuscany sits on manuscripts that haven’t been catalogued since the 18th century. While indigenous communities hold ecological knowledge accumulated over millennia that Western science is only beginning to recognize as relevant — and that knowledge is dying with its last speakers at a rate of roughly one language every two weeks.
The reason this keeps happening is that most people — most executives, most investors, most policymakers — think about knowledge the way they think about a tool: it’s valuable when it’s useful right now, for a specific purpose, with a measurable return. Knowledge that doesn’t obviously and immediately generate revenue gets ignored. Which is a catastrophically narrow way of understanding what knowledge is and how it creates value.
Knowledge doesn’t necessarily primarily create value in the first derivative. It creates value when it becomes a variable in an equation you haven’t formulated yet.
Let me make this concrete. In 1928, Alexander Fleming noticed mold contaminating a petri dish and — instead of throwing it away — investigated why the bacteria around the mold had died. That observation, in isolation, was worth nothing. It was a curiosity. A contaminated experiment. No immediate application, no commercial value, no quarterly impact. It took
another thirteen years, two other scientists, a world war, and massive government funding before penicillin became a usable drug. The knowledge existed in 1928. The equation it fit into didn’t exist until 1941.
Now multiply that by everything we’re currently ignoring. Every piece of human knowledge that seems useless, that seems niche, that seems disconnected from anything commercially relevant — is a potential variable in an equation we can’t see yet. And AI, for all its limitations, is genuinely extraordinary at one thing: finding patterns across vast, heterogeneous bodies of knowledge that humans would never have connected. That’s what it’s for. Not generating the hundred-thousandth blog post about productivity hacks. Finding the connection between the metallurgist’s alloy data, the monastery’s 16th-century chemical observations, and a materials science problem that nobody’s solved yet — because nobody had all three pieces in the same system.
Douglas Adams had a joke about this. In The Hitchhiker’s Guide to the Galaxy, a civilization builds the most powerful computer ever created and asks it the Ultimate Question — the answer to life, the universe, and everything. The computer thinks for seven and a half million years and delivers the answer: 42. Useless — not because the answer is wrong, but because nobody understood the question well enough to make sense of it.
We’re in a version of that situation. We have machines that are extraordinarily good at finding answers. What we don’t have — what we’re actively allowing to disappear — is enough of the knowledge to formulate the right questions. Every piece
of undigitized, uncaptured, unstructured human knowledge that vanishes is a variable removed from an equation we might desperately need in twenty years. Or five. Or next Tuesday.
The modern Library of Alexandria doesn’t require a building. It requires a decision — a collective, deliberate, economically irrational-looking decision to preserve and structure human knowledge not because it pays off immediately, but because the compounding value of having it available when the equation finally needs it is incalculable. Literally incalculable. You cannot model the ROI of knowledge you don’t yet know you’ll need.
And yet — because I’m a realist as much as an idealist — I know that “incalculable future value” doesn’t get budget approved. Which is why the rest of this book focuses on the knowledge that does have measurable, near-term, commercially defensible value: proprietary enterprise knowledge. The stuff that makes your AI better than everyone else’s AI right now. That’s the business case. That’s the first derivative.
But I want you to hold the bigger picture in the back of your mind while you read it. Because the business case is the door. What’s behind it is something larger.
What This Means for the Next Five Years
Here’s where I shift registers from analyst to someone who’s genuinely worried — not about AI as a technology, but about the decisions being made right now based on assumptions that are already outdated.
Most enterprise AI strategies I’ve seen — and I’ve seen more than I’d like — are built on an implicit assumption that foundation models will continue to improve at roughly the pace they’ve improved over the last five years. The reasoning goes: GPT-3 was impressive, GPT-4 was remarkable, therefore GPT5 or GPT-6 or whatever comes next will be transformative, and all we need to do is plug our processes into the API and wait for the magic to happen.
That assumption depends entirely on the continued availability of fresh, high-quality training data. Which — as we’ve just established — is a resource that is simultaneously depleting, degrading, and being legally restricted.
I’m not predicting that AI progress stops. The labs are smart. They’ll find workarounds — synthetic data generation with careful curation, reinforcement learning, reasoning architectures that rely less on raw training data. Some of these will work. Some won’t. The trajectory will be uneven.
But the era of “train on the internet and get reliably better every eighteen months” is over. The easy gains have been made. And the next generation of improvement — the one that actually matters for enterprise value — will come not
from better models trained on public data, but from models augmented with proprietary, curated, high-quality knowledge that no one else has.
This is the structural scarcity that the rest of this book is about. Not scarcity of compute. Not scarcity of algorithms. Scarcity of knowledge — the real, human-generated, experience-refined kind. The kind that was never on the internet. The kind that’s sitting in the heads of your employees, in the habits of your teams, in the undocumented decisions that make your operations work.
The public data ocean is fished out. The question is whether you have a private lake — and whether you’ve been stocking it.
Most organizations haven’t. Which is why the next chapter matters.
Three things are happening at the same time, and the fact that they’re happening at the same time is the entire point.
Individually, any one of them would be significant. Data depletion alone would force a strategic rethink. Regulatory pressure alone would reshape the economics of AI training. The rise of agentic AI alone would create massive new demand for structured knowledge. But these three forces aren’t happening individually. They’re converging — simultaneously, reinforcing each other, and creating a structural shift that is not cyclical, not temporary, and not something you can wait out.
I’ve spent enough years in consulting to know that “convergence” is one of those words that gets thrown around in strategy decks until it means nothing. So let me be specific about what I mean. I mean that the supply of knowledge is contracting, the legal framework for accessing what remains is tightening, and the demand for structured knowledge is exploding — all at once, all irreversibly. That’s not a trend. That’s a phase transition. And the window for positioning yourself on the right side of it is measured in years, not decades.
Force 1: The Supply Squeeze
I covered this in detail in Chapter 2, so I’ll keep the recap short.
The stock of high-quality, human-generated public data is effectively consumed. New web content is predominantly synthetic. Model collapse dynamics make recursive training on AI-generated content a degenerative process. The easily accessible supply of the raw material that powered the AI revolution is running out.
The strategic implication is Econ 101, and I almost feel em- barrassed stating it this plainly: when the supply of a re- source contracts while demand increases, the resource becomes strategically valuable. For most of the AI era, knowledge was treated as abundant and free — scrape the internet, train the model, ship the product. That assumption is now empirically false. Knowledge — the genuine, human-generated, experiencerefined kind — is scarce. And scarcity changes everything about how markets work.
If you’ve internalized the previous chapter, you already know this. I repeat it here because it’s the supply-side foundation for the convergence argument, and the other two forces only make sense when you understand that the baseline has shifted from abundance to scarcity.
Force 2: The Regulators Woke Up
For most of the AI industry’s existence, the legal framework for training data was — to put it charitably — ambiguous. The prevailing assumption in Silicon Valley was that scraping publicly available content for AI training constituted fair use. That assumption was never tested at scale, because nobody bothered to test it. The AI labs scraped. The content creators complained. The lawyers circled. And for about five years, everyone operated in a legal gray zone that benefited the people with the most compute and the least regard for intellectual property.
That era is ending, and it’s ending faster than most people in the industry expected.
In March 2025, a U. S. federal court ruled in Thomson Reuters v. Ross Intelligence that using copyrighted content for AI training does not constitute fair use when the use is commercial and directly competitive with the original. That’s not a niche ruling. That’s a federal court saying: if you’re building a commercial product by training on someone else’s copyrighted work, and your product competes with theirs, you don’t get to do that for free. The implications cascade through every AI lab that trained on web-scraped content — which is all of them.
Two months later, the U. S. Copyright Office published a report that made the position even more explicit: “training is not inherently transformative.” That language matters, because “transformative use” was the primary legal defense the AI industry had been relying on. The Copyright Office — which
is not a court but is enormously influential in shaping how courts think — essentially said: no, running someone’s content through a training pipeline does not magically transform it into something you own. You used it. You need to license it. Or you need to stop.
Across the Atlantic, the EU AI Act — effective since August 2024 — took a different but complementary approach. Rather than litigating fair use, the EU went straight to transparency: if you’re building a general-purpose AI model, you must disclose what data you trained on. That sounds bureaucratic until you realize what it means in practice — it means every foundation model provider operating in Europe must be able to account for the provenance of their training data. Which means they need to know what’s in it. Which means “we scraped the internet and didn’t really look too closely” is no longer a viable strategy.
The UK, which initially flirted with AI-friendly copyright exemptions, is trending toward alignment with the EU position. Japan, which had one of the most permissive AI training frameworks in the world, is reconsidering. The direction is global, and it’s unidirectional: toward requiring permission, licensing, or ownership of training data.
Now — I want to be clear about something, because the reflexive reaction in the tech industry is to treat regulation as an obstacle. It’s not. Or rather — it’s not only an obstacle. It’s a massive economic catalyst.
Here’s why. When scraping is free, data has no market value. Anyone can take it. There’s no incentive to create high-quality
training data, because someone will just scrape it without paying. But when the legal framework requires licensing — when you actually have to buy the knowledge your AI consumes — suddenly knowledge has a price. And when knowledge has a price, there’s an incentive to produce it, curate it, structure it, and sell it. The regulation doesn’t kill the market. It creates the market.
This is exactly what’s happening. The licensing economy has arrived, and it’s arrived fast. News Corp signed a deal with Open AI worth over $250 million across five years. Reddit licenses its data for somewhere between $60 and $70 million annually. Academic publishers are signing agreements left and right. Cloudflare launched AI bot blocking by default for all new domains in July 2025 and introduced a Pay-Per-Crawl marketplace that lets content owners charge for automated access.
These aren’t isolated deals. They’re the first wave of a systematic market infrastructure for knowledge trading. And they fundamentally change the economics of who wins in AI.
Under the old regime — scrape freely, train cheaply — the advantage went to whoever had the most compute. Under the new regime — license everything, own what you can — the advantage shifts to whoever has the best knowledge. Not whoever can process it fastest. Whoever owns it.
That’s a different game. And most organizations haven’t realized the rules changed.
Force 3: The Agents Are Coming
The third force in the convergence is the one that most people underestimate, partly because it sounds like science fiction and partly because the timeline from announcement to reality has been shorter than anyone predicted.
Agentic AI — AI systems that don’t just answer questions but autonomously perform tasks, make decisions, invoke tools, and interact with other systems — is moving from prototype to production faster than any enterprise technology I’ve seen in twenty years of working in this space. And I’ve seen a lot of things move fast. This is different.
Gartner predicts that 40% of enterprise applications will in- tegrate task-specific AI agents by the end of 2026. Up from less than 5% in 2025. That’s not a growth curve — that’s a step function. And even if Gartner’s estimate is optimistic by half, a move from 5% to 20% in a single year represents a fundamental shift in how enterprise software works.
Why does this matter for the knowledge economy thesis?
Because agents don’t just need training data. They need callable knowledge.
Let me explain the difference. A traditional AI system — a chatbot, a document summarizer, a recommendation engine — consumes knowledge at training time. You feed it data, it learns patterns, and then it applies those patterns at inference time. The knowledge is baked in. If the knowledge changes,
you retrain. It’s a batch process.
An agent operates differently. An agent encounters a situation, determines what it needs to know, goes and gets the knowledge in real time, applies it, takes an action, observes the result, and adjusts. It’s not a batch process — it’s a continuous, dynamic, real-time knowledge consumption loop. And that means the knowledge needs to be available not as training data but as something the agent can call — documented, secured, structured, API-accessible, with proper access controls and audit trails.
Here’s the problem: most enterprise knowledge doesn’t exist in this form. It exists in PDFs. In email threads. In people’s heads. In Confluence pages that haven’t been updated since 2021. In Share Point folders with names like “Archive_Final_v2_REAL.” An agent can’t call a PDF. An agent can’t call someone’s memory of what happened in a client meeting three years ago. An agent can’t call tacit knowledge that nobody’s written down.
This creates an infrastructure gap that is — and I don’t use this word lightly — enormous. Every enterprise deploying agentic AI needs structured, agent-callable knowledge to make those agents actually useful. Without it, you have agents that are autonomously confident and comprehensively ignorant — which is a polite way of describing a system that will make decisions based on whatever generic information it can scrape together, without any of the institutional context that makes decisions good.
The demand for agent-callable knowledge is about to explode. And the supply — for reasons we covered in Force 1 — is structurally constrained.
Why This Convergence Doesn’t Reverse I’ve sat through enough hype cycles to be deeply allergic to the word “paradigm shift.” Every new technology gets it. Cloud was a paradigm shift. Mobile was a paradigm shift. Blockchain was a paradigm shift. Half the time, the shift was real but overstated. The other half, it was just a new coat of paint on an old building.
This is different, and I can tell you exactly why. In a typical hype cycle, the driving force is singular: a new technology, a new platform, a new capability. It creates excitement, investment flows in, reality disappoints, the cycle corrects, and eventually the technology finds its appropriate (usually smaller) niche. The correction happens because the single driving force encounters friction — technical limitations, market resistance, regulatory pushback — and equilibrium reasserts itself.
The convergence I’m describing has three independent driving forces, and they reinforce each other. Data depletion makes proprietary knowledge scarce. Scarcity makes knowledge economically valuable. Economic value attracts regulatory frameworks that formalize ownership and licensing. Formalized ownership creates market infrastructure for knowledge trading. Market infrastructure enables monetization. Monetization incentivizes knowledge capture. Meanwhile, agentic AI creates massive new demand for structured knowledge, which accelerates the entire cycle.
There is no scenario in which all three forces reverse simultaneously. Data un-depletes? The internet un- pollutes itself with human-generated content? Regulators decide copyright doesn’t apply to AI training after all? Agentic AI turns out to be a fad? You’d need all three to happen at once, and the probability of that is — let me put it diplomatically — not something I’d bet on.
The window for action is now. Not because I’m trying to create urgency for dramatic effect — I’m constitutionally incapable of that kind of manipulation and would find it embarrassing — but because the dynamics of knowledge capture create compounding advantages. The organization that starts systematically capturing its institutional knowledge today has a two-year head start that cannot be purchased. Unlike software, which can be licensed overnight, and unlike compute, which can be rented by the hour, institutional knowledge can only be built organically, over time, through the slow, messy, deeply human process of asking experts what they know and turning their answers into something a machine can use.
Two years from now, the organizations that started today will have structured knowledge bases that make their AI systems measurably, demonstrably, competitively superior. And the organizations that waited will be trying to catch up by licensing generic knowledge from marketplaces — which is better than nothing, but it’s the equivalent of buying storebrand ingredients while your competitor is working with produce from their own garden. The meals will be different. The customers will notice.
The Fork in the Road
I want to end this chapter by zooming out, because the convergence isn’t just a business story. It’s a civilizational fork.
Path A: the supply-demand imbalance resolves through concentration. A small number of organizations -- the ones that moved first, invested most, or simply happened to already sit on valuable knowledge assets -- control the lion's share of structured, AI-consumable knowledge. They license it to everyone else at prices that reflect monopoly dynamics. AI systems converge on the same knowledge bases, producing the same outputs, and the competitive advantage goes not to who knows more but to who charges more for access. The knowledge economy becomes an extraction economy. A few get very rich. Most get dependency.
Path B: the supply-demand imbalance resolves through distribution. Organizations invest in capturing their own knowledge. Marketplaces emerge with real competition, real price discovery, real quality standards. Individuals get mechanisms to monetize their expertise. Open knowledge initiatives preserve the long tail -- the monastery manuscripts, the indigenous ecological wisdom, the retired metallurgist's alloy data. AI systems become
genuinely specialized and genuinely useful, because they're fed genuinely unique knowledge. The knowledge economy becomes a knowledge ecosystem.
Path A requires nothing. It’s the default. It’s what happens when nobody makes a deliberate choice.
Path B requires effort, investment, and — hardest of all — the willingness to treat knowledge as something worth preserving even when the immediate ROI is unclear.
I wrote this book because I believe Path B is both better and achievable. But I’m not naive enough to think it’s inevitable. The technology enables both paths equally. The outcome depends entirely on what the people reading this decide to do.
So — what’s it going to be?
Roughly 80% of the world’s knowledge has never been written down.
I need you to sit with that number for a moment, because it sounds like the kind of statistic someone pulls out at a conference to sound provocative — and then everyone nods and moves on to the next slide. So let me make it uncomfortably concrete.
The metallurgist who spent thirty years perfecting heat treatment sequences for aerospace components — sequences that aren’t in any textbook because they evolved through thousands of micro-adjustments based on how specific furnaces behave in specific humidity conditions — that knowledge is in his hands, not in a document. The surgeon who can feel when something is wrong before the imaging confirms it — not mysticism, but pattern recognition accumulated over ten thousand procedures — that knowledge is in her nervous system, not in a medical database. The farmer in the Mekong Delta who knows which rice varieties to plant based on how the river smelled in March — knowledge passed down through five generations, validated
against reality every single harvest — that knowledge is in a language that has 400 speakers left and no written grammar.
All of that is in the 80%.
The number comes from the long-standing observation in knowledge management research that explicit knowledge — the kind you can write down, codify, store in a system — represents roughly 20% of what an organization or a society actually knows. The other 80% is tacit: embodied in skills, embedded in routines, encoded in relationships, carried in memories, and transmitted — when it’s transmitted at all — through apprenticeship, demonstration, conversation, and the kind of slow, messy, human-to-human interaction that doesn’t scale and doesn’t digitize easily.
This chapter is about that 80%. Why it matters more than the 20% we’ve already digitized. Why we’ve been ignoring it. And why, for the first time in history, there’s a reason — an economic, structural, unavoidable reason — to stop ignoring it.
It’s Not Just a Corporate Problem The usual framing of the tacit knowledge problem is organizational. Company X has experts approaching retirement, their knowledge isn’t documented, the company faces a “brain drain.” It’s a real problem — I’ve seen it destroy operational capability
in manufacturing, in consulting, in engineering firms — but framing it purely as a corporate issue misses the scale of what we’re actually dealing with.
This is a civilizational problem.
Across every domain of human activity — medicine, agriculture, craftsmanship, law, engineering, ecology, navigation, cooking, construction, conflict resolution — there exist vast bodies of knowledge that were developed over centuries, refined through practice, and never committed to any medium that a digital system can access. Not because people were lazy. Because the knowledge was the kind that resists codification. It’s procedural — you learn it by doing, not by reading. It’s contextual — it only makes sense within a specific environment, with specific tools, under specific conditions. It’s relational — it’s transmitted from master to apprentice, from elder to community, from parent to child, in interactions that are as much about trust and observation as they are about content.
The Enlightenment gave us this idea — a useful but incomplete idea — that knowledge worth having is knowledge that can be written down, systematized, and made universally accessible. And that’s true for a certain kind of knowledge: scientific laws, mathematical proofs, historical records, technical specifications. The explicit 20%. The kind that fits in books and databases. The kind the internet was built to distribute.
But the Enlightenment framework has a blind spot the size of a continent, and the blind spot is everything that can’t be easily written down. The 80% that Polanyi called
“tacit” in 1966 when he wrote the single most important sentence in the history of knowledge management: “We can know more than we can tell.”
That sentence should be tattooed on the forehead of every Chief Data Officer in the world. We can know more than we can tell. The gap between what humans know and what humans have explicitly articulated is not a documentation problem. It’s a structural feature of how knowledge works. And until very recently, that gap was something we simply lived with — an accepted limitation, like gravity or quarterly reporting.
The Inventory of What We’re Losing Let me give you a partial inventory of the 80%, organized not by industry but by the type of knowledge that’s disappearing. Because the types matter — they tell you what’s hard about capturing it and what’s needed to make it usable.
Embodied knowledge -- the kind that lives in muscle memory, in physical intuition, in the hands of someone who's done something ten thousand times. A master welder who can hear the difference between a good arc and a bad one. A baker who adjusts hydration by how the dough feels, not by what the recipe says. A violinist who knows where to place a finger to the fraction of a millimeter -- without conscious calculation, without measurement, through years of practice that rewired the neural pathways. This knowledge cannot be captured by writing it down. The
person who holds it often can't articulate it. And yet it's real, it's valuable, and it produces outcomes that no amount of explicit instruction replicates.
Contextual knowledge -- the kind that only makes sense within a specific environment. The torque example from Chapter 5 lives here: 42 Nm on Station 7, not because the physics are different, but because the heat exchanger proximity creates a condition that the general specification doesn't account for. Every organization has hundreds, thousands of these -- local adaptations, site-specific workarounds, customer-specific protocols that exist because someone, at some point, discovered through failure that the standard approach doesn't work here. This knowledge is almost never documented, because from the perspective of the organization's formal knowledge systems, it's an exception. An edge case. A deviation from the standard. But in practice, the exceptions are the knowledge. The standard is just information.
Relational knowledge -- the kind that exists in the connections between people, not in any individual's head. How a sales team coordinates when a deal is going sideways. Which department head to call when you need an emergency approval and the official process will take three weeks. The unwritten rules of how decisions actually get made -- not the org chart,
but the shadow org chart that everyone knows and nobody draws. This knowledge is inherently distributed and inherently social. It can't be extracted from one person because no one person holds it. It emerges from the network.
Cultural knowledge -- and here I'm stepping outside the enterprise for a moment, because this matters. Indigenous ecological knowledge. Traditional medicine. Agricultural practices evolved over millennia in specific bioregions. Craft techniques preserved within specific communities. This knowledge is under acute threat -- not from digitization failure but from cultural disruption, language death, and the economic marginalization of the communities that hold it. A language dies roughly every two weeks. With each one, an entire framework for understanding the world -- tested and refined over generations -- disappears.
Experiential knowledge -- the synthesized judgment that comes from years of doing something, failing at it, adjusting, and developing an intuition that works faster and more accurately than any analytical process. The ER doctor who walks into a room and knows the patient is septic before seeing the lab results. The trader who exits a position because something "feels wrong" -- and six hours later, the market confirms it. The parent who can tell from the specific quality of their child's silence that
something happened at school. This knowledge is real, reliable, and fundamentally non-transferable through any medium other than shared experience.
That’s a partial list. The full inventory would fill a book much longer than this one. But the pattern should be clear: the 80% isn’t marginal, supplementary, or nice-to-have. It’s the knowledge that makes things actually work — in organizations, in communities, in civilizations. The 20% we’ve digitized is the scaffolding. The 80% is the building.
Why Nobody Captured It (Until Now)
The obvious question is: if this knowledge is so valuable, why hasn’t anyone systematically captured it?
The answer has three layers, and the top two are the ones everyone talks about while the third is the one that actually matters.
The first layer is technical. Tacit knowledge is hard to capture because, by definition, the people who hold it can’t easily articulate it. You can’t just interview an expert and write down what they say — because what they say is often a simplified, incomplete, and sometimes misleading version of what they ac- tually do. The gap between articulated knowledge and practiced knowledge is enormous, and bridging it requires techniques — structured observation, process elicitation, scenario-based
questioning, cognitive task analysis — that are time-intensive, skill-intensive, and don’t scale.
The second layer is organizational. Knowledge capture competes for resources with everything else — product development, customer acquisition, quarterly targets. And knowledge capture has a nasty property from a management perspective: the costs are immediate, the benefits are diffuse and deferred. You spend $500,000 this year to capture the knowledge of your retiring chief engineer. The payoff shows up in avoided mistakes, better decisions, and faster onboarding — but it shows up over years, across departments, and it’s nearly impossible to attribute causally. No one ever gets promoted for a failure that didn’t happen because someone captured knowledge that prevented it. That’s the incentive problem.
The third layer — the one that actually matters — is economic. And this is where the story changes.
Until now, there was no buyer for captured knowledge.
I mean that quite literally. You could invest enormous effort in extracting tacit knowledge from your best people, structuring it, making it searchable — and then what? It sat in a system. A human might find it if they searched for it. A human might not. The consumption mechanism was passive: make it available, hope someone uses it. There was no active consumer that could reliably, systematically, at scale turn captured knowledge into measurable value.
AI is that buyer. And agents are the buyer’s most voracious form.
An AI agent encountering a decision point doesn’t hope to find relevant knowledge. It demands it. It identifies what it needs, queries the available knowledge bases, retrieves contextual information, applies it to the decision, acts, and moves on — dozens or hundreds of times per hour. Every agent interaction is a knowledge transaction. Every knowledge transaction creates measurable value: a better recommendation, a faster resolution, an avoided error, a more accurate prediction.
For the first time in history, knowledge capture has an economic buyer that is insatiable, measurable, and always on.
That changes everything about the incentive structure. The retired metallurgist’s alloy knowledge isn’t just “worth preserving” in some abstract cultural sense — it’s monetizable. An agent serving materials science inquiries will pay for that knowledge (or rather, the company deploying the agent will pay for it) because it produces better outputs than the generic information available in public training data. The Tuscan monastery sitting on uncatalogued 16th-century manuscripts has, for the first time, a realistic economic path from “priceless cultural artifact” to “revenue-generating knowledge asset” — because an AI system cross-referencing historical chemistry observations with modern materials science might find connections that no human researcher had the time or the mandate to look for.
The monastery doesn’t need to understand agentic AI. It needs to understand that someone will pay for what it has — and that the “someone” is a system that never sleeps, never gets bored,
and processes knowledge at a rate no human institution can match.
The Incentive Cascade
Let me map this out, because the dynamics are self-reinforcing in a way that matters for strategic planning.
Agentic AI creates demand for structured, callable knowledge. Demand creates economic value for knowledge. Economic value creates incentives for knowledge holders — individuals, organizations, communities, institutions — to capture and structure their knowledge. Captured knowledge feeds AI systems that become measurably better. Better AI systems attract more users. More users create more demand. The cycle accelerates.
This is not theoretical. The early signals are already visible.
Consulting firms are investing in “knowledge asset” programs — not the old-school knowledge management initiatives that produced document repositories, but structured efforts to capture expert judgment and decision frameworks in forms that AI systems can consume. Manufacturing companies are running “knowledge extraction sprints” where retiring engineers spend their last months working with knowledge engineers (a role that barely existed two years ago) to formalize decades of operational intuition. Academic institutions are exploring knowledge licensing — not just publishing papers but
structuring their research outputs as AI-consumable knowledge products.
And the most interesting signal: individuals are starting to realize that what they know has economic value. Not their labor — their knowledge. The consultant who’s spent twenty years in pharmaceutical regulatory affairs, the engineer who designed three generations of turbine blades, the farmer who’s been breeding drought-resistant varieties for forty years — these people sit on knowledge that is, in the new economy, genuinely valuable. And for the first time, there’s a mechanism to turn that value into income without requiring them to become full-time employees of an AI company.
This is the part that connects back to the Prologue. If we get this right — if the economic incentives for knowledge capture are distributed broadly enough — then the Knowledge Economy isn’t just a corporate strategy play. It’s a mechanism for valuing human experience. For paying people not for their time but for what they know. For creating an economy where the question isn’t “how many hours did you work” but “what knowledge did you contribute.”
That sounds utopian. Maybe it is. But the infrastructure for it is being built right now, by people who are thinking about quarterly earnings — not Star Trek. The beautiful thing about economic incentives is that they don’t require idealism to function. They just require the math to work.
And the math is starting to work.
What the 80% Actually Looks Like When You Capture It
I want to close with something practical, because this chapter has been more philosophical than some readers might prefer, and I owe you a bridge to Part II of the book.
When tacit knowledge gets captured well — when someone actually does the hard work of extracting it from experts, structuring it, validating it, and making it AI-consumable — it doesn’t look like a document. It doesn’t look like a FAQ. It doesn’t look like a wiki page.
It looks like a judgment call with context.
“When the customer asks for a delivery acceleration on a standing order, check whether they’ve had quality complaints in the last 90 days. If yes, the request is probably about a replacement for defective units, not genuine demand increase — process as priority but flag for quality review. If no, check seasonal patterns — Q4 requests from retail customers are almost always real. Q1 requests from industrial customers are almost always planning noise. Route accordingly.”
That’s knowledge. It’s contextual, experiential, conditional, and it produces better outcomes than any generic instruction. It took a supply chain manager fifteen years to accumulate. It takes a knowledge engineer about two hours to extract and structure. And once it’s in a system, it serves every agent, every
decision, every interaction — forever.
Multiply that by every expert in your organization. Then multiply it by every expert in the world.
That’s the 80%. That’s what we’ve been leaving on the table.
Part II: The Hidden Treasure
Most people who run companies think they have a knowledge problem. They don’t. They have an information problem dressed up in a knowledge costume — and nobody in the room is willing to point out that the emperor’s Confluence wiki isn’t wearing any clothes.
I made the conceptual case for why knowledge and information are different layers of the same hierarchy in Chapter 1. I showed you what’s happening to the information supply in Chapter 2. I walked through the forces converging to make knowledge the scarce resource in Chapter 3, and mapped the 80% of undigitized human knowledge in Chapter 4. If you’ve read those chapters, you understand the theory.
This chapter is about making the theory operational. Because understanding the distinction in the abstract is worth nothing if you can’t look at your own organization — or your own life, for that matter — and tell the difference between what you have
and what you know.
So let me be precise. And then let me give you the tools to be precise yourself.
The Distinction, Made Painfully Concrete
Information is a fact with context. Knowledge is a fact with judgment.
Here’s the example I keep coming back to because it makes the point with surgical clarity. A maintenance technician at a manufacturing plant asks the AI system: What’s the recommended torque for the mounting bolts on Station 7?
An information system -- which is what most enterprise "knowledge" platforms actually are -- returns: 45 Nm, per manufacturer specification, document reference MNT-2019-047.7
Correct. Usable. Completely generic. Any technician at any plant running the same equipment gets the same answer, because it’s the same information.
A knowledge system returns something different: Manufacturer spec is 45 Nm. However, on Station 7 specifically, use 42 Nm -- the housing assembly has a documented thermal expansion issue under sustained load that causes micro-warping at standard torque. This was identified after three seal failures in Q2 2019 (incident reports IR-2019-114 through IR-2019-116). The 42 Nm specification was validated by the engineering team in July 2019 and has eliminated the failure mode. Note: this adjustment applies only to Station 7 due to its proximity to the heat exchanger on Line 3.
Same question. Radically different answer. The first gives you what the manual says. The second gives you what the organization knows — which includes the manual, plus three failures, plus a root cause analysis, plus institutional memory about why this specific station behaves differently from every other station running the same equipment.
Information tells you what. Knowledge tells you what, plus why, plus when it doesn’t apply, plus what went wrong the last time someone ignored the nuance.
That’s not a marginal difference. In the torque example, it’s the difference between a routine maintenance task and a $200,000 production line shutdown. And the reason I keep coming back to this example is that every single organization has a version of it. Every one. The specifics change — it’s a torque value in manufacturing, a client relationship dynamic in consulting, a regulatory nuance in pharma, a soil condition in agriculture
— but the pattern is universal: the official answer and the real answer are different, and the gap between them is where competitive advantage lives.
What AI Gets Right and Where It Hits a Wall
I want to be fair to the technology, because this isn’t an anti-AI argument — it’s a precision argument about where AI currently delivers value and where it doesn’t. And I say this as someone who builds these systems for a living.
Large language models are extraordinarily good at processing information. They can ingest a 200-page contract, extract obligations, cross-reference clauses, and flag inconsistencies in under a minute. They can summarize a quarter’s worth of customer support tickets and identify emerging patterns. They can translate between languages, reformat data, generate code, and produce first drafts of virtually anything text-based. All of that is information processing — and they’re better at it than any human who has ever lived.
But ask a public AI a question that requires knowledge — meaning contextual, experiential, judgment-laden understanding of a specific domain — and something interesting happens. The answer sounds great. It’s fluent, well-structured, confident. And it’s generically correct in the way that a textbook answer is correct… which is to say, correct in the abstract and useless in the specific.
Ask Claude or GPT about how to handle a difficult regulatory audit in the pharmaceutical industry and you’ll get a competent overview of regulatory frameworks, best practices, and general guidance. Ask someone who’s actually navigated three FDA warning letters and survived a consent decree, and you’ll get something entirely different — the unwritten rules, the things that technically aren’t required but that the reviewer will absolutely flag, the specific documentation formats that signal competence to the inspectors, the timing strategies that buy you breathing room, the political dynamics between the regional office and headquarters that affect how findings are escalated. None of that is in any document. All of it is knowledge.
The gap between these two answers — the information answer and the knowledge answer — is the competitive advantage gap. And here’s the thing that should terrify every executive relying on out-of-the-box AI: that gap is widening, not closing. Because as public models converge on the same depleting training data, their outputs converge toward commodity-level information. Your Chat GPT gives the same answer as your competitor’s Chat GPT. The differentiator was never the model. The differentiator is what you feed it.
The current generation of models cannot generate knowledge. They can process it, restructure it, surface it, even recombine existing knowledge in occasionally surprising ways. But they cannot create new knowledge from experience, because they don’t have experience. They have training data. Knowledge — real knowledge — comes from doing things, failing at things, adjusting, trying again, and building an intuition for how a specific domain actually behaves versus how the textbook says it should behave.
That’s not a limitation that will be solved by bigger models or more compute. It’s a structural characteristic of what knowledge is.
The Audit: Five Questions You Need to Answer
So here’s where this gets operational. I’ve spent four chapters explaining why this matters. Now I want to give you a framework for figuring out where you stand — because most organizations have never done an honest inventory of their knowledge versus their information.
I call this the Knowledge Audit, and it’s five questions. They’re simple to state and uncomfortable to answer honestly. Which, in my experience, means they’re the right questions.
Question 1: If your top ten domain experts left tomorrow, what would your AI systems lose? This is the test that separates knowledge-rich organizations from informationrich ones. If the answer is “nothing — everything they know is in our systems,” then you either have an extraordinarily good knowledge engineering program or — far more likely — you’re confusing their documents with their knowledge. The documents are the tip. The judgment underneath is the iceberg.
List the ten people in your organization whose departure would hurt most. Not the most senior — the most knowledgeable. The
ones everyone calls when something unusual happens. The ones who get consulted before major decisions, even when they’re not in the meeting officially. Those are your knowledge assets. Now ask: how much of what makes them valuable is in a system your AI can access?
If the honest answer is less than 20% -- which, in my experience, it almost always is -- then you have an information system, not a knowledge system.
Question 2: Where does your AI give different answers than your best people? This is the gap analysis. Pick ten real questions that come up in your operations — the kind of questions your team actually deals with. Run them through your AI system. Then run them through your best expert. Compare the answers.
Where they converge, you have information-level coverage. Fine. Necessary. Not differentiating.
Where they diverge — where the expert gives an answer the AI doesn’t, or adds context the AI misses, or warns about something the AI doesn’t know to warn about — that’s your knowledge gap. That’s the uncaptured competitive advantage. And that’s where your investment should go.
Question 3: What do your people know that your competitors’ people also know — and what do they know that nobody else knows? This is the differentiation test. Shared knowledge — industry standards, regulatory requirements, common best practices — is necessary but not differentiating. It’s table stakes. Your competitors have it too, and their AI can access the same public information yours can.
Proprietary knowledge — the things your organization has learned through its own specific experience, its own failures, its own innovations — is the competitive moat. The torque value. The client relationship insight. The production optimization that was discovered accidentally during a night shift in 2019 and never written down because nobody thought it was important enough.
Most organizations, when they do this analysis honestly, discover that their AI is well-fed on shared knowledge and starving for proprietary knowledge. Which means their AI is performing at industry average. By design.
Question 4: How much of your operational knowledge is in people’s heads versus in systems? Not documents. Not emails. Not meeting recordings. Knowledge — the kind that includes judgment, context, and conditional reasoning.
The honest answer for most organizations is somewhere between 70% and 90% of operational knowledge is tacit — in people’s heads, in team dynamics, in unwritten protocols. The 20% that’s in systems is almost entirely information, not knowledge.
That tacit-to-explicit ratio is your vulnerability metric. Every percentage point of knowledge that exists only in human heads is a percentage point that your AI can’t access, your agents can’t use, your new hires can’t benefit from, and your competitors can’t replicate — but also a percentage point that walks out the door every time someone retires, changes jobs, or takes a sick day.
Question 5: If you could make one person’s knowledge available to every AI system and every employee in your organization simultaneously — whose knowledge would it be, and what would change? This is the value question. It forces you to think about knowledge in terms of impact rather than volume. The answer is almost never the CEO. It’s usually the person three levels down who everyone informally consults — the “go-to” person whose phone never stops ringing because they’re the only one who really understands how something works.
Now imagine that person’s knowledge — not their personality, not their judgment in the philosophical sense, but their operational knowledge: the patterns they recognize, the rules of thumb they apply, the warnings they give, the exceptions they know about — imagine all of that, structured and available, feeding every AI system in your organization. Every customer interaction, every maintenance decision, every sales call, every product development review.
That’s the difference between an information system and a knowledge system. That’s the difference between an AI that sounds helpful and an AI that actually is.
The Knowledge Advantage is Asymmetric
Here’s why this matters strategically, and not just operationally.
Information is symmetric. By definition, once it’s published, indexed, and searchable, it’s available to everyone. Your competitor can access the same documentation, the same research papers, the same industry reports, the same training data as you. There is no sustained competitive advantage in information, because information equalizes.
Knowledge is asymmetric. Your organization’s specific experiences, your specific failures, your specific optimizations — those are yours. They can’t be scraped. They can’t be licensed from a marketplace (though some components of them can, which is a different discussion). They can only be built, slowly, through the messy process of doing things and learning from them.
This asymmetry is the entire foundation of competitive advantage in the Knowledge Economy. And it operates differently from every previous source of competitive advantage in one crucial way: it compounds.
An organization that captures its knowledge and feeds it to AI systems today doesn’t just have a static advantage. It has a learning advantage. The AI system that has access to proprietary knowledge makes better decisions. Better decisions generate
better outcomes. Better outcomes generate new knowledge. New knowledge gets captured and fed back into the system. The flywheel accelerates.
The organization that starts this cycle in 2026 will be — by 2028 — operating at a knowledge level that a competitor starting in 2028 cannot reach until 2030 at the earliest. Because you can’t buy the two years of accumulated, validated, compounding institutional knowledge. You can only build it.
This is why I keep saying the window is measured in years, not decades. Not because the technology has a deadline — it doesn’t — but because the advantage compounds. The early mover doesn’t just get a head start. They get an accelerating head start. The gap widens with time, not narrows.
From Theory to Action
I’ve spent Part I of this book explaining why the world is shifting from information to knowledge as the primary source of value. If I’ve done my job, you’re not just nodding — you’re running the five audit questions in your head and not loving the answers.
Part II is about what you do with that uncomfortable realization. The next chapter gets into the specific architecture of how knowledge capture actually works — not the theory, but the engineering. How you get from “our best engineer knows things our AI doesn’t” to “our AI knows everything our best engineer knows and applies it a thousand times per day.”
It’s not magic. It’s not even particularly complicated in concept. It’s just hard in practice — because it requires talking to humans, which has always been the part that technology companies find most inconvenient.
But then again — if it were easy, everyone would already be doing it, and there’d be no advantage to capture.
The fact that it’s hard is the moat.
The Method — From Human Heads to Machine Minds
Part III: Mining the Gold
Everything I’ve argued so far — the depletion, the convergence, the 80% problem, the asymmetric advantage — comes down to one practical question: How do you actually get knowledge out of people’s heads and into a form that AI systems can use?
This is where most books on similar topics get vague. They’ll give you a framework, a few diagrams, some aspirational language about “knowledge transformation,” and then leave you alone with the actual problem. I find that insulting. If I’m going to spend four chapters telling you that knowledge capture is the most important strategic initiative of the next decade, I owe you enough detail to actually do it.
So this chapter is the method. Not the only method — there are variations, refinements, and organization-specific adaptations that matter — but the foundational approach that I’ve seen work in practice, across industries, at scale. It has three phases:
Discover, Capture, Structure. Each one is harder than it sounds, each one has failure modes that will derail you if you don’t know they’re coming, and each one is absolutely essential.
I’ll be direct about something: this chapter is going to be longer and more detailed than the ones before it. If you’re the kind of reader who wants the big picture and trusts your team to handle the execution, read the first section of each phase and skip the rest. If you’re the kind of reader who needs to understand the mechanics before you’ll commit resources — and in my experience, the best executives are this kind — read all of it.
Phase 1: Discover
Before you capture anything, you need to know what’s worth capturing. This sounds obvious and it isn’t. Most organizations that attempt knowledge capture start with the wrong question: “What knowledge do we have?” That’s too broad. It leads to boil-the-ocean initiatives that try to document everything and end up documenting nothing useful.
The right question is: “Where is the gap between what our AI systems produce and what our best people know — and where does that gap cost us the most?”
That’s a different starting point entirely. It’s targeted. It’s economic. And it gives you a prioritization framework from
day one.
The Knowledge Gap Map: The first practical step is building what I call a Knowledge Gap Map. It's exactly what it sounds like -- a structured inventory of where your organization's AI outputs diverge from expert judgment, ranked by economic impact.
Here’s how you build one: Take your ten most consequential decision types — the decisions your organization makes repeatedly that have significant financial, operational, or strategic im- pact. In manufacturing, that might be: maintenance scheduling, quality assessment, supplier selection, production sequencing, equipment calibration. In consulting, it might be: staffing decisions, proposal scoping, risk assessment, client relationship management, knowledge domain matching. In healthcare, it might be: treatment protocol selection, diagnostic triage, resource allocation, patient risk stratification, care pathway optimization.
For each decision type, do three things.
First, document how your current AI or information systems support that decision. What data do they access? What output do they produce? What’s the quality and specificity of that output? Be honest — if your “AI-assisted” process is really just a search over a document repository with a language model summarizing the results, write that down.
Second, interview the people who actually make these decisions — the experts, the veterans, the people whose judgment you trust when it matters. Ask them: when you make this decision, what do you know that the system doesn’t? What factors do you consider that aren’t in the data? What rules of thumb do you apply? What have you learned from past failures? What would you tell a new hire that they’d never learn from the documentation alone?
Third, quantify the gap. When the expert’s answer diverges from the system’s answer, what’s the cost? Not theoretical cost — actual cost. The maintenance decision that prevents a $200,000 shutdown. The staffing decision that saves a project from failure. The diagnostic triage that catches a condition six hours earlier. Put numbers on it. Approximate numbers are fine. The point isn’t precision — it’s prioritization.
What you’ll end up with is a map that looks something like this: Decision Type A has a large knowledge gap with high economic impact — that’s your highest-priority capture target. Decision Type B has a small gap with low impact — that’s a niceto-have. Decision Type C has a large gap but low individual impact, except it happens ten thousand times a month — so the aggregate impact is enormous. That’s your second-highest priority.
This map is your strategic document. It tells you where to start, how to prioritize, and — critically — how to build a business case that a CFO will fund. Because “we need to capture institutional knowledge” gets a polite nod and no budget. “We identified $4.2 million in annual avoidable losses driven by
a gap between our AI systems and our expert judgment in maintenance scheduling, and here’s a plan to close that gap” gets a meeting with the board.
Identifying the Knowledge Holders: Once you have your gap map, you need to find the right people. This is less straightforward than it seems, because the people who hold the most valuable knowledge are not always the people with the most impressive titles.
I’ve found three reliable heuristics for identifying knowledge holders in an organization.
The phone test: When something unusual happens -- an unexpected failure, a weird edge case, a client situation nobody's seen before -- who do people call? Not who should they call according to the org chart. Who do they actually call? Follow the informal information flow. The person who gets the most "hey, quick question" messages is almost always sitting on a goldmine of tacit knowledge.
The exception test: Who gets consulted when the standard process doesn't apply? Every organization has standard procedures, and those procedures handle 80% of situations adequately. The interesting knowledge lives in the other 20% -- the exceptions, the edge cases, the "we tried the normal approach and it didn't work, so we called Sarah." Sarah is your
knowledge holder.
The retirement test: If this person announced their retirement tomorrow, who would panic? Which processes would degrade? Which decisions would get worse? The magnitude of the organizational anxiety triggered by a departure announcement is a remarkably accurate proxy for the knowledge value that person holds.
Run these three tests. You’ll get a list. That list will overlap only partially with your org chart’s hierarchy. Some of your most senior people will not be on it — they hold positional authority, not knowledge authority. Some of your most junior people will surprise you — they’ve been absorbing knowledge from mentors and from hands-on experience, and they hold more than anyone realizes.
This is also the moment where you’ll encounter the first political challenge: some of the people on your list will not want to participate. They’ll sense — correctly — that externalizing their knowledge changes their position in the organization. I’ll address how to handle this in the Capture phase. For now, just build the list.
The Knowledge Taxonomy: The last piece of the Discovery phase is building a taxonomy — a structured classification of the types of knowledge you’re going to capture. This matters because different types of knowledge require different capture
techniques, and if you try to capture everything the same way, you’ll get mediocre results across the board.
I use five categories, which map roughly to the types I described in Chapter 4.
Procedural knowledge -- how to do something. The step-by-step process that an expert follows, including the decision points, the conditional branches, and the "if this happens, do that instead" variations that never make it into standard operating procedures. This is the most capturable type of tacit knowledge, because it has a natural structure: inputs, steps, decisions, outputs.
Diagnostic knowledge -- how to figure out what's wrong. Pattern recognition, symptom interpretation, root cause analysis. This is harder to capture because it's often intuitive -- the expert "just knows" what's wrong without being able to fully articulate the reasoning chain. The capture technique here is scenario-based: present the expert with cases (real or constructed) and ask them to think aloud through their diagnostic process.
Predictive knowledge -- what's going to happen. Risk assessment, trend recognition, early warning signals. This is the knowledge that shows up as "gut feeling" -- the veteran who says "this project is going to
slip" three months before it does. Capturing it requires understanding the signals the expert is reading: what specific observations trigger their prediction, even if they're not consciously aware of all of them.
Relational knowledge -- who, what, and how the network functions. Who to contact for what. Which vendor will actually deliver versus which one will overpromise. How decisions really get made. This is the shadow organizational knowledge that makes people effective, and it's often the hardest to capture because it involves judgments about people -- which feels uncomfortable to formalize.
Contextual knowledge -- why things are the way they are here. The 42 Nm on Station 7. The client who always requires three rounds of review not because the contract says so but because their VP of procurement uses the review process to demonstrate diligence to their board. The historical reasons behind current practices that, if lost, will lead someone to "optimize" a process that was designed that way for a reason nobody remembers
Build your taxonomy. Tag your gap map items with the relevant knowledge types. You now have a discovery document that tells you: what knowledge to capture, from whom, what type it is,
and how valuable it is. That’s more strategic clarity than most organizations have ever had about their knowledge assets.
Phase 2: Capture
This is the phase where things get human. And messy. And political. And where most knowledge initiatives quietly die.
I’m not going to pretend otherwise: knowledge capture from human experts is fundamentally a social process. It requires trust, skill, time, and — more than anything — a genuine respect for the people whose knowledge you’re extracting. I’ve seen knowledge capture programs fail not because the methodology was wrong but because the people running them treated experts like vending machines: insert question, receive answer, move on. That’s not how humans work. That’s not how knowledge works. The best knowledge — the stuff that’s most valuable and hardest to access — only comes out when the expert feels genuinely respected, genuinely understood, and genuinely safe.
That said, there’s a method to it. Let me walk you through it.
The Knowledge Engineering Interview: The core capture technique is the knowledge engineering interview -- a structured conversation designed to extract tacit knowledge and transform it into explicit,
documentable form. This is not a regular interview. It's not journalism. It's not a survey. It's a specific discipline with specific techniques, and getting it right is the difference between capturing knowledge and capturing opinions.
A good knowledge engineering interview has five characteristics.
It follows the work, not the org chart. You don’t ask an expert to describe their role. You ask them to walk you through specific situations — recent ones, ideally — where they made decisions, solved problems, or handled exceptions. “Tell me about the last time something unexpected happened on the production line” is a better opening than “What do you know about production line management?” The first question triggers episodic memory and produces specific, actionable knowledge. The second triggers self-presentation and produces generalities.
It uses probing, not leading. When the expert says “I just knew something was off,” you don’t accept that. You probe: “What specifically did you notice? Was it a sound? A reading on the display? Something about the timing?” Tacit knowledge often manifests as pattern recognition that the expert has never had to decompose into its constituent signals. Your job is to help them decompose it — respectfully, patiently, without making them feel like they’re being interrogated.
It captures the exceptions, not the rules. The standard process is documented. Nobody needs you to capture it again. What you need are the deviations — the moments where the expert overrides the standard approach, applies different judgment, considers additional factors. “When would you not follow the standard procedure?” is the most valuable question in the knowledge engineer’s toolkit. The answer to that question is the knowledge.
It triangulates. One expert’s knowledge is a perspective. Three experts’ knowledge — compared, reconciled, synthesized — is an asset. Never capture from a single source if you can avoid it. Capture from multiple experts on the same topic, then look for convergence (where they agree — that’s validated knowledge) and divergence (where they disagree — that’s either a gap in understanding or a contextual difference that’s itself valuable to document).
It’s iterative. The first interview gets you maybe 60% of what the expert knows about a topic. The second — conducted after you’ve structured the first interview’s output and can present it back to the expert for review — gets you another 25%. The third gets the last 15%, which is often the most valuable: the edge cases, the “oh, I forgot to mention” additions, the corrections to what they said the first time around. Budget for three passes. Two if you’re resource-constrained. Never one.
The AI-Assisted Capture Pipeline: Here's where the technology catches up with the human process, and where 2026 is genuinely different from any previous era of knowledge management.
LLMs are remarkably good at certain aspects of knowledge capture. Not at conducting the interview itself — that still requires a human knowledge engineer with domain awareness and interpersonal skill — but at everything that happens after the interview.
Transcription and structuring. The interview is recorded (with consent, obviously). An LLM transcribes it, identifies the key knowledge statements, separates them from the conversational scaffolding, and produces a structured first draft organized by your taxonomy categories. What used to take a knowledge engineer two days of post-processing now takes thirty minutes of review and correction.
Gap identification. The LLM analyzes the captured knowledge against your existing documentation and identifies gaps — places where the expert mentioned something that contradicts or extends the current documentation, places where the expert referenced situations that have no documentation at all, places where the captured knowledge is ambiguous or incomplete. This gives you the agenda for the second interview.
Consistency checking. When you’ve captured from multiple experts, the LLM cross-references their knowledge and flags inconsistencies. Expert A says the threshold is 85°C. Expert B says it’s 88°C. That’s not an error to smooth over — it’s a signal.
Maybe they’re referring to different equipment. Maybe one of them is working with outdated information. Maybe there’s a contextual factor that makes both answers correct under different conditions. The LLM identifies these divergences; the knowledge engineer resolves them.
Formalization. The captured knowledge — messy, narrative, full of implicit context — gets transformed into structured knowledge representations: decision trees, conditional rules, annotated procedures, context-tagged assertions. The LLM produces the first draft of these formalizations. The knowledge engineer and the original expert review and correct them.
This pipeline doesn’t replace the human. It amplifies the human. A skilled knowledge engineer using this pipeline can capture, process, and structure knowledge at roughly five times the rate of one working without it. Which means that the “it takes too long and costs too much” objection — which was legitimate in the 1990s — is now largely obsolete. The technology has caught up. The bottleneck isn’t processing capacity. It’s the willingness to start.
Managing the Politics: I promised I'd address this, and I won't sugarcoat it: the political dynamics of knowledge capture are the single most common reason these initiatives fail. Not the technology. Not the methodology. The politics.
Here’s the pattern. You identify an expert. You explain the initiative. They smile and say they’re happy to help. Then one of three things happens.
The minimizer. They participate, but they minimize. They give you the standard procedures, the textbook answers, the things that are already documented. The real knowledge — the edge cases, the workarounds, the hard-won judgment — stays behind a wall. Not because they’re being malicious. Because sharing that knowledge feels like giving away the thing that makes them valuable.
The hoarder. They explicitly resist. “You can’t capture what I do — it’s too complex.” “It takes years of experience — you can’t put it in a system.” Both statements may be partially true. But they’re also convenient defenses for someone who has correctly identified that their knowledge is their job security and has no interest in commoditizing it.
The over-sharer. They give you everything — and I mean everything. Every opinion, every grievance, every tangent about how things used to be better when they were in charge. The volume is high. The signal-to-noise ratio is low. And sorting the genuine knowledge from the personal narrative becomes its own project.
How do you handle these dynamics? There’s no magic. But there are principles.
Frame it as amplification, not extraction. “We’re not trying to replace you. We’re trying to make your knowledge available at scale. Right now, you can help three people a day with your
expertise. With this system, you can help three hundred.” That reframe is not spin — it’s true, and the best experts know it’s true because they’re already frustrated by being a bottleneck.
Create economic incentives. In organizations that are moving toward knowledge monetization, the experts whose knowledge is captured become revenue contributors — their knowledge generates measurable value, and they can be compensated accordingly. In organizations that aren’t there yet, tie knowledge contribution to performance reviews, bonuses, or recognition programs. Make it pay to share.
Protect the source. Experts who share knowledge need to know they’re not making themselves redundant. The most effective approach I’ve seen: make the expert the validator of the knowledge system. They capture their knowledge, and then they become the ongoing quality authority — the person who reviews, updates, and certifies that the system’s knowledge remains accurate. That’s a role, not a redundancy. It actually increases their value to the organization.
Start with the willing. In every organization, there are experts who want to share — who are frustrated that their knowledge dies when they leave, who take pride in teaching, who see the broader mission. Start with them. Build the first success stories. Let the results speak. The reluctant ones will come around when they see their peers receiving recognition, resources, and — if you’ve set up the incentives right — compensation.
Phase 3: Structure
Raw captured knowledge — even when it’s been transcribed, organized, and validated — is not yet usable by AI systems. It’s a pile of refined ore. The structuring phase is where you turn it into something an agent can actually call.
This is the most technical phase, and I’ll try to keep it accessible without dumbing it down, because the decisions you make here have enormous downstream impact on how well your knowledge system performs.
From Narrative to Knowledge Graph: The fundamental challenge of structuring is this: human knowledge is narrative. It's stories, anecdotes, conditional reasoning, context-dependent judgment. AI systems -- particularly agents -- need structured, queryable, machine-readable knowledge. Bridging that gap is the core engineering challenge.
The most effective structure for enterprise knowledge in 2026 is the knowledge graph — a network of entities (things, concepts, people, processes) connected by relationships (causes, requires, contradicts, applies-to, depends-on) and annotated with properties (conditions, confidence levels, sources, expiration dates).
The torque example, in a knowledge graph, looks something like this:
Entity: Station 7 Mounting Bolts Relationship: hasspecification → 45 Nm (source: manufacturer, document: MNT-2019-047) Relationship: has-operational-override → 42 Nm Override conditions: thermal expansion from heat exchanger proximity on Line 3 Validation: Engineering team, July 2019 Evidence: Incident reports IR-2019-114, -115, -116 Scope: Station 7 only (other stations use manufacturer spec) Confidence: High (validated, no failures since implementation) Last reviewed: March 2025
That’s one piece of knowledge. A well-structured enterprise knowledge graph might contain tens of thousands to hundreds of thousands of such interconnected pieces — and the power comes not from any individual piece but from the connections between them. An agent querying this graph doesn’t just get the torque value. It can traverse the relationships to understand why the override exists, what evidence supports it, when it was last validated, and what other stations might be affected by similar conditions. That’s not search. That’s reasoning infrastructure.
The Evolution: From RAG to Knowledge Runtimes: Most organizations that are doing anything with knowledge and AI in 2026 are using Retrieval-Augmented Generation -- RAG. The basic RAG architecture is straightforward: user asks a question, the system retrieves relevant documents from a vector store, the language model reads the retrieved documents and generates an answer. It works. For information retrieval, it works remarkably well.
But RAG has a ceiling, and the ceiling is this: RAG retrieves documents. It doesn’t retrieve knowledge. The distinction matters because a document is a container — it might hold relevant information alongside irrelevant information, it might contain knowledge that requires contextual interpretation the retrieval system doesn’t provide, and it might miss connections to other documents that change the meaning of what’s retrieved.
The 2026 evolution — and this is where the field is moving fast — is what I think of as Knowledge Runtimes. A knowledge runtime is an orchestration layer that sits between the AI agent and the underlying knowledge stores, and it does several things that basic RAG doesn’t.
Multi-source synthesis. Instead of retrieving from a single vector store, a knowledge runtime queries across multiple knowledge sources -- the knowledge graph, document stores, structured databases, real-time sensor data, external APIs -- and synthesizes a unified answer. The torque question doesn't just hit the knowledge graph. It also checks the maintenance log, the current equipment status, and the environmental conditions -- and integrates all of that into a contextual response.
Confidence scoring. A knowledge runtime attaches confidence levels to its outputs. "42 Nm, confidence: high, based on validated engineering override and 5 years of incident-free operation" versus "approximately 40-43 Nm, confidence: medium, based on
partial documentation and one expert interview." The agent -- or the human using the agent -- can make decisions about how much to trust the answer.
Provenance tracking. Every piece of knowledge in the response is traceable back to its source: which expert said it, when it was captured, how it was validated, when it was last reviewed. This is not a nice-to-have. In regulated industries, it's a compliance requirement. In all industries, it's an integrity requirement -- because AI systems that confidently present unreliable knowledge are worse than AI systems that admit uncertainty.
Governance and access control. Not all knowledge should be available to all agents. Competitive pricing knowledge shouldn't be accessible to a customer-facing agent. HR-sensitive knowledge shouldn't be accessible to a general operations agent. A knowledge runtime enforces access policies -- which knowledge can be accessed by which agents under which conditions -- in a way that basic RAG simply can't.
Continuous updating. Knowledge changes. The 42 Nm override might be revised when the heat exchanger on Line 3 gets replaced. A knowledge runtime handles updates without requiring full reprocessing -- individual knowledge elements can be added, modified,
or deprecated while the rest of the system continues to operate.
Building a knowledge runtime is not a weekend project. It’s a significant engineering investment. But it’s also the infrastructure that separates “we have some AI tools” from “we have a knowledge-powered enterprise.” And the organizations making this investment now — in 2026 — are building the operational backbone that will make their AI systems genuinely, measurably, durably superior.
Quality Over Quantity: The Curation Discipline: I want to end this section with something that goes against the instincts of most technology executives: resist the urge to capture everything.
The value of a knowledge system is not proportional to its size. It’s proportional to its accuracy, its relevance, and its maintainability. A knowledge base with 500 high-quality, validated, well-structured knowledge elements will outperform one with 50,000 unvalidated, poorly structured ones — because the AI system using the smaller base will generate consistently reliable outputs, while the one using the larger base will generate a mixture of excellent and terrible outputs with no way for the user to tell which is which.
This is counterintuitive in an era that has conditioned us to believe that more data is always better. It isn’t. More knowledge is better — but only if it meets three criteria.
Validated. The knowledge has been confirmed by multiple sources, tested against reality, or formally reviewed by a domain authority. Unvalidated knowledge in a system is not an asset -- it's a liability. It will generate confident, wrong outputs. Which is worse than no output at all, because at least "I don't know" is honest.
Maintained. Knowledge has a shelf life. The 42 Nm override is valid today. After the Line 3 heat exchanger replacement, it may not be. Knowledge that isn't reviewed on a defined cadence degrades into misinformation -- and misinformation in an AI system is uniquely dangerous because the system delivers it with the same confidence as accurate knowledge.
Scoped. Every piece of knowledge should have clear applicability boundaries. The 42 Nm override applies to Station 7, on Line 3, under current equipment configuration. If someone generalizes it to "all stations should use 42 Nm," they've turned knowledge into misinformation. Scope -- the explicit statement of where a piece of knowledge does and doesn't apply -- is as important as the knowledge itself.
Start small. Capture the highest-impact knowledge first — the stuff your gap map identified as most economically valuable. Validate it rigorously. Structure it carefully. Deploy it, measure
the results, and expand based on demonstrated value.
Five hundred validated insights, carefully curated, will change how your organization operates. Fifty thousand unvetted entries will create a system nobody trusts and nobody uses.
I’ve seen both. The small, curated approach wins every time. Not because ambition is bad — it isn’t — but because trust is built incrementally, and a knowledge system that produces one confidently wrong answer loses more credibility than a hundred right answers earn.
Bringing It Together: The Flywheel
The three phases — Discover, Capture, Structure — are not a one-time project. They’re a continuous cycle.
You discover knowledge gaps. You capture the knowledge to close them. You structure it for AI consumption. The AI systems use it, produce better outcomes, and — in the process — generate new data about where the remaining gaps are. Which feeds back into discovery. Which drives more capture. Which produces more structured knowledge. The cycle accelerates.
The organizations that understand this — that build knowledge engineering as an ongoing capability rather than a one-time initiative — are the ones that will compound their advantage
year over year. Because the flywheel doesn’t just maintain itself. It generates momentum. Each rotation produces better AI, which produces better outcomes, which generates better data about what knowledge is still missing, which drives more targeted capture.
Building this flywheel is the strategic priority. Not selecting the right AI vendor. Not choosing the right model. Not optimizing your prompt engineering. Those are implementation details. The flywheel — the continuous cycle of discovering, capturing, and structuring proprietary knowledge — is the machine that produces competitive advantage.
Everything else is a feature. This is the architecture.
You’ve captured the knowledge. You’ve structured it. It sits in a system — a knowledge graph, a runtime, whatever architecture you chose. Your internal AI systems are using it, and the results are measurably better than what you had before.
Now what?
For most organizations, the answer is: nothing. They built the knowledge base for internal use, they’re seeing returns on that investment, and they move on. And that’s fine. That alone justifies the effort. But it’s also leaving an enormous amount of value on the table — because the same knowledge that makes your internal AI better can, in many cases, make other people’s AI better too. And those other people will pay for it.
This chapter is about what happens when knowledge becomes a product. Not a byproduct of operations, not a side effect of doing business, but a deliberate, priced, tradeable asset with its own value chain, its own market infrastructure, and its own economics.
I realize this sounds abstract. It won’t be by the end of the chapter.
The Economics of Knowledge as a Product
Let’s start with a basic economic observation that most people in enterprise AI haven’t fully internalized yet.
In the Information Age, the dominant business model for content was advertising. You created content, you made it freely available, and you monetized the attention it attracted. This model worked for information — because information, once published, is non-rivalrous (my reading it doesn’t prevent your reading it) and non-excludable (once it’s on the internet, anyone can access it). Those two properties — non-rivalry and non-excludability — are the defining characteristics of a public good. And public goods are notoriously hard to monetize directly, which is why the entire internet economy defaulted to advertising.
Knowledge — the kind we’ve been discussing — has different economic properties. It is non-rivalrous in the same sense (my using it doesn’t diminish it), but it is excludable. If the 42 Nm override exists only in your structured knowledge base and not on the public internet, you can control who accesses it. That combination — non-rival but excludable — makes knowledge a club good in economic terms. And club goods have a very different monetization model: you charge for access.
This is not theoretical. It’s already happening. The licensing deals I described in Chapter 3 — News Corp, Reddit, academic publishers — are the first wave. But those deals are about information licensing. They’re selling access to large content libraries to AI labs for training purposes. That’s version 1.0 of the knowledge market. Necessary, but crude.
Version 2.0 is different. Version 2.0 is about selling structured, validated, domain-specific knowledge to AI agents in real time. Not bulk data licensing. Not training data sales. Transactional, on-demand, contextual knowledge delivery — where every API call represents a specific piece of knowledge being consumed by a specific agent for a specific purpose, and the knowledge provider gets compensated for that consumption.
That’s a fundamentally different market. And the infrastructure for it is being built right now.
What Agent-Callable Knowledge Actually Looks Like as a Product Let me make this concrete, because “knowledge as a product” is the kind of phrase that sounds compelling in a strategy deck and means nothing until someone shows you the mechanics.
Imagine a materials engineering firm — let’s call them Schmidt & Partners, because this is a hypothetical — that has spent forty years developing expertise in high-temperature alloy behavior. Their engineers know things about how specific alloys perform
under specific conditions that no public database contains, because the knowledge was developed through proprietary testing, proprietary failure analysis, and decades of accumulated shop floor experience.
Traditionally, Schmidt & Partners monetized this knowledge one way: consulting engagements. A client calls, describes a problem, an engineer spends two weeks analyzing it, delivers a report, sends an invoice. High-margin work, but inherently limited by the number of engineers and the hours in a day.
Now imagine Schmidt & Partners structures their accumulated knowledge into an agent-callable knowledge base. Not all of it — competitive secrets stay internal — but the subset that represents general high-temperature alloy expertise: material selection guidelines, failure mode databases, testing protocol recommendations, performance prediction models enriched with forty years of empirical data.
An AI agent working for an aerospace manufacturer encounters a materials selection problem. The agent queries available knowledge sources. It finds generic alloy data in public databases — commodity information, everyone has it. It also discovers Schmidt & Partners’ knowledge service, which offers domain-specific expertise that goes far beyond the generic data: specific performance predictions under specific conditions, informed by proprietary testing history, annotated with confidence levels and known edge cases.
The agent — or the system deploying the agent — decides the premium knowledge is worth the price for this decision. It executes a knowledge transaction. Schmidt & Partners’ knowledge base delivers a contextualized response. The aerospace manufacturer gets a materially better answer. Schmidt & Partners gets paid.
That transaction took three seconds. No human at Schmidt & Partners was involved. The knowledge — accumulated over forty years — generated revenue while the engineers who built it were asleep.
Now multiply this across every domain of human expertise. Legal precedent analysis. Agricultural soil science. Industrial process optimization. Medical diagnostics. Supply chain risk assessment. Architectural engineering. Pharmaceutical formulation. Every field where deep expertise exists and where AI agents are being deployed to make decisions.
That’s the knowledge marketplace.
The Value Chain
Every market needs a value chain — a clear path from raw material to finished product to end consumer. The knowledge marketplace is no different, and understanding its value chain is essential for anyone who wants to participate in it, whether as a knowledge provider, a platform operator, or a consumer.
Knowledge holders are the upstream source. They're the individuals, teams, organizations, communities, and institutions that possess domain-specific knowledge. The metallurgist. The law firm. The research university. The indigenous community with ecological knowledge. The monastery with uncatalogued manuscripts. Anyone who knows something that AI systems would benefit from knowing.
Knowledge engineers are the processors. They take raw tacit knowledge and transform it into structured, validated, AI-consumable form. This is the capture and structuring work described in Chapter 6, and it's where most of the human skill and effort concentrates. Knowledge engineering is an emerging profession -- it barely existed three years ago, and by 2028, I expect it to be one of the fastest-growing specialized roles in the economy.
Knowledge platforms are the infrastructure. They host structured knowledge, manage access control, handle transactions, ensure provenance tracking, and provide the APIs through which agents discover and consume knowledge. Think of them as the marketplaces -- the places where knowledge supply meets knowledge demand. Multiple platforms will emerge, just as multiple e-commerce platforms coexist. Some will be general-purpose. Others will specialize in specific domains -- healthcare knowledge, engineering knowledge, legal knowledge.
Agent operators are the downstream consumers. They're the companies deploying AI agents that need domain-specific knowledge to function effectively. Every enterprise running agentic AI is a potential knowledge consumer. And the volume of consumption will be enormous, because agents don't query a knowledge base once a day -- they query it hundreds or thousands of times per hour.
End users are the ultimate beneficiaries -- the humans whose decisions are improved, whose problems are solved faster, whose outcomes are better because the AI agent had access to genuine knowledge rather than generic information.
The value chain is clear. The economics are compelling. What’s missing — and what’s being built — is the marketplace infrastructure that connects these participants efficiently.
Monetization Models
Not all knowledge monetizes the same way. The right model depends on what kind of knowledge you have, who needs it, and how it’s consumed. Here are the models I see working — not in theory, but in the emerging practice of 2026.
Knowledge-as-a-Service (Kaa S). Subscription-based access to a curated knowledge domain. An agricultural technology company subscribes to a soil science knowledge service. Their agents have continuous access to structured, validated knowledge about soil conditions, crop interactions, amendment protocols -- updated regularly, sourced from agricultural research institutions and experienced practitioners. Pricing is typically per-agent or per-query-volume, with tiered access levels.
Transactional micropayments. Pay-per-query access, priced based on the specificity and value of the knowledge delivered. A generic query ("what's the melting point of titanium") costs nothing -- that's commodity information. A specific query ("optimal welding parameters for Ti-6Al-4V at 3mm gauge in a humid marine environment, accounting for known fatigue behavior under cyclic loading") accesses premium knowledge and costs accordingly. The pricing mechanism needs to be dynamic -- reflecting the value and rarity of the knowledge, not just the computational cost of delivering it.
Revenue sharing. The knowledge provider receives a percentage of the economic value their knowledge creates for the consumer. This is harder to implement -- it requires measuring downstream impact -- but it's the most aligned model, because it means the knowledge provider benefits when their knowledge is
genuinely useful and doesn't benefit when it isn't. Some early platforms are experimenting with proxy metrics: conversion rates for sales knowledge, error reduction rates for operational knowledge, diagnostic accuracy for medical knowledge.
Licensing and syndication. Bulk access agreements, similar to the current content licensing deals but for structured knowledge rather than raw content. A large enterprise might license an entire domain knowledge base for a fixed annual fee, with the right to deploy it across all their internal agents. This model works for knowledge that has broad applicability and stable value.
Hybrid models. Most knowledge providers will end up with a combination -- a base subscription for standard access, transactional pricing for premium queries, and revenue sharing for high-value applications. The market will sort out the optimal mix, as markets do.
The critical insight across all these models: knowledge that is structured, validated, and agentcallable commands dramatically higher prices than raw content or unstructured data. The structuring work described in Chapter 6 isn’t just an
operational requirement — it’s a value-creation step. Raw expertise has potential value. Structured, agentcallable knowledge has realized value. The difference in pricing can be an order of magnitude or more.
Provenance, Trust, and the Quality Problem
Every marketplace faces a quality problem, and the knowledge marketplace faces it acutely — because the consequences of consuming bad knowledge are worse than the consequences of consuming bad information.
If a search engine returns a mediocre article, you waste two minutes reading it. If a knowledge service feeds an AI agent incorrect domain-specific guidance, the agent makes a bad decision — and bad decisions in manufacturing, healthcare, finance, or engineering have material consequences. The cost of bad knowledge isn’t annoyance. It’s liability. This means the knowledge marketplace needs quality infrastructure that goes far beyond what current content platforms provide.
Provenance tracking. Every piece of knowledge in a marketplace needs a traceable origin. Who generated it? How was it captured? What validation did it undergo? When was it last reviewed? This isn't just good practice -- it's the foundation of trust. An
agent consuming knowledge from a marketplace needs to assess reliability, and reliability assessment requires provenance data.
Validation frameworks. Knowledge in a marketplace should carry explicit validation status: expert-reviewed, peer-corroborated, empirically tested, algorithmically cross-referenced, unvalidated. The consumer decides their own risk tolerance. A customer service agent might accept lower-validation knowledge for routine queries. A medical diagnostic agent should require the highest level available. The marketplace surfaces validation data; the consumer applies judgment.
Reputation systems. Over time, knowledge providers build track records -- measured not by reviews or ratings but by downstream outcomes. Knowledge from Provider X leads to demonstrably better agent performance than knowledge from Provider Y. The marketplace captures these signals and surfaces them as quality indicators. This is knowledge-market reputation, and it will become the primary competitive differentiator between providers.
Governance and audit trails. Regulated industries -- and increasingly, all industries -- require auditability. When an AI agent makes a decision based on marketplace knowledge, the entire chain needs to
be auditable: what knowledge was consumed, from which provider, with what validation level, and how it influenced the decision. This audit infrastructure doesn't exist at scale yet. Building it is one of the most important near-term challenges.
Expiration and deprecation. Knowledge has a shelf life. The marketplace needs mechanisms to flag, deprecate, or remove knowledge that is no longer current. This is not a one-time quality check -- it's an ongoing discipline that requires active maintenance from knowledge providers and active monitoring from the platform.
Build this quality infrastructure, and you get a marketplace that people trust. Trust attracts participation. Participation creates network effects. Network effects create a flywheel. Skip it, and you get a marketplace that’s the knowledge equivalent of the SEO-polluted web — full of content, devoid of value.
Who Wins in the Knowledge Marketplace
Let me be direct about the competitive dynamics, because they’re not evenly distributed.
Domain experts with structured knowledge win. The metallurgist, the regulatory specialist, the agricultural scientist — anyone with deep, specific, hard-won expertise that can be captured and structured has a monetizable asset. For the first time, their knowledge has a market price that reflects its actual value, not just their hourly consulting rate. This is economically liberating for individuals and small firms that have always been knowledge-rich and market-poor.
Organizations that invest early in knowledge engineering win. The first mover advantage I described in Chapter 5 applies here with even more force. The organization that builds a high-quality knowledge base today can start generating marketplace revenue by 2027 or 2028 — revenue that funds further knowledge capture, which builds a larger and more valuable knowledge base, which generates more revenue. The flywheel is economic, not just operational.
Platform operators with strong quality infrastructure win. The marketplace platforms that solve the trust problem — that build the provenance, validation, and reputation systems — will attract the best knowledge providers and the most discerning consumers. Quality begets quality. This is a network effect that rewards the platform that takes quality most seriously, not the one that scales fastest.
Societies that value knowledge preservation win. And this is the bigger point — the one that connects back to the Library of Alexandria and the Prologue. Societies, cultures, and institutions that invest in preserving and structuring their accumulated knowledge — not just for commercial purposes, but because they understand that knowledge is civilizational infrastructure — will be the ones that benefit most from the
knowledge economy. The economic incentive is the engine. The civilizational benefit is the destination.
Who loses? Organizations that treated knowledge as a free input — that scraped without paying, trained without licensing, consumed without contributing. Organizations that invested in models and compute but not in knowledge. Organizations that assumed the AI would just figure it out without domainspecific expertise.
The market is repricing knowledge from “free” to “valuable.” Those who have it will be paid. Those who need it will pay. And those who ignored it will discover that the most expensive knowledge is the knowledge you don’t have when you need it.
I promised at various points in this book to tell you what to do on Monday morning. So here it is, for this chapter specifically.
If you’re a knowledge holder — an individual or an organization with domain-specific expertise — start thinking about which parts of your knowledge have external value. Not all of it does. Some of it is too context-specific to be useful outside your organization. Some of it is competitively sensitive and should stay proprietary. But some of it — the broadly applicable domain expertise, the validated best practices, the pattern
libraries, the diagnostic frameworks — has value to anyone in your field. That’s your potential marketplace product.
If you’re an enterprise deploying AI agents — start thinking about your knowledge supply chain. Where are your agents getting their domain-specific knowledge today? Is it just whatever was in the training data? Is it your own document repository via RAG? Or do you have access to structured, validated, curated knowledge from genuine domain experts? If it’s the first or second option, your agents are operating on commodity knowledge. Your competitors’ agents might not be for long.
If you’re building infrastructure — and I mean this in the broadest sense, from technology platforms to policy frameworks — start thinking about the marketplace mechanics. What does knowledge provenance look like at scale? How do you price knowledge dynamically based on specificity and value? How do you build reputation systems that reward quality? How do you handle knowledge deprecation? These are unsolved problems, and the people who solve them will build the rails that the knowledge economy runs on.
The marketplace is coming. The question isn’t whether — it’s whether you’ll be a participant or a bystander.
And in my experience, bystanders in emerging markets don’t get a second chance to enter on favorable terms.
Winners, Losers, and the Knowledge Divide
Part IV: The Bigger Picture
I’ve spent the last three chapters being pragmatic — methods, structures, marketplaces, monetization models. The business case for knowledge capture is solid. The mechanics are real. The opportunity is measurable.
Now I need to talk about the thing that actually keeps me up at night. Not whether the knowledge economy will happen — it will. But who benefits when it does.
Because the same dynamics that make knowledge the defining competitive advantage of the next decade also have the potential to create the most consequential inequality of the 21st century. And if we’re not honest about that — if we write it off as someone else’s problem or assume the market will sort it out — we’ll wake up in 2030 with an economic structure that makes the current wealth gap look quaint.
I promised in the Prologue that I’d be direct. Here’s direct: the knowledge economy will either be the most distributive economic shift since the internet, or the most concentrative one since industrialization. Both outcomes are structurally possible. Only one of them is desirable. And the window for influencing which one we get is closing faster than most people realize.
But before I lay out those scenarios, I need to say something about what we’re losing. Because the knowledge economy doesn’t emerge from nothing — it emerges from the ruins of something that was, for a brief and extraordinary moment, genuinely beautiful.
Requiem for the Golden Age
For most of human history, knowledge was hoarded.
This is not a metaphor. It’s a literal description of how knowledge worked for approximately five thousand years of recorded civilization. The Library of Alexandria was not public in any modern sense — it was a royal institution, access controlled by the Ptolemaic court. Medieval monasteries preserved knowledge but also gatekept it — literacy itself was a privilege, and the texts were in Latin, which was the linguistic equivalent of a paywall. Private libraries of the Renaissance were status symbols of the wealthy. The printing
press democratized information — books became affordable — but deep, specialized knowledge remained locked behind university walls, guild structures, and the economic barrier of years of unpaid apprenticeship.
Then came the internet. And for about twenty years — roughly 1998 to 2020, if I’m being generous — something unprecedented happened. Humans voluntarily shared their knowledge, freely, at scale, with strangers. Not all of it. Not the deepest proprietary secrets. But an astonishing amount. Stack Overflow. Wikipedia. Thousands of niche forums where an aerospace engineer would spend forty-five minutes writing a detailed answer to a stranger’s question about turbine blade metallurgy — for free, for the satisfaction of helping, for the dopamine hit of an upvote. Academic preprint servers. Opensource code repositories. Cooking blogs written by grandmothers who wanted their recipes to survive. Photography tutorials by professionals who just liked teaching.
No paywalls. No synthetic content. No algorithmic manipulation of what you saw. Just humans sharing what they knew with other humans who wanted to learn.
That was the golden age. And I need you to understand something about it: it was an anomaly. Not the natural state of things. Not the inevitable direction of progress. A brief, extraordinary, historically unprecedented window during which the economics of the internet — low distribution costs, social incentives for sharing, no viable monetization model for individual knowledge — accidentally created the closest thing to a universal knowledge commons that humanity has ever
achieved.
It’s over. Not because someone decided to end it. Because the structural conditions that enabled it no longer exist.
The synthetic content flood killed the signal-to-noise ratio. The platform economy captured the value of shared knowledge and redirected it to shareholders. The AI labs scraped the commons and used it to build products that compete with the very people who created it. Copyright enforcement — necessary, justified, but consequential — is re-enclosing the commons behind licensing agreements. And the humans who used to share freely are doing it less, because the experience of sharing knowledge on the internet in 2026 is fundamentally different from what it was in 2010. You post an answer, and three AI-generated rewrites of your answer appear within hours, stripped of your nuance, attributed to nobody, optimized for engagement metrics.
The incentive to share is gone. The golden age ran on goodwill, and the goodwill has been extracted.
Here’s a thought experiment that clarifies the scale of what we’ve lost. The last major print edition of the Encyclopaedia Britannica was published in 2010 — 32 volumes, roughly 44 million words. At the time, it felt like a relic. Wikipedia had already surpassed it in scope by orders of magnitude. Why would anyone want a physical encyclopedia when the internet had everything?
Well. The 2010 Britannica had one property that no digital knowledge source in 2026 can guarantee: every word in it was written by a human with verified expertise, reviewed by editors with domain knowledge, and published by an institution with a 244-year reputation to protect. No synthetic content. No SEO manipulation. No algorithmic distortion. No model collapse. Just vetted, human-generated knowledge, frozen in print.
If you printed an equivalent encyclopedia today — one that captured the genuine, verified, human-generated knowledge we’ve accumulated since 2010 — how thick would it be? Honestly? For the knowledge (not the information, not the synthetic content, not the recycled rewrites) — it might not be much thicker than the 2010 edition. Because the vast majority of what’s been added to the internet since then is information at best and noise at worst. The knowledge production didn’t scale with the content production. The content just got louder.
I don’t say this to romanticize the past. The golden age had real problems — misinformation, filter bubbles, the digital divide. But it had one thing that no amount of AI sophistication can replicate: a critical mass of humans voluntarily sharing genuine, experience-based knowledge in an open, accessible, non-commercial space. That commons is gone. And it’s not coming back, for the same reason you can’t un-ring a bell or un-scrape a dataset.
What comes next is a return to the historical norm — but with modern infrastructure. Knowledge, once again, will be gated. The wealthy — wealthy individuals, wealthy organizations, wealthy nations — will have access to premium, structured, agent-delivered knowledge. Everyone else will get the synthetic
equivalent: fluent, plausible, generically adequate, and fundamentally shallow. The startups get commodity knowledge from public models. The established players pay for exclusive insights from curated knowledge bases. The gap between what a well-funded AI knows and what a generic AI knows will widen, and that gap will map directly onto economic outcomes.
The rich will have their private libraries again. They’ll just be hosted in the cloud and delivered by agents instead of leather-bound and shelved in mahogany.
I don’t like this. I want to be clear about that. The humanist in me — the person who grew up believing the internet would democratize knowledge forever — finds it genuinely sad. But the analyst in me recognizes that it’s structural, not personal. The golden age wasn’t sustainable because it relied on economic conditions that no longer exist. Don’t hate the player. Hate the game. Or, better yet — understand the game well enough to change the rules where you can, and play it well where you can’t.
Which brings me to the question that actually matters for this chapter: given that knowledge is being re-enclosed, who ends up on which side of the wall?
Every economic era creates its own inequality. The Industrial Age divided the world into those who owned the means of production and those who sold their labor. The Information Age — despite its promise of democratization — divided the world into those who controlled attention (platforms, algorithms, data monopolies) and those who generated the content those platforms monetized. The pattern is remarkably consistent: each era’s defining resource starts out accessible and ends up concentrated.
The Knowledge Economy’s defining resource is proprietary knowledge. And the fault line it creates runs between those who have it, can structure it, and can monetize it — and those who don’t.
This fault line doesn’t follow the familiar divisions of the Information Age. It’s not simply rich versus poor, or developed versus developing, or tech-savvy versus tech-illiterate. It’s more specific and more insidious than that.
Between organizations, the divide will separate knowledge-rich enterprises from knowledge-poor ones. The company that spent three years building a structured knowledge base has AI systems that perform at a level the competitor simply cannot match -- not by buying better models, not by hiring better engineers, but only by investing the same three years in knowledge capture that they didn't start. Time is the moat. And time, unlike software, cannot be
licensed.
Between individuals, the divide will separate those whose knowledge has market value from those whose knowledge doesn't -- or, more precisely, those who have access to the infrastructure that makes their knowledge monetizable from those who don't. The retired metallurgist whose alloy expertise is worth a fortune in a knowledge marketplace is only worth that fortune if someone structures his knowledge, hosts it on a platform, and connects it to agents that need it. Without that infrastructure, his knowledge dies with him. Same knowledge, same value -- completely different outcomes depending on access.
Between societies, the divide will separate those that invest in knowledge infrastructure from those that don't. Countries and cultures with strong educational systems, robust research institutions, and deliberate knowledge preservation programs will compound their advantage. Those without will find themselves in the same position as countries that missed the industrial or digital revolutions -- not necessarily poor, but structurally dependent on others' knowledge exports.
Between generations, the divide will separate those who accumulated knowledge during the golden era of learning-by-doing from those who entered the workforce after AI began automating the very processes through which knowledge was traditionally acquired. A junior engineer in 2026 has fewer opportunities to make the kind of mistakes that build genuine expertise -- because the AI catches them first. Efficient? Yes. But it also means the pipeline of future knowledge holders is narrowing. We're consuming the current stock of human knowledge faster than we're replenishing it.
That last point is the one that disturbs me most. The knowledge economy depends on human knowledge. Human knowledge depends on human experience. If AI removes the experiences through which humans develop knowledge, we’re sawing off the branch we’re sitting on. Not today. Not obviously. But with a slow, structural inevitability that will be visible only in retrospect — which is, historically, when it’s too late.
The Concentration Scenario
Let me paint the dark picture, because I think it’s the default trajectory if nobody intervenes.
A small number of organizations — call them the knowledge oligarchs — move early. They’re large, well-funded, strategically aware. They run systematic knowledge capture programs across their operations. They build premium knowledge bases. They sell access through marketplace platforms they control.
They acquire smaller knowledge-rich companies not for their products but for their institutional knowledge. They hire away domain experts not to do work but to extract and monetize their expertise.
Within five years, these organizations control the bulk of structured, AI-consumable knowledge in their respective domains. Their AI systems — fed by the best knowledge — produce the best outputs. Their clients and customers get the best results. Their competitive advantage becomes self-reinforcing: better knowledge → better AI → better outcomes → more clients → more revenue → more investment in knowledge capture → better knowledge. The flywheel I described in Chapter 6, running at full speed — but only for them.
Everyone else gets the generic version. Same models, same compute, but commodity knowledge. Their AI is competent, but not excellent. Their outputs are adequate, but not differentiated. They compete on price, not quality — because quality, in the knowledge economy, is a function of the knowledge you have, and they don’t have enough.
Meanwhile, individual knowledge holders — the experts, the practitioners, the people who actually generated the knowledge that the oligarchs monetized — receive a fraction of the value. They got a consulting fee, maybe a bonus, maybe a “thank you” in the company newsletter. The ongoing revenue stream from their monetized knowledge flows to the organization, not to them. Which is exactly how the Industrial Age worked: the laborer was compensated for time, while the owner captured the value of production. Same structure, different era, same
outcome.
If this sounds familiar, it should. It’s the Information Age playbook, replicated. Users created content. Platforms monetized it. The value flowed upward. The knowledge economy, in its worst version, does the same thing with a more valuable input.
I don’t think this is inevitable. But I think it’s the path of least resistance. Markets, left to their own devices, tend toward concentration. And knowledge markets have a particularly strong concentrative tendency because of the compounding dynamics I’ve described: early movers build advantages that are structurally unreachable by late entrants. Without deliberate intervention — by organizations, by policymakers, by the people building the marketplace infrastructure — concentration is the default.
The Distribution Scenario
Now the other picture. The one I’m working toward.
In the distribution scenario, the knowledge economy is structured so that value flows to knowledge holders — not just to the platforms and organizations that aggregate and monetize knowledge, but to the individuals and communities that generated it. This requires four things that don’t exist at scale yet but are buildable.
Individual knowledge sovereignty. Mechanisms that allow individuals to own, control, and monetize their own knowledge. When the retired metallurgist contributes his alloy expertise to a marketplace, the revenue from that knowledge flows to him — not exclusively to whatever company employed him when the knowledge was developed. This is a legal and economic innovation, not a technical one. The technology for micropayment distribution is trivial. The frameworks for individual knowledge ownership are not. They need to be built.
The precedent, imperfect but instructive, is creative commons licensing and musician royalty systems. The principle is the same: the creator retains rights. The implementation for knowledge is harder, because knowledge is more diffuse and harder to attribute than a song. But the principle is sound, and the infrastructure to support it is within reach.
Accessible knowledge engineering. If structuring knowledge requires an expensive knowledge engineering team, then only well-funded organizations will be able to participate in the marketplace. The distribution scenario requires tools that make knowledge structuring accessible to individuals and small organizations — AI-assisted capture tools, standardized knowledge formats, template-based structuring frameworks that don’t require a Ph D in ontology engineering to use.
Some of this is already emerging. LLM-powered interview tools that can guide an expert through a structured knowledge extraction process. Template libraries for common knowledge types. Automated validation frameworks that cross-reference captured knowledge against existing knowledge bases. The
tools are getting better fast. What’s needed is deliberate investment in making them available broadly, not just to enterprise clients.
Open knowledge infrastructure. Not all knowledge should be proprietary. Just as open-source software created an enormous shared resource that everyone builds on, open knowledge initiatives can create shared knowledge commons that prevent the worst concentrative dynamics. Baseline domain knowledge — the kind that’s currently in textbooks and public databases — should remain open and freely accessible. Proprietary knowledge — the kind that creates competitive advantage — can be monetized. But the floor — the minimum level of knowledge available to everyone — needs to be protected.
This is where the Library of Alexandria argument from Chapter 2 becomes policy-relevant. Investing in knowledge preservation and open knowledge commons is not charity. It’s infrastructure. It raises the baseline for everyone, prevents monopoly over fundamental domain knowledge, and ensures that the knowledge economy has a healthy competitive ecosystem rather than a few dominant players controlling the essential inputs.
Fair value distribution. Marketplace platforms need to be designed — from the ground up — with value distribution as a core principle, not an afterthought. This means transparent pricing. It means revenue sharing that genuinely reflects the contribution of knowledge holders. It means governance structures that give knowledge providers a voice in how the marketplace operates. It means resisting the temptation to
follow the platform economics playbook of “attract participants, capture value, extract rents.” I realize I’m describing a market that doesn’t exist yet. I realize that the forces of concentration are strong and the incentives for extraction are real. I’m not naive about this. But I’ve also seen what happens when markets are designed with distribution in mind from the beginning versus when distribution is retrofitted after concentration has already occurred. The first approach works. The second is window dressing.
Let me zoom in from the macro picture to the individual, because this is where the knowledge economy either delivers on its promise or fails.
In the Information Age, individuals were content creators. They blogged, they posted, they contributed to forums. The value of that content was captured primarily by platforms. The individual got engagement, maybe some ad revenue, occasionally a career boost — but the economic value of their contributions flowed overwhelmingly to the aggregators. We accepted this because the alternative — a paywalled, fragmented internet — seemed worse.
In the Knowledge Economy, individuals are knowledge holders. And the question is whether the pattern repeats — whether knowledge flows from individuals to organizations to platforms, with the individual getting the smallest share — or
whether this time, the economic structure can be designed differently.
I believe it can. Not because I’m an optimist — I’m not, particularly — but because the economics are different in one crucial way. Information was abundant and generic. Your blog post was competing with a million other blog posts. Knowledge is scarce and specific. The metallurgist’s alloy expertise is not competing with a million other metallurgists. It might be competing with ten. Or three. Or nobody. Scarcity creates pricing power. Pricing power creates the economic basis for individual participation.
A surgeon with twenty years of diagnostic expertise in a specific subspecialty. A farmer with multi-generational knowledge of drought-resistant crop varieties. A logistics manager who has optimized cold-chain distribution across three continents. A linguist who speaks one of the last fluent dialects of a dying language. Each of these people holds knowledge that is genuinely scarce, genuinely valuable, and — in the Knowledge Economy — genuinely monetizable.
The question is access. Can these individuals get their knowledge structured, hosted, and connected to the agents that need it? If the answer is yes — if the infrastructure is accessible, if the tools are affordable, if the marketplace platforms distribute value fairly — then the Knowledge Economy becomes something remarkable: an economy that compensates people for what they know. Not for their time. Not for their labor. For their accumulated understanding of how some specific part of the world works.
That’s not just an economic proposition. It’s a philosophical one. It says: your experience matters. What you’ve learned by doing — by failing, adjusting, succeeding — has value that persists beyond your employment, your retirement, your lifetime. It says: human knowledge is not a free input to be extracted and discarded. It’s an asset to be valued and compensated.
If that sounds like what I described in the Prologue — the Star Trek future where humans contribute what they’re uniquely good at — it should. Because this is the mechanism. Not a utopian aspiration. A market structure. One that happens to align human dignity with economic incentive. Which, in my experience, is the only kind of alignment that actually works at scale.
I wouldn’t respect your intelligence if I didn’t flag the hard problems. The knowledge economy raises ethical questions that don’t have easy answers, and pretending otherwise would be dishonest.
Who owns institutional knowledge? When an expert develops knowledge while employed at an organization, who owns it? The individual whose brain generated it? The organization whose resources enabled it? Current
employment law mostly says the organization. But that framework was built for patents and trade secrets, not for the kind of tacit, experiential knowledge we're discussing. A new legal framework is needed -- and it will be contested, expensive, and slow to develop.
What about knowledge that shouldn't be monetized? Medical knowledge. Safety-critical engineering knowledge. Knowledge about environmental hazards. There are domains where monetizing knowledge -- putting it behind a paywall, making it available only to those who can afford it -- creates outcomes that are ethically indefensible. The knowledge economy needs mechanisms to ensure that knowledge with public safety implications remains accessible. Open knowledge commons, mandatory disclosure requirements, or public-interest exceptions to knowledge ownership -- the tools exist. The political will to implement them is the variable.
What about knowledge theft? If knowledge is an asset, it can be stolen. An AI system that's smart enough to extract knowledge from a marketplace without paying for it. A competitor that reverse-engineers your structured knowledge from your AI system's outputs. A disgruntled employee who copies the knowledge base on their way out. Knowledge security is an emerging field with emerging challenges, and the solutions are not yet mature.
What about the knowledge that resists monetization? Indigenous knowledge. Traditional ecological knowledge. Cultural knowledge that has spiritual or communal significance. The knowledge economy's tendency to price everything is in tension with knowledge traditions that view knowledge as communal, sacred, or inherently non-commercial. Navigating this tension -- respecting cultural values while enabling economic participation for those who want it -- requires sensitivity that markets don't naturally provide.
I don’t have clean answers to any of these. I don’t think anyone does yet. But I think the right approach is to build the marketplace with these questions built into its architecture — not as problems to be solved later, but as design constraints from the beginning. Markets that ignore ethical constraints don’t avoid ethical problems. They just discover them at scale, when they’re much harder to fix.
The Choice We’re Making I started this chapter by saying the knowledge economy will either distribute or concentrate. I’ve described both scenarios. I’ve flagged the hard problems. And I’ve been honest about the fact that the default — without deliberate intervention — tends
toward concentration.
So what determines the outcome?
Not technology. The technology enables both scenarios equally. Not market forces alone — markets are tools, and tools produce whatever outcome they’re designed for. Not government regulation, though that helps at the margins.
What determines the outcome is the design decisions being made right now — by the people building the platforms, setting the standards, writing the contracts, and establishing the norms. The architecture of the knowledge marketplace is being designed today. And architecture, once established, is very hard to change. Ask anyone who’s tried to retrofit privacy into a system that was built without it.
If the architects prioritize extraction — maximum value capture for the platform, minimum distribution to knowledge holders — we get the concentration scenario. If they prioritize distribution — fair value sharing, individual sovereignty, open knowledge commons — we get the better version.
I know which version I’m building toward. But I’m one person. The outcome depends on whether enough other people — builders, investors, policymakers, and yes, the executives reading this book — make the same choice.
The Knowledge Economy is not just about competitive advantage. It’s about the kind of economy — and the kind of society — we want to live in.
I realize that’s a heavy sentence to end on for a business book. Good. It should be heavy. Because the decisions that seem like business decisions today are, in fact, civilizational ones. And civilizational decisions deserve more than a quarterly planning cycle.
Predictions are a fool’s game. I know this because I’ve made enough of them — some embarrassingly wrong, some uncomfortably right — to understand that specificity and confidence are inversely correlated with accuracy. The more precise the prediction, the more likely it is to be wrong in the details. The more vague, the more useless.
So let me try to thread the needle: specific enough to be actionable, honest enough about uncertainty to be credible. Here’s what I think happens between now and 2030 — not as prophecy, but as the most likely trajectory given the forces we’ve mapped in this book.
And then, because I respect your intelligence and my own credibility, I’ll spend the second half of this chapter trying as hard as I can to demolish my own argument.
2026–2027: The Foundation Phase
We’re in it. Right now. And the defining characteristic of this phase is that most organizations don’t know they’re in it.
The foundation phase is marked by a few things happening simultaneously. AI labs are hitting the training data wall — not catastrophically, not publicly, but in the quiet recalibration of roadmaps and the sudden strategic interest in “data partnerships” and “knowledge assets.” The licensing economy is operational but chaotic — deals are being struck, prices are being discovered, and nobody has a reliable framework for what knowledge is actually worth. Agentic AI is moving from demos to early production deployments, and the organizations deploying it are discovering — some of them for the first time — that agents without domain-specific knowledge are impressively useless at anything beyond generic tasks.
The organizations that will dominate the next decade are making three moves right now.
They’re running knowledge audits. Not the perfunctory kind — the honest kind. The kind where you actually map the gap between what your AI produces and what your best people know, and you put a dollar figure on the delta.
They’re starting knowledge capture programs. Not companywide, not boil-the-ocean — targeted at the highest-value gaps identified in the audit. The first fifty knowledge elements, captured from the top five experts, structured for agent consumption. Proof of concept. Measurable results. Ammunition
for the budget conversation.
And they’re thinking about knowledge as an asset class. Not just an operational input — a strategic asset with balance sheet implications. This is the hardest mental shift, because accounting standards don’t yet have a framework for valuing proprietary knowledge. But the organizations that start treating knowledge as an asset — protecting it, investing in it, measuring its contribution to outcomes — will be the ones that have something worth measuring when the standards catch up.
The foundation phase ends when knowledge capture stops being an innovation project and starts being an operational function. For the earliest movers, that transition happens by late 2027.
2027–2028: The Acceleration Phase
This is where things get interesting — and where the gap between early movers and everyone else becomes visible.
By 2027, the first generation of knowledge-augmented AI systems will have enough operational history to produce hard data. Not projections, not pilot results — full-scale production data showing the performance difference between AI with proprietary knowledge and AI without it. And those numbers will be compelling enough to trigger a rush.
I expect to see several things in this phase.
Knowledge engineering becomes a recognized profession. The people who can extract tacit knowledge from experts and structure it for AI consumption will be among the most soughtafter hires in the market. Universities won’t have caught up yet — the curriculum doesn’t exist — so the early knowledge engineers will come from adjacent fields: information science, UX research, journalism, consulting. Anyone who knows how to ask the right questions and structure the answers.
Knowledge marketplace infrastructure matures. The early, fragmented marketplace experiments of 2026 consolidate into a few serious platforms with real transaction volume. Standardized knowledge formats emerge — probably driven by a combination of industry consortia and de facto standards set by the platforms with the most traction. Pricing models stabilize. The first knowledge-specific financial instruments appear — knowledge futures, knowledge insurance, knowledgebacked securities. That last one sounds absurd until you remember that twenty years ago, “data-backed valuations” sounded absurd too.
The first knowledge acquisitions happen. Large organizations begin acquiring smaller companies not for their products, revenue, or team — but explicitly for their institutional knowledge bases. A manufacturing conglomerate buys a family-owned specialty firm because that firm’s forty years of accumulated process knowledge, now structured and agentcallable, is worth more than the revenue the firm generates. This will be reported as “acqui-hires” or “strategic acquisitions.”
What it actually is: knowledge is becoming a balance sheet item, and the M&A market is pricing it in.
The knowledge divide becomes empirically measurable. Industry analysts begin publishing benchmarks comparing AI system performance across organizations, and the correlation between knowledge base quality and AI performance becomes impossible to ignore. The Mc Kinseys and Gartners of the world publish reports with titles like “The Knowledge Advantage: Why AI Performance Diverges” — and suddenly, every board in every industry is asking their CTO: “What’s our knowledge strategy?”
If you haven’t started by this point, you’re not too late — but you’re late enough that catching up requires significantly more investment than starting would have cost two years earlier. The compounding advantage I described in Chapter 5 is now visible in the data, and the data is not encouraging for latecomers.
2028–2030: The Maturity Phase
By 2028, the knowledge economy stops being a thesis and starts being a reality that shows up in financial statements.
Knowledge appears on balance sheets. Not as goodwill, not as “intangible assets” in the vague accounting sense — as a specifically valued, specifically auditable asset class. Accounting
standards bodies will have spent two years debating this, and by 2028 or 2029, the first frameworks for knowledge asset valuation will be in use. The valuation methodology will be imperfect — how do you value knowledge that hasn’t been used yet? — but the principle will be established: structured, validated, AI-consumable knowledge is an asset with measurable economic value.
Knowledge-as-a-Service becomes a revenue line. Or- ganizations that invested early in knowledge capture find themselves sitting on knowledge bases that generate recurring revenue — not as a side effect, but as a deliberate business line. The materials engineering firm from Chapter 7 isn’t just a consulting company anymore — it’s a knowledge company that also does consulting. The revenue split starts shifting. By 2030, I expect knowledge services to represent 10-20% of revenue for the most advanced knowledge-rich organizations.
The agent economy reaches scale. Gartner’s prediction of 40% agent integration by end of 2026 will have proven optimistic or conservative — I’d bet on roughly accurate within a year or two — and by 2029, agent-mediated interactions will be the primary channel for a significant portion of enterprise operations. Every one of those agents is a knowledge consumer. The demand side of the knowledge economy is no longer speculative. It’s infrastructure.
The regulatory framework solidifies. By 2028, the legal questions that were contentious in 2025 — Does training on copyrighted data require licensing? Who owns institutionally generated knowledge? What are the disclosure requirements
for AI-consumed knowledge? — will have answers. Not perfect answers. Not globally harmonized answers. But answers that are stable enough for organizations to plan around. The era of ambiguity closes. The era of compliance begins.
The knowledge divide is structural. The gap between knowledge-rich and knowledge-poor organizations is now too wide to close through catch-up investment alone. The early movers have three to four years of compounded knowledge, three to four years of flywheel momentum, three to four years of marketplace revenue that funded further knowledge capture. The latecomers can buy marketplace access, but they can’t buy the proprietary, internally generated knowledge that gives the leaders their edge. That knowledge was built over time, through the specific experiences of specific organizations. It is, by this point, the defining competitive asset.
The Scenarios: Best, Base, and Worst
Let me consolidate the timeline into three scenarios, because strategic planning requires thinking in ranges, not points.
Best case. The knowledge economy develops with strong distribution mechanisms. Individual knowledge sovereignty is established early. Open knowledge commons are funded and maintained. Marketplace
platforms distribute value fairly. The knowledge divide is real but manageable -- like the digital divide, it exists but doesn't prevent meaningful participation. Human knowledge is valued, compensated, and preserved at scale. The golden age of the internet is gone, but what replaces it is something more sustainable: a knowledge economy that actually pays for what it consumes.
Base case. The knowledge economy develops with moderate concentration. A handful of large platforms dominate marketplace infrastructure. Early movers build significant advantages. Individual participation is possible but requires navigating complex infrastructure. Some knowledge commons exist but are underfunded. The divide is real and significant -- knowledge-rich organizations outperform knowledge-poor ones by a measurable margin -- but not catastrophic. Most organizations adapt, eventually, at higher cost than they would have paid if they'd started earlier.
Worst case. The knowledge economy develops with extreme concentration. Three to five organizations control the critical knowledge infrastructure. Knowledge holders -- individuals and small organizations -- have no viable alternative to accepting whatever terms the platforms offer. Open knowledge commons are neglected or deliberately undermined. The knowledge divide maps onto existing inequality lines and amplifies them. AI systems serve
premium knowledge to premium clients and generic knowledge to everyone else. The promise of the knowledge economy becomes, in practice, another mechanism for the concentration of economic power.
I think the base case is the most likely outcome. The best case requires deliberate, coordinated intervention that I’m not confident the relevant institutions will provide. The worst case requires a degree of market failure that regulators — chastened by the platform economy experience — are more likely to prevent this time around. But “most likely” is not “certain,” and the distance between the base case and the worst case is determined by decisions being made in the next two to three years.
Steel-Manning the Objections
Now — the part where I try to break my own argument. Because any thesis that can’t survive its strongest objections isn’t worth publishing.
I’ve collected these objections from conversations with people I respect — investors, AI researchers, enterprise executives, and a few professional skeptics whose job is to poke holes in arguments like mine. Here they are, stated as strongly as I can
make them, followed by my honest response.
Objection 1: "Synthetic data will solve the depletion problem." The strongest version of this argument: AI labs are getting increasingly good at generating synthetic training data. Techniques like self-play, constitutional AI, and RLHF allow models to improve without new human-generated data. The depletion of public data is real but irrelevant because the labs will route around it. Knowledge scarcity is a temporary bottleneck, not a structural constraint.
My response: Partly true, and I acknowledged this in Chapter 2. Synthetic data — carefully curated, combined with human data, verified through empirical pipelines — can improve model performance. The research on this is real. But synthetic data has a fundamental limitation that no amount of curation overcomes: it can only recombine existing knowledge. It cannot generate new knowledge from new experience. An AI trained on synthetic data about turbine blade metallurgy does not learn anything that wasn’t already in the human-generated data it was synthesized from. It might learn it more efficiently, more robustly, more consistently — but the ceiling is set by the human knowledge in the system.
The ICLR 2025 finding — that even tiny fractions of synthetic data can trigger collapse dynamics — is the empirical nail. Not a coffin nail, but a real constraint that limits how far
synthetic data alone can take you. My thesis doesn’t require that synthetic data fails completely. It requires that proprietary human knowledge remains the highest-value input. And that, I believe, is robust even in the most optimistic synthetic data scenarios.
Objection 2: "AGI will make knowledge capture irrelevant." The strongest version: Once we have artificial general in- telligence — systems that can genuinely reason, learn from experience, and generate new knowledge independently — the entire premise of your book collapses. Why capture human knowledge when the machine can generate its own? You’re writing a book about horses in 1905.
My response: Maybe. If AGI arrives in the next five years — and I mean genuine AGI, not the marketing version — then yes, much of this book becomes a historical artifact. But I’d make two points.
First, timeline. The most credible AGI researchers I know — the ones doing the actual work, not the ones on podcasts — give a wide range of estimates. Some say 2030. Some say 2040. Some say never, at least not through current approaches. If it’s 2030, the organizations that spent 2026–2030 capturing knowledge have a four-year competitive advantage and a structured knowledge base that’s valuable to AGI systems and to current systems. If it’s 2040, they have a fourteen-year
advantage. If it’s never, they have a permanent one. Knowledge capture is the hedge that pays off in every scenario.
Second, even AGI needs knowledge. An AGI system arriving in 2030 doesn’t arrive with knowledge about your specific or- ganization, your specific processes, your specific failure modes. It arrives with general intelligence and needs to be fed domainspecific knowledge to be useful in domain-specific contexts. The organizations that have that knowledge structured and ready will onboard AGI faster and more effectively than those scrambling to capture it after the fact.
I’m not betting against AGI. I’m betting that the organizations best positioned for AGI are the same ones best positioned for the pre-AGI world: the ones that treated their knowledge as an asset and invested accordingly.
Objection 3: "Approximate is good enough." The strongest version: For most business applications, you don’t need the 42 Nm precision. The 45 Nm answer from the generic AI is close enough. The marginal value of proprietary knowledge doesn’t justify the cost of capturing it. Your thesis overvalues precision in a world where “good enough” is, in fact, good enough.
My response: For some applications, this is correct. If you’re generating marketing copy, the generic AI is fine. If you’re summarizing meeting notes, the generic AI is fine. For any task
where “roughly right” produces acceptable outcomes, there’s no compelling case for knowledge investment.
But here’s the thing: the applications where “approximately right” is acceptable are exactly the applications that are being commoditized fastest. If anyone’s AI can do it at 80% quality, it’s not a competitive advantage — it’s table stakes. The value lives in the 20% of applications where precision matters, where context matters, where the difference between the generic answer and the right answer has measurable economic consequences. The $200,000 production line shutdown. The misdiagnosed patient. The failed regulatory audit. The lost client relationship.
As AI becomes more pervasive, the commodity tier gets bigger and the premium tier gets more valuable. The question isn’t whether “good enough” exists — it does. The question is whether you want your organization to compete in the tier where everyone’s AI produces the same output, or in the tier where yours produces something better.
Objection 4: "This is just 'data is the new oil' repackaged." The strongest version: You’ve rebranded a tired concept. Replace “data” with “knowledge” and you’re telling the same story that every tech evangelist has been telling for a decade. The hype cycle will play out the same way: overinvestment, disappointment, correction.
My response: I spent a full section in Chapter 1 explaining why “data is the new oil” was wrong — because it confused the raw material with the refined product. The knowledge economy thesis is not a rebrand. It’s a correction.
Data is abundant, commoditized, and decreasing in marginal value. Knowledge is scarce, differentiated, and increasing in value. Data is machine-generated at scale. Knowledge is humangenerated through experience. Data can be copied freely. Knowledge must be captured through deliberate, skill-intensive processes. The economics are fundamentally different.
But I take the objection seriously in one respect: the hype risk is real. The knowledge economy will be overhyped. Vendors will rebrand their document management platforms as “knowledge platforms.” Consultants will sell “knowledge transformation” programs that are just data governance with a new title. And some organizations will overspend on capture initiatives that don’t produce returns, because they captured the wrong knowledge, or captured it poorly, or failed to structure it for AI consumption.
The hype cycle doesn’t invalidate the thesis. It just means the thesis will be surrounded by noise, and the organizations that distinguish signal from noise — that invest in genuine knowledge engineering rather than rebranded information management — will be the ones that capture the real value.
Objection 5: "Privacy regulation will kill this." The strongest version: The GDPR, the AI Act, and their global equivalents will make knowledge capture and marketplace trading so legally fraught that the transaction costs will exceed the value. Every piece of captured knowledge potentially contains personal data, proprietary information, or regulated content. The compliance burden will strangle the knowledge economy in its cradle.
My response: This is the objection I take most seriously, because it’s the one with the most empirical support. Privacy regulation is real, it’s expanding, and it creates genuine friction.
But regulation didn’t kill e-commerce, despite the GDPR requiring consent for every cookie. It didn’t kill cloud computing, despite data sovereignty requirements. It didn’t kill the platform economy, despite antitrust scrutiny. Regulation shapes markets — it doesn’t eliminate them. The knowledge economy will be shaped by privacy regulation. It will be slower, more expensive, and more governance-heavy than it would be in a regulation-free environment. But it will exist, because the economic value of structured knowledge is too large for regulation to suppress entirely.
The organizations that treat regulation as a design constraint — building privacy into their knowledge capture processes from the beginning, anonymizing where necessary, obtaining consent where required, maintaining audit trails — will navigate it. The organizations that treat regulation as an afterthought will struggle. That’s not a failure of the thesis. That’s a competence filter.
The Timeline for Action
I’ll end this chapter where a strategist should: with a decision framework.
If you’re reading this in 2026 — and I’m writing it in 2026, so the timing works — you have roughly eighteen to twentyfour months before the acceleration phase makes early mover advantages visible and hard to match.
That’s enough time to run a knowledge audit. Enough time to capture the first hundred high-value knowledge elements from your top experts. Enough time to build a proof of concept that demonstrates measurable AI performance improvement. Enough time to establish knowledge engineering as an operational function rather than an innovation experiment.
It is not enough time to wait for someone else to go first, study what they did, and then follow. By the time the case studies are published and the best practices are codified, the first movers will have two years of compounding advantage. You’ll be reading about their success in a Gartner report while they’re already generating marketplace revenue from the knowledge base you haven’t started building.
The question isn’t whether the knowledge economy is coming. It’s whether you’ll be ready when it arrives.
And “ready” starts with a decision, not a strategy deck.
What Is a Life’s Work?
Epilogue
I started writing this book to make a business case. Proprietary knowledge as competitive advantage. The end of the Information Age. The economics of knowledge capture. Marketplace mechanics. The whole thing had a structure, a thesis, a target audience — C-level executives who need to understand why their AI strategy is missing the most important input.
That case is made. If you’ve read this far, you have the argument, the evidence, the method, and the timeline. You can close this book, walk into your next leadership meeting, and articulate why knowledge engineering is the strategic priority of the next decade. I hope you do.
But somewhere between the outline and the final chapter, something shifted. Not in the argument — the argument held up, got stronger, got validated by everything I researched. What shifted was what I realized the argument implies when you follow it past the boardroom and into the world.
WHAT IS A LIFE’S WORK?
And I can’t close this book without talking about it. Not because it will help you capture knowledge or build marketplace revenue. Because it might help you think about something that I believe — with a conviction I don’t usually allow myself — is the most important question of our time.
The question is this: In a future where AI can do almost everything better than a human, what defines the value of a human life’s work?
I don’t hear this question being asked. Not in the places where it should be asked loudest. I scroll through the Leitmedien — the major outlets, the opinion pages, the policy debates — and I see articles about AI taking jobs in customer service. About copyright disputes with AI-generated art. About whether Chat GPT should be allowed in schools. These are real issues. They are also, in the grand scheme of what’s happening, spectacularly beside the point.
They’re beside the point because they treat AI as a disruption within the existing system — a new tool that shifts some jobs, creates some legal questions, requires some policy adjustments. What they don’t address — what almost nobody is addressing, except in scattered Reddit threads and a few academic papers that nobody reads — is that AI doesn’t disrupt the system. It challenges the premise the system is built on.
The premise is this: human beings create economic value through their labor, their skills, and their knowledge. That value is compensated through wages. Wages enable participation in the economy. Participation in the economy is,
in modern societies, the primary mechanism through which people access housing, food, healthcare, education, dignity, and social standing. Work is not just an economic activity. It’s the organizing principle of adult life. It’s how most people answer the question “what do you do?” — which, in most cultures, is really the question “who are you?”
AI is about to make that premise obsolete. Not for everyone. Not overnight. But structurally, irreversibly, and much faster than the public conversation acknowledges.
Let me walk through this without flinching, because I think flinching is what everyone’s been doing.
The easy version of the argument is: AI will automate routine tasks, humans will move to higher-value work. We’ve heard this before. It’s the standard reassurance, and it has a comforting historical precedent — the Industrial Revolution automated manual labor, and humans moved to cognitive labor. The machines took the muscles. We kept the minds.
But this time, the machines are taking the minds.
Not all of them. Not yet. But the trajectory is clear. Today, AI handles information processing. Tomorrow — and tomorrow is closer than most people’s planning horizons — it handles analysis, synthesis, planning, diagnosis, design, and decisionmaking. The cognitive tasks that were supposed to be the safe harbor for human workers are being automated at a pace that leaves almost no time for the “transition to higher-value work” that the optimists keep promising.
WHAT IS A LIFE’S WORK?
So what’s left?
The usual answers: creativity, empathy, physical craftsmanship, ethical judgment, human connection. And yes — these are real. AI is not going to replace a master carpenter’s feel for wood grain. It’s not going to replace a therapist’s ability to sit with someone in their darkest moment. It’s not going to replace the human need for other humans — for touch, for presence, for the specific comfort of being understood by someone who has also suffered.
But here’s the question nobody asks, and it’s the one that keeps me staring at the ceiling: what percentage of the global population has the aptitude, the training, and the opportunity to make a living from creativity, empathy, craftsmanship, or philosophy?
In the developed world — maybe 20%. Maybe 30% if we’re generous and if the education systems adapt fast enough, which they won’t, because they never do. That leaves 70% of the population in wealthy nations facing a future where their current skills are automated and the “higher-value” alternatives require aptitudes or training that aren’t universally available.
And we’re only talking about the developed world. Think about the developing economies. Think about India, where the IT outsourcing industry employs millions of people doing exactly the kind of cognitive work that AI is learning to do. Think about the Philippines, where call centers are a major economic pillar. Think about Bangladesh, where the garment industry — already under pressure from automation — is the backbone
of the economy. Think about Sub-Saharan Africa, where the demographic dividend that economists have been promising depends on young populations finding productive employment in an economy that still needs human labor.
These are not countries of philosophers. They are countries of people who need jobs — real jobs, that pay real wages, that support real families. And the jobs they have, or the jobs they were counting on, are being automated at a speed that no retraining program, no policy intervention, no amount of “learn to code” optimism can match.
Will they become nations of artisans? Poets? Empathy workers? With what infrastructure? With what educational investment? With what cultural framework that says “your value as a human being is not defined by your economic productivity” — a framework that, by the way, doesn’t exist in any functioning economy on earth, including the ones that claim to believe it?
I don’t have an answer. I want to be honest about that. I’ve spent two decades in AI, I’ve thought about this more than most, and I don’t have an answer. What I have are questions that I think need to be asked by people with more power than I have — heads of state, institutional leaders, the people who write policy and allocate resources at scale.
Here’s what I do know, and it connects back to the thesis of WHAT IS A LIFE’S WORK?
this book in a way I didn’t fully appreciate when I started writing it.
Knowledge — human knowledge, the kind I’ve spent nine chapters arguing is the most valuable resource in the AI economy — is also the thing that might save us. Not in the business sense. In the civilizational sense.
If the Knowledge Economy develops along the lines I’ve de- scribed — if human knowledge is valued, compensated, preserved, and treated as a genuine asset — then we have, at least, a partial answer to the question of human value in an AI world. The answer would be: humans are the generators of knowledge. Not the processors of information — AI does that better. Not the executors of routine tasks — AI does that cheaper. But the originators of the experiential, contextual, judgment-laden understanding that comes from actually living in the world, making mistakes, learning, and building wisdom.
The surgeon who develops a new intuition for a rare condition by seeing her thousandth patient. The farmer who notices a pattern in soil behavior that no dataset contains because no sensor was measuring it. The social worker who understands something about a community’s dynamics that can’t be learned from data because it lives in relationships and trust. The musician who creates something that doesn’t exist yet and couldn’t exist without a human having felt something specific and choosing to express it.
That’s knowledge generation. And it’s the one thing that AI — even AGI, even the most optimistic projection of what artificial intelligence becomes — cannot do from first principles. Because it requires being a body in the world. It requires living. It requires the specific, messy, embodied, mortal experience of being human.
If we build an economy that values this — that compensates knowledge generation the way we currently compensate labor — then humans have a role that is not just dignified but essential. Not because we’re performing tasks that machines can’t do yet. Because we’re doing something that machines can’t do in principle: generating new knowledge from new experience.
That’s a big “if.” And it requires economic structures, policy frameworks, and social contracts that don’t exist yet. But it’s not impossible. It’s a design problem. And I’ve spent my career solving design problems.
I wrote in the Prologue that I grew up watching Star Trek. That the future I want is one where machines handle the work that humans only did because someone had to, and humans are free to do the things that make us human.
I still want that future. But I’m less certain now than I was when I started this book that we’ll get it by default. The technology enables both the utopia and the dystopia. The market, left alone, trends toward concentration. The political institutions that should be having this conversation are still debating whether AI- generated images violate copyright — which matters, but which is approximately as relevant to the real question as debating deck chair arrangements on a ship that’s changing course.
WHAT IS A LIFE’S WORK?
The real question is not “will AI take my job?” The real question is “what is a human life worth when machines can do most of what humans were paid to do?”
The answer to that question will be determined not by technology, but by the choices we make about how to use it. By the economic structures we build. By the policies we enact. By the values we decide to encode into the systems we’re creating right now.
This book has been, on its surface, about knowledge capture and competitive advantage. Underneath, it’s been about something larger: the argument that human knowledge — experiential, contextual, judgment-laden, irreducibly human — is the most valuable thing we have. Not just economically. Existentially.
If that sounds like too much for a business book — good. It should be too much for a business book. Because the questions we’re facing are too big for any single genre, any single discipline, any single framework to contain.
I started with a broken Google search. I’m ending with a question about the meaning of human contribution in a world that’s learning to do without us.
I don’t have the answer. Nobody does. But I know that the people reading this book — executives, strategists, builders, people with resources and reach and the ability to shape what comes next — are exactly the people who should be thinking about it.
Not next quarter. Now.
One last thing. A practical one, because I’m constitutionally incapable of ending entirely in the abstract.
If you take nothing else from this book, take this: go talk to the smartest person in your organization. The one everyone calls when something goes wrong. The one who’s been there twenty years and understands things nobody else understands. Ask them what they know that isn’t written down anywhere. Listen. Write it down. Structure it. Feed it to your AI.
That’s the Monday morning action. That’s where it starts. One conversation. One piece of knowledge captured. One step toward a system that values what humans know.
Everything else in this book — the market dynamics, the regulatory framework, the competitive strategy, the civilizational implications — grows from that one act: asking a human being what they know, and treating the answer as something worth preserving.
Because it is. It always was. We just forgot, for a while, in the noise.
Appendix A: The Monday Morning Knowledge Audit This appendix distills the methodology from Chapter 6 into an executable diagnostic. It’s designed to be completed in five working days by a small team — ideally a senior operations leader, someone from IT or data, and one person who knows where the actual expertise lives in your organization. No consultants required. No software purchase necessary. Just structured observation and honest answers.
Day 1: The Knowledge Gap Map
Objective: Identify the ten most consequential decisions your organization makes regularly, and determine how much of the knowledge behind those decisions is captured versus resident in people’s heads.
For each decision, document:
Decision description -- What is being decided, how often, and by whom?
AI capability today -- If you ran this decision through your best current AI system, what would the output quality be? Rate honestly: Unusable / Directionally Correct / Good Enough / Expert-Level.
Expert judgment delta -- What does the human expert add that the AI cannot? Be specific. "Experience" is not an answer. "Knowing that the tolerance spec on Station 7 was changed in 2019 after a batch failure and the documentation was never updated" -- that's an answer.
Cost of the gap -- What does it cost when this decision is made without the expert? Rework, delays, quality failures, customer impact, regulatory risk. Estimate in currency, not adjectives.
Capture difficulty -- How hard would it be to extract and structure this knowledge? Easy (the expert can articulate it), Medium (requires structured interviews and observation), Hard (deeply embodied, requires apprenticeship-style extraction).
Rank the ten decisions by cost-of-gap multiplied by inverse capture difficulty. The top three are your starting points.
Day 2: The Knowledge Holder Inventory
Objective: Identify the people in your organization whose departure would create irreplaceable knowledge gaps.
Three tests for identifying critical knowledge holders:
The Phone Test. When something goes wrong at 2 AM, who gets called? Not the person in the org chart. The person who actually gets the call. Write down every name that comes up across your top-ten decisions from Day 1.
The Exception Test. When the standard process doesn't work -- the edge case, the unusual configuration, the customer with the legacy system nobody else understands -- who handles it? These people are your knowledge holders.
The Retirement Test. For each name from the first two tests: if this person retired in six months, what would you lose? If the answer takes more than one sentence, they're critical.
For each identified knowledge holder, document: name, do- main, estimated years to replacement (how long would it take to train someone to their level), and single point of failure status (are they the only person with this knowledge? Yes/No).
Day 3: The Knowledge Taxonomy
Objective: Classify the knowledge you’ve identified into types, because different types require different capture methods.
Five knowledge types and their capture approaches:
Procedural -- How to do something. Steps, sequences, techniques. Capture method: structured interviews, process observation, video documentation. Difficulty: Low to Medium.
Diagnostic -- How to identify what's wrong. Pattern recognition, elimination logic, experienced intuition about failure modes. Capture method: case-based interviews ("walk me through the last five times you diagnosed X"), apprenticeship observation. Difficulty: Medium to High.
Predictive -- How to anticipate what will happen. Market intuition, material behavior under stress, customer patterns. Capture method: scenario-based interviews ("what would you expect if..."), decision journaling over time. Difficulty: High.
Relational -- Who to call, how to navigate organizations, where the real decision-making happens. Capture method: relationship mapping, contextual interviews about stakeholder dynamics. Difficulty: High, and politically sensitive.
Contextual -- Why things are the way they are. Historical decisions, failed experiments, the reasons behind seemingly arbitrary processes. Capture method: "why" interviews, institutional archaeology. Difficulty: Medium, but often the highest-value capture because it prevents repeating expensive mistakes.
Map each knowledge gap from Day 1 to these types. You’ll likely find that the highest-cost gaps cluster in diagnostic and predictive knowledge — which is exactly the knowledge that’s hardest to capture and most valuable to AI agents.
Day 4: The Readiness Assessment
Objective: Determine your organization’s actual capacity to begin knowledge capture.
Answer each question honestly. “Aspirationally yes” counts as no.
Infrastructure readiness: Do you have a system capable of storing structured knowledge beyond documents and databases -- meaning graph structures, semantic relationships, confidence levels, provenance metadata? If no, that's your first technical investment.
Talent readiness: Do you have anyone on staff who has conducted knowledge engineering interviews -- meaning structured, multi-pass extraction sessions designed to surface tacit knowledge? If no, you need to either train someone or hire for this. It's a specific skill, closer to investigative journalism than to data science.
Cultural readiness: When you tell your top knowledge holders "we want to capture what you know," will they respond with enthusiasm, suspicion, or indifference? Each response requires a different strategy. Enthusiasm: move fast. Suspicion: address incentives first (see the politics section in Chapter 6). Indifference: demonstrate value through quick wins.
Executive readiness: Is there a named executive who owns knowledge capture as a strategic priority -- meaning it's in their objectives, with budget, not a side project? If no, nothing else matters until this is fixed.
AI readiness: Do your current AI systems have the ability to consume structured knowledge at inference time -- meaning RAG, function calling, knowledge graph queries, or equivalent? If no, your capture efforts will produce assets that sit unused. Align the capture timeline with your AI architecture roadmap.
Day 5: The 90-Day Plan
Objective: Convert the diagnostic into action.
Based on the previous four days, define:
Pilot scope. Select one knowledge domain from your Day 1 top three -- ideally one with a willing knowledge holder (Day 2), primarily procedural or contextual knowledge (Day 3), and reasonable infrastructure readiness (Day 4). This is your proof of concept.
Team. Assign a knowledge engineer (even if they're learning on the job), a technical lead who can build the storage and retrieval infrastructure, and the knowledge holder(s). Keep it small. Three to five people.
Timeline. 90 days to first usable output. Not a report. Not a strategy document. A structured knowledge base that an AI agent can query, with measurable improvement in decision quality for the pilot domain.
Success metric. Define this before you start. "AI system performance on pilot domain decisions improves from [current rating] to [target rating] as measured by [specific evaluation method]." If you can't define the metric, you don't understand the problem well enough yet.
Budget. Honest estimate of person-hours, tool costs, and opportunity cost of the knowledge holders' time. For most organizations, a single-domain pilot runs $50K-$150K in fully loaded costs over 90 days. That's less than a single bad decision in most of the domains you identified on Day 1.
The Five-Question Knowledge Audit
For ongoing assessment after the initial diagnostic, ask these five questions quarterly:
What did our AI get wrong this quarter that a human expert would have gotten right? Every such instance is an unmapped knowledge gap. Track them. They're your roadmap.
Which knowledge holders are closer to departure than they were last quarter? Retirement timelines, job satisfaction, competitive offers. Your capture priority should track your attrition risk.
What new knowledge was generated this quarter that isn't captured anywhere? New processes, new customer insights, new failure modes. Knowledge capture isn't a one-time project. It's an ongoing function.
How many of our captured knowledge assets were actually consumed by AI systems this quarter? If the answer is "we captured a lot but nobody's using it" -- you have a delivery problem, not a capture problem.
What's our knowledge capture velocity -- and is it accelerating? Measure in structured knowledge units per month (however you define them). If this number isn't growing, your flywheel isn't spinning.
Appendix B: Thesis Validation Matrix
This book makes a series of empirical claims. Some are wellestablished, some are projections, and some are interpretive. I believe in showing my work. Below is every major factual claim in the book, its source, its verification status, and — where relevant — the conditions under which the claim might turn out to be wrong.
A thesis you can’t falsify isn’t a thesis. It’s a religion. I’d rather be wrong and know it than right by accident.
Data Depletion & Synthetic Data
Claim: Public text data for AI training will be exhausted between 2026 and 2032. Source: Villalobos, P. et al. "Will we run out of data? Limits of LLM scaling based on human-generated data." ar Xiv:2211.04325 (2024 update). Presented at ICML 2024, Vienna. Peer-reviewed. Status: Well-established. Epoch AI's original 2022 estimate was 2024 for high-quality data; 2024 revision moved the front end to 2026-2028 based on improved understanding of data quality filtering and multi-epoch training. The range reflects genuine uncertainty in what counts as "usable" data. Falsification condition: New, large-scale sources of high-quality human-generated text are discovered or created. Possible but unlikely at the scale required.
Claim: AI models trained recursively on synthetic data undergo "model collapse" -- progressive loss of distributional tails. Source: Shumailov, I. et al. "AI models collapse when trained on recursively generated data." Nature 631, –755759 (2024). Status: Well-established. Demonstrated across language models, vision models, and diffusion models. Replicated independently. Falsification condition: New training techniques that fully prevent tail erosion in recursive synthetic training without access to original human data. Some mitigation techniques exist (accumulation strategies, verifier-based selection), but none fully solve the problem at scale.
Claim: Even tiny fractions of synthetic data (as low as 1 in 1,000 samples) can trigger model collapse, and larger models may amplify rather than mitigate the effect. Source: Dohmatob, E. et al. "Strong Model Collapse." ICLR 2025 Spotlight paper. Published March 2025. Status: Established in supervised regression settings. The 1-in-1,000 finding is specific to scaling law dynamics -- it means that scaling up training data does not help if even a small constant fraction is synthetic. The amplification effect in larger models is demonstrated theoretically and empirically in random feature models and two-layer neural networks. Extrapolation to full-scale LLMs is directional, not proven. Falsification condition: Large-scale empirical studies showing that frontier LLMs are robust to small synthetic data fractions. Not yet published.
Regulatory & Legal
Claim: A U. S. federal court ruled in Thomson Reuters v. Ross Intelligence that using copyrighted content for AI training is not fair use when the use is commercial and directly competitive. Source: Thomson Reuters Enterprise Centre Gmb H v. Ross Intelligence Inc., No. 1:20-CV-613-SB (D. Del. Feb. 11, 2025). Judge Stephanos Bibas, Third Circuit, sitting by designation. Status: Established. Partial summary judgment for Thomson Reuters on direct infringement and fair use. Ross found to have infringed over 2,000 Westlaw headnotes. Fair use factors 1 and 4 favored Thomson
Reuters. Currently on interlocutory appeal to the Third Circuit (No. 25-2153, accepted June 17, 2025). Important nuance: This case involves a non-generative AI search tool that directly competed with Westlaw. Courts have not yet ruled on fair use for generative AI training. The ruling's applicability to foundation model training is disputed. Falsification condition: Third Circuit reversal, or subsequent rulings in generative AI cases (e.g., NYT v. Open AI) reaching opposite conclusions on fair use.
Claim: The U. S. Copyright Office published a report stating that arguments for AI training being "inherently transformative" are "mistaken." Source: U. S. Copyright Office, "Copyright and Artificial Intelligence, Part 2: Copyrighted Works in AI Training." 108-page report, released May 9, 2025. Status: Established. The report is nonbinding but influential. It rejects both the argument that AI training is inherently non-expressive and the analogy to human learning. It describes a spectrum: noncommercial research with no output reproduction is likely fair use; copying expressive works from pirated sources to generate competing content is likely not. Falsification condition: Congressional legislation preempting Copyright Office guidance, or courts explicitly rejecting the report's reasoning.
Claim: The EU AI Act requires disclosure of training data for general-purpose AI models. Source: Regulation (EU) 2024/1689 (EU AI Act). Entered into force August 1, 2024. GPAI transparency obligations applicable from August 2, 2025. Status: Established. Article 53 requires providers of GPAI models to "draw up and make publicly available a sufficiently detailed summary about the content used for training." Implementation details are being developed through codes of practice. Falsification condition: Significant weakening through codes of practice or enforcement failures. Possible.
Licensing Economy
Claim: News Corp signed a deal with Open AI worth over $250 million across five years. Source: Wall Street Journal, May 22, 2024. Confirmed by multiple outlets. The exact terms are not public; the $250M figure is reported by WSJ, which is a News Corp publication. Status: Widely reported but not independently confirmed to the dollar. The "over $250 million" figure should be treated as approximate.
Claim: Reddit licenses its data for AI training at approximately $60 million (Google) and $70 million (Open AI) annually. Source: Google deal: widely reported at $60M/year since Reddit's IPO filing (Feb. 2024). Open AI deal:
estimated at ~$70M/year based on Reddit's disclosure to Ad Week that AI licensing accounts for ~10% of $1.3B annual revenue ($130M), minus confirmed Google payment. Search Engine Land, Feb. 2025. Status: Google figure well-established. Open AI figure is derived, not confirmed. Combined total (~$130M) is confirmed by Reddit's own disclosures.
Claim: Cloudflare launched AI bot blocking by default and a Pay-Per-Crawl marketplace in July 2025. Source: Cloudflare official announcement, July 1, 2025. Extensively covered by Tech Crunch, MIT Technology Review, CNBC, Nieman Lab. Status: Established.
Agent Economy & Market Projections
Claim: 40% of enterprise applications will integrate task-specific AI agents by end of 2026, up from less than 5% in 2025. Source: Gartner, Inc. Press release, August 26, 2025. Attributed to Anushree Verma, Sr Director Analyst. Status: Projection by a major analyst firm. Gartner's best-case projection extends to 30% of enterprise application software revenue from agentic AI by 2035 ($450B+). These are forecasts, not established facts. Gartner has a mixed track record on technology adoption timelines. Falsification condition: Enterprise agent adoption plateaus well below 40% through 2026 due to governance concerns, integration costs, or
underwhelming ROI.
Knowledge & Cultural Claims
Claim: The 2010 Encyclopaedia Britannica was the last print edition -- 32 volumes, approximately 44 million words, after 244 years of print publication. Source: Encyclopaedia Britannica, Inc. announcement via Jorge Cauz (President), March 13, 2012. Historical facts confirmed by Britannica's own records. Status: Established. 32 volumes, 44 million words, 129 pounds, $1,395 retail price. 8,000 sets of the 2010 edition were sold.
Claim: A language dies roughly every two weeks. Source: Widely cited figure attributed to UNESCO, National Geographic, and David Crystal's Language Death (2000). Based on Michael Krauss's 1992 paper "The World's Languages in Crisis." Status: Disputed. The Catalogue of Endangered Languages (ELCat, University of Hawai'i / Eastern Michigan University, 2013) found the actual rate is approximately one language every three to four months -- about 3.5 extinctions per year over the last 40 years. The "every two weeks" figure appears to be a forward-looking calculation from Krauss's upper-bound estimate of 90% loss by 2100, not an observed rate. The actual observed rate remains deeply concerning, but the specific claim requires qualification.
Interpretive Claims
The following are this book’s core interpretive arguments — not factual claims but theses that depend on how you read the evidence:
Thesis: Proprietary human knowledge will become the primary competitive differentiator in the AI economy. Depends on: Data depletion being real (well-supported), synthetic data having fundamental limits (supported with caveats), agent adoption creating demand for callable knowledge (projected), and no AGI breakthrough making human knowledge irrelevant (unknowable). Strongest counter-argument: AGI arrives earlier than expected, rendering the knowledge capture investment unnecessary. See Chapter 9, Objection 2.
Thesis: A knowledge marketplace will emerge as a significant economic structure by 2028-2030. Depends on: Agent demand (projected), standardization of knowledge formats (early stage), trust infrastructure development (not yet built), and regulatory frameworks that enable rather than prevent knowledge trading (uncertain). Strongest counter-argument: Network effects consolidate around 2-3 dominant platforms that extract most of the value, and the "marketplace" looks more like an oligopoly than a bazaar. See Chapter 8.
Thesis: The knowledge divide will become a structural feature of the global economy comparable to the digital divide. Depends on: Knowledge capture being expensive and requiring institutional capacity (likely), early movers gaining compounding advantages (consistent with other technology adoption patterns), and no effective redistribution mechanism emerging (uncertain). Strongest counter-argument: Open-source knowledge capture tools and methodologies democratize access fast enough to prevent structural concentration. I'd give this a 20% probability.
A Note on Methodology
I’ve tried to be rigorous about distinguishing between what I know, what I project, and what I believe. The research community will have legitimate disagreements with some of my interpretations — particularly around the severity of synthetic data limitations and the timeline for knowledge marketplace emergence. I welcome those disagreements. This book is meant to start a conversation, not end one.
Where I’ve cited specific numbers, I’ve provided sources. Where I’ve made projections, I’ve tried to state the conditions under which I’d be wrong. Where I’ve offered opinions, I’ve labeled them as such.
If anything in this matrix is incorrect or has been superseded by more recent findings, I’d like to know about it. The knowledge economy runs on correction as much as on creation.
Appendix C: Glossary
Terms are defined as used in this book. Where my usage di- verges from common industry usage, I’ve noted the distinction.
Agent (AI Agent) — An AI system that can take autonomous actions to achieve a goal, rather than simply responding to prompts. Agents can chain multiple steps, use tools, access external data sources, and make intermediate decisions without human intervention. In the context of this book, agents are the primary consumers of structured knowledge — the demand side of the knowledge economy.
Agentic AI — The architectural paradigm in which AI systems operate as autonomous or semi-autonomous agents rather than reactive tools. Distinguished from conversational AI (chatbots), generative AI (content creation), and analytical AI (data processing) by the agent’s capacity for planning, tool use, and goal-directed behavior.
Club Good — An economic term for a good that is excludable (you can prevent people from accessing it) but non-rival (one person’s use doesn’t diminish another’s). This book argues that structured knowledge is a club good — I can restrict access to my knowledge base, but your querying it doesn’t consume it. This is economically distinct from both public goods (nonexcludable, non-rival) and private goods (excludable, rival).
Contextual Knowledge — Knowledge that is meaningful only within a specific operational, organizational, or cultural context. “The torque spec on Station 7 is 42 Nm, not 45” is contextual knowledge — it’s useless without knowing what Station 7 is and why the deviation from documentation matters. Contextual knowledge is among the hardest to capture and the most valuable to AI agents.
Data — Raw, unprocessed observations or signals. In the DIKW hierarchy used in this book: data is the base layer, information is data with structure and meaning, knowledge is information combined with experience and judgment, and wisdom is the ability to apply knowledge appropriately across contexts.
Data Depletion — The phenomenon whereby the supply of publicly available, high-quality human-generated text data for AI training is approaching exhaustion. Epoch AI estimates this will occur between 2026 and 2032. Sometimes called the “data wall.”
Diagnostic Knowledge — Knowledge used to identify what’s wrong — pattern recognition, elimination logic, experienced intuition about failure modes. The ER doctor who recognizes sepsis from subtle vital sign patterns before the lab results confirm it is applying diagnostic knowledge. Hardest knowledge type to capture, because experts often cannot articulate the patterns they’re recognizing.
Embodied Knowledge — Knowledge that resides in the body and its learned responses rather than in conscious thought. A welder hearing from the sound of the arc whether the feed rate
is correct. A baker knowing from the feel of dough whether the hydration is right. Cannot be captured through interviews alone; requires observation, sensor data, and iterative validation.
Fair Use — A U. S. legal doctrine that permits certain uses of copyrighted material without the rights holder’s permission. The four-factor fair use test considers: purpose and character of the use, nature of the copyrighted work, amount used relative to the whole, and effect on the market for the original. The Thomson Reuters v. Ross Intelligence ruling (2025) was the first to substantially address fair use in the context of AI training.
Foundation Model — A large AI model trained on broad data that can be adapted for many downstream tasks. GPT-4, Claude, Gemini, and Llama are foundation models. They represent the baseline capability that structured knowledge augments — the “general intelligence” that still needs domain-specific knowledge to be useful.
Information — Data that has been organized, structured, and given context so that it becomes meaningful. A temperature reading is data. “The reactor core temperature exceeded safe thresholds at 14:32” is information. This book argues that the Information Age — the era in which access to information was the primary competitive advantage — is ending because AI has equalized access to information.
Kaa S (Knowledge as a Service) — A business model in which structured knowledge is delivered to AI agents or systems on a subscription, transactional, or licensing basis. Analogous to
Saa S for software. This book projects Kaa S as a significant revenue category by 2028-2030.
Knowledge — In this book’s specific usage: information combined with experience, judgment, and context such that it enables better decisions. Knowledge is what a human expert adds beyond what an AI system trained only on public data can provide. The “45 Nm vs. 42 Nm” distinction throughout this book is the canonical example — information says 45, knowledge says 42.
Knowledge Capture — The process of extracting tacit knowledge from human experts and encoding it in structured, machine-readable formats. Distinguished from documentation (which records information) by its focus on judgment, exceptions, context, and the reasoning behind decisions.
Knowledge Divide — This book’s term for the structural inequality that emerges when some organizations, individuals, and societies accumulate and monetize structured knowledge while others do not. Analogous to the “digital divide” of the early internet era, but potentially more consequential because knowledge compounds.
Knowledge Economy — As used in this book: an economic system in which proprietary human knowledge — not in- formation, not data, not compute — is the primary source of competitive advantage. This represents a specific phase transition from the Information Economy, driven by data depletion, regulatory convergence, and agent demand.
Knowledge Engineering — The discipline of extracting, structuring, and validating human knowledge for use by AI systems. This book argues it will become a recognized profession with its own methods, standards, and career tracks by 2027-2028. Distinct from data engineering (which handles data pipelines) and prompt engineering (which handles AI interaction design).
Knowledge Graph — A data structure that represents knowledge as entities (nodes), relationships (edges), and properties (attributes). Distinguished from databases by its emphasis on semantic relationships and context. This book argues that knowledge graphs (or their successors) are the natural representation for structured human knowledge because they preserve the relational and contextual dimensions that flat databases lose.
Knowledge Holder — A person who possesses critical tacit knowledge that is not captured in any documented system. Identified through the Phone Test, Exception Test, and Retirement Test described in Chapter 6.
Knowledge Marketplace — An economic platform where knowledge holders can make their structured knowledge available to AI agents and systems in exchange for compensation. This book argues such marketplaces will emerge as a significant economic structure by 2028-2030.
Knowledge Runtime — A system that goes beyond basic RAG to provide multi-source synthesis, confidence scoring, provenance tracking, governance controls, and continuous knowledge updating. The evolution from “retrieve and gener-
ate” to a genuine knowledge operating system.
Model Collapse — The progressive degradation of AI model quality when models are trained recursively on outputs from other AI models. Manifests as loss of distributional tails (rare, unusual, edge-case content), convergence toward generic outputs, and eventual incoherence. Demonstrated empirically in Nature (Shumailov et al., 2024) and theoretically at ICLR 2025 (Dohmatob et al.).
Predictive Knowledge — Knowledge used to anticipate what will happen — market intuition, material behavior under stress, customer patterns. The materials engineer who says “that alloy will fail under cyclic loading at this temperature” before any test confirms it is applying predictive knowledge.
Procedural Knowledge — Knowledge about how to do something — steps, sequences, techniques, workarounds. The easiest knowledge type to capture through structured interviews, and often the starting point for knowledge capture programs.
Provenance — The documented chain of origin and transformation of a piece of knowledge or data. In the knowledge economy context, provenance tracking answers: where did this knowledge come from, who validated it, when was it last updated, and what’s its confidence level? Essential for trust in knowledge marketplace transactions.
RAG (Retrieval-Augmented Generation) — A technique that augments AI model responses by retrieving relevant content from external sources at inference time, rather than relying
solely on knowledge embedded during training. RAG is the current primary mechanism through which AI agents consume structured knowledge. This book argues that RAG will evolve into more sophisticated “knowledge runtimes.”
Relational Knowledge — Knowledge about people, organizations, and social dynamics — who to call, who actually makes decisions, which relationships enable or block action. The “shadow org chart” that every experienced employee understands but no document captures.
Structured Knowledge — Human knowledge that has been captured, organized, and encoded in formats that AI systems can query, reason over, and act on. Distinguished from unstructured documentation by its semantic richness, machinereadability, and integration of context, confidence levels, and provenance.
Tacit Knowledge — Knowledge that a person possesses but cannot easily articulate — often contrasted with explicit knowledge, which can be written down. Originally theorized by Michael Polanyi (“we can know more than we can tell”). This book argues that the tacit/explicit distinction is less useful than understanding the five knowledge types (procedural, diagnostic, predictive, relational, contextual), because each can have both tacit and explicit components.
Wisdom — In the DIKW hierarchy: the ability to apply knowledge appropriately across contexts, with judgment about when standard approaches apply and when they don’t. This book does not claim that AI systems can achieve wisdom,
nor does it argue that the knowledge economy can capture it. Wisdom remains, for now, irreducibly human.
Appendix D: Recommended Reading
This is not a bibliography. These are the books, papers, and reports that shaped my thinking — and that I believe will help you develop your own perspective on the arguments in this book. I’ve annotated each entry because a reading list without context is just a shelf.
The Foundations
Polanyi, Michael. The Tacit Dimension. University of Chicago Press, 1966. The origin of "we can know more than we can tell." Polanyi's insight that human knowledge is irreducibly personal and embodied -- that a cyclist cannot explain how they balance, that a diagnostician cannot fully articulate their pattern recognition -- is the philosophical foundation for this book's central argument. Short, dense, and more relevant now than when it was written.
Nonaka, Ikujiro and Takeuchi, Hirotaka. The Knowledge-Creating Company. Oxford University Press, 1995. The SECI model (Socialization, Externalization, Combination, Internalization) remains the best framework for understanding how organizations create and transfer knowledge. Nonaka and Takeuchi
understood, thirty years before AI agents, that the hardest problem isn't storing knowledge -- it's converting tacit knowledge into forms that can travel across organizational boundaries. Read the model, skip the 1990s management jargon.
Davenport, Thomas and Prusak, Laurence. Working Knowledge. Harvard Business School Press, 1998. The practical counterpart to Nonaka. Davenport and Prusak documented why knowledge management 1.0 failed -- not because the technology was wrong but because the incentives were. Their analysis of knowledge markets inside organizations anticipates the external marketplace dynamics I describe in Chapter 7. The failure modes they catalog remain instructive.
The Data Depletion Thesis
Villalobos, Pablo et al. "Will we run out of data? Limits of LLM scaling based on human-generated data." ar Xiv:2211.04325, 2024 update. The Epoch AI paper that quantifies the depletion timeline. Peer-reviewed, presented at ICML 2024. Essential reading for anyone who wants to understand the supply-side constraint on AI progress. The 2024 revision is more nuanced than the 2022 original -- read both to understand how the field's thinking evolved.
Shumailov, Ilia et al. "AI models collapse when trained on recursively generated data." Nature 631, –755759, 2024. The empirical demonstration of model collapse. What makes this paper important is not just the finding (recursive training on synthetic data degrades models) but the mechanism -- tail erosion, where the rare and unusual content disappears first. This is why synthetic data cannot substitute for human-generated knowledge: it loses exactly the edge cases that make AI systems useful for hard problems.
Dohmatob, Elvis et al. "Strong Model Collapse." ICLR 2025 Spotlight. The paper that showed even 1-in-1,000 synthetic data contamination can trigger collapse dynamics. Read the abstract and Section 3 for the core result. The implications for any organization relying on web-scraped training data are significant -- the internet is already contaminated with synthetic content, and the contamination is accelerating.
The Regulatory Landscape
U. S. Copyright Office. "Copyright and Artificial Intelligence, Part 2: Copyrighted Works in AI Training." May 2025. 108 pages. The full report is worth reading if you're making strategic decisions about data licensing or knowledge marketplace participation. The executive summary captures the key positions: training is not inherently transformative,
the fair use analysis is fact-specific, and licensing markets are expected to develop. Available free from copyright.gov.
Thomson Reuters Enterprise Centre Gmb H v. Ross Intelligence Inc., No. 1:20-CV-613-SB (D. Del. Feb. 11, 2025). The first U. S. court ruling on fair use in AI training. The opinion is publicly available and surprisingly readable. Essential for understanding why "we scraped it from the internet" is no longer a viable data strategy.
Regulation (EU) 2024/1689 -- The EU AI Act. The full regulation is bureaucratic, but Articles 51-56 on general-purpose AI models are the relevant sections. The transparency requirements for training data disclosure are the regulatory mechanism that will accelerate the licensing economy in Europe.
The Agent Economy
Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach. 4th Edition, Pearson, 2020. The standard textbook. The 4th edition includes substantial updates on deep learning, but the foundational chapters on search, planning, knowledge representation, and decision-making remain
the best introduction to the concepts that underpin agentic AI. If you're a non-technical reader trying to understand what agents actually do, start with Chapters 1-2 and then jump to the relevant application chapters.
Wooldridge, Michael. An Introduction to Multi Agent Systems. 2nd Edition, Wiley, 2009. Older but foundational for understanding multi-agent coordination -- how agents negotiate, share information, and resolve conflicts. The economic models of agent interaction (auction mechanisms, contract nets, coalition formation) are directly relevant to how knowledge marketplace transactions will work.
Gartner. "40% of Enterprise Applications Will Feature Task-Specific AI Agents by 2026." Press release, August 26, 2025. Read with appropriate skepticism -- Gartner's technology adoption timelines are often directionally correct but aggressive on timing. The value is in the five-stage adoption model (assistants → task agents → collaborative agents → agent ecosystems → new normal) and the $450B revenue projection for 2035.
Economics of Information and Knowledge
Shapiro, Carl and Varian, Hal. Information Rules. Harvard Business School Press, 1999. Written before the current AI wave but remarkably prescient about the economics of information goods -- network effects, lock-in, switching costs, and versioning. The sections on pricing information goods apply directly to knowledge marketplace economics. Read Chapter 1 and Chapter 6.
Mokyr, Joel. The Gifts of Athena: Historical Origins of the Knowledge Economy. Princeton University Press, 2002. Mokyr argues that the Industrial Revolution was fundamentally a knowledge revolution -- that the key input wasn't capital or labor but "useful knowledge" and the institutions that generated and transmitted it. His distinction between propositional knowledge ("knowing that") and prescriptive knowledge ("knowing how") anticipates the procedural/diagnostic/predictive taxonomy in this book.
Zuboff, Shoshana. The Age of Surveillance Capitalism. Public Affairs, 2019. Essential for understanding the extractive model that this book's knowledge economy might become if we're not careful. Zuboff documents how behavioral data was claimed as a free resource, processed into prediction products, and sold in behavioral futures markets. Replace "behavioral data" with "human knowledge" and you have the concentration scenario from Chapter 8. Read it as a warning, not a prediction.
The Human Dimension
Crawford, Kate. Atlas of AI. Yale University Press, 2021. Crawford maps the physical, labor, and political infrastructure behind AI systems -- the mines, the data centers, the invisible human labor. Essential context for Chapter 8's discussion of the knowledge divide. AI is not an abstraction. It has a supply chain, and that supply chain has human beings in it.
Crystal, David. Language Death. Cambridge University Press, 2000. The source that originally popularized the "language dies every two weeks" figure (now revised downward -- see Appendix B). But Crystal's broader argument about the cognitive and cultural losses that accompany language death remains powerful and directly relevant to Chapter 4's discussion of cultural knowledge.
Graeber, David and Wengrow, David. The Dawn of Everything. Farrar, Straus and Giroux, 2021. Not about AI at all, but essential for thinking about the civilizational questions in the Epilogue. Graeber and Wengrow demonstrate that human societies have always been more flexible, more creative, and more varied in their organization than we typically assume. If you're worried about what happens when AI automates most productive labor, this book provides historical
evidence that humans have structured their economies in radically different ways before -- and could again.
For the Technical Reader
Lewis, Patrick et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Neur IPS 2020. The foundational RAG paper. If you're building knowledge retrieval systems, start here.
Ji, Ziwei et al. "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys 55(12), 2023. Comprehensive overview of why AI systems confabulate and the current mitigation strategies. Directly relevant to understanding why structured knowledge with provenance tracking outperforms raw RAG for high-stakes applications.
Pan, Shirui et al. "Unifying Large Language Models and Knowledge Graphs: A Roadmap." IEEE Transactions on Knowledge and Data Engineering, 2024. Technical roadmap for LLM-knowledge graph integration. If you're building the infrastructure described in Chapter 6, this paper maps the current landscape and open problems.
One More
Borges, Jorge Luis. "The Library of Babel." 1941. A short story, not a research paper. Borges imagines a universe consisting of an infinite library containing every possible book -- every possible arrangement of characters. In this library, all knowledge exists. But finding any particular piece of knowledge is impossible, because it's buried in an infinite expanse of meaningless noise.
We are building the Library of Babel. AI systems trained on everything are systems that contain everything and understand nothing without curation. The knowledge economy is, at its heart, the project of turning the Library of Babel into something useful — one structured, validated, contextual piece of knowledge at a time.
Five pages. Read it tonight.
And the Last One… Watch Star Trek. All of it. Start with "The Next Generation". I did not learn more about, well, almost anything, as I did there.
Thank you Thank you for reading my thoughts - and yes, an AI helped me transforming my German manuscript to fluent English - thank you, too.
Thom Heinrich