Context Engineering Explained for Founders Who Use AI, Not Build It

TL;DR: Context engineering is the discipline of controlling what information an AI model sees when it generates a response. Research shows AI quality degrades as context grows, even before the window is close to full. For founders using AI in their work, the takeaway is practical: structure beats volume, long conversations produce worse results, and giving AI exactly what it needs for the current task outperforms giving it everything you have.


The AI you opened this morning is not the same one you’ll be talking to an hour and a dozen topics later. Not because the model changed. Because the context did.

“Context engineering” has been flooding developer circles for the past several months. Andrej Karpathy, former Tesla AI director, described it as the broader discipline that prompt engineering was always a subset of. Shopify CEO Tobi Lütke flagged it in an internal memo that circulated widely: the most valuable new skill in an AI-first environment isn’t writing better prompts, it’s building better context. The term gained real traction in mid-2025, mostly among engineers and developers.

Most of what’s been written about it targets that audience. But the underlying principle applies to anyone using AI in their work, including founders using it to run operations, draft communications, or work through decisions. What follows is the practical version.


What Is Context Engineering?

Context engineering is the practice of deliberately controlling what information an AI model has access to when it generates a response. Prompt engineering focuses on how you ask a question. Context engineering focuses on what the model knows at the moment it answers. The context window includes far more than your question: it holds the full conversation history, any documents you’ve pasted in, prior outputs, instructions, and everything else that has accumulated during the session.

Anthropic defines it as “the set of strategies for curating and maintaining the optimal set of tokens during LLM inference, including all the other information that may land there outside of the prompts.” The core challenge is that this set of information grows over any real interaction, and how you manage that growth directly affects the quality of what comes back.


Why Your AI Gets Worse Mid-Conversation

As the context window fills up, AI accuracy drops. A Chroma study from July 2025 tested 18 frontier models, including Claude, GPT-4.1, and Gemini 2.5, and found that every single one exhibits performance degradation as input length increases. The researchers called this “context rot.” Stanford’s prior work quantified the specific effect: accuracy falls from roughly 70-75% when relevant information sits at the start of the context, down to 55-60% when the same information gets buried in the middle. The information isn’t gone. The model just pays less attention to it.

This is what researchers call positional bias. Information at the beginning and end of the context window gets processed reliably. Information in the middle gets deprioritized. In a long conversation, your original instructions, key constraints, and the specific output you asked for all gradually migrate toward the middle.

The effect compounds when tasks require reasoning, not just retrieval. Stanford’s research found that performance degradation was significantly worse for questions requiring two reasoning steps compared to simple lookups. A conversation that drifted across three topics before reaching your actual question has set up worse conditions for that question than a clean, focused session would have.


Why Bigger Context Windows Don’t Fix This

More context capacity delays the problem; it doesn’t solve it. The Chroma study found degradation at every input length increment tested, not just when approaching the limit. A Databricks analysis found accuracy drops noticeably around 32,000 tokens, well before the million-token limits now available on some models.

The reason is architectural. LLMs run on transformer architecture, which calculates relationships between every token in the context and every other token. That’s n² pairwise relationships for n tokens. At 10,000 tokens, that’s 100 million calculations. At 100,000 tokens, 10 billion. The model’s ability to maintain equal attention across all of that doesn’t scale. Anthropic’s engineers call this an “attention budget”: every token you add depletes it by some amount, which is why they treat context as a finite resource with diminishing marginal returns rather than free storage.


The Founding Principle: Smallest Set of High-Signal Information

Anthropic’s guidance on context engineering comes down to one sentence: find the smallest possible set of high-signal tokens that maximize the likelihood of the desired outcome. This isn’t about brevity for its own sake. It’s about recognizing that every piece of information you add to the context competes with every other piece for the model’s attention, and irrelevant or redundant content actively dilutes quality.

For a founder or operator, the translation is direct: give AI exactly what it needs for this task. Not your whole week of notes, not the entire document when you need one answer from it, not every background detail you might mention. The same instinct behind lean process documentation applies here. As The Self-Managing Business observes about ops work: you document what actually gets asked, not what you think should be documented. AI context runs on the same principle.

This is also why giving AI an information architecture problem — dumping everything into a single long session and expecting it to sort it out — produces the same failure mode as keeping all your business knowledge in your head. Volume doesn’t substitute for structure. The guide makes this point about automation too: automation without foundation is just chaos at a faster speed. That holds for AI interactions as much as it does for business processes.


What This Looks Like in Practice

Four habits follow directly from the research.

Start a new conversation for each task. Conversation history accumulates fast. A session that began with a draft email, moved through two other questions, and arrived at the thing you actually need has accumulated significant noise. Starting fresh gives the model a clean window and keeps your key information at the front, where it gets processed with full attention.

Front-load what matters. Positional bias is consistent across models: information at the start of the context window gets higher reliability than information in the middle. If your session has a specific goal, state it clearly at the beginning. Don’t bury the actual request after three paragraphs of context.

Be specific rather than complete. You don’t need to paste your entire SOP to get a useful answer about one step in it. You don’t need to share a full document when you need one section summarized. Giving AI the slice it needs for the current task produces better results than giving it everything and expecting it to find what matters. This is the same logic that makes working on the business in focused blocks more productive than open-ended sprawl.

When quality drops mid-session, start over. If you’ve been in the same conversation a while and the responses start feeling less precise or less attuned to what you originally asked, that’s usually the context filling up. The fix isn’t to rephrase. It’s to start a new session with a focused prompt.


None of This Is Complicated

Context engineering is systems thinking applied to AI interactions. Treat what you give the model as a finite resource with diminishing returns. Structure beats volume. Clean beats comprehensive. Focused beats thorough.

The underlying problem is consistent: information without structure doesn’t scale, regardless of how much capacity you have available. That’s true for your business operations and it’s true for your AI interactions.

If you’re looking to build this kind of structured thinking into how you work, the work-with-me page is the place to start.

Frequently Asked Questions

What is context engineering in simple terms?

Context engineering is the practice of controlling what information an AI model has access to when it generates a response. Rather than focusing only on how to phrase a question, it asks what the model should know at the moment it answers. In practice, this means curating what goes into the conversation: the right information, in the right amount, at the right time.

Why does my AI seem to forget earlier parts of a long conversation?

AI models process context with what researchers call positional bias: information at the beginning and end of a conversation gets more reliable attention than information buried in the middle. As a session grows, your early instructions and constraints migrate toward the middle of the window, where they get deprioritized. This is a product of how transformer-based models process context, and starting a new conversation for each new task is the most direct fix.

Does starting a new conversation actually improve AI quality?

Yes, in most cases. A fresh context window means your key information sits at the start, where the model processes it with full attention. Long conversations accumulate background noise from earlier topics, resolved questions, and intermediate steps that no longer apply. Clearing that noise by starting fresh gives the model a cleaner signal to work from.

What is context rot and how does it affect me?

Context rot is the performance degradation that happens as input length increases in an LLM. A Chroma 2025 study tested 18 frontier models and found that every one exhibits this behavior at every input length increment tested. For practical users, it means the quality of AI responses declines over the course of a long session or when you feed the model large volumes of text. The model remains capable, but accuracy decreases as context fills up.

How is context engineering different from prompt engineering?

Prompt engineering focuses on how you phrase a request. Context engineering focuses on what the model knows when it processes that request. In a one-shot interaction, the two are nearly the same. In a multi-turn conversation or extended workflow, they diverge significantly. Conversation history, pasted documents, prior outputs, and tool results all contribute to context. Managing those inputs deliberately is context engineering. Writing the request well is prompt engineering. Both matter, but context engineering is the bigger lever for anyone using AI across more than a single message.

The Business Chaos Audit

Not sure where your operations are breaking? The Business Chaos Audit is a free Notion template that scores your setup across six operational areas and shows you exactly where to focus. Takes about 15 minutes.


Discover more from The Admin Workshop

Subscribe now to keep reading and get access to the full archive.

Continue reading