Photo by Mike van den Bos on Unsplash
We’ve all been there: managing many moving pieces in our heads and then, BLAM, Slack dings, we’ve lost our perfect mental model faster than closing an unsaved file. Working with LLM agents has upped the ante in terms of what we have to keep in context in our heads.
Consider reviewing user feedback that reports files are missing that they expected to be in the agent output. We have to load into our memory:
- the LLM output in question
- their feedback
- Excel file of automated evaluation outputs
- the PowerPoint decks that are missing, and awareness of all of the slides within
- external services the agent is pulling the data from
- add a touch of search relevancy tuning
- a dozen prompts the agent uses
- the order these prompts are fired off
- oh yea, and then there’s the code!
As we work through an issue, we’re juggling all of this context to find the best solution. I’m tired just thinking about it. (If you know an engineer, give them a hug; it’s been a long day.) Keeping everything in order mentally is crucial. One simple Slack message can blow it all up before we even realize it.
This isn’t just annoying. It’s expensive—mentally and cognitively. And it turns out, engineers and large language models (LLMs) suffer in surprisingly similar ways when their context window gets blown up.
Subscribe to my blog and get posts like this in your inbox. Share your email below, or follow me on Threads, LinkedIn, or BlueSky.
Working Memory vs. Context Window
Humans rely on working memory, our brain’s temporary scratchpad for solving problems, holding ideas, and managing complexity. It’s what lets you keep the structure of a code function in your head while tracking variable states, or mentally juggle five modules during a code refactor.
We can’t keep everything in context, all the time. Like an LLM’s context window, working memory has a finite size. Overflow it, and you lose track (or go crazy). And unlike an LLM, we can’t just “truncate and move on.”
The Real Cost of Context Switching
When engineers are interrupted, rebuilding context isn’t instant. We have to:
- Recall what decisions we made
- Rebuild the problem space in our heads
- Reconstruct the mental map of how systems interact
- Reread our own code and notes just to figure out where we left off
It’s not just “getting back to work”—it’s reloading your brain.
Studies show it takes an average of 23 minutes and 15 seconds to fully regain focus after a distraction. When we’re distracted just 3 times in a day, we lose an hour! Undisturbed focus times are so important. Another study shares: “Our data suggests that people compensate for interruptions by working faster, but this comes at a price: experiencing more stress, higher frustration, time pressure and effort.” Frequent switching increases stress and reduces the quality of output.
LLMs Drop It Like It’s Hot
LLMs hit context limits, too, but they handle it differently. When the context window gets full, they just error (or go crazy). We have to reduce the size of the context we’re giving it. Older messages get dropped off, or we summarize the past to keep some resemblance of context alive, but it’s always a compromise.
Humans don’t have that luxury. We can’t just forget the past; we need it, so we have to rebuild it.
Even AI Engineers Lose Context
We’re not alone in this problem. I use Cursor Agent, and after a while, I have to start a new thread because the current one has lost its mind. The context is just… gone. Earlier decisions vanish. It’s like explaining your problem to someone who wasn’t paying attention—because they weren’t. This happens faster than me losing my memory and context, which can get quite frustrating.
This is why we use tools like memory buffers, snapshots, and chaining. Because agents, like humans, can’t remember everything.
Could we leverage similar techniques?
How Engineers Can Defend Against Context Collapse
If you want to hold onto your mental model, treat your brain like an LLM and externalize your memory:
- Document your thoughts before switching. If I know I’m about to be pulled out for a meeting, I will spend several minutes documenting where I am; leaving TODOs in my code for what I was thinking, and where I’m heading next.
- Keep thorough documentation. Knowing I’m going to be pulled away is one thing, but in reality, that’s rarely the case. As I dive into a large effort, I will keep notes as I go, copy/paste values and code, my thoughts on my progress, etc.. This is incredibly valuable to get back up to speed faster. (Admittedly, I wish I did this more, unfortunately I still believe I won’t get distracted…)
- Time-block and turn off notifications. Outlook can automatically block your calendar for Focus Time, use it! Blocking your calendar, as Busy or Out of office, is a great way to block scheduled distractions. During these times, turn off your notifications from Slack, Outlook, Insta, Candy Crush, etc. and let yourself focus on your tasks and kill it. (I also have a few different Spotify playlists to help mute the distractions in my head.)
If someone creates an app that can help me get up to speed faster, let me know!
Focus Is Your Superpower
Even in an AI-assisted world, sustained attention is still your edge. Whether you’re coding or collaborating with an agent, your ability to protect your context determines how fast, and how well, you solve problems.
Consider even the larger reasoning models or deep research. It’s thinking about its context, and building new context, until it gets you an output. We can’t distract it. It’s focused and honed in and will let you know when it’s done.
Context switches are inevitable. But they don’t have to wreck your flow.
What’s your system for staying focused? Do you rely on time-blocking? Sticky notes? A personal checklist or a running log file? Share what works for you, or what doesn’t, below!
Subscribe to my blog and get posts like this in your inbox. Share your email below, or follow me on Threads, LinkedIn, or BlueSky.

Leave a Reply