Troubleshooting Guide: When AI Agents Get Stuck - Todo-MCP Blog
Troubleshooting 10 min read | January 2025

Troubleshooting Guide: When AI Agents Get Stuck

Recovery patterns for AI-assisted development. What to do when your AI assistant goes off track, including verbal recovery patterns and symptom-based solutions.

Get Your Free AI Coding Tutorial

Learn how to solve context loss in Claude Code and Cursor

or enter your details

By continuing, you agree to our Terms and Privacy Policy.

A Note to the Reader

The other guides in this series describe how things should work. This guide describes what to do when they don't.

Every methodology encounters reality. Plans prove wrong. Context gets lost. The AI misunderstands. Work drifts. This guide provides recovery patterns for common failure modes - ways to get back on track without starting over.

The underlying philosophy remains the same: return to your last protection point and climb from there.

The Recovery Philosophy

When something goes wrong in AI-assisted development, the instinct is often to push forward harder. This usually makes things worse.

Instead:

  1. Stop - Recognize something is wrong
  2. Diagnose - Identify what went wrong and where
  3. Return - Go back to the last known-good state
  4. Reassess - Understand what led to the failure
  5. Resume - Continue from solid ground

This mirrors the climbing metaphor: when you slip, you fall to your last protection point - not to the ground. The protection points (tests, documentation, verified commits) exist precisely for this purpose.

Verbal Recovery Patterns

The words you use to enter recovery mode matter. These phrases have proven effective for stopping runaway execution and returning to stable ground.

Emergency Stop

When the AI is in a loop or heading in the wrong direction:

"Stop all work and do nothing, let's plan..."
"Stop. This approach isn't working. Let's step back."

Entering Recovery Mode

After something has gone wrong and you need to regroup:

"We came back because you got yourself into a loop. You were doing
deep research and fixing things, and eventually you locked yourself up.

So, do a resume look at your context keys, then stop and talk to me.
Don't do anything else. Now we have a partially damaged database and
we have to figure out how to go back and fix it.

We are entering planning mode for a bit. Do nothing, we're planning."

Key Elements of Recovery Phrases

  • Explicit stop - "Stop all work" halts all action
  • Acknowledge the problem - "You got into a loop" names what went wrong
  • Request state check - "Look at your context keys" assesses current position
  • Block further action - "Don't do anything else" prevents compounding
  • Declare planning mode - "We're planning" returns to collaboration

The Recovery Conversation Pattern

1. Stop
   "Stop. Don't make any more changes."

2. Assess
   "Run context_resume. Tell me what you think the current state is."

3. Align
   "Here's what I see: [your observation]. Do these match?"

4. Plan recovery
   "We are entering planning mode. What do we need to do to get back
   to a known-good state?"

5. Execute recovery
   "Go ahead and [specific recovery action]."

Quick Reference: Symptom → Action

Use this table when you recognize a problem:

  • AI in a loop, same approach failing → "Stop. Investigate before trying again."
  • Plan was wrong, discovered mid-execution → Pause. Return to last good state. Re-plan.
  • Context lost between sessions → Run context_resume. Verify understanding.
  • Repeated misunderstanding → Reset: describe problem and success, not solution.
  • Tests failing, cause unknown → Characterize first. Find what changed. Then fix.
  • Scope creep → Separate discovered issues. Complete original scope only.
  • Verification being skipped → Stop. Run tests now. Make verification explicit.
  • Multi-session drift → Return to original goal. Audit current state. Realign.
  • Blocked on external → Document blocker. Find parallel work. Preserve state.

Symptom: The AI Keeps Trying the Same Failing Approach

What It Looks Like

The AI attempts a fix, it fails, and the AI tries essentially the same thing again. Loop repeats. No progress.

Why It Happens

  • The AI is pattern-matching to similar problems without understanding this one's specifics
  • Root cause wasn't actually identified during planning
  • The AI is guessing rather than investigating

Recovery

Step 1: Break the loop explicitly

"Stop. This approach isn't working. Let's step back."

Step 2: Return to investigation

"We need to understand WHY this is failing before trying again.
Don't propose a fix - investigate the actual behavior."

Step 3: Require evidence

"Show me what you found. What is actually happening vs. what we expected?"

Step 4: Only then fix

"Now that we understand the cause, what's the minimal fix?"

The Pattern: Failing approach → Stop → Investigate → Understand → Fix

Never: Failing approach → Same approach → Same approach → Frustration

Symptom: The Plan Was Wrong (Discovered Mid-Execution)

What It Looks Like

You're several tasks into execution when you realize the overall approach is flawed. Continuing will waste effort; the work so far may not be usable.

Why It Happens

  • Requirements were misunderstood during planning
  • A technical constraint wasn't discovered until implementation
  • The problem turned out to be different than diagnosed

Recovery

If the work so far is salvageable:

"Pause execution. The approach needs to change, but we can keep what we've done.

New constraint: [what you learned]
Let's revise the remaining tasks given this."

If the work so far must be abandoned:

"Stop. We need to back out these changes and rethink.

What we learned: [the insight]
Return to the last good commit.
Now let's re-plan with this knowledge."

The Key Insight: Mid-execution discovery isn't failure - it's learning. The mistake is continuing down a known-wrong path. Return to planning when planning is what's needed.

Symptom: Context Was Lost Between Sessions

What It Looks Like

Starting a new session, the AI doesn't remember what was happening. Or worse, it "remembers" something incorrect.

Why It Happens

  • Session ended without saving context
  • Context was saved but not in a recoverable format
  • Too much time passed and context became stale

Recovery

If you have task tracking:

"Run context_resume to see where we were."

Then verify:

"That's your understanding. Here's what I remember: [your recollection].
Do these align? What's missing?"

If context is truly lost:

"We're starting fresh. Here's the current state:
- What we were trying to do: [goal]
- What was completed: [done items]
- What remains: [remaining items]
- Last known-good state: [commit/checkpoint]

Please rebuild your understanding and confirm."

Prevention

At the end of every session, explicitly save state: what was accomplished, what remains, key decisions made, open questions. This is placing protection before you stop climbing.

Symptom: The AI Keeps Misunderstanding

What It Looks Like

You explain what you want. The AI does something different. You clarify. The AI still misses the point. Cycle repeats.

Why It Happens

  • You're describing the solution when you should describe the problem
  • Terminology mismatch - same words, different meanings
  • Implicit context that you haven't shared
  • The request is genuinely ambiguous

Recovery

Step 1: Reset the explanation

"Let me try explaining this differently.

The PROBLEM I'm trying to solve: [problem, not solution]
The CONTEXT you need: [background]
SUCCESS looks like: [concrete outcome]
CONSTRAINTS: [what must/must not happen]"

Step 2: Check understanding

"Before you do anything, tell me what you think I'm asking for."

Step 3: Correct at the understanding level

"Not quite. The key thing you're missing is: [specific gap]"

The Pattern: When communication fails repeatedly, the issue is usually: too much HOW, not enough WHY; missing context that seems obvious to you; ambiguity you don't see because you know what you mean.

Describe the problem and the success criteria. Let the AI figure out the approach.

Symptom: Tests Are Failing and We Don't Know Why

What It Looks Like

Tests that were passing are now failing. Or new tests fail unexpectedly. The cause isn't obvious.

Why It Happens

  • Change had unintended side effects
  • Test assumptions no longer hold
  • Environment changed
  • Flaky tests

Recovery

Step 1: Characterize the failure

"Don't try to fix this yet. First, characterize:
- Which tests are failing?
- When did they start failing?
- What changed since they last passed?
- Do they fail consistently or intermittently?"

Step 2: Isolate the change

"What was the last known-good state?
Can we return to that state and verify the tests pass there?"

Step 3: Binary search if needed

"If multiple changes happened, bisect:
- Test at the midpoint
- Narrow down which change caused the failure"

Step 4: Understand before fixing

"Now that we know which change caused it, explain WHY before fixing.
Don't guess - trace the actual failure path."

The Key Insight: Test failures are information. They tell you exactly where your protection held. Use them diagnostically, not just as obstacles to make green.

Symptom: Scope Has Crept Beyond Original Intent

What It Looks Like

What started as "add a login button" has become "refactor the entire authentication system." The work keeps expanding. No end in sight.

Why It Happens

  • "While we're here" additions
  • Discovered issues that "should" be fixed
  • Perfectionism
  • Unclear original boundaries

Recovery

Step 1: Acknowledge the creep

"We've drifted from the original scope. Let's reset.

Original goal: [what was actually requested]
Current scope: [what we're now trying to do]
The gap: [how we drifted]"

Step 2: Separate concerns

"These are the discovered issues. They go on a SEPARATE list, not this task:
- [issue 1]
- [issue 2]

This task is ONLY: [original scope]"

Step 3: Draw a hard line

"Complete the original scope. Commit it.
Then and only then, decide if we tackle the other issues."

Prevention

State boundaries explicitly at the start:

"This task is ONLY about [X].
If you discover other issues, note them but don't fix them."

Symptom: The AI Is Skipping Verification Steps

What It Looks Like

The AI marks things complete without running tests. Commits without verifying the build. Moves to the next task without confirming the previous one works.

Why It Happens

  • Speed pressure (real or perceived)
  • Assumption that "it should work"
  • Verification steps weren't explicit in the task

Recovery

Step 1: Stop and verify now

"Pause. Before we continue:
- Run all tests
- Verify the build passes
- Show me the results"

Step 2: Make verification explicit

"Going forward, every task completion requires:
- Tests pass (show output)
- Build passes (show output)
- Changes committed with message

Don't mark anything complete until these are verified."

Step 3: Establish the expectation

"We never skip verification. 'It should work' is not verification.
Only 'I ran it and here's the output' counts."

The Philosophy: Skipping verification is climbing without placing protection. It feels faster until you fall.

Symptom: Work Has Drifted Over Multiple Sessions

What It Looks Like

After several sessions, the work has gradually moved away from the original intent. Each individual step seemed reasonable, but the sum is off-target.

Why It Happens

  • No project brief to anchor against
  • Context degraded over time
  • Small pivots accumulated
  • No checkpoints to catch drift

Recovery

Step 1: Return to original intent

"Let's revisit the original goal.
What were we actually trying to accomplish?"

Step 2: Audit the current state

"Now look at what we have:
- What matches the original goal?
- What's tangential?
- What's contradictory?"

Step 3: Realign or redefine

Option A: "The original goal is still right. Let's course-correct."
Option B: "We've learned the goal should change. Let's redefine it explicitly."

Prevention

For multi-session work, create a project brief:

Project: [name]
Goal: [single clear statement]
Success Criteria: [how we know it's done]
Boundaries: [what's out of scope]

Check against this brief at the start of each session.

Symptom: We're Blocked on Something External

What It Looks Like

Work can't continue because of a dependency outside your control - waiting for API access, another team's component, a decision from stakeholders.

Recovery

Step 1: Document the blocker clearly

"We're blocked on: [specific thing]
What we need: [specific requirement]
Who owns it: [person/team]
Expected resolution: [timeline if known]"

Step 2: Identify parallel work

"While blocked, what CAN we do?
- Other tasks not dependent on this blocker
- Preparation work that will be needed once unblocked
- Documentation of what we've learned so far"

Step 3: Preserve state for resumption

"Save current state so we can resume cleanly when unblocked:
- Where we stopped
- What we were about to do
- Any context that might be forgotten"

The Key Insight: Being blocked on one thing doesn't mean being blocked on everything. Find parallel paths. Document thoroughly so resumption is quick.

The Meta-Pattern

Almost every recovery follows the same structure:

  1. Recognize - Something is wrong
  2. Stop - Don't push forward blindly
  3. Diagnose - Understand what went wrong
  4. Return - Go back to solid ground (last protection point)
  5. Fix the process - Address what led to the failure
  6. Resume - Continue from verified state

The instinct to push through is usually wrong. Stopping feels slow but prevents compounding the problem.

Key Quotes

"The best recovery is prevention - but when prevention fails, the second best is recognizing quickly and returning to solid ground."

"Pushing through a known problem is not persistence. It's stubbornness. Know the difference."

"Every failure teaches you where to place better protection next time."

Ready to Never Lose Context Again?

Get your free setup tutorial and start shipping faster

or enter your details

By continuing, you agree to our Terms and Privacy Policy.

Continue Reading

Back to all articles