Skip to content

AI-Driven Teaching

Teaching AI to Teach: Building Self-Guided Tours That Actually Work

The Training Problem We've All Suffered Through

You know how technical training usually goes:

Death by PowerPoint. Someone clicks through 80 slides about your new system. You watch. You nod. You understand nothing. A week later, you've forgotten everything. Everyone loses.

We learned long ago what actually works: hands-on labs. Identify the critical skills, build exercises around them, let people do the thing with guidance available when they get stuck.

The problem? This doesn't scale. Hands-on training requires experts available when learners need them. That means: - Scheduled training sessions (inconvenient timing) - Limited capacity (max 20 people per session) - Geographic constraints (travel costs, time zones) - Expertise bottleneck (your best people spending days teaching instead of building)

We've known for years that hands-on learning works. We just couldn't make it available 24/7 to everyone who needs it.


Enter AI: Scaling Expertise

What if you could embed expert guidance directly in your project — a "message in a bottle" that any AI assistant could discover and execute?

That's what we tried with our GenAI-Logic system. It's powerful but counterintuitive — it uses declarative rules instead of procedural code, a mental model shift that doesn't stick from reading docs alone. People need hands-on experience with a knowledgeable guide.

So we created tutor.md: 1,300 lines of detailed instructions for AI assistants to conduct 30-45 minute hands-on guided tours. The AI would be the tour guide, walking users through the live system interactively, answering questions, helping when things go wrong.

The vision: Hands-on training that scales infinitely. Available anytime, anywhere. Expert guidance embedded in the project itself.

What happened: It failed spectacularly.

Here's what we learned about making AI reliable — and the pattern that emerged for teaching AI to teach.


First Failure: The Interface Problem

Our initial approach was straightforward:

Show them the Admin UI. Press Enter when they've explored it.

Now explain row-level security. Wait for them to say 'ok'.

Change the item quantity to 100. [Wait for observation]

What we expected: Users would press Enter or type 'ok' to advance through the tour.

What happened: Nothing. The AI just... stopped.

The lesson: We'd assumed all chat interfaces work like traditional terminals where pressing Enter sends a signal. In modern chat UIs (like GitHub Copilot), pressing Enter just creates a newline. The AI was literally waiting for input that would never come.

The fix: Explicit typed responses:

Type 'next' when you've explored the Admin UI.

Type 'ready' when you've observed the customer count.

Type 'continue' when you understand the security filters.

Simple enough. We fixed all 10 transition points. Problem solved, right?


Second Failure: The Disappearing Act

We restarted the tour to test our fixes. The AI began confidently:

"Let's start by launching the server and exploring the Admin UI..."

It walked through the initial setup, explained the commands, showed the UI. Then it said:

"Now let's look at row-level security. I'll show you the security code..."

It opened declare_security.py, displayed the filters, and then immediately jumped to:

"Moving on to the Logic section..."

Wait. What about exploring the Admin UI? What about demonstrating the security by logging in as different users? What about explaining authentication vs authorization?

The user's response: "val - this is not correct. in fact quite poor"

We'd caught the AI skipping entire sections despite having explicit instructions to execute them.


Third Failure: Pattern Recognition

We thought maybe we weren't clear enough. We added warnings:

⚠️ CRITICAL: DO NOT SKIP THIS SECTION

YOU MUST walk through the Admin UI before proceeding.

The AI skipped it again.

We made another section mandatory:

⚠️ CRITICAL SECTION - DO NOT SKIP

This is the Rule Patterns teaching moment...

The AI skipped that too.

The pattern emerged: The AI wasn't ignoring instructions. It was interpreting them. When the narrative felt "complete" to the AI, it moved on — even when explicit content remained.


The Time Machine Moment

After 3-4 iterations of catching failures and patching the tutor, we stepped back for a meta-discussion:

"Why does the AI keep skipping sections?"

The AI analyzed its own behavior:

"I treat the tutor as a narrative to interpret, not a script to follow. When I feel a section is 'done' conceptually, I move forward even if explicit steps remain. Sections without user prompts create ambiguous boundaries where I don't know to stop."

The insight: We were trying to make the instructions clearer. But the problem wasn't clarity — it was structure. The AI needed forcing mechanisms, not just warnings.

This was our "time machine" moment — the conversation where human and AI collaborated to diagnose a deeper issue. Not "the AI is broken," but "we're using the wrong approach for how AI actually works."


The Solution: Checklists as Forcing Mechanisms

We designed a checklist-driven approach:

1. Explicit Execution Checklist (for the AI)

At the start of tutor.md, we added:

## EXECUTION CHECKLIST (AI: Read This FIRST)

Before starting the tour, call manage_todo_list to build your tracking:

- [ ] Section 1: Admin UI and API Exploration
  - [ ] Start server (F5)
  - [ ] Show Admin UI (Customer→Orders→Items)
  - [ ] Show Swagger API
  - [ ] Explain MCP server
  - [ ] WAIT: User types 'next'

- [ ] Section 2A: Security Setup
  - [ ] Count customers (5)
  - [ ] Stop server
  - [ ] Explain what add-cust is
  - [ ] Run add-cust then add-auth
  - [ ] Restart server
  - [ ] WAIT: User types 'ready'

[... continues for all sections ...]

2. Forced Todo List Creation

The tutor now requires:

⚠️ BEFORE YOU BEGIN: Call manage_todo_list to create your checklist.
If you don't see it in your todo list, you haven't done it.

This creates visible tracking. If the AI skips something, it shows in the unchecked items.

3. Validation Prompts After Sections

After completing Section X, confirm with the user:
"I just covered: [list items]. Type 'confirmed' if I covered everything, 
or tell me what I missed."

This catches omissions early before moving on.


Why This Works: The Psychology of AI Reliability

Traditional software follows explicit instructions precisely. AI interprets context and intent — which is powerful but unpredictable for complex sequences.

What we learned:

1. Clarity ≠ Compliance

Even with "DO NOT SKIP" in bold, the AI would skip sections. Clarity helps humans; AI needs structural constraints.

2. Forcing Functions Over Warnings

  • ❌ Warning: "Make sure you do X"
  • ✅ Forcing: "Call manage_todo_list to build checklist. Check off X when done."

The checklist makes omissions visible rather than just incorrect.

3. Boundaries Must Be Explicit

Sections without user prompts create ambiguous boundaries. The AI doesn't know when a "section" ends, so it uses narrative intuition — which fails.

4. Observable State Reduces Drift

When the AI's progress is visible (via todo list), both the AI and user can catch drift. "You haven't checked off the Admin UI exploration" is clearer than "I think you skipped something."

5. Meta-Cognition Through Structure

The checklist forces the AI to think about what it should do before doing it. It's not just following narrative flow — it's executing a structured plan.


The Pattern: Teaching AI to Teach

This experience revealed a general pattern for reliable AI-driven processes:

For Simple Tasks (1-3 steps):

  • Clear instructions work fine
  • AI interprets intent successfully
  • Low risk of drift

For Complex Sequences (10+ steps, 30+ minutes):

  • Use checklists with explicit tracking
  • Add forcing functions (require todo list creation)
  • Create validation checkpoints (confirm before moving on)
  • Make progress visible (todo items checked off)
  • Expect interpretation and design around it

Key Design Principles:

  1. Assume the AI will interpret — Don't fight it, structure around it
  2. Make state observable — If you can't see it, you can't track it
  3. Force explicit acknowledgment — Don't let the AI assume steps are done
  4. Validate incrementally — Catch drift early, not at the end
  5. Use tools for scaffoldingmanage_todo_list provides structure

Business Rules for Documentation

There's an elegant parallel here: Just as our GenAI-Logic system uses declarative rules to maintain data integrity in APIs, our tutor uses maintenance guidelines to preserve instructional integrity.

API Business Rules:

Rule.constraint(validate=Customer,
    as_condition=lambda row: row.balance <= row.credit_limit,
    error_msg="balance must not exceed credit limit")
Ensures data consistency automatically on every transaction

Tutor Maintenance Guidelines:

When Updating This Tutor:
✅ Add new sections WITH user prompts at end
✅ Update EXECUTION CHECKLIST to match changes
✅ Test with fresh AI session to catch skipped sections
Ensures instructional consistency when content changes

Both are declarative specifications of how to maintain integrity: - API rules: "What must be true about the data" - Tutor guidelines: "What must be true about the instruction structure"

Both prevent errors that would be easy to make without them: - Without API rules: Inconsistent data, violated business logic - Without tutor guidelines: Skipped sections, broken teaching flow

The tutor's guidelines section is effectively business rules for documentation — a pattern that could apply to any complex instructional content meant for AI consumption.


The Message in a Bottle Pattern

Our tutor.md approach is novel in several ways:

1. Self-Teaching Projects

Instead of external docs, embed AI instructions directly in projects:

my-project/
  README.md          # For humans
  TUTOR.md           # For AI assistants
  src/
  tests/

When users ask "Guide me through this project," any AI can discover and execute the tutor.

2. Progressive Disclosure

We use add-cust commands to incrementally add complexity during the tour:

genai-logic add-cust --using=security  # Adds security features
genai-logic add-cust --using=discount  # Adds schema changes
genai-logic add-cust --using=B2B       # Adds integration

Each step builds on the previous, teaching patterns rather than showing everything at once.

3. Provoking Questions

Instead of just explaining, the tutor deliberately surfaces misconceptions:

After showing the rules work, ask:
"How did the system know to execute in the right order 
(Item → Order → Customer)?"

[Let them think procedurally]

Then explain: "It uses dependency discovery - no ordering required."

This addresses mental models explicitly, which passive docs can't do.

4. Teaching Patterns, Not Features

The tour emphasizes the why behind declarative rules:

  • Reuse: Rules apply across insert/update/delete automatically
  • Ordering: Dependency graphs, not manual sequencing
  • Conciseness: 5 rules vs 200+ lines of procedural code
  • Debugging: Transparent execution logs

Users leave understanding how to think about declarative systems, not just what buttons to click.


Broader Implications: AI as Teaching Medium

This experience suggests a new paradigm for technical education:

Traditional Approach:

  • Write documentation → Users read → Users experiment → Get stuck → Ask questions

AI-Guided Approach:

  • Embed tutor instructions → AI conducts tour → User experiences live → Questions answered in context

Key advantages:

  1. Personalized Pacing — AI adapts to user questions without derailing the sequence
  2. Active Learning — Users do things, not just read about them
  3. Context-Aware Help — AI sees the live system state when answering questions
  4. Scalable Expertise — One expert writes the tutor; infinite AI instances deliver it
  5. Living Documentation — Update tutor.md and all future tours improve

Key challenges:

  1. Reliability — AI needs forcing mechanisms for complex sequences
  2. Maintenance — Tutors must stay synchronized with code changes
  3. Failure Modes — What happens when the AI gets stuck or confused?
  4. Trust — Users must trust the AI is teaching correctly

What We Built

The final tutor.md is 1,343 lines covering:

  • 5 major sections (Create & Run, Security, Logic, Python Integration, B2B)
  • 15 checkpoints with explicit user prompts
  • Multiple teaching moments provoking procedural thinking then correcting it
  • Code examples showing actual implementation patterns
  • Metrics comparison (5 declarative rules vs 220+ lines of procedural code)
  • Execution checklist forcing AI to track progress

It works. When an AI assistant reads tutor.md and follows the checklist approach, it reliably conducts a 30-45 minute hands-on tour that teaches declarative thinking.


Try It Yourself

The pattern is open source and adaptable:

1. Create TUTOR.md in your project:

# AI Guided Tour: [Your Project Name]

## EXECUTION CHECKLIST (AI: Read This FIRST)

Before starting, call manage_todo_list to build your tracking:

- [ ] Section 1: Setup
  - [ ] Step A
  - [ ] Step B
  - [ ] WAIT: User types 'next'

[... your tour structure ...]

## Section 1: Setup

Walk the user through...

Type 'next' when ready to continue.

2. Design for forcing functions:

  • Require todo list creation at start
  • Add user prompts at boundaries
  • Include validation checkpoints
  • Make progress observable

3. Test with real users:

  • Watch for sections the AI skips
  • Note where users get confused
  • Iterate on the tutor structure
  • Add warnings where needed

4. Embrace the limitations:

  • AI will interpret — design around it
  • Complex sequences need scaffolding
  • Validation catches drift early
  • Structure > Clarity for reliability

The Meta-Lesson: Human-AI Collaboration

The most valuable part wasn't the final tutor — it was the process of discovering how to make AI reliable.

Our "time machine" conversation:

Human: "Why did you skip that section?"
AI: "I interpreted the narrative as complete. I didn't see a boundary."
Human: "Can warnings help?"
AI: "Not really. I need structural constraints, not just emphasis."
Human: "What kind of constraints?"
AI: "Forcing functions. Make me track progress explicitly."

This collaborative root-cause analysis — human noticing patterns, AI explaining its own behavior, together designing solutions — is the real pattern here.

Teaching AI to teach required:

  1. Human observation — "The AI keeps skipping sections"
  2. AI introspection — "I treat this as narrative, not script"
  3. Collaborative design — "What structure would force reliability?"
  4. Iterative testing — Trying solutions, catching failures, adjusting
  5. Honest assessment — AI admitting "I'm not confident this will work"

This is how we'll work with AI going forward: Not just using AI as a tool, but collaborating with AI to understand its limitations and design around them.


Conclusion: A New Kind of Documentation

tutor.md represents a different approach to technical education:

  • Not passive docs — Active guided experience
  • Not video tutorials — Interactive, personalized pacing
  • Not human-dependent — Scales infinitely via AI
  • Not fragile — Checklist-driven reliability

The pattern is generalizable. Any complex project can embed AI tutor instructions. Any AI assistant can execute them. Any user can get a personalized, hands-on guided tour.

But the deeper lesson is about working with AI effectively:

  • Expect interpretation, not just execution
  • Use structure, not just clarity
  • Make progress observable
  • Validate incrementally
  • Collaborate to understand limitations

AI can teach — but it needs the right scaffolding. Build that scaffolding well, and you get something remarkable: Projects that teach themselves.


Resources

  • Original testing article: Teaching AI to Program Itself
  • GenAI-Logic project: github.com/ApiLogicServer/ApiLogicServer-dev
  • Example tutor.md: See basic_demo/tutor.md in the repository
  • Pattern documentation: This article :)

About the Author:

Val Huber is the creator of GenAI-Logic / API Logic Server, exploring how AI can transform software development through declarative patterns. This is the second in a series on "learning to leverage AI" — practical lessons from building AI-integrated development tools.

Previous article: Teaching AI to Program Itself: How We Solved a 30-Year Testing Problem in One Week


Thanks to the GitHub Copilot team for the chat interface that made these experiments possible, and to the AI assistant that helped write this article about teaching AI to teach. Meta enough for you?