AI-Driven Teaching
Teaching AI to Teach: Building Self-Guided Tours That Actually Work
The Training Problem We've All Suffered Through
You know how technical training usually goes:
Death by PowerPoint. Someone clicks through 80 slides about your new system. You watch. You nod. You understand nothing. A week later, you've forgotten everything. Everyone loses.
We learned long ago what actually works: hands-on labs. Identify the critical skills, build exercises around them, let people do the thing with guidance available when they get stuck.
The problem? This doesn't scale. Hands-on training requires experts available when learners need them. That means: - Scheduled training sessions (inconvenient timing) - Limited capacity (max 20 people per session) - Geographic constraints (travel costs, time zones) - Expertise bottleneck (your best people spending days teaching instead of building)
We've known for years that hands-on learning works. We just couldn't make it available 24/7 to everyone who needs it.
Enter AI: Scaling Expertise
What if you could embed expert guidance directly in your project — a "message in a bottle" that any AI assistant could discover and execute?
That's what we tried with our GenAI-Logic system. It's powerful but counterintuitive — it uses declarative rules instead of procedural code, a mental model shift that doesn't stick from reading docs alone. People need hands-on experience with a knowledgeable guide.
So we created tutor.md: 1,300 lines of detailed instructions for AI assistants to conduct 30-45 minute hands-on guided tours. The AI would be the tour guide, walking users through the live system interactively, answering questions, helping when things go wrong.
The vision: Hands-on training that scales infinitely. Available anytime, anywhere. Expert guidance embedded in the project itself.
What happened: It failed spectacularly.
Here's what we learned about making AI reliable — and the pattern that emerged for teaching AI to teach.
First Failure: The Interface Problem
Our initial approach was straightforward:
Show them the Admin UI. Press Enter when they've explored it.
Now explain row-level security. Wait for them to say 'ok'.
Change the item quantity to 100. [Wait for observation]
What we expected: Users would press Enter or type 'ok' to advance through the tour.
What happened: Nothing. The AI just... stopped.
The lesson: We'd assumed all chat interfaces work like traditional terminals where pressing Enter sends a signal. In modern chat UIs (like GitHub Copilot), pressing Enter just creates a newline. The AI was literally waiting for input that would never come.
The fix: Explicit typed responses:
Type 'next' when you've explored the Admin UI.
Type 'ready' when you've observed the customer count.
Type 'continue' when you understand the security filters.
Simple enough. We fixed all 10 transition points. Problem solved, right?
Second Failure: The Disappearing Act
We restarted the tour to test our fixes. The AI began confidently:
"Let's start by launching the server and exploring the Admin UI..."
It walked through the initial setup, explained the commands, showed the UI. Then it said:
"Now let's look at row-level security. I'll show you the security code..."
It opened declare_security.py, displayed the filters, and then immediately jumped to:
"Moving on to the Logic section..."
Wait. What about exploring the Admin UI? What about demonstrating the security by logging in as different users? What about explaining authentication vs authorization?
The user's response: "val - this is not correct. in fact quite poor"
We'd caught the AI skipping entire sections despite having explicit instructions to execute them.
Third Failure: Pattern Recognition
We thought maybe we weren't clear enough. We added warnings:
The AI skipped it again.
We made another section mandatory:
The AI skipped that too.
The pattern emerged: The AI wasn't ignoring instructions. It was interpreting them. When the narrative felt "complete" to the AI, it moved on — even when explicit content remained.
The Time Machine Moment
After 3-4 iterations of catching failures and patching the tutor, we stepped back for a meta-discussion:
"Why does the AI keep skipping sections?"
The AI analyzed its own behavior:
"I treat the tutor as a narrative to interpret, not a script to follow. When I feel a section is 'done' conceptually, I move forward even if explicit steps remain. Sections without user prompts create ambiguous boundaries where I don't know to stop."
The insight: We were trying to make the instructions clearer. But the problem wasn't clarity — it was structure. The AI needed forcing mechanisms, not just warnings.
This was our "time machine" moment — the conversation where human and AI collaborated to diagnose a deeper issue. Not "the AI is broken," but "we're using the wrong approach for how AI actually works."
The Solution: Checklists as Forcing Mechanisms
We designed a checklist-driven approach:
1. Explicit Execution Checklist (for the AI)
At the start of tutor.md, we added:
## EXECUTION CHECKLIST (AI: Read This FIRST)
Before starting the tour, call manage_todo_list to build your tracking:
- [ ] Section 1: Admin UI and API Exploration
- [ ] Start server (F5)
- [ ] Show Admin UI (Customer→Orders→Items)
- [ ] Show Swagger API
- [ ] Explain MCP server
- [ ] WAIT: User types 'next'
- [ ] Section 2A: Security Setup
- [ ] Count customers (5)
- [ ] Stop server
- [ ] Explain what add-cust is
- [ ] Run add-cust then add-auth
- [ ] Restart server
- [ ] WAIT: User types 'ready'
[... continues for all sections ...]
2. Forced Todo List Creation
The tutor now requires:
⚠️ BEFORE YOU BEGIN: Call manage_todo_list to create your checklist.
If you don't see it in your todo list, you haven't done it.
This creates visible tracking. If the AI skips something, it shows in the unchecked items.
3. Validation Prompts After Sections
After completing Section X, confirm with the user:
"I just covered: [list items]. Type 'confirmed' if I covered everything,
or tell me what I missed."
This catches omissions early before moving on.
Why This Works: The Psychology of AI Reliability
Traditional software follows explicit instructions precisely. AI interprets context and intent — which is powerful but unpredictable for complex sequences.
What we learned:
1. Clarity ≠ Compliance
Even with "DO NOT SKIP" in bold, the AI would skip sections. Clarity helps humans; AI needs structural constraints.
2. Forcing Functions Over Warnings
- ❌ Warning: "Make sure you do X"
- ✅ Forcing: "Call manage_todo_list to build checklist. Check off X when done."
The checklist makes omissions visible rather than just incorrect.
3. Boundaries Must Be Explicit
Sections without user prompts create ambiguous boundaries. The AI doesn't know when a "section" ends, so it uses narrative intuition — which fails.
4. Observable State Reduces Drift
When the AI's progress is visible (via todo list), both the AI and user can catch drift. "You haven't checked off the Admin UI exploration" is clearer than "I think you skipped something."
5. Meta-Cognition Through Structure
The checklist forces the AI to think about what it should do before doing it. It's not just following narrative flow — it's executing a structured plan.
The Pattern: Teaching AI to Teach
This experience revealed a general pattern for reliable AI-driven processes:
For Simple Tasks (1-3 steps):
- Clear instructions work fine
- AI interprets intent successfully
- Low risk of drift
For Complex Sequences (10+ steps, 30+ minutes):
- Use checklists with explicit tracking
- Add forcing functions (require todo list creation)
- Create validation checkpoints (confirm before moving on)
- Make progress visible (todo items checked off)
- Expect interpretation and design around it
Key Design Principles:
- Assume the AI will interpret — Don't fight it, structure around it
- Make state observable — If you can't see it, you can't track it
- Force explicit acknowledgment — Don't let the AI assume steps are done
- Validate incrementally — Catch drift early, not at the end
- Use tools for scaffolding —
manage_todo_listprovides structure
Business Rules for Documentation
There's an elegant parallel here: Just as our GenAI-Logic system uses declarative rules to maintain data integrity in APIs, our tutor uses maintenance guidelines to preserve instructional integrity.
API Business Rules:
Rule.constraint(validate=Customer,
as_condition=lambda row: row.balance <= row.credit_limit,
error_msg="balance must not exceed credit limit")
Tutor Maintenance Guidelines:
When Updating This Tutor:
✅ Add new sections WITH user prompts at end
✅ Update EXECUTION CHECKLIST to match changes
✅ Test with fresh AI session to catch skipped sections
Both are declarative specifications of how to maintain integrity: - API rules: "What must be true about the data" - Tutor guidelines: "What must be true about the instruction structure"
Both prevent errors that would be easy to make without them: - Without API rules: Inconsistent data, violated business logic - Without tutor guidelines: Skipped sections, broken teaching flow
The tutor's guidelines section is effectively business rules for documentation — a pattern that could apply to any complex instructional content meant for AI consumption.
The Message in a Bottle Pattern
Our tutor.md approach is novel in several ways:
1. Self-Teaching Projects
Instead of external docs, embed AI instructions directly in projects:
When users ask "Guide me through this project," any AI can discover and execute the tutor.
2. Progressive Disclosure
We use add-cust commands to incrementally add complexity during the tour:
genai-logic add-cust --using=security # Adds security features
genai-logic add-cust --using=discount # Adds schema changes
genai-logic add-cust --using=B2B # Adds integration
Each step builds on the previous, teaching patterns rather than showing everything at once.
3. Provoking Questions
Instead of just explaining, the tutor deliberately surfaces misconceptions:
After showing the rules work, ask:
"How did the system know to execute in the right order
(Item → Order → Customer)?"
[Let them think procedurally]
Then explain: "It uses dependency discovery - no ordering required."
This addresses mental models explicitly, which passive docs can't do.
4. Teaching Patterns, Not Features
The tour emphasizes the why behind declarative rules:
- Reuse: Rules apply across insert/update/delete automatically
- Ordering: Dependency graphs, not manual sequencing
- Conciseness: 5 rules vs 200+ lines of procedural code
- Debugging: Transparent execution logs
Users leave understanding how to think about declarative systems, not just what buttons to click.
Broader Implications: AI as Teaching Medium
This experience suggests a new paradigm for technical education:
Traditional Approach:
- Write documentation → Users read → Users experiment → Get stuck → Ask questions
AI-Guided Approach:
- Embed tutor instructions → AI conducts tour → User experiences live → Questions answered in context
Key advantages:
- Personalized Pacing — AI adapts to user questions without derailing the sequence
- Active Learning — Users do things, not just read about them
- Context-Aware Help — AI sees the live system state when answering questions
- Scalable Expertise — One expert writes the tutor; infinite AI instances deliver it
- Living Documentation — Update
tutor.mdand all future tours improve
Key challenges:
- Reliability — AI needs forcing mechanisms for complex sequences
- Maintenance — Tutors must stay synchronized with code changes
- Failure Modes — What happens when the AI gets stuck or confused?
- Trust — Users must trust the AI is teaching correctly
What We Built
The final tutor.md is 1,343 lines covering:
- 5 major sections (Create & Run, Security, Logic, Python Integration, B2B)
- 15 checkpoints with explicit user prompts
- Multiple teaching moments provoking procedural thinking then correcting it
- Code examples showing actual implementation patterns
- Metrics comparison (5 declarative rules vs 220+ lines of procedural code)
- Execution checklist forcing AI to track progress
It works. When an AI assistant reads tutor.md and follows the checklist approach, it reliably conducts a 30-45 minute hands-on tour that teaches declarative thinking.
Try It Yourself
The pattern is open source and adaptable:
1. Create TUTOR.md in your project:
# AI Guided Tour: [Your Project Name]
## EXECUTION CHECKLIST (AI: Read This FIRST)
Before starting, call manage_todo_list to build your tracking:
- [ ] Section 1: Setup
- [ ] Step A
- [ ] Step B
- [ ] WAIT: User types 'next'
[... your tour structure ...]
## Section 1: Setup
Walk the user through...
Type 'next' when ready to continue.
2. Design for forcing functions:
- Require todo list creation at start
- Add user prompts at boundaries
- Include validation checkpoints
- Make progress observable
3. Test with real users:
- Watch for sections the AI skips
- Note where users get confused
- Iterate on the tutor structure
- Add warnings where needed
4. Embrace the limitations:
- AI will interpret — design around it
- Complex sequences need scaffolding
- Validation catches drift early
- Structure > Clarity for reliability
The Meta-Lesson: Human-AI Collaboration
The most valuable part wasn't the final tutor — it was the process of discovering how to make AI reliable.
Our "time machine" conversation:
Human: "Why did you skip that section?"
AI: "I interpreted the narrative as complete. I didn't see a boundary."
Human: "Can warnings help?"
AI: "Not really. I need structural constraints, not just emphasis."
Human: "What kind of constraints?"
AI: "Forcing functions. Make me track progress explicitly."
This collaborative root-cause analysis — human noticing patterns, AI explaining its own behavior, together designing solutions — is the real pattern here.
Teaching AI to teach required:
- Human observation — "The AI keeps skipping sections"
- AI introspection — "I treat this as narrative, not script"
- Collaborative design — "What structure would force reliability?"
- Iterative testing — Trying solutions, catching failures, adjusting
- Honest assessment — AI admitting "I'm not confident this will work"
This is how we'll work with AI going forward: Not just using AI as a tool, but collaborating with AI to understand its limitations and design around them.
Conclusion: A New Kind of Documentation
tutor.md represents a different approach to technical education:
- Not passive docs — Active guided experience
- Not video tutorials — Interactive, personalized pacing
- Not human-dependent — Scales infinitely via AI
- Not fragile — Checklist-driven reliability
The pattern is generalizable. Any complex project can embed AI tutor instructions. Any AI assistant can execute them. Any user can get a personalized, hands-on guided tour.
But the deeper lesson is about working with AI effectively:
- Expect interpretation, not just execution
- Use structure, not just clarity
- Make progress observable
- Validate incrementally
- Collaborate to understand limitations
AI can teach — but it needs the right scaffolding. Build that scaffolding well, and you get something remarkable: Projects that teach themselves.
Resources
- Original testing article: Teaching AI to Program Itself
- GenAI-Logic project: github.com/ApiLogicServer/ApiLogicServer-dev
- Example tutor.md: See
basic_demo/tutor.mdin the repository - Pattern documentation: This article :)
About the Author:
Val Huber is the creator of GenAI-Logic / API Logic Server, exploring how AI can transform software development through declarative patterns. This is the second in a series on "learning to leverage AI" — practical lessons from building AI-integrated development tools.
Previous article: Teaching AI to Program Itself: How We Solved a 30-Year Testing Problem in One Week
Thanks to the GitHub Copilot team for the chat interface that made these experiments possible, and to the AI assistant that helped write this article about teaching AI to teach. Meta enough for you?