Quality Booster

The 90-Day Plan: Introducing AI Testing in Your Team

An honest roadmap for test managers who want to introduce AI-powered testing — with ISTQB concepts, realistic expectations, and the mistakes I've made.

Andi

Andi

Test Manager

Before we start, an honest number: according to an MIT study on the “GenAI Divide”, 95% of all GenAI pilot projects deliver no measurable business value. Not because the technology is bad — but because teams start without a clear plan, want too much at once, and give up too early.

This article is my attempt to do better. Not a polished framework, but a roadmap based on real experience — my own and from the industry. I deliberately use concepts from the ISTQB CT-AI syllabus (Certified Tester AI Testing), because they provide a shared language that goes beyond tool hype.

Understand
Week 1–2
🧪 Experiment
Week 3–6
📈 Expand
Week 7–10
🔄 Anchor
Week 11–12
🚀 Learn
Month 4+
From analysis to continuous learning

Before You Start: Two Things You Need to Understand

1. AI Amplifies What’s Already There

This is the most important insight across all experience reports: AI amplifies existing processes — it doesn’t fix broken ones. A team with solid testing fundamentals becomes more efficient with AI. A team with weak processes gets weak tests faster. The ISTQB calls this the principle of context-dependent testing — what works in one context doesn’t work in another.

So ask yourself first: Are your testing processes stable enough that you want to accelerate them? Or do you need to fix them first?

2. The Trust Curve

Teams go through four predictable phases when adopting AI:

  1. Enthusiasm — “This will change everything!”
  2. Experimentation — Quick wins seem possible
  3. Disillusionment — AI produces nonsense, maintenance effort grows
  4. Selective use — AI for specific, well-defined tasks

If you know this curve, phase 3 won’t surprise you. Most failed adoptions break off right there — because nobody anticipated the disillusionment.

Phase 1 Enthusiasm
"This will change everything!"
Phase 2 Experimentation
Quick wins seem possible
Phase 3 Disillusionment
Most teams give up here
Phase 4 Selective Use
AI for targeted tasks
The Trust Curve: If you know phase 3 is coming, it won't surprise you

Phase 1: Understand (Week 1–2)

Goal: Figure out where AI actually makes sense in your context — and where it doesn’t

What You Need

  • A small core team (2–3 people: test manager, senior tester, a developer)
  • Management backing
  • Honesty about your current maturity level

Weakness Analysis

Analyse your current testing process. The ISTQB distinguishes between testing of AI systems (CT-AI) and testing with AI tools (CT-GenAI). For this plan, we’re talking about the latter: where can AI tools support your existing work?

Process StepTypical ProblemAI PotentialRealistic Benefit
Test case creationHigh manual effortGeneration from requirementsGood for boilerplate, weak on domain logic
Test dataAnonymisation is tediousSynthetic data generationOne of the strongest use cases (+80% adoption in one year)
Selector maintenanceBrittle XPath/CSS locatorsSelf-healing locatorsWorks well, saves real time
Defect analysisTriage takes too longAutomatic categorisationHelpful as first assessment, not as replacement
DocumentationRetroactive, incompleteLive documentationUseful, but review needed

Stakeholder Mapping

A pattern I keep seeing: management and frontline teams have completely different expectations. According to the World Quality Report 2025/26, 39% of leaders say AI has “revolutionised” their processes — but only 19% of engineers agree. This gap is a reliable early indicator of trouble.

So deliberately identify:

  • Champions — Who’s curious and willing to experiment?
  • Experienced sceptics — Who has legitimate concerns? (Take these people seriously. Engineers with 10+ years of experience tend to be the most cautious — and they’re often right.)
  • Blockers — Who might actively work against the project?

By the End of Phase 1, You Have:

  • 1–2 concrete use cases to test (no more!)
  • An honest assessment: are our fundamentals solid enough?
  • A stakeholder map with communication plan

Phase 2: Experiment (Week 3–6)

Goal: Prove with a single use case whether AI works in your context

The Assertion Problem

Before you start, a warning: the most dangerous anti-pattern in AI-generated testing is what I call the assertion problem. AI generates tests that run without errors but validate nothing meaningful. Tests turn green, bugs ship to production, and you’ve created false confidence. The ISTQB describes this as automation bias — the tendency to over-trust AI-generated results.

That’s why the most important rule during the pilot is: every AI-generated test is reviewed by a human. According to ContextQA, 76% of companies using AI in testing have established human-in-the-loop processes — not because they don’t trust the technology, but because they’ve learned from experience.

Choose One Use Case

Pick a single use case for the pilot. Ideal criteria:

  • Not business-critical (no damage if errors occur)
  • Repeatable (at least 10 similar cases)
  • Measurable (quantifiable in time)
  • No complex domain knowledge required

From my experience, these use cases work best for getting started:

  • Test data generation — structurally valid, semantically unusual inputs
  • Boilerplate code — setup/teardown, parameterised variations
  • Test documentation — generating descriptions from existing tests
  • Root cause analysis — analysing stack traces, narrowing down failure causes

Less suitable for getting started: E2E test generation (too complex, too much domain knowledge required).

Pair Testing with AI

Establish the working model “tester + AI”. Not “AI tests autonomously”, but an iterative dialogue:

👤
Tester
1. Prompt with Context Send test case description + context
2. Suggestion AI generates test steps, data, code
3. Review & Adjustment Review, correct, optimize
4. Final Output Revised, final test case
5. Team Validation Approval, feedback, learning
🤖
AI Tool
The "Pair Testing" Model: Tester and AI work iteratively together – human expertise + AI efficiency

What to Measure

Don’t just measure speed. The ISTQB defines various ML performance metrics in the CT-AI syllabus — for our scenario, I translate that into pragmatic metrics:

What You MeasureWhy
Time per test case (before/after)Efficiency — but watch for hidden review burden
Proportion of meaningful assertionsQuality — the most important value
Where the AI failedLearning — document every failure
Review effort per AI outputHonesty — if review costs 80% of manual time, the gain is only 20%

Important: Don’t invent target values. Measure what actually happens. If the efficiency gain is 10% rather than the hoped-for 50%, that’s an honest insight — not a failure.

Anti-Patterns in This Phase

  • “Let’s test everything with AI” — The pilot must stay focused. One use case, not five.
  • Happy path overfitting — AI focuses on obvious scenarios. Edge cases, error states, and race conditions go untested — exactly where the real bugs live.
  • No manual validation — Every AI output gets reviewed. No exceptions.

By the End of Phase 2, You Have:

  • An honestly measured efficiency gain (or the insight that there isn’t one)
  • Documented lessons learned — especially: what did NOT work
  • A go/no-go decision for phase 3

Phase 3: Expand (Week 7–10)

Goal: Extend the pilot to additional areas — if it worked

Knowledge Transfer

The pilot team trains the wider group. Three workshops have proven effective:

  1. Fundamentals & tool handling (4h) — What can the tool do, what can’t it? Expectation management matters more than feature demos.
  2. Prompt engineering for testers (4h) — Providing context, formulating constraints, iteratively improving results. The ISTQB CT-GenAI syllabus treats prompt engineering as its own chapter — rightly so, it’s a new core competency.
  3. Review & quality assurance (4h) — How do you recognise whether an AI-generated test is meaningful? How do you spot missing or weak assertions?

Prioritise Additional Use Cases

Rank potential use cases by risk and benefit. The ISTQB principle of risk-based testing applies here too:

Use CaseRiskBenefitRecommendation
Test data generationLowHighStart immediately
Unit test scaffoldingLowMediumGood second step
API test generationMediumHighWith review process
E2E test generationHighHighOnly after experience
Visual regression testingMediumMediumIf use case exists

Define Governance

At this point, you need clear rules:

  • Who may commit AI-generated tests without review? (Answer: nobody)
  • What data may be sent to external AI services? (Data residency, compliance)
  • How are AI-assisted tests marked in test documentation?

By the End of Phase 3, You Have:

  • At least 50% of the team familiar with the tool
  • 2–3 additional use cases in progress
  • Governance rules documented

Phase 4: Anchor (Week 11–12)

Goal: Integrate AI testing into the regular test process

Integration into Test Levels

The ISTQB defines classic test levels — AI support can be built into each, but with varying maturity:

Test LevelAI ApplicationMaturityNote
Unit testTest scaffolding, boilerplateHighWell-researched, many tools
Integration testAPI documentation from codeMediumWorks for standard APIs
System testTest case generation from user storiesLowNeeds significant domain knowledge
Acceptance testDocumentation, protocol generationMediumSupportive, not replacing

Process Adjustment

Update your test strategy:

  • Where is AI generation used — and where deliberately not?
  • What does the review process for AI output look like?
  • Which metrics are you tracking for AI support?
  • How do you handle concept drift — the ISTQB term for when the underlying data and requirements change over time?

By the End of Phase 4, You Have:

  • AI testing documented in the test strategy
  • An established review process
  • CI/CD integration where it makes sense

Phase 5: Learn (From Month 4)

Goal: Continuously improve and stay honest

Establish Metrics

Define KPIs that actually tell you something:

KPIWhat It Tells YouWarning
Time savings per test cycleEfficiency gainWithout review effort, the number is misleading
Proportion of meaningful AI assertionsOutput qualityThe most important KPI
Defect detection rate before/afterWhether you’re actually testing betterNeeds 3–6 months of data
Team adoption rateWhether it’s being acceptedLow = signal, not problem

Long-Term Perspective

Realistic timelines from the industry: most teams see first noticeable improvements after 6–12 months. Full value often shows after 18–24 months. The 90-day plan gets you through the most critical phase — the beginning. But don’t expect transformation in three months.

Topics for month 4–12:

  • Build an internal prompt library (knowledge management)
  • Learn about metamorphic testing — an ISTQB test technique that’s especially valuable when no clear test oracle exists
  • Gain first experience with agentic testing (autonomous AI agents that execute tests independently) — but with cost awareness: a single agentic test run can consume thousands of API tokens. Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027 due to uncontrollable costs

What Works — and What Doesn’t

Proven

  • Start small — A successful pilot convinces more than any presentation
  • Involve experienced sceptics — They find the weaknesses before it gets expensive
  • Always review — Human-in-the-loop isn’t a weakness, it’s quality assurance
  • Measure honestly — Even a “mere” 15% efficiency gain is a success if it’s sustainable

Failed

  • Big bang — Trying to replace all test automation at once
  • Tool first — Buying tools before the use case is clear
  • Without governance — Letting AI output into the pipeline without review
  • Ignoring fears — When the team has concerns, they’re usually justified
  • Unrealistic expectations — “50% faster in 30 days” is marketing, not reality. The Stack Overflow Developer Survey 2025 shows: only 33% of developers trust AI output, 45% say debugging AI-generated code takes more time than writing it yourself

Summary

PhaseWeekFocusHow You Know It’s Working
Understand1–2Use case analysis, stakeholders1–2 concrete, bounded use cases
Experiment3–6Pilot with one use caseHonestly measured efficiency gain
Expand7–10Knowledge transfer, more use casesTeam uses tool independently
Anchor11–12Process integration, governanceAI testing documented in strategy
Learn13+Metrics, continuous improvementData-driven decisions

This plan is no guarantee — but it gives you a framework to avoid the typical mistakes. The most important lesson from my own experience: start, start small, and be honest about what works and what doesn’t.

Got questions or your own experiences? I’d love to hear from you at hello@quality-booster.com.


Sources & Further Reading

Andi

Andi

Test Manager