The 90-Day Plan: Introducing AI Testing in Your Team
An honest roadmap for test managers who want to introduce AI-powered testing — with ISTQB concepts, realistic expectations, and the mistakes I've made.
Andi
Test Manager
Before we start, an honest number: according to an MIT study on the “GenAI Divide”, 95% of all GenAI pilot projects deliver no measurable business value. Not because the technology is bad — but because teams start without a clear plan, want too much at once, and give up too early.
This article is my attempt to do better. Not a polished framework, but a roadmap based on real experience — my own and from the industry. I deliberately use concepts from the ISTQB CT-AI syllabus (Certified Tester AI Testing), because they provide a shared language that goes beyond tool hype.
Before You Start: Two Things You Need to Understand
1. AI Amplifies What’s Already There
This is the most important insight across all experience reports: AI amplifies existing processes — it doesn’t fix broken ones. A team with solid testing fundamentals becomes more efficient with AI. A team with weak processes gets weak tests faster. The ISTQB calls this the principle of context-dependent testing — what works in one context doesn’t work in another.
So ask yourself first: Are your testing processes stable enough that you want to accelerate them? Or do you need to fix them first?
2. The Trust Curve
Teams go through four predictable phases when adopting AI:
- Enthusiasm — “This will change everything!”
- Experimentation — Quick wins seem possible
- Disillusionment — AI produces nonsense, maintenance effort grows
- Selective use — AI for specific, well-defined tasks
If you know this curve, phase 3 won’t surprise you. Most failed adoptions break off right there — because nobody anticipated the disillusionment.
Phase 1: Understand (Week 1–2)
Goal: Figure out where AI actually makes sense in your context — and where it doesn’t
What You Need
- A small core team (2–3 people: test manager, senior tester, a developer)
- Management backing
- Honesty about your current maturity level
Weakness Analysis
Analyse your current testing process. The ISTQB distinguishes between testing of AI systems (CT-AI) and testing with AI tools (CT-GenAI). For this plan, we’re talking about the latter: where can AI tools support your existing work?
| Process Step | Typical Problem | AI Potential | Realistic Benefit |
|---|---|---|---|
| Test case creation | High manual effort | Generation from requirements | Good for boilerplate, weak on domain logic |
| Test data | Anonymisation is tedious | Synthetic data generation | One of the strongest use cases (+80% adoption in one year) |
| Selector maintenance | Brittle XPath/CSS locators | Self-healing locators | Works well, saves real time |
| Defect analysis | Triage takes too long | Automatic categorisation | Helpful as first assessment, not as replacement |
| Documentation | Retroactive, incomplete | Live documentation | Useful, but review needed |
Stakeholder Mapping
A pattern I keep seeing: management and frontline teams have completely different expectations. According to the World Quality Report 2025/26, 39% of leaders say AI has “revolutionised” their processes — but only 19% of engineers agree. This gap is a reliable early indicator of trouble.
So deliberately identify:
- Champions — Who’s curious and willing to experiment?
- Experienced sceptics — Who has legitimate concerns? (Take these people seriously. Engineers with 10+ years of experience tend to be the most cautious — and they’re often right.)
- Blockers — Who might actively work against the project?
By the End of Phase 1, You Have:
- 1–2 concrete use cases to test (no more!)
- An honest assessment: are our fundamentals solid enough?
- A stakeholder map with communication plan
Phase 2: Experiment (Week 3–6)
Goal: Prove with a single use case whether AI works in your context
The Assertion Problem
Before you start, a warning: the most dangerous anti-pattern in AI-generated testing is what I call the assertion problem. AI generates tests that run without errors but validate nothing meaningful. Tests turn green, bugs ship to production, and you’ve created false confidence. The ISTQB describes this as automation bias — the tendency to over-trust AI-generated results.
That’s why the most important rule during the pilot is: every AI-generated test is reviewed by a human. According to ContextQA, 76% of companies using AI in testing have established human-in-the-loop processes — not because they don’t trust the technology, but because they’ve learned from experience.
Choose One Use Case
Pick a single use case for the pilot. Ideal criteria:
- Not business-critical (no damage if errors occur)
- Repeatable (at least 10 similar cases)
- Measurable (quantifiable in time)
- No complex domain knowledge required
From my experience, these use cases work best for getting started:
- Test data generation — structurally valid, semantically unusual inputs
- Boilerplate code — setup/teardown, parameterised variations
- Test documentation — generating descriptions from existing tests
- Root cause analysis — analysing stack traces, narrowing down failure causes
Less suitable for getting started: E2E test generation (too complex, too much domain knowledge required).
Pair Testing with AI
Establish the working model “tester + AI”. Not “AI tests autonomously”, but an iterative dialogue:
What to Measure
Don’t just measure speed. The ISTQB defines various ML performance metrics in the CT-AI syllabus — for our scenario, I translate that into pragmatic metrics:
| What You Measure | Why |
|---|---|
| Time per test case (before/after) | Efficiency — but watch for hidden review burden |
| Proportion of meaningful assertions | Quality — the most important value |
| Where the AI failed | Learning — document every failure |
| Review effort per AI output | Honesty — if review costs 80% of manual time, the gain is only 20% |
Important: Don’t invent target values. Measure what actually happens. If the efficiency gain is 10% rather than the hoped-for 50%, that’s an honest insight — not a failure.
Anti-Patterns in This Phase
- “Let’s test everything with AI” — The pilot must stay focused. One use case, not five.
- Happy path overfitting — AI focuses on obvious scenarios. Edge cases, error states, and race conditions go untested — exactly where the real bugs live.
- No manual validation — Every AI output gets reviewed. No exceptions.
By the End of Phase 2, You Have:
- An honestly measured efficiency gain (or the insight that there isn’t one)
- Documented lessons learned — especially: what did NOT work
- A go/no-go decision for phase 3
Phase 3: Expand (Week 7–10)
Goal: Extend the pilot to additional areas — if it worked
Knowledge Transfer
The pilot team trains the wider group. Three workshops have proven effective:
- Fundamentals & tool handling (4h) — What can the tool do, what can’t it? Expectation management matters more than feature demos.
- Prompt engineering for testers (4h) — Providing context, formulating constraints, iteratively improving results. The ISTQB CT-GenAI syllabus treats prompt engineering as its own chapter — rightly so, it’s a new core competency.
- Review & quality assurance (4h) — How do you recognise whether an AI-generated test is meaningful? How do you spot missing or weak assertions?
Prioritise Additional Use Cases
Rank potential use cases by risk and benefit. The ISTQB principle of risk-based testing applies here too:
| Use Case | Risk | Benefit | Recommendation |
|---|---|---|---|
| Test data generation | Low | High | Start immediately |
| Unit test scaffolding | Low | Medium | Good second step |
| API test generation | Medium | High | With review process |
| E2E test generation | High | High | Only after experience |
| Visual regression testing | Medium | Medium | If use case exists |
Define Governance
At this point, you need clear rules:
- Who may commit AI-generated tests without review? (Answer: nobody)
- What data may be sent to external AI services? (Data residency, compliance)
- How are AI-assisted tests marked in test documentation?
By the End of Phase 3, You Have:
- At least 50% of the team familiar with the tool
- 2–3 additional use cases in progress
- Governance rules documented
Phase 4: Anchor (Week 11–12)
Goal: Integrate AI testing into the regular test process
Integration into Test Levels
The ISTQB defines classic test levels — AI support can be built into each, but with varying maturity:
| Test Level | AI Application | Maturity | Note |
|---|---|---|---|
| Unit test | Test scaffolding, boilerplate | High | Well-researched, many tools |
| Integration test | API documentation from code | Medium | Works for standard APIs |
| System test | Test case generation from user stories | Low | Needs significant domain knowledge |
| Acceptance test | Documentation, protocol generation | Medium | Supportive, not replacing |
Process Adjustment
Update your test strategy:
- Where is AI generation used — and where deliberately not?
- What does the review process for AI output look like?
- Which metrics are you tracking for AI support?
- How do you handle concept drift — the ISTQB term for when the underlying data and requirements change over time?
By the End of Phase 4, You Have:
- AI testing documented in the test strategy
- An established review process
- CI/CD integration where it makes sense
Phase 5: Learn (From Month 4)
Goal: Continuously improve and stay honest
Establish Metrics
Define KPIs that actually tell you something:
| KPI | What It Tells You | Warning |
|---|---|---|
| Time savings per test cycle | Efficiency gain | Without review effort, the number is misleading |
| Proportion of meaningful AI assertions | Output quality | The most important KPI |
| Defect detection rate before/after | Whether you’re actually testing better | Needs 3–6 months of data |
| Team adoption rate | Whether it’s being accepted | Low = signal, not problem |
Long-Term Perspective
Realistic timelines from the industry: most teams see first noticeable improvements after 6–12 months. Full value often shows after 18–24 months. The 90-day plan gets you through the most critical phase — the beginning. But don’t expect transformation in three months.
Topics for month 4–12:
- Build an internal prompt library (knowledge management)
- Learn about metamorphic testing — an ISTQB test technique that’s especially valuable when no clear test oracle exists
- Gain first experience with agentic testing (autonomous AI agents that execute tests independently) — but with cost awareness: a single agentic test run can consume thousands of API tokens. Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027 due to uncontrollable costs
What Works — and What Doesn’t
Proven
- Start small — A successful pilot convinces more than any presentation
- Involve experienced sceptics — They find the weaknesses before it gets expensive
- Always review — Human-in-the-loop isn’t a weakness, it’s quality assurance
- Measure honestly — Even a “mere” 15% efficiency gain is a success if it’s sustainable
Failed
- Big bang — Trying to replace all test automation at once
- Tool first — Buying tools before the use case is clear
- Without governance — Letting AI output into the pipeline without review
- Ignoring fears — When the team has concerns, they’re usually justified
- Unrealistic expectations — “50% faster in 30 days” is marketing, not reality. The Stack Overflow Developer Survey 2025 shows: only 33% of developers trust AI output, 45% say debugging AI-generated code takes more time than writing it yourself
Summary
| Phase | Week | Focus | How You Know It’s Working |
|---|---|---|---|
| Understand | 1–2 | Use case analysis, stakeholders | 1–2 concrete, bounded use cases |
| Experiment | 3–6 | Pilot with one use case | Honestly measured efficiency gain |
| Expand | 7–10 | Knowledge transfer, more use cases | Team uses tool independently |
| Anchor | 11–12 | Process integration, governance | AI testing documented in strategy |
| Learn | 13+ | Metrics, continuous improvement | Data-driven decisions |
This plan is no guarantee — but it gives you a framework to avoid the typical mistakes. The most important lesson from my own experience: start, start small, and be honest about what works and what doesn’t.
Got questions or your own experiences? I’d love to hear from you at hello@quality-booster.com.
Sources & Further Reading
- MIT: The GenAI Divide — Study on the gap between AI adoption and measurable business value (52 executive interviews, 153 leader surveys, 300+ public deployments)
- Capgemini World Quality Report 2025/26 — Annual industry report on software quality and testing trends
- Stack Overflow Developer Survey 2025 — AI section: trust, usage, and frustration with AI tools among 65,000+ developers
- ISTQB CT-AI Syllabus — AI Testing curriculum with 11 chapters, from ML fundamentals to AI-powered test methods
- ISTQB CT-GenAI — New module (2025) specifically for using generative AI in everyday testing
- ContextQA: The Perfection Trap — Why QA teams abandon AI adoption when it doesn’t work perfectly right away
- Gartner via Digital Watch: Agentic AI Project Cancellations — Prediction: 40%+ of agentic AI projects will be cancelled by 2027
- QAble: Is AI Really Helping Testing? — Analysis of the gap between AI promises and testing reality
Andi
Test Manager