Quality Booster

Self-Healing Tests: Hype or Help?

Automatic selector repair promises less test maintenance. I look at what actually works — from DOM comparison to LLM semantics to the question of whether you can just avoid the problem entirely.

Andi

Andi

Test Manager

Why do tests keep breaking when nothing changed in the logic? A button was renamed, a CSS class changed, a div swapped for a section element — and suddenly 30 tests are red. That costs time, nerves, and trust in automation. According to Capgemini’s World Quality Report, test maintenance eats up 25–35% of total automation effort.

Self-healing tests promise to solve exactly this: tests that repair themselves when a selector breaks. Sounds appealing. But how well does it actually work?

I looked at three approaches — from classic DOM comparison to AI-based semantics to the question of whether you can simply avoid the problem in the first place.


What Self-Healing Actually Does

The basic idea is simple: when a locator can’t find an element, the tool tries to identify the correct element through alternative means — instead of failing the test immediately. What differs is the intelligence of these alternative strategies. And that has evolved significantly in recent years.


Generation 1: DOM Comparison

The first generation of self-healing analyses DOM structure. When a locator breaks, the tool compares the current DOM against a stored snapshot and looks for the most similar element.

Healenium is the best-known open-source example. It wraps Selenium, stores DOM snapshots in PostgreSQL, and uses tree-matching algorithms to find the most likely element when a locator fails. If the similarity score exceeds a configurable threshold (default: 70%), the test continues.

Katalon Studio offers a “Self-Healing Mode” that cycles through alternative strategies (XPath, CSS, attributes, image) when a locator fails.

What It Delivers

A study from the University of Genoa (Leotta et al., 2021) shows: DOM-based self-healing repairs 60–70% of locator failures for minor UI changes, but only 15–25% for major structural changes. In practice, teams report saving a few hours per week on large test suites.

Where It Fails

  • “Submit” becomes “Confirm” — different string, DOM comparison finds nothing
  • Complete redesign — new page structure, self-healing is powerless
  • Logic changes — the checkout now has 4 steps instead of 3? Not a locator problem, no healing
  • False healing — the test “heals” to the wrong element, turns green, but no longer tests what it should. This is the most dangerous risk.

Generation 2: LLM-Based Semantic Healing

This is where it gets interesting. Instead of repairing selectors, the AI understands the intent of the test step. An LLM doesn’t just see the DOM tree — it understands that “Submit” and “Confirm” mean the same thing. Or that “Warenkorb” and “Einkaufswagen” are semantically identical in German — even though no fuzzy matching in the world would catch that.

How It Works Technically

sequenceDiagram
    participant Test
    participant Playwright
    participant LLM
    
    Test->>Playwright: click('[data-testid=submit-btn]')
    Playwright-->>Test: Element not found!
    Test->>LLM: DOM + screenshot + context: "Button that completes the order"
    LLM->>LLM: Semantic analysis
    LLM-->>Test: button:has-text('Confirm') — Confidence 94%
    Test->>Playwright: click(button:has-text('Confirm'))
    Playwright-->>Test: Success
    Note over Test: Cache result + set review flag

The key difference: the LLM understands meaning, not just structure.

Tools That Already Do This

ToolApproachKey Feature
MomenticTests in natural language, LLM finds element at runtime”Click the button that submits the order”
OctomindAI agent generates and repairs Playwright testsDirect CI/CD integration
ShortestOpen source, Playwright + AIshortest("user can submit the contact form")
ZeroStepPlaywright pluginai("click login button", { page })

The “Warenkorb → Einkaufswagen” Example

This is where LLM healing proves its worth:

Approach”Submit” → “Confirm""Warenkorb” → “Einkaufswagen”
CSS/XPathBreaksBreaks
data-testidSurvives (if unchanged)Survives (if unchanged)
DOM comparison (Healenium)Maybe, if context unchangedNo — completely different string
LLM semanticUnderstands: both = “complete”Understands: both = “shopping cart”

The Cost Question

The elephant in the room: LLM calls aren’t free.

  • On every step: 50–200ms latency + $0.01–0.05 per call. With 200 tests of 30 interactions each: expensive.
  • Only on failure (recommended): Normal locators first, LLM only when they break. Cache the result. Keeps costs manageable.
  • Local models: A fine-tuned small language model on your own server. No API costs, but initial setup effort.

The Alternative: Don’t Break in the First Place

There’s a third way — and it might be the smartest: write selectors that are stable from the start.

Playwright has this built in as a design principle. Instead of relying on CSS classes or XPaths, Playwright recommends locators tied to the meaning of an element:

StrategyExampleSurvives
getByRole()page.getByRole('button', { name: 'Submit' })CSS refactoring, ID changes, DOM restructuring
getByTestId()page.getByTestId('checkout-btn')Everything except deliberate removal
getByText()page.getByText('Welcome')Classes, IDs, structure — all irrelevant
getByLabel()page.getByLabel('Email address')Forms stay stable

The beauty: getByRole('button', { name: 'Submit' }) survives CSS refactoring, ID changes, and DOM restructuring — as long as there’s a button called Submit.

Plus: Playwright has auto-waiting built in (automatically waits until an element is visible and clickable) and strict mode (throws an error if a locator is ambiguous). That prevents two more major causes of flaky tests.

But: It Doesn’t Solve Everything

Playwright locators also break when the text changes. getByRole('button', { name: 'Submit' }) fails when the button suddenly says Confirm. This is exactly where LLM healing would be a sensible complement.


All Three Approaches Compared

DOM ComparisonLLM SemanticResilient Locators
When activeAfter the breakAfter the breakBefore the break (prevention)
“Submit” → “Confirm”MaybeYesNo (text changed)
Completely new layoutNoPartiallyNo
Logic changeNoNoNo
False healing?RiskLower riskNo risk
CostLowMedium (API costs)None
Setup effortMediumMedium–HighLow
Best nicheLegacy suites with many XPathsSemantic changes, multilingualNew projects

When to Use What

Starting a new project? Invest in resilient locators (getByRole, getByTestId). That prevents 80–90% of locator problems at the source. You’ll rarely need self-healing.

Have a large legacy suite with brittle selectors? Healenium or Katalon can help as a bridge while you gradually improve selectors. Don’t expect miracles — 30–50% of locator failures will be caught.

Your dev team regularly renames UI labels? LLM-based healing (Momentic, Octomind, ZeroStep) is the only solution that understands semantic changes. The “shopping cart” rename scenario is no problem here.

Want to combine everything? The strongest strategy: resilient locators as the foundation + LLM healing as a safety net for failures. Normally everything runs via Playwright locators; on a break, the LLM steps in, caches the result, and optionally creates a fix.


My Take

Self-healing is not hype — the problem it addresses is real and measurable. But there’s an important difference between repair and prevention.

DOM-based self-healing (Healenium & co.) is a solid technique for legacy suites but has clear limits. LLM-based semantic healing is a genuine step forward — for the first time, a tool can understand that “Confirm” and “Submit” mean the same thing.

Still: if you’re setting up a new test project today, invest in stable locators first. It’s cheaper, faster, and prevents the problem at the source. Self-healing — especially the LLM variant — then serves as the safety net for the cases that slip through.

Or put differently: the best test isn’t one that repairs itself — it’s one that doesn’t break in the first place.


Sources

  • Leotta et al. (2021): “Self-Healing Web Test Automation”, Journal of Systems and Software — 60–70% repair rate for minor changes
  • Capgemini World Quality Report 2022/23 — Test maintenance: 25–35% of automation effort
  • SmartBear State of Testing Report 2023 — 22% use AI-assisted test maintenance
  • Current research on AI-assisted test automation and self-healing mechanisms
  • Healenium — Open source self-healing for Selenium
  • Momentic — LLM-based tests in natural language
  • Octomind — AI agent for Playwright tests
  • Shortest — Open source Playwright + AI
  • ZeroStep — Playwright plugin for AI-powered element finding
  • Playwright Locators documentation
Andi

Andi

Test Manager