Using DOM and Vision AI Models for Testing

When creating test instructions in Magic Inspector, you'll notice an AI Models button at the end of certain instruction input fields. This button allows you to choose between different AI models for element identification, depending on the type of instruction you've selected.

Currently, Magic Inspector offers two AI models for element identification:

DOM-based model (labeled as "1" in the screenshot below)
Vision-based model (labeled as "2" in the screenshot below)

It's crucial to understand the differences between these models as they cater to different use cases and scenarios. Let's dive into each model's characteristics, strengths, and weaknesses.

DOM-based Model

The DOM-based model is the default and often the most efficient option for element identification.

How it works

This model serializes useful elements of your page and isolates the element that best matches your description based on the Document Object Model (DOM) structure.

Strengths

Very efficient when components are well-crafted with good placeholders, labels, or semantic attributes
Highly effective when developers use semantic HTML tags and meaningful attributes
Can accurately identify elements based on their role, name, or other DOM properties
More reliable in most standard web scenarios

Weaknesses

Struggles with poorly marked-up HTML or highly visual elements
Quite impossible to describe elements based on their visual position or appearance
Cannot identify elements that are not properly represented in the DOM

Vision-based Model

The vision model serves as an excellent alternative when the DOM model falls short.

How it works

This model is trained to find elements visually based on their characteristics and position on the page.

Strengths

Can identify components that are not easily findable within the DOM
Excels at differentiating elements based on visual clues
Allows for visual descriptions like "The Plus (+) icon button at the bottom right corner"

Weaknesses

Generally less reliable than the DOM model for common web UIs
More prone to hallucinations due to the large variety of UI designs

Choosing the Right Model

When deciding which model to use, consider the following:

Start with the DOM model for most scenarios, especially when working with well-structured web applications.
If you're having trouble identifying an element using the DOM model even after refining your description, switch to the vision model.
Use the vision model when you need to describe elements based on their visual characteristics or position.

After a while, you will intuitively know which model works best depending on the parts of your app you're testing, and choosing between them will become second nature.

What if it still doesn't work?

If you've tried different models and still can't get the result you want, you can still disable the AI and use our fallback methods as explained in the testing without AI guide.