Using DOM and Vision AI Models for Testing
When creating test instructions in Magic Inspector, you'll notice an AI Models button at the end of certain instruction input fields. This button allows you to choose between different AI models for element identification, depending on the type of instruction you've selected.
Currently, Magic Inspector offers two AI models for element identification:
- DOM-based model (labeled as "1" in the screenshot below)
- Vision-based model (labeled as "2" in the screenshot below)
It's crucial to understand the differences between these models as they cater to different use cases and scenarios. Let's dive into each model's characteristics, strengths, and weaknesses.
DOM-based Model
The DOM-based model is the default and often the most efficient option for element identification.
How it works
This model serializes useful elements of your page and isolates the element that best matches your description based on the Document Object Model (DOM) structure.
Strengths
- Very efficient when components are well-crafted with good placeholders, labels, or semantic attributes
- Highly effective when developers use semantic HTML tags and meaningful attributes
- Can accurately identify elements based on their role, name, or other DOM properties
- More reliable in most standard web scenarios
Weaknesses
- Struggles with poorly marked-up HTML or highly visual elements
- Quite impossible to describe elements based on their visual position or appearance
- Cannot identify elements that are not properly represented in the DOM
Vision-based Model
The vision model serves as an excellent alternative when the DOM model falls short.
How it works
This model is trained to find elements visually based on their characteristics and position on the page.
Strengths
- Can identify components that are not easily findable within the DOM
- Excels at differentiating elements based on visual clues
- Allows for visual descriptions like "The Plus (+) icon button at the bottom right corner"
Weaknesses
- Generally less reliable than the DOM model for common web UIs
- More prone to hallucinations due to the large variety of UI designs
Choosing the Right Model
When deciding which model to use, consider the following:
- Start with the DOM model for most scenarios, especially when working with well-structured web applications.
- If you're having trouble identifying an element using the DOM model even after refining your description, switch to the vision model.
- Use the vision model when you need to describe elements based on their visual characteristics or position.
After a while, you will intuitively know which model works best depending on the parts of your app you're testing, and choosing between them will become second nature.
What if it still doesn't work?
If you've tried different models and still can't get the result you want, you can still disable the AI and use our fallback methods as explained in the testing without AI guide.