Inside the AI Shopping Sandbox
To explore how AI agents make purchasing decisions, the researchers used the mock site to display eight products in a fixed grid. They then simulated thousands of shopping sessions, presenting the AI agents with randomized product arrangements. Each experiment varied one or more attributes: where a product appeared on the page, its price, rating, review count, and whether it carried a “Sponsored” or “Overall Pick” badge.

This structure allowed the researchers to isolate causal effects. For example, they could determine whether an item was chosen more often because of its price or simply because it appeared on the top row. They found that positioning had an outsized impact, and not always in the ways designers might expect. While all models favored top-row placement, preferences across columns differed: GPT-4.1 leaned left, Claude Sonnet 4 favored the center, and Gemini 2.5 Flash preferred the right. Sellers optimizing for AI visibility need to consider not just where a product appears but also which agent they are trying to appeal to.

Badging also played a significant role. Products marked as “Sponsored” saw their selection rates drop, while those labeled “Overall Pick” experienced substantial gains in attention. These effects held even when all other variables were held constant, indicating that AI agents, like human consumers, interpret platform signals, sometimes even discounting advertisements in favor of perceived endorsements.
Different Models, Different Markets
One of the study’s most striking conclusions is how differently AI models behave. Claude Sonnet 4, GPT-4.1, and Gemini 2.5 Flash frequently made divergent choices when asked to choose among identical assortments. For example, Claude favored one brand in the fitness watch category nearly twice as often as the other models. These preferences were consistent and measurable, suggesting that each AI model effectively creates its own miniature market with its own demand patterns.
That variability has real implications. In a world where AI agents mediate a growing share of online purchases, sellers may find themselves vulnerable to shifts in model behavior, especially when updates roll out—in one documented case, simply switching from an older to a newer version of Gemini reshuffled market shares across multiple categories, even though the listings themselves didn’t change. For sellers, that means a model upgrade could function as a demand shock. For platforms, it raises questions about how stable rankings and endorsements will be in an AI-driven environment.
Another key insight comes from experiments on rationality and instruction following. Although most advanced models passed basic tests—such as choosing the cheapest or highest-rated item when clearly optimal—performance varied. Some models, for instance, failed to recognize slight differences in price or rating, dismissing them as insignificant or mistakenly attributing them to display errors. While these behaviors may be understandable in ambiguous cases, they also point to risks: consumers delegating decisions to agents may not always get the best deal.
Gaming AI agents
In a final set of experiments, the researchers explored how sellers might respond to the rise of AI agents. They simulated a world in which sellers use their own AI tools to adjust product descriptions, with the goal of better appealing to automated buyers. These tweaks were minor—usually just rewordings or reordering of features—but the effects were occasionally dramatic.
In about 25 percent of cases, a single round of AI-generated edits to a product’s description produced statistically significant increases in selection share. One mousepad, for instance, saw its market share jump by more than 20 percentage points after a rewritten description made it more attractive to the GPT-4.1 agent. Most changes had little to no effect, but the outliers suggest that careful optimization, especially when tuned to a specific model, can be highly effective.
This dynamic hints at the emergence of a new meta-game in e-commerce. Rather than simply optimizing for human consumers, sellers may begin to tailor their listings for bots. The language, layout, and even visual presentation of a product could be subtly altered to appeal to an agent’s preferences, much like marketers once fiercely optimized webpages for search engine algorithms.
The New E-Commerce Playbook
As AI agents begin to shape what products get seen and sold, the old rules of online retail may no longer apply. Sellers can no longer rely on pricing alone, nor can they assume that promotional placements will have the same impact on machine shoppers. Instead, they’ll need to understand which features agents prioritize and how different AI models behave.
For platforms, the rise of agentic shopping raises existential questions about design, monetization, and fairness. For instance, if page position influences AI agents in different ways, how will platforms leverage rankings as a monetization lever? And if one model’s preferences dominate a category, does that create an unfair advantage for certain products?
To help answer these questions, the researchers have open-sourced their ACES framework, allowing others to replicate, adapt, and build on their experiments. As consumers hand over more of their decision-making to machines, tools like this may become essential for all stakeholders - AI agent developers, buyers, sellers, platforms and regulators - trying to navigate the next phase of digital commerce.