Skip to main content
Official Logo of Columbia Business School
Academics
  • Visit Academics
  • Degree Programs
  • Admissions
  • Tuition & Financial Aid
  • Campus Life
  • Career Management
Faculty & Research
  • Visit Faculty & Research
  • Academic Divisions
  • Search the Directory
  • Research
  • Faculty Resources
  • Teaching Excellence
Executive Education
  • Visit Executive Education
  • For Organizations
  • For Individuals
  • Program Finder
  • Online Programs
  • Certificates
About Us
  • Visit About Us
  • CBS Directory
  • Events Calendar
  • Leadership
  • Our History
  • The CBS Experience
  • Newsroom
Alumni
  • Visit Alumni
  • Update Your Information
  • Lifetime Network
  • Alumni Benefits
  • Alumni Career Management
  • Women's Circle
  • Alumni Clubs
Insights
  • Visit Insights
  • Digital Future
  • Climate
  • Business & Society
  • Entrepreneurship
  • 21st Century Finance
  • Magazine

Uncovering the Costly Bias in Marketplace Testing

Statistical bias could be misleading your product and feature testing, according to research from Columbia Business School Professor Hannah Li, but solutions might be easier than you think.

Published
April 21, 2025
Publication
Research In Brief
Focus On
Digital Future, Marketplace Design
Jump to main content
Article Author(s)
Jonathan Sperling

Jonathan Sperling

Writer/Editor
Marketing and Communications
Online real estate listings
Category
Thought Leadership
Topic(s)
Data/Big Data, AI and Transformative Tech, Marketplace

About the Researcher(s)

Hannah Li

Hannah Li

Assistant Professor of Business
Decision, Risk, and Operations Division

0%

A/B testing can be a relatively quick and cost-efficient tool for leaders and their companies to test new features on a subset of users to understand the impact before broader deployment.  However, this testing can come with a serious caveat in many industries.

Imagine you're testing a new feature on your website — the impact of showing better-quality photos for rental listings on a platform like Airbnb. You randomly split users into two groups in preparation for an A/B test: a treatment group sees new, high-quality photos, while a control group sees the original, standard images.

In a perfect world, each user's behavior would be unaffected by what the other group sees. But that assumption often breaks down in reality, especially in marketplaces or social networks. According to research from Hannah Li, an assistant professor in Columbia Business School's Decision, Risk, and Operations Division, users don't operate in isolation- they interact, compete, and influence each other.

"When you run A/B testing in marketplaces where you have users buying and selling things from each other, the users are no longer going to be independent," Li says.

Key Takeaways:

-Traditional A/B testing assumes a type of user independence, i.e. that the treatment assigned to one individual does not influence the behavior of another.

-In platforms like marketplaces or social networks, this assumption often fails because users interact, compete, or influence one another, creating interference bias.

-As a result, companies risk wrongly rolling out or rejecting features, all while believing they're making sound, data-driven decisions.

-Smarter experimental designs, such as Two-Sided Randomization, can reduce bias.

-Other forms of biases can arise in recommendation systems, where users can strategically interact with their recommendation algorithms by deliberately changing how they engage with content. 

Preventing Statistical Bias

Li explained that when someone in the treatment group books a listing due to the higher-quality photos, there's now one less listing available for someone in the control group. This means the treatment unintentionally affects the control group, violating a core assumption of A/B testing: independence. 

That distortion is what Li and her fellow researchers call interference bias – an occurrence that can bias as high as 230%, meaning companies might believe an intervention is more than twice as effective as it is. That can lead to false confidence in a product change — launching something you think is a success, only to find it doesn't work in the real world. Worse, it might cause you to kill ideas that would've worked simply because your experiment didn't account for how users affect one another. All the while, a company believes they are making air-tight, data-driven decisions. 

In their research, Li and her co-researchers found that implementing the right experimental systems can curtail this bias.

Interference in Action

To investigate how interference bias arises in two-sided platforms, the researchers developed a formal marketplace model using continuous-time Markov chains. This mathematical framework allowed them to simulate a dynamic environment where buyers and sellers arrive, interact, and transact over time. 

Li and her co-researchers found that preventing this bias can be done through a novel form of experimental design, known as Two-Sided Randomization (TSR). TSR randomizes both sides of the marketplace simultaneously. Instead of randomizing either  sellers or buyers to treatment or control groups, TSR randomizes both sides, sellers and buyers, to these groups. This type of design allows the platform to measure competition effects between sellers and between buyers, the source of the interference bias, and account for these effects in the experiment estimates. 

This leads to far more accurate estimates of an experiment's Global Treatment Effect (GTE) — the metric most companies care about when deciding whether to roll out a feature to all users. Simulations from Li and her co-researchers' paper show that TSR consistently produces lower bias than standard experimental methods, across a wide range of market conditions.

If TSR is not feasible, there are other approaches companies can take, according to Li. Cluster Randomization, for example, groups users (e.g., by region) and randomizes them to minimize cross-group interaction.

Another technique is Switchback Testing. Instead of splitting users into a control group and a treatment group, alternate the treatment across time periods for the entire platform (e.g., on one day, off the next).

When Users are Strategic

A subsequent paper by Li studies how systems of people strategically interact with online platforms to influence recommended content—another form of bias that can throw companies off.

Typically, platforms like TikTok, Netflix, and Amazon suggest content based on users' past behaviors, assuming user interactions are straightforward reflections of their preferences. However, Li and her co-researchers' study suggests that users often engage in strategic behavior to shape their future recommendations.

For instance, when participants were informed that an algorithm prioritizes "likes" and "dislikes," they used these features almost twice as much as those told the algorithm focuses on viewing time. Through surveys, the researchers found that nearly half of the participants admitted to altering their behavior on platforms to control future recommendations. Some users even reported avoiding content they enjoy to prevent the platform from over-recommending similar content in the future.

"If you watch a video on YouTube, the platform learns that you like it. If you don't watch it, they learn you don't like it. But what we heard is that users are strategizing. They may see a YouTube video and actually like it, but they know that if they click on it, they will get millions of the same videos for the next three weeks. So, they don't watch the video," Li says, adding that "when this happens, the data that's being collected is not representative of the user's true preferences."

Experimental Music

To study how users adapt their behavior in response to recommendation systems, Li and her co-authors created their own music streaming app—essentially a simplified version of Spotify. This gave them total control over what users saw and how the system reacted. By stripping away real-world platform complexities, they could focus entirely on whether users tried to “game” the algorithm.

The study’s 750 participants were randomly assigned to different conditions in a controlled environment. Everyone listened to songs and could “like” or “dislike” them, or just skip ahead. In the first session, participants used the music player naturally, as if they were on a real platform. 

In the following session, participants were randomly told different things about how the recommendation algorithm worked. Some were told the system cared most about likes/dislikes, others were told it prioritized listening time, and a control group got no guidance.

This setup let researchers test how user behavior changed depending on what users believed the algorithm cared about—without changing the actual algorithm. By observing how people’s actions varied under these scenarios, the researchers could see whether users acted strategically—choosing actions not just based on personal enjoyment, but also based on what they thought would “train” the algorithm in their favor. he main behavioral metrics tracked included:

The researchers paid close attention to the number of “likes” and “dislikes” and how long users stayed on each song, or dwell time. The researchers also conducted follow-up surveys to confirm whether users admitted to similar strategic behaviors on real-world platforms like Spotify or TikTok.

Li suggested that the fact users are strategizing indicates that recommendation systems, such as Instagram’s “Explore”  page, may be over-indexing on known user preferences rather than exploring new content. Adjusting the algorithms to be less heavy-handed in pushing familiar content could help address this issue.

She also noted that users would ideally be able to more easily alter the algorithm behind their personal feed rather than strategize their behavior. Giving users more control and transparency over the recommendation system could help mitigate stratification.

 

Adapted from “Measuring Strategization in Recommendation,” by Hannah Li of Columbia Business School, Sarah H. Cen of Massachusetts Institute of Technology, Andrew Ilyas of Massachusetts Institute of Technology, Jennifer Allen of Massachusetts Institute of Technology, and Aleksander Mądry Massachusetts Institute of Technology.

Also adapted from “Experimental Design in Two-Sided Platforms,” by Hannah Li of Columbia Business School, Ramesh Johari of Stanford University, Inessa Liskovich of Stanford University, and Gabriel Y. Weintraub of Stanford University.

About the Researcher(s)

Hannah Li

Hannah Li

Assistant Professor of Business
Decision, Risk, and Operations Division

You Might Like

Algorithms, Analytics, Artificial Intelligence, Business and Society, Business Economics and Public Policy, Data and Business Analytics, AI and Transformative Tech, Digital IQ, Finance, Marketing, Marketplace
Date
April 17, 2025
Close-up computer monitor with trading software
Algorithms, Analytics, Artificial Intelligence, Business and Society, Business Economics and Public Policy, Data and Business Analytics, AI and Transformative Tech, Digital IQ, Finance, Marketing, Marketplace

Designing Smarter Economic Systems: A New Approach to Mechanism Design

Award-winning research from Professor Laura Doval tackles the “limited commitment” problem in economics, offering a model that helps governments and firms adjust rules and strategies based on new information over time.
  • Read more about Designing Smarter Economic Systems: A New Approach to Mechanism Design about Designing Smarter Economic Systems: A New Approach to Mechanism Design
Data and Business Analytics, Data/Big Data, AI and Transformative Tech, Digital IQ, Marketing, Media and Technology
Date
April 04, 2025
Shopping for travel online
Data and Business Analytics, Data/Big Data, AI and Transformative Tech, Digital IQ, Marketing, Media and Technology

How Real-Time Click Data Drives Smarter Personalization

New Columbia Business School research reveals how analyzing real-time customer journey data — from search queries to filtering behavior — can predict preferences with remarkable accuracy, even without historical data.
  • Read more about How Real-Time Click Data Drives Smarter Personalization about How Real-Time Click Data Drives Smarter Personalization
Business and Society, Economics and Policy, Globalization, Management, Social Impact
Date
March 27, 2025
Depressed woman in business
Business and Society, Economics and Policy, Globalization, Management, Social Impact

When Economic Struggles Foster Self-Interest, Not Universal Compassion

A Columbia Business School study shows that experiencing a recession in young adulthood leads to lasting support for wealth redistribution—but mostly for one’s own group.
  • Read more about When Economic Struggles Foster Self-Interest, Not Universal Compassion about When Economic Struggles Foster Self-Interest, Not Universal Compassion
Algorithms, Artificial Intelligence, Business and Society, AI and Transformative Tech, Digital IQ, Marketing, Marketplace
Date
March 18, 2025
Music app on a smartphone
Algorithms, Artificial Intelligence, Business and Society, AI and Transformative Tech, Digital IQ, Marketing, Marketplace

The Secret to Getting Consumers to Trust Personalized Recommendations

Columbia Business School researchers discover that the amount of variety in a consumer’s past purchases predicts their openness to algorithm-based recommendations.
  • Read more about The Secret to Getting Consumers to Trust Personalized Recommendations about The Secret to Getting Consumers to Trust Personalized Recommendations
Save Article

Download PDF

More to Explore
Share
  • Share on Facebook
  • Share on Threads
  • Share on LinkedIn

External CSS

Official Logo of Columbia Business School

Columbia University in the City of New York
665 West 130th Street, New York, NY 10027
Tel. 212-854-1100

Maps and Directions
    • Centers & Programs
    • Current Students
    • Corporate
    • Directory
    • Support Us
    • Recruiters & Partners
    • Faculty & Staff
    • Newsroom
    • Careers
    • Contact Us
    • Accessibility
    • Privacy & Policy Statements
Back to Top Upward arrow
TOP

© Columbia University

  • X
  • Instagram
  • Facebook
  • YouTube
  • LinkedIn