Categories
Generative AI can transform research, but hidden bias risks distorting insights. Learn how inclusive data and oversight support fair, trustworthy outcomes.
Synthetic research is having a moment. As Generative AI (GenAI) tools get better at simulating consumers and projecting market scenarios, research teams are using them to move faster and stretch smaller budgets. That pace is exciting, but it's also risky not to talk plainly about bias.
Diversity, Equity, and Inclusion (DEI) aren't just ethics boxes to check in research. They're how you get accurate, durable, valuable insights. If your data or methods systematically underrepresent or misread parts of the market, your strategy wobbles.
Hidden biases in GenAI can slip past even experienced researchers because the outputs look so confident and coherent. That's the danger, but it's fixable. Keep reading to learn how to address these and make the most of your synthetic research.
GenAI refers to models that create new content (text, images, audio, or synthetic data) by learning patterns from large training sets.
In research, these models can simulate survey responses, generate personas, write summaries of open-ends, and even synthesize "what-if" scenarios based on historical data.
Under the hood, they're predicting the next token or pixel, not "understanding" the world like a person would. A Massachusetts Institute of technology (MIT) research has shown that large language models (LLMs) can encode and reproduce societal biases present in their training data, particularly when applied to downstream decision-making tasks.
When AI models train on synthetic data that predominantly represents one demographic, they develop a skewed understanding of consumer behavior.
In a nutshell, AI creates synthetic data that shapes insights generation. GenAI helps teams:
Analyze mountains of unstructured data (reviews, chats, social posts) in minutes
Draft hypotheses, discussion guides, even iterative survey items
Generate synthetic consumer segments for early prototyping
Summarize respondent feedback with sentiment and themes at scale
The benefits are real: cost savings and speed (even the ability to explore "long tail" questions at low cost).
Tools like these also democratize access to research across an organization. However, the same scalability that makes GenAI powerful can also quietly and quickly amplify bias. That is if you don't build guardrails!
Bias can sneak in from several places:

Image source: Generated by author via ChatGPT
Data is the first culprit. If the training data underrepresents certain groups or overrepresents specific cultural contexts, the model's outputs reflect that skew. The National Institute of Standards and Technology (NIST) calls these "sociotechnical" risks and urges teams to address them across the AI lifecycle.
Algorithms present another problem. Optimization choices can favor majority patterns. For example, sentiment or toxicity classifiers have exhibited higher false positives on language varieties like African American English, which can distort insights if used uncritically.
There's human oversight. Prompting decisions, label guidelines, and interpretation practices introduce their own assumptions. Even well-meaning researchers can unintentionally instruct a model in ways that exclude or stereotype.
Short-term vs. long-term effects: In the short term, biased outputs lead to noisy insights and poor decisions. You might miss a promising segment or choose messages that don't land.
Over the long haul, bias compounds. When your team normalizes incomplete pictures, strategy drifts, and trust erodes, especially among communities that feel unseen by your brand.
Misrepresentation and exclusion: Misrepresentation happens when GenAI paraphrases open-ended statements in ways that flatten dialects or cultural references. Your toplines can misstate what people actually said and meant.
Meanwhile, exclusion occurs when your synthetic data doesn't model disability, language diversity, rural consumers, or older adults. Your projections will ignore them. Over time, that can harden into product gaps and inequitable experiences.
Learn from Bryan Henry, President of PeterMD. Having worked on healthcare research, he emphasizes the risk of relying too heavily on synthetic outputs without validation.
Henry says, “GenAI can surface patterns quickly, but it doesn’t inherently know what’s missing. In the field of healthcare, where patient variability is significant, overlooking underrepresented groups can lead to insights that feel complete but fail in practice”.
An MIT news article explains that bias emerges because training data reflects real-world societal bias, which models then reproduce. That’s why MIT researchers have continued to train LLMs to reduce harmful stereotypes, such as gender and racial biases. However, a lot of work still needs to be done.
We can't remove all biases. However, we can reduce it and make it visible enough to manage. Here’s how:
Build inclusive datasets on purpose. Oversample underrepresented groups for fine-tuning tasks tied to your markets. Make sure your synthetic seeds and exemplars span age, geography, disability, language, and socioeconomic status.
Document your data. Adopt Datasheets for Datasets (see an example below), so stakeholders can see who's in the data, who's missing, how labels were defined, and where known limitations sit.

Preserve subgroup signals. When summarizing or de-identifying, keep tags that let you analyze fairness later (e.g., self-identified demographics, language variety). Synthetic data can be useful, but it tends to inherit upstream skews unless you explicitly rebalance it.
For example, a brand developing new T-shirt collections might oversample feedback from diverse groups (with different age ranges, body types, cultural backgrounds, etc.) to ensure designs resonate broadly.
As a brand, you document who contributed input and track which groups may still be underrepresented. When analyzing results, you preserve subgroup tags (e.g., language or region) to spot gaps and rebalance future collections accordingly.
A retail brand using synthetic personas to test product-market fit found that early outputs skewed heavily toward urban, English-speaking consumers. After auditing their seed data and rebalancing inputs to include rural and multilingual segments, their follow-up research surfaced entirely different purchase drivers—particularly around pricing sensitivity and accessibility. This shift directly influenced their regional pricing strategy.
Set fairness goals up front. Define parity metrics relevant to research tasks: false-positive/negative rates in sentiment by subgroup, or calibration of satisfaction scores across demographics. Stanford's HELM is a good reference for multi-dimensional evaluation.

Use bias detection toolkits. Open-source libraries like IBM's AI Fairness 360 and Microsoft's Fairlearn help quantify group-level disparities and test mitigation approaches.
Constrain generation with prompts and controls. Steer models to consider specific subgroups and ask for separate summaries by demographic or region before generating an overall roll-up.
Involve the domain and community experts. Bring in cultural advisors or community researchers to stress test prompts and labels. This isn't "review at the end." It's co-design.
Case in point: If you are a rental website builder, you might set fairness goals to ensure listings and recommendations perform equally well across different renter groups, such as families, students, or people with disabilities.
As such, you use bias detection tools to check if certain groups are shown fewer options or receive less accurate matches, then adjust your model and prompts to correct this. You also involve community advisors to review how listings are described and categorized. This ensures the platform remains inclusive and fair by design.
Audit early and often. Run pre-deployment and ongoing fairness evaluations, just like you would QA on survey logic. Tools aside, create a cadence and owners.
Add human-in-the-loop checkpoints. Before insights roll up to decision-makers, have trained reviewers evaluate subgroup summaries and language choices.
Close the loop with real-world signals. Compare AI-generated insights with actual market performance, complaints, support tickets, and community feedback. Adjust models and prompts when drift shows up.
Publish model cards for internal use. Even if you don't share them publicly, concise model documentation (see an example below) helps teams remember what a model is good for and what it isn't.

For instance, a healthcare organization documenting medical negligence might run regular audits to check whether its AI system flags incidents consistently across different patient groups.
Before reports are finalized, trained reviewers assess language and summaries to ensure no group is misrepresented or overlooked. They then compare AI findings with actual complaints, legal cases, and patient feedback.
Lastly, they update models as gaps appear and maintain internal model documentation to track limitations and proper use.
GenAI excels at pattern-finding and summarization. But markets are made of people, and people live in context. That's where human researchers shine: translating cultural cues, interrogating outliers, asking better follow-up questions. This is how human co-creation will shape the future!
In other words, AI excels at processing vast amounts of data, but human researchers bring cultural context and empathy that machines can't replicate. The most successful projects pair AI's analytical power with researchers who understand the communities they're studying. This combination catches biases that either approach alone would miss.
Prompt engineering for inclusivity: Teach teams to ask for subgroup views and counter-examples, not just top-three themes
Fairness literacy: Make sure everyone understands basic fairness metrics and how to read a bias report.
Data documentation: Train researchers to write and use dataset and model documentation as part of the workflow.
Community-centered methods: Strengthen qualitative skills that surface lived experiences. Hold co-creation sessions and accessibility checks, even moderated discussions with local context.
A well-known study published in Science found that a widely used healthcare algorithm underestimated the needs of Black patients compared to white patients with the same health conditions.
The issue wasn’t the model itself; it was the data. The system used healthcare spending as a proxy for illness severity. But because less was historically spent on Black patients, the algorithm learned a distorted pattern.
In practice, this meant certain patients were less likely to be flagged for additional care. Once researchers replaced cost-based proxies with direct health indicators, the bias dropped significantly.
This is a clear example of how synthetic or AI-driven insights can look accurate on the surface but still miss entire populations if the inputs are flawed.
Research highlighted by Stanford for human-centered AI (HAI) shows that AI systems can misinterpret African American English (AAE), often labeling it as more negative or toxic than it actually is.
In a market research setting, this becomes a real problem. If teams use AI tools to analyze open-ended survey responses or social media feedback, they may accidentally overstate negative sentiment from certain groups or misread tone entirely.
That can lead to incorrect conclusions about campaign performance or customer satisfaction. This can even cause brands to overlook valuable segments. This is why subgroup checks and human review are still essential when working with AI-driven analysis.
GenAI can amplify insights work if we use it with intention. However, bias creeps in through data and algorithms. If left unchecked, it undermines DEI and warps the very market understanding we depend on.
We know what works: build inclusive datasets, document them, set clear fairness goals, and audit routinely. More importantly, keep skilled humans in the loop. Pairing AI with human empathy is how you catch what either would miss alone.
Ultimately, inclusive research drives better outcomes while leaving less opportunity on the table. To learn how leading teams are applying GenAI responsibly, explore the latest insights at GreenBook.
Comments
Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.
Disclaimer
The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.
More from Catherine Schwartz
Delve into the realm of visual marketing through our insightful article. Gain insights on its definition, functionality, and practical applications fo...
Enhance user experiences and drive sales with the strategic implementation of conversational design in e-commerce.
Explore the benefits of translation for your business. Learn how translation can improve customer experience and drive repeat sales.
ARTICLES
Top in Inclusive Insights
Women are already deciding the next decade for brands. New research reveals how expectations are outpacing brand reality.
Learn how inclusive gender demographic questions reveal diverse experiences, inform policy, and address disparities for equity, inclusion, and social ...
Sign Up for
Updates
Get content that matters, written by top insights industry experts, delivered right to your inbox.