Categories
July 15, 2025
How market researchers are using Gen AI to generate synthetic data, accelerate insights, and unlock new use cases while protecting privacy.
In a field defined by tight timelines, complex audiences, and rising privacy concerns, synthetic data is emerging as a powerful way to reimagine how market researchers work. Enabled by the rapid evolution of generative AI, synthetic data allows researchers to simulate consumer responses, test hypotheses, and experiment at scale—often without ever fielding a survey.
It’s not a replacement for traditional research—but it is fast becoming a vital complement.
At IIEX North America 2025, Ali Henriques, Executive Director of Qualtrics Edge, emphasized how synthetic data is unlocking speed, scale, and control for insights professionals around the world. Backed by findings from the company’s MR Trends study—an analysis of over 3,000 global researchers across 18 markets—Henriques made the case that synthetic data is no longer a fringe innovation. It’s a foundational shift.
Synthetic data refers to information that is artificially generated to replicate the statistical patterns and properties of real-world data. Unlike traditional datasets sourced from actual respondents, synthetic data is produced through algorithms—often powered by Gen AI models—that simulate realistic responses or behaviors.
For researchers, synthetic data answers the call for:
Privacy-respecting approaches
Faster research cycles
Cost-effective testing
Scalable access to hard-to-reach populations
It offers a flexible, ethical, and increasingly accurate way to explore ideas, prototype survey designs, and even stress-test hypotheses without the friction of recruitment, consent, or data sensitivity.
Synthetic data is gaining traction in a wide range of applications across the research lifecycle. Among the most common:
Message and concept testing: Simulate how audiences might respond to new creative or claims in early-stage development.
Segmentation prototyping: Populate fictional but realistic consumer groups to explore behavioral or attitudinal patterns.
Behavioral simulations: Model shopping journeys, clickstreams, or responses to pricing changes—without exposing real user data.
Data augmentation: Fill in gaps or rebalance skewed datasets with synthetic inputs.
Pre-launch experimentation: Test “what-if” scenarios without needing to commit real-world spend.
These emerging use cases align closely with findings from Qualtrics' MR Trends study. According to Henriques, the study found that synthetic data is already being used as a full replacement for human input in 52% of cases, and 40% of researchers are applying it across both qualitative and quantitative use cases—with early-stage innovation being the most common focus area.
Generative AI powers much of today’s synthetic data creation. Using large language models (LLMs) and diffusion models, researchers can generate highly realistic text, numerical data, or even images based on carefully crafted prompts.
In market research, this means:
Generating synthetic open-ends that mimic real consumer language
Creating AI-powered personas for empathy-building or segmentation
Simulating full survey responses across demographic profiles
Producing qualitative-style narratives to test emotional reactions or storytelling strategies
By blending historical data, behavioral frameworks, and creative prompting, AI makes it possible to generate synthetic data that’s nuanced, diverse, and representative of specific audiences or use cases.
For insight professionals, the biggest draw is how synthetic data accelerates the research process:
Speed: Hypotheses can be tested and refined in hours, not weeks.
Scale: Responses can be generated across low-incidence or sensitive segments.
Access: Synthetic data eliminates friction tied to PII, compliance, or recruitment.
Pre-fieldwork validation: Teams can stress-test survey logic and messaging before committing budget.
As Henriques noted during her IIEX session, researchers are already seeing measurable benefits:
“The benefits are pretty obvious. Condensed timelines for data collection, we're getting insights more quickly, improving the accuracy of insights, we're controlling a bit more of the who and the what they're representing and then getting richer, more detailed data.”
This doesn't mean giving up research rigor. Instead, it's about working smarter. As Henriques put it:
“We’re able to hold on to the aspects of research that are really, really important to us… letting go of a little control… but we're hanging on to the phases of study design and analysis and reporting.”
One of the most immediate advantages of synthetic data is privacy. Because the responses are simulated, researchers can explore sensitive topics or test unreleased concepts without the risk of leaks or compliance concerns.
Henriques offered a sharp example:
“Privacy manifests in a couple of ways in synthetic. One is that synthetic respondents are not going to screenshot your concept and share it on Reddit or any other blog, right? There's privacy inherent in the nature of the response, if you will, but it's also helpful for unpacking sensitive groups and populations. There’s certain populations that are really hard to conduct research with. Well all of that kind of privacy... we shed ourselves of that in a synthetic world.”
This makes synthetic data especially useful for innovation teams, regulated industries, and studies involving at-risk or marginalized populations.
As with any methodology, quality assurance is critical. Synthetic data should be validated through:
Statistical comparison with historical benchmarks
Manual review for coherence and realism
Bias detection tools to surface skew or hallucinated data
Iterative prompting to fine-tune outputs
Importantly, synthetic data shouldn’t be treated as a substitute for real-world measurement. It’s best used for prototyping, exploration, and directional insight—not final decision-making without validation.
Synthetic data is powerful, but it’s not without challenges. Key concerns include:
Overconfidence in simulated responses that sound real but may not reflect true sentiment
Opacity of AI models, which can introduce hidden bias
Ethical usage, especially around transparency and disclosure
Misinterpretation by stakeholders who may not understand what synthetic data is—or isn’t
The solution? Be clear, cautious, and collaborative. Label synthetic inputs. Document methods. Train internal stakeholders on how to interpret and act on these emerging sources of data.
Synthetic data is no longer a curiosity—it’s becoming core. The MR Trends study shows just how confident researchers are in its growth:
71% of global researchers believe that synthetic data will make up more than half of all data collection within the next three years.
Expect to see:
Synthetic data modules integrated into survey and analytics platforms
Real-time simulations to support agile, in-flight decision-making
Cross-modal generation, combining text, visual, and behavioral stimuli
Standardization and benchmarking to ensure data quality
Broader adoption across teams, including brand strategy, innovation, UX, and beyond
Synthetic data, powered by generative AI, offers more than speed or novelty—it represents a new layer of intelligence in the research process. By pairing simulation with strategy, researchers can move faster, explore further, and reduce risk along the way.
As Henriques and others are demonstrating, the future of market research isn’t just digital—it’s synthetic, scalable, and smarter than ever before.
Now is the time to pilot, test, and share best practices—because the teams who learn to work with synthetic data today will be the ones shaping the insights landscape tomorrow.
Comments
Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.
Disclaimer
The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.
More from Ashley Shedlock
Discover how digital twins are transforming marketing research with AI-driven consumer models, faster insights, and predictive foresight.
Discover how AI is reshaping insights jobs, what tasks it automates, and the human skills professionals need to thrive in the AI era.
Stay ahead with AI-driven social media insights. Discover how Converseon, Wonderflow & Revuze help brands turn chatter into strategy.
Explore how new qualitative research tools—from AI interviews to UX testing—help brands uncover the ...
Sign Up for
Updates
Get content that matters, written by top insights industry experts, delivered right to your inbox.