The Limits of Synthetic Data in Market Research

Mar 26

Synthetic data in market research is quickly becoming one of the most talked-about topics in the industry. AI models can now generate responses that look remarkably human. For some use cases, this opens up exciting possibilities — faster insights and the ability to rapidly prioritize ideas before investing in full research studies.

But there’s also a growing debate about how far synthetic data should go. As we explored in our perspective on the future of surveys in a world of synthetic data, the rise of AI-generated participants raises important questions about whether simulated insights can truly replace real human perspectives.

Some companies are beginning to suggest that synthetic participants could replace real survey participants entirely. And that’s where many researchers are starting to ask some important questions about the role of synthetic data in market research.

When Synthetic Data Looks Real — But Isn’t

One of the strengths of large language models is their ability to generate responses that sound very human. But that doesn’t necessarily mean they reflect real people.

“Synthetic data can be extremely plausibly human-looking, but plausible is not the same as accurate or representative,” says Josh Seltzer, CTO of Nexxt Intelligence | inca.

Large language models are trained on massive amounts of existing text and are very good at producing responses that resemble the average patterns within that data. The challenge is that market research insight rarely comes from the average. Often, the most valuable insights come from people who think differently — the edge cases, the unexpected opinions, or the emerging behaviors that haven’t yet become mainstream. If synthetic participants mainly reproduce the average, those important signals can easily be lost.

This is an important consideration for teams relying on AI-powered quantitative research and modern AI insights platforms for surveys, where the goal is not just scale but meaningful human insight.

Representation and emerging trends

Another limitation of synthetic data in market research is representation. Much of the content used to train large language models comes from the internet. And that data tends to reflect a relatively narrow portion of the global population.

As Josh explains, a large portion of online text comes from what researchers describe as “western, educated, industrialized, rich and democratic (WEIRD)” populations.

This means AI-generated responses may struggle to accurately represent minority perspectives or underrepresented communities. Synthetic data can also have difficulty capturing new or emerging trends. Due to the fact that language models rely heavily on existing data, they are often better at describing what already exists than predicting what might happen next — particularly when researchers are exploring entirely new product categories or ideas. For organizations exploring AI powered market research methods, this raises an important point: AI can support research, but it cannot fully replace real human perspectives.

Where Synthetic Data Can Still Be Useful

None of this means synthetic data has no place in research.

In fact, there are situations where it can be genuinely useful. For example, it can help with:

hypothesis generation and scenario planning when quick directional input is needed
rapid screening of large sets of ideas
data augmentation

In the first two situations, synthetic data may provide helpful guidance before conducting research with real people. But when businesses are making major strategic or financial decisions, researchers still need something synthetic data cannot fully replicate: real human perspectives.

As Phil Sutcliffe, Managing Partner with inca | Nexxt Intelligence notes, market research ultimately exists to understand people — their motivations, experiences, and emotions. And that requires listening to real voices, not just simulations of them.

This is why many researchers are exploring conversational research approaches that combine AI efficiency with real human perspectives. At Nexxt Intelligence, we developed the inca platform to support this balance — helping researchers collect deeper qualitative insight at scale while keeping real participant voices at the center of the research process.

The most powerful research approaches combine AI and human insight, enabling researchers to quickly capture perspectives from real people at scale, grounding their insights in authentic human experience.

Kathy Cheng

The Limits of Synthetic Data in Market Research

When Synthetic Data Looks Real — But Isn’t

Representation and emerging trends

Where Synthetic Data Can Still Be Useful

Reclaiming Conversation in the Age of AI

Why Survey Experience Matters for Better Data in Market Research