Sample Blending: A New Approach

Multi-source sampling presents challenges in ensuring a consistent, reliable blend. SSI is working to identify the components that drive consumer preferences to find the controlling factor that ensures two samples are truly comparable to one another, across time and across sources.

Multi-source samples have largely been considered a necessary evil - a requirement when no single source can handle a large or low incidence sampling need. Researchers have avoided using multiple sources, perhaps because sampling theory dictates the use of a defined, single population.

But the world is changing and the traditional e-mail invitation access panel paradigm is no longer enough to provide sustainable research into the future. There are many people who want to give their opinions but may shy away from the commitment of joining a panel. Just as a manufacturer of golfing equipment that conducts research solely among members of country clubs would be missing large segments of its target audiences, panels may be missing a distinct type of opinion-giver. A blended approach incorporates panels, communities, and groups with aligned interests. The whole Internet becomes the panel.

But how can a sample buyer ensure a consistent, reliable blend when using multiple sources, especially when no external benchmarks exist?

We can use quotas by socio-demographics as we have always done to create samples which look alike and "look like" the general population. But demographics are often not the most helpful or relevant stabilizers. Take the absurd example of an online sample being sourced only from a fishing website. Fishing is a popular sport among most demographics, so we could produce a sample from this site that looked "right."

But what sort of answers would members of that sample give? If the question is about brands of coffee bought, then their answers might reflect the wider population. Being interested in fishing doesn't affect the coffee you drink. But what if the topic is media consumption? Anglers spend their weekends fishing, not watching TV. Their answers on TV viewing may not be anywhere close to the answers of the general population, and we might not have a true picture of media consumption.

It is not unusual for a product to be equally liked by both genders, across all age groups, regions and social classes - think Coke for example. But something drives consumer preference for Coke over Pepsi. It is this "something," or a proxy for it, that needs to be controlled for or stratified on in the sampling process to ensure that two samples are truly comparable to one another, across time and across sources.

SSI is working to identify the components of this "something" and design a stabilization method which goes beyond socio-demographics. We have tested a broad set of factors, including psychographic, neurographic, and personality variables which define groups of people and which might potentially drive variance in behavior. Some variables which were powerful in explaining variables included chronotyping and propensity factors such as risk averseness.

We found differences across the sources we tested on a wide range of topics and questions such as attitude to new technology and shopping habits. Importantly, balancing on the psychological and neurographic variables did more to explain the variance than additional demographic balancing did as shown in the chart below:

	No balancing	Balancing with additional socio-demographic variables	Balancing with psychographic, etc. variables
Between sample variation metric 1*	0.88	0.76	0.48
Improvement factor		1.15	1.83
Between sample variation metric 2**	18%	14%	9%
Improvement factor		1.29	1.92

*This metric was calculated as follows: 1. Means of IVs tabulated per source; 2. Variance between sample means calculated; 3. Sum of variances calculated

**This metric was calculated by calculating % of subsample means deviating 0.1 SD from total score

To take one specific example, there is a difference in ownership of items such as BlackBerry® cell phones and MP3 players across the panel and non-panel sources we tested. This can be traced to an underlying behavior (likelihood of adopting new technology) and beyond that to an underlying attitude, (willingness to try new things).

Blending chart

Instead of having to control the variation of every question, on ownership of individual items, we can use "willingness to try new things" as a variable which explains variation in a variety of individual behaviors and usages.

This is one example of how we arrive at an element in a stabilization cluster. We have taken the first steps in creating a concise list of relevant variables that can be used as stabilization metrics for a broad range of research subjects at the individual participant, rather than the source, level. Identification of bias and controlling for it in multi-sourced samples is a work in progress, but we see this approach as a big step in the right direction.

This content was provided by Survey Sampling International. For complete research details, download the White Paper, Blending Sample 1+1>2. To learn how sample blending can benefit your projects, contact your SSI account manager at [email protected] or +1.203.567.7200. Visit the company website at www.surveysampling.com.

Get the latest updates from top market research, insights, and analytics experts delivered weekly to your inbox