White Paper:
Cluster Analysis Gets Complicated
by Rajan Sambandam, TRC
Segmentation studies using cluster analysis have become commonplace. However, the data may be affected by collinearity, which can have a strong impact and affect the results of the analysis unless addressed. This article investigates what level presents a problem, why it's a problem, and how to get around it. Simulated data allows a clear demonstration of the issue without clouding it with extraneous factors.
Collinearity is a natural problem in clustering.
So how can researchers get around it?
Cluster analysis is widely used in segmentation studies for several reasons. First of all, it’s easy to use. In addition, there are many variations of the method, most statistical packages have a clustering option, and for the most part it’s a good analytical technique. Further, the non-hierarchical clustering technique k-means is particularly popular because it’s very fast and can handle large data sets. Cluster analysis is a distance-based method because it uses Euclidean distance (or some variant) in multidimensional space to assign objects to clusters to which they are closest. However, collinearity can become a major problem when such distance based measures are used. It poses a serious problem that, unless addressed, can produce distorted results.
Collinearity can be defined simply as a high level of correlation between two variables. (When more than two variables are involved, this would be called as multicollinearity.) How high does the correlation have to be for the term collinearity to be invoked? While rules of thumb are prevalent, there doesn't appear to be any strict standard even in the case of regression-based key driver analysis. It's also not clear if such rules of thumb would be applicable for segmentation analysis.
Collinearity is a problem in key driver analysis because, when two independent variables are highly correlated, it becomes difficult to accurately partial out their individual impact on the dependent variable. This often results in beta coefficients that don't appear to be reasonable. While this makes it easy to observe the effects of collinearity in the data, developing a solution may not be straightforward.
The problem is different in segmentation using cluster analysis because there’s no dependent variable or beta coefficient. A certain number of observations measured on a specified number of variables are used for creating segments. Each observation belongs to one segment, and each segment can be defined in terms of all the variables used in the analysis. From a marketing research perspective, the objective in each case is to identify groups of observations similar to each other on certain characteristics, or basis variables, with the hope this would translate into opportunities. In a sense, all segmentation methods are trying for internal cohesion and external isolation among the segments.
When variables used in clustering are collinear, some variables get a higher weight than others. If two variables are perfectly correlated, they effectively represent the same concept. But that concept is now represented twice in the data and hence gets twice the weight of all the other variables. The final solution is likely to be skewed in the direction of that concept, which could be a problem if it’s not anticipated. In the case of multiple variables and multicollinearity, the analysis is in effect being conducted on some unknown number of concepts that are a subset of the actual number of variables being used in the analysis.
For example, while the intention may have been to conduct a cluster analysis on 20 variables, it may actually be conducted on seven concepts that may be unequally weighted. In this situation, there could be a large gap between the intention of the analyst (clustering 20 variables) and what happens in reality (segments based on seven concepts). This could cause the segmentation analysis to go in an undesirable direction. Thus, even though cluster analysis deals with people, correlations between variables have an effect on the results of the analysis.
Can It Be Demonstrated?
Is it possible to demonstrate the effect of collinearity in clustering? Further, is it possible to show at what level collinearity can become a problem in segmentation analysis? The answer to both questions is yes, if we’re willing to make the following assumptions: (1) Regardless of the data used, certain types of segments are more useful than others and (2) The problem of collinearity in clustering can be demonstrated using the minimum requirement of variables (i.e., two).
These assumptions are not as restrictive as they initially seem. Consider the first assumption. Traditionally, studies that seek to understand segmenting methods (in terms of the best method to use, effect of outliers, or scales) tend to use either real data about which a lot is known, or simulated data where segment membership is known.
However, to demonstrate the effect of collinearity, we need to use data where the level of correlation between variables can be controlled. This rules out the real data option. Creating a data set where segments are pre-defined and correlations can be varied is almost impossible because the two are linked. But in using simulated data where correlation can be controlled, the need for knowing segment membership is averted if good segments can be simply defined as ones with clearly varying values on the variables used.
Segments with uniformly high or low mean values on all the variables generally tend to be less useful than those with a mix of values. Since practicality is what defines the goodness of a segmentation solution, this is an acceptable standard to use. Further, segments with uniformly high or low values on all variables are easy to identify without using any segmentation analysis technique. It’s only in the mix of values that a richer understanding of the data emerges. It could be argued that the very reason for using any sort of multivariate segmentation technique is to be able to identify segments with a useful mix of values on different variables.

Addressing the second assumption, the problem is a lot easier to demonstrate if we restrict the scope of the analysis to the minimum. Using just two variables is enough to demonstrate collinearity. Since bivariate correlation is usually the issue when conducting analysis, the results are translatable to any number of variables used in an actual analysis when taken two at a time. With only two variables being used, four segments can adequately represent the practically interesting combinations of two variables. Hence the results reported here are only in the two-to four-segment range, although I extended the analysis up to seven segments to see if the pattern of results held.
To read the rest of this article in pdf format, click here.
This article was written by Rajan Sambandam of TRC, a full-service market research provider located in Fort Washington, PA.
Other content shared by TRC
Better Questions For Segmentation: Use of MAX-DIFF
by Rajan Sambandam, TRC
Using Maximum Difference Scaling as a method in designing surveys may ensure more useful results in your market research. It is a comparative method based on importance that sidesteps the problems associated with traditional importance scales. TRC explains the mechanics behind this method through a detailed example in this white paper. Read Article »
Database Scoring with Object Based Segmentation
by Rajan Sambandam, TRC
Segmentation created from company databases are often lacking the rich segmentation schemes formed by attitudinal surveys. A new approach is Object based segmentation that uses database variables at the basis for forming attitudinal segments, leaving both markets classifiable with clear demographic segments. TRC compares traditional segmentation analysis with Object based. Read Article »
Asymmetry in Product Features: Use of the Kano Method
by Rajan Sambandam, TRC
The presence or absence of product features strongly affect consumer satisfaction with the design. Comparing these features using asymmetry analysis can help identify satisfiers and dissatisfiers from among the features of a product. The Kano method is similar but results in categorizing each respondent's answers. TRC presents this essential method of deciding new product features in detail. Read Article »
Conjoint Analysis versus Self-Explicated Method: A Comparison
by Rajan Sambandam, TRC
Determining feature importance in a product can be divided into two techniques - top-down methods where a customer evaluates the whole product at once, and bottum-up methods where features are evaluated individually or in sets. The former method, Conjoint Analysis, is more common while the latter method, Self-Explicated Method, is not widely used but has practical advantages. TRC compares the two methods in this white paper. Read Article »
Product Configurator
by Rajan Sambandam, TRC
To help customers purchase the right product, companies often use product configurators - tools that let customers design their purchase before ordering. This method is employed as a market research technique, similar to conjoint analysis but without some of the constrictions. This white paper from TRC explains an appropriate use of the product configurator method. Read Article »
Market Segmentation: One Method, Four Examples
by Rajan Sambandam, TRC
Effective market segmentation requires an understanding of the market and the skilled art of finding the appropriate segments. TRC gives four examples of this method's application with results. Read Article »
How to Measure the Value of a Brand
by Rajan Sambandam, TRC
Brand name evokes an inherent value; finding a way to reliably measure that value is crucial in determining product development. A technique called discrete choice conjoint analysis is described in this paper by TRC. Read Article »
Deriving Value from Research: the Use of Conjoint Analysis for Product Development
by Rajan Sambandam, TRC
Marketing research has been used by firms over the last several decades to provide information for decision making. Over time, increasingly sophisticated statistical methods have been developed and deployed in the service of this goal. This article focuses on one such method - conjoint analysis - and its application to product development. Read Article »
Identifying Feature Importance: A Comparison of Methods
by TRC
Understanding what customers want is fundamental to the new product development process as well as to the process of keeping existing products fresh and relevant. To be successful in this area we need to be able to correctly identify what features are important to consumers. Feature importance can be measured using a variety of methods of differing effectiveness. In this paper we will deal with the following methods: Importance Scales, Pick data, Pairwise Comparisons, and Max-Diff.
Read Article »
Monadic Price Testing vs. Price Laddering
by TRC
Compares two popular pricing methods to understand the difference in take rate information. Read Article »
New Product Development: Stages and Methods
by Rajan Sambandam, TRC
TRC identifies the best methods for each stage of the product development process, from Idea Generation through Feature Development, Product Development and Product Testing. Read Article »
Understand Choice in Banking: Use of Discrete Choice Conjoint Analysis
by TRC
Conjoint analysis provides incentive for survey respondents to determine which features must not be omitted in their final purchase. The method closely mirrors decision-making in the real world, and as shown by TRC in this white paper, is applicable to many situations including how customers choose their bank. Read Article »
Want better product ideas? Try smart incentives
by Rajan Samandam, TRC
Idea generation from survey respondents is strongly dependent on incentive. Introducing competition strengthens the quantity and quality of creative responses. TRC provides examples of smart incentives in this white paper. Read Article »
An alternative method of reporting customer satisfaction scores
by Rajan Sambandam and George Hausser of TRC
Though customer satisfaction evaluations are widely used, reporting of these scores has varied from one study to another. This is likely the result of each method’s advantages and disadvantages, as well as the personal preferences and habits of the researcher. In this article we review various reporting methods and outline our method with an example. Read Article »
Identifying the Key Drivers of Brand Image
by TRC
Measuring brand image requires looking at direct effects as well as indirect effects of a company's performance. TRC compares traditional multiple regression with SatiscanTM, a method that can review all possible path models. Read Article »
Improving Call Satisfaction: A Case Study
by TRC
TRC presents a case study of analyzing and improving a call center as an on-going data collection process. Read Article »
Improving Claim Satisfaction: A Case Study
by TRC
A case study on applying full-service market research to help an insurance company improve their client satisfaction with claim handling. Read Article »
Non-Response Bias In Survey Sampling
by TRC
Market research accounts for many scenarios to ensure high quality of data. One of the most overlooked problems is non-response bias. TRC describes ways to reduce its effects through survey design and data adjustment in this white paper. Read Article »
Segmentation Success
by Michael Sosnowski, TRC
This paper explains the basic building blocks of the segmentation process and its implementation. Read Article »
Survey of Analysis Methods Part I
by Rajan Sambandam, TRC
Practical marketing research deals with two major problems: identifying key drivers and developing segments. In this two-part series TRC looks at key driver analysis and segmentation. Read Article »
Survey of Analysis Methods Part II
by Rajan Sambandam, TRC
This is Part II of a series looking at aspects of practical marketing research: identifying key drivers and developing segments. This content describes specific segmentation methods: cluster analysis, neural networks, self-organizing map (SOM), and mixture models. Included is a discussion on ideas for developing good segments. Read Article »
Validating Satiscan Using A Split Sample Approach
by TRC
TRC's SatiscanTM model is tested for validity using call center data and a split sample approach. This shows that SatiscanTM produces similar models when run on random halves of an energy industry dataset. Read Article »
Satiscan and Regression Analysis: A Comparison
by TRC
The comparison shows the advantages of SatiscanTM, an analytical method from TRC, over regression in identifying the correct and cost efficient action steps. Read Article »
TURF: New Methods for Implementation
by Westley Ritz, TRC
TURF is a long-established and quite useful marketing research tool, but not everyone is familiar with how it works, or with the latest developments that can make TURF even more effective. The purposes of this paper are twofold: (1) to explain the technique and (2) to describe the latest methods for implementation.
Read Article »
Product Configuration: A Research Approach for the Times
by Rajan Sambandam & Pankaj Kumar, TRC
The marketplace has shifted in the last decade with the ability of consumers to configure the product they want. This white paper explains the basics of configuration, an approach that mimics the real world of customer driven product design to obtain insight into consumer decision-making. Read Article »
Product Configuration: Evidence for Effectiveness
by Rajan Sambandam & Pankaj Kumar, TRC
This white paper looks at the examples from one product configuration study, the kinds of information that can be derived and the possibilities provided by statistical analysis. Read Article »
New Product Research: A Dynamic Approach to Feature Prioritization
by Pankaj Kumar, Westley Ritz and Rajan Sambandam of TRC
Feature prioritization is a very common new product research problem. Over the last few years, the most popular technique has been Max-Diff. However, as the number of features increases it becomes difficult to use. Bracket is a tournament-based approach that produces Max-Diff like results and can easily prioritize fifty or more features. Read Article »
Doing More with Less: Getting Greater Value from Mobile Quant
by TRC
What “more with less” means with respect to mobile MR, and examples from traditional online studies to challenge existing assumptions about what will and will not work on a mobile device. Read Article »
How to measure the value of a brand?
by TRC
Knowing how to price your product that you can optimize your ROI is key. This video explains various ways to measure the value of a brand and talks about a discrete choice conjoint technique as a perfect approach to measuring the value of a brand. Read Article »
Product Configuration with Michael Sosnowski
by TRC
Consider a person who wants to buy a personal computer. The customer can select exactly the combination desired, subject to a price constraint. Would it be possible to use such a process for research? Read Article »
How to Improve Your Market Segmentation
by TRC
Bob Hull from TRC talks about a market research technique for market segmentation and ways of improving them. Read Article »
Rich Raquet Market Research Consulting
by TRC
Rich Raquet is introducing TRC, a research & analytics firm, specializing in new product research, conjoint, segmentation, brand equity, sat & loyalty. Read Article »





