Home  Contact  About Us  Market Research Blog  Account Login  Add New Listing  Follow GreenBook on Twitter 
Terms of Use
 
 
 
 
 

White Paper

 Print   Email

TRC

Fort Washington, PA
P: (215) 641-2200
E: admin@trchome.com
W: www.trchome.com

Research and analytics firm that pairs customized solutions with senior-level attention and the latest choice modeling approaches to solve business

» See all resourcesshared by TRC.

Want to share your content on GreenBook.org?

Please call (212) 849-2753.

Cluster Analysis Gets Complicated

Rajan Sambandam, TRC

Segmentation studies using cluster analysis have become commonplace. However, the data may be affected by collinearity, which can have a strong impact and affect the results of the analysis unless addressed. This article investigates what level presents a problem, why it's a problem, and how to get around it. Simulated data allows a clear demonstration of the issue without clouding it with extraneous factors.

Collinearity is a natural problem in clustering.

So how can researchers get around it?
Cluster analysis is widely used in segmentation studies for several reasons. First of all, it’s easy to use. In addition, there are many variations of the method, most statistical packages have a clustering option, and for the most part it’s a good analytical technique. Further, the non-hierarchical clustering technique k-means is particularly popular because it’s very fast and can handle large data sets. Cluster analysis is a distance-based method because it uses Euclidean distance (or some variant) in multidimensional space to assign objects to clusters to which they are closest. However, collinearity can become a major problem when such distance based measures are used. It poses a serious problem that, unless addressed, can produce distorted results.

Collinearity can be defined simply as a high level of correlation between two variables. (When more than two variables are involved, this would be called as multicollinearity.) How high does the correlation have to be for the term collinearity to be invoked? While rules of thumb are prevalent, there doesn't appear to be any strict standard even in the case of regression-based key driver analysis. It's also not clear if such rules of thumb would be applicable for segmentation analysis.

Collinearity is a problem in key driver analysis because, when two independent variables are highly correlated, it becomes difficult to accurately partial out their individual impact on the dependent variable. This often results in beta coefficients that don't appear to be reasonable. While this makes it easy to observe the effects of collinearity in the data, developing a solution may not be straightforward.

The problem is different in segmentation using cluster analysis because there’s no dependent variable or beta coefficient. A certain number of observations measured on a specified number of variables are used for creating segments. Each observation belongs to one segment, and each segment can be defined in terms of all the variables used in the analysis. From a marketing research perspective, the objective in each case is to identify groups of observations similar to each other on certain characteristics, or basis variables, with the hope this would translate into opportunities. In a sense, all segmentation methods are trying for internal cohesion and external isolation among the segments.

When variables used in clustering are collinear, some variables get a higher weight than others. If two variables are perfectly correlated, they effectively represent the same concept. But that concept is now represented twice in the data and hence gets twice the weight of all the other variables. The final solution is likely to be skewed in the direction of that concept, which could be a problem if it’s not anticipated. In the case of multiple variables and multicollinearity, the analysis is in effect being conducted on some unknown number of concepts that are a subset of the actual number of variables being used in the analysis.

For example, while the intention may have been to conduct a cluster analysis on 20 variables, it may actually be conducted on seven concepts that may be unequally weighted. In this situation, there could be a large gap between the intention of the analyst (clustering 20 variables) and what happens in reality (segments based on seven concepts). This could cause the segmentation analysis to go in an undesirable direction. Thus, even though cluster analysis deals with people, correlations between variables have an effect on the results of the analysis.

Can It Be Demonstrated?
Is it possible to demonstrate the effect of collinearity in clustering? Further, is it possible to show at what level collinearity can become a problem in segmentation analysis? The answer to both questions is yes, if we’re willing to make the following assumptions: (1) Regardless of the data used, certain types of segments are more useful than others and (2) The problem of collinearity in clustering can be demonstrated using the minimum requirement of variables (i.e., two).

These assumptions are not as restrictive as they initially seem. Consider the first assumption. Traditionally, studies that seek to understand segmenting methods (in terms of the best method to use, effect of outliers, or scales) tend to use either real data about which a lot is known, or simulated data where segment membership is known.

However, to demonstrate the effect of collinearity, we need to use data where the level of correlation between variables can be controlled. This rules out the real data option. Creating a data set where segments are pre-defined and correlations can be varied is almost impossible because the two are linked. But in using simulated data where correlation can be controlled, the need for knowing segment membership is averted if good segments can be simply defined as ones with clearly varying values on the variables used.

Segments with uniformly high or low mean values on all the variables generally tend to be less useful than those with a mix of values. Since practicality is what defines the goodness of a segmentation solution, this is an acceptable standard to use. Further, segments with uniformly high or low values on all variables are easy to identify without using any segmentation analysis technique. It’s only in the mix of values that a richer understanding of the data emerges. It could be argued that the very reason for using any sort of multivariate segmentation technique is to be able to identify segments with a useful mix of values on different variables.

Addressing the second assumption, the problem is a lot easier to demonstrate if we restrict the scope of the analysis to the minimum. Using just two variables is enough to demonstrate collinearity. Since bivariate correlation is usually the issue when conducting analysis, the results are translatable to any number of variables used in an actual analysis when taken two at a time. With only two variables being used, four segments can adequately represent the practically interesting combinations of two variables. Hence the results reported here are only in the two-to four-segment range, although I extended the analysis up to seven segments to see if the pattern of results held.

To read the rest of this article in pdf format, click here.

This article was written by Rajan Sambandam of TRC, a full-service market research provider located in Fort Washington, PA.

[Nov 16, 2009]



Other Resources By TRC

Better Questions For Segmentation: Use of MAX-DIFF | White Paper

Rajan Sambandam, TRC

Using Maximum Difference Scaling as a method in designing surveys may ensure more useful results in your market research. It is a comparative method based on importance that sidesteps the problems associated with traditional importance scales. TRC explains the mechanics behind this method through a detailed example in this white paper. | Read White Paper »


Database Scoring with Object Based Segmentation | White Paper

Rajan Sambandam, TRC

Segmentation created from company databases are often lacking the rich segmentation schemes formed by attitudinal surveys. A new approach is Object based segmentation that uses database variables at the basis for forming attitudinal segments, leaving both markets classifiable with clear demographic segments. TRC compares traditional segmentation analysis with Object based. | Read White Paper »


Asymmetry in Product Features: Use of the Kano Method | White Paper

Rajan Sambandam, TRC

The presence or absence of product features strongly affect consumer satisfaction with the design. Comparing these features using asymmetry analysis can help identify satisfiers and dissatisfiers from among the features of a product. The Kano method is similar but results in categorizing each respondent's answers. TRC presents this essential method of deciding new product features in detail. | Read White Paper »


Conjoint Analysis versus Self-Explicated Method: A Comparison | White Paper

Rajan Sambandam, TRC

Determining feature importance in a product can be divided into two techniques - top-down methods where a customer evaluates the whole product at once, and bottum-up methods where features are evaluated individually or in sets. The former method, Conjoint Analysis, is more common while the latter method, Self-Explicated Method, is not widely used but has practical advantages. TRC compares the two methods in this white paper. | Read White Paper »


Product Configurator | White Paper

Rajan Sambandam, TRC

To help customers purchase the right product, companies often use product configurators - tools that let customers design their purchase before ordering. This method is employed as a market research technique, similar to conjoint analysis but without some of the constrictions. This white paper from TRC explains an appropriate use of the product configurator method. | Read White Paper »


Market Segmentation: One Method, Four Examples | Case Study

Rajan Sambandam, TRC

Effective market segmentation requires an understanding of the market and the skilled art of finding the appropriate segments. TRC gives four examples of this method's application with results. | Read Case Study »


How to Measure the Value of a Brand | White Paper

Rajan Sambandam, TRC

Brand name evokes an inherent value; finding a way to reliably measure that value is crucial in determining product development. A technique called discrete choice conjoint analysis is described in this paper by TRC. | Read White Paper »


Asymmetry Analysis | White Paper

Rajan Sambandam, TRC

Asymmetrical relationships among variables in satisfaction research have been increasingly investigated in the last decade. However most of the work has been published in academic journals (such as Marketing Science and Journal of Marketing Research), which may not always be accessible to practical market researchers. The objective of this article is to both provide a simple introduction to this topic and add to the existing body of knowledge. | Read White Paper »


Deriving Value from Research: the Use of Conjoint Analysis for Product Development | White Paper

Rajan Sambandam, TRC

Marketing research has been used by firms over the last several decades to provide information for decision making. Over time, increasingly sophisticated statistical methods have been developed and deployed in the service of this goal. This article focuses on one such method - conjoint analysis - and its application to product development. We will briefly look at what conjoint analysis is and a real life example of its application that provided true value to a company. | Read White Paper »


Identifying Feature Importance: A Comparison of Methods | White Paper

TRC

Understanding what customers want is fundamental to the new product development process as well as to the process of keeping existing products fresh and relevant. To be successful in this area we need to be able to correctly identify what features are important to consumers. Feature importance can be measured using a variety of methods of differing effectiveness. In this paper we will deal with the following methods: Importance Scales, Pick data, Pairwise Comparisons, and Max-Diff. | Read White Paper »


Monadic Price Testing vs. Price Laddering | White Paper

TRC

Compares two popular pricing methods to understand the difference in take rate information. | Read White Paper »


New Product Development: Stages and Methods | White Paper

Rajan Sambandam, TRC

TRC identifies the best methods for each stage of the product development process, from Idea Generation through Feature Development, Product Development and Product Testing. | Read White Paper »


Understand Choice in Banking: Use of Discrete Choice Conjoint Analysis | White Paper

TRC

Conjoint analysis provides incentive for survey respondents to determine which features must not be omitted in their final purchase. The method closely mirrors decision-making in the real world, and as shown by TRC in this white paper, is applicable to many situations including how customers choose their bank. | Read White Paper »


Want better product ideas? Try smart incentives | White Paper

Rajan Samandam, TRC

Idea generation from survey respondents is strongly dependent on incentive. Introducing competition strengthens the quantity and quality of creative responses. TRC provides examples of smart incentives in this white paper. | Read White Paper »


An alternative method of reporting customer satisfaction scores | White Paper

Rajan Sambandam and George Hausser of TRC

Though customer satisfaction evaluations are widely used, reporting of these scores has varied from one study to another. This is likely the result of each method’s advantages and disadvantages, as well as the personal preferences and habits of the researcher. We recently had the opportunity to report customer satisfaction scores in a unique format that assimilates the advantages of various methods and provides the manager with a clearer picture of where to take action. In this article we review various reporting methods and outline our method with an example. Further, we also discuss a type of reporting that is becoming increasingly common especially in the health care arena, i.e., the issue of comparing the performance of various facilities or centers that belong to a single network or organization. We show how our method can be applied for this purpose and why it is advantageous. | Read White Paper »


Identifying the Key Drivers of Brand Image | Service

TRC

Measuring brand image requires looking at direct effects as well as indirect effects of a company's performance. TRC compares traditional multiple regression with SatiscanTM, a method that can review all possible path models. | Read Service »


Improving Call Satisfaction: A Case Study | Case Study

TRC

TRC presents a case study of analyzing and improving a call center as an on-going data collection process. | Read Case Study »


Improving Claim Satisfaction: A Case Study | Case Study

TRC

A case study on applying full-service market research to help an insurance company improve their client satisfaction with claim handling. | Read Case Study »


Non-Response Bias In Survey Sampling | White Paper

TRC

Market research accounts for many scenarios to ensure high quality of data. One of the most overlooked problems is non-response bias. TRC describes ways to reduce its effects through survey design and data adjustment in this white paper. | Read White Paper »


Segmentation Success | White Paper

Michael Sosnowski, TRC

This paper explains the basic building blocks of the segmentation process and its implementation. | Read White Paper »


Survey of Analysis Methods Part I | White Paper

Rajan Sambandam, TRC

Practical marketing research deals with two major problems: identifying key drivers and developing segments. In this two-part series TRC looks at key driver analysis and segmentation. | Read White Paper »


Survey of Analysis Methods Part II | White Paper

Rajan Sambandam, TRC

This is Part II of a series looking at aspects of practical marketing research: identifying key drivers and developing segments. This content describes specific segmentation methods: cluster analysis, neural networks, self-organizing map (SOM), and mixture models. Included is a discussion on ideas for developing good segments. | Read White Paper »


Validating Satiscan Using A Split Sample Approach | Service

TRC

TRC's SatiscanTM model is tested for validity using call center data and a split sample approach. This shows that SatiscanTM produces similar models when run on random halves of an energy industry dataset. | Read Service »


Satiscan and Regression Analysis: A Comparison | Service

TRC

The comparison shows the advantages of SatiscanTM, an analytical method from TRC, over regression in identifying the correct and cost efficient action steps. | Read Service »


TURF: New Methods for Implementation | White Paper

Westley Ritz, TRC

TURF is a long-established and quite useful marketing research tool, but not everyone is familiar with how it works, or with the latest developments that can make TURF even more effective. The purposes of this paper are twofold: (1) to explain the technique and (2) to describe the latest methods for implementation. | Read White Paper »

 
 Follow GreenBook on Twitter