Big Data Integration and Your Customer Genome

One of business’ Big Data challenges is the integration of different data silos. The integration of these disparate customer data helps your analytics team to identify the interrelationships among the different pieces of customer information, including their values, interests, attitudes about your brand, interactions with your brand and more. Integrating information/facts about your customers allows you to gain an understanding about how all the diverse variables work together (i.e., are related to each other), driving deeper customer insight.

A data set can be described by the sample size (number of things you are measuring) and  the number of variables (number of facts about a given thing). I revisited an article published earlier this year on the size of Big Data (How Big is Big Data?). The interactive graphic plots many different data sets along these two size-related dimensions.

Data sets in the upper left quadrant of this chart include a few things on which many facts are known (Human genome). Data sets in the lower right quadrant include many things on which a few facts are known (US Census). Data sets about the human genome (see human genome project) characterize the former data set. Data silos in business characterize the latter.

Mapping and understanding of all the genes of humans allows for deep personalization in healthcare through focused drug treatments (i.e., pharmacogenomics) and risk assessment of genetic disorders (e.g., genetic counseling, genetic testing).  The human genome project allows healthcare professionals to look beyond the “one size fits all” approach to a more tailored approach of addressing healthcare needs of a particular patient.

Siloed data sets prevent business leaders from gaining a complete understanding of their customers. In this scenario, analytics can only be conducted within one data silo at a time, restricting the set of information (i.e., variables) that can be used to describe a given phenomenon; your analytic models are likely underspecified (not using the complete set of useful predictors), decreasing your model’s predictive power / increasing your model’s error.  The bottom line is that you are not able to make the best prediction about your customers because you don’t have all the necessary information about them.

Data Integration as Your Customer Genome Project

Customer Genome Project

Figure 1. Data Integration is an exercise in creating your customer genome.

Using the 2×2 graphical approach to understanding data size, we can see how the value of your integrated business data is greater than the sum of its parts. Figure 1 looks at four different scenarios of how businesses can use their data. In the lower right quadrant, it is business as usual; when departments keep their data siloed, each department only knows a few things about the customers. Analytics is able to build general rules for broad customer segments (e.g., male vs female; age segments).

The lower left quadrant represents one-off projects where a sample of customers are used to study a phenomenon. Analytics in these types of projects may be less valuable due to lack of generalizability (to the other customers) and poor models (e.g., underspecified) due to omitted metrics.

Key Account programs are best categorized as projects falling in the upper left quadrant, where you know a lot of things about a few “important” customers (Accounts). In these situations, analytic results of a small set of accounts may be difficult to generalize to the entire customer base.

Integrated data sets (those in the upper right quadrant) allow you to a lot of things about all your customers. Analytics applied to these types of data help you generate better predictive models containing all the key variables that are useful in predicting your outcome. Additionally, using these “customer genomic” type data sets, you are able to target specific customers with personalized treatment that resonates with them (i.e., segmentation on steroids).


Businesses can get much more useful insight about their customers if they integrate their data silos. The mapping and understanding of human genes provides a useful analogy for business leaders. Each data silo contains only a small part of what defines your customers. Your insights are limited by the variables used in your modeling. By knowing more about your customers (the variables and how they are related to each other), you can better describe each specific customer, build better, more comprehensive predictive models to improve the customer experience through personalization.

1 Comment

Leave a Reply

Skip to toolbar