Customer Data Platform — What features makes for a great CDP

Photo by QuickOrder on Unsplash

CDPs, Customer data platform is a term that is being thrown around a lot these days, without many people genuinely understanding what they do. There exists an abundance of solutions on the market which each tackle different aspects, but typically CDPs tend to varying degrees the same five principal axes:

  1. Identity
  2. Data Cleansing, Transformation and Enrichment
  3. Data Centralization
  4. Audience and Segmentation
  5. Data Integration & Analytics


CDPs build this customer profile around a concept of identity. An identity strategy shapes how data is linked or merge together. When looking at identity strategies, there usually is four central axis to look at:

  • Deterministic vs. Probabilistic
  • Hard vs. Soft Merge
  • Gold attributes
  • Treatment of historical data

Deterministic vs. Probabilistic: Most CDPs out there handle deterministic matching, meaning that they would match customers when there is some explicit identifier such as a customer id, email, name, or phone number being used that allow for the customers’ records to be linked together with an exact match. Some CDPs such as AgilOne, offer on top of a deterministic matching strategy, the ability to do propensity matching.

Propensity matching allows for the merging of records when there is a high likelihood that the records belong to the same user, for instance, if you have the same first name, last name, and zip code.

Hard vs. Soft merge: Identity strategies also diverge in how they are storing the different profiles once a match has happened. Hard merge strategies will combine both profiles into a single profile record, making it, depending on the situation, either very difficult or impossible to revert the merge at a later stage. Soft merge strategies, on the other hand, keep every record entry as they were originally provided.

They work by creating associations between the different profile records. As such, they are very appropriate when using probabilistic matching strategies or any deterministic matching rule with a high degree of uncertainty that might need to be rolled back.

Gold attributes: Identity strategies also varies as to how to promote specific attributes to the golden record. Some rely on using the latest available profile. Some rely on the latest attribute update date; others allow the promotion by “trusted” source … It is worth considering how your CDP handles attribute promotion as not all the CDPs have the same degree of refinement in this aspect.

Treatment of Historical data: The identity also defines how historical data is promoted to the golden profile. Do we want all historical information for all merged profiles to be part of this “golden record,” or do we want only to capture new information going forward. For instance, let’s look at data captured on an e-commerce website, before the customer signing up or purchasing a product, there is little to no PII data provided by the customer that would allow to identify him/her properly.

Once a customer has signed up or purchased a product we can usually associate that data to a known profile if it exists or create a new one within the CDP. Whether or not to leverage the data for when the customer was not yet identified is a choice to make when defining the identity strategy to use.

Data Cleansing, transformation and Enrichment

Data Cleansing and transformation: CDPs can normally leverage specific data transformation at data ingestion, Tealium, for instance, offers a Tally variable that would increase a counter for each event provided or rolling sums variables. AgilOne calculates specific attributes based on its’ internal data model and calculates specific propensity scores. Other vendors such as mParticle take a more developer-focused approach towards data cleansing and transformation by enabling an AWS Lambda function callback, essentially executing your own piece of code when there is incoming data.

Data Enrichment: Some CDPs can enrich customer data through integration with third-party data vendors such as Experian Mosaic and Consumerview, Axiom Liveramp or Oracle Datalogix, Cookie Syncing, or through leveraging predictive models.

Aggregate Values Enrichment

Certain CDPs offer the ability to enrich the raw data being provided by calculating aggregate values, rollups, or performing certain data transformation.

Raw data enrichment

There is a wide range of data that can be obtained from third-party vendors, ranging from demographic data, lifestyle and interest data, financial data, purchase behavior. Below a sample list of data points that can be acquired through these third parties:

  • Demographic: age, gender, education, occupation, household income, marital status, or the number of kids in the household age
  • Financial data: credit score, household profitability score, property and mortgage data, credit card utilization
  • Lifestyle and interest data: such as interest in sports, video games, movies, traveling
  • Purchase behavior: Affinity towards certain product categories, affinity to purchase in specific channels such as online e-commerce
  • Address validation: such as whether or not the address on record is still an active address.

Predictive models

Predictive models can also be used to enrich the customer profile available, some of the types of scoring, that can be provided by CDP include:

  • CLV: Customer lifetime value (CLV) prediction
  • Discount sensitivity: such as propensity to purchase
  • Propensity models: such as likelihood to visit, convert, purchase, churn, open, …
  • Recommendation: Content or Product affinity, next best action..
  • Predictive segmentation: behavioral clustering/segmentation, or lookalike modeling
  • Household Clustering: Understanding who of your customers are part of the same household.

A lot of this data enrichment be done directly in the CDP or through external 3rd providers.

Data Centralization

CDPs serve the purpose of data centralization. For that purpose they usually offer an API that allows for the looking of user attributes, ingested events as well as audience membership.

This data centralization allows for the use in personalization or A/B testing. Some CDPs notably offers the feature of splitting Audiences across different tests and control groups. This allows us to have a unified representation of group assignments across the different touchpoints.

The different CDPs vary a lot in their ability to act as a central information hub. The offerings vary by retention policy, API limits or pricing model, and their ability to serve both customer profile data and related events.

Audience and Segmentation

mParticle Segmentation interface

CDP offers the ability to segment the user base with any attributes ingested in the platform. These audience segments can then be exported to the different systems that have been integrated with the CDP.

CDPs traditionally work with “Adaptive Segments,” i.e., segments that are constantly recalculated.

In some cases, they might offer “static segments,” segments that are calculated only once. This is usually the case when the CDP has to process “cold” data or process custom segments created by SQL queries.

Data Integration & Analytics

CDPs facilitate the data integration of customer data between different systems. CDP typically have connector marketplaces where integration can be configured in just a few clicks.

Tealium’s eventstream display ads connector integration

Some of the integrations they offer such as an integration to Google Analytics through the measurement protocol, or to certain ads vendors is often referred to as server-side tagging. The typical areas of integration tackled by CDPs are:

  • Advertising: DSPs, Facebook, Google Marketing Platform,…
  • Analytics Measurement & AB Testing: Google Analytics, Adobe Analytics, Optimizely, MixPannel …
  • Email/Marketing Automation: MailChimp, Sendgrid, Salesforce Marketing Cloud, Emarsys, …

Segment Marketing Cloud integration

The depth of the integration will vary by specific platform and CDP, some offering just an audience connector or a feed connection, another offering as well an event connector and/or a 2-way connector.

Some CDPs allow for leveraging the customer data for analytics purposes. AgilOne and Lytics for instance, give access to their internal data schemas for querying, analysis, and segmentation purposes. CDPs that offer features within the analytics space, normally address the needs for providing data for analytic purposes across two dimensions, 1) Data Access and 2) Data structures.

Data Access: There is a variety of ways that CDPs offer use to access and leverage the data for analytics purposes. Some offer the ability to use SQL like queries for segmentation purposes, an interactive query environment, dashboard integration or the ability to connect directly to the CDP’s database using an ODBC or JDBC connection, or to provide exports options directly to databases or data warehouses, essentially allowing third-party software to leverage this data.

mParticle’s datawarehouse export

Data Structure: These CDPs tend to rely on relational models, and normally include data-structure more involved than just customer attributes and events. They might also include product master data, location-based master data, or for CDPs oriented towards the retail world store master data. Some CDPs such as Treasure data let you furthermore define your schemas for ingestion and data processing.

Sources: https://medium.com/analytics-and-data/customer-data-platform-what-features-makes-for-a-great-cdp-74227bc4d028