Using your heatmap

The color of the tile represents the magnitude of difference for each comparison (read more below). Yellow represents the most different pair of data sets while navy blue represents the least different pairing.

This can help you craft a high-level narrative about the competitors, products, customer segments or geographic markets.

Clicking into a specific tile will also present you with several options:

View - this will take you to the comparison explorer where you can see all of the detail of the analysis and build insight cards
Dismiss - this icon can be used to note that you are not interested in further analysis for a particular comparison
Pin - this icon can be used to note that you want to do further investigation for the comparison
Favourite - this icon can be used to note a comparison that you have looked at and contains interesting insights

Understanding difference percentiles

The topical differences in a pair of data sets are compared to determine the difference percentile.

We use a measure called the difference percentile to show the magnitude of these differences compared to a benchmark sample. The percentile score tells you how different the language sets are.

A score of 0% means that the texts are practically the same, while a score of 100% means that they are extremely different. For example, a score of 91% means that the comparison between Funyuns and Ruffles shows the most different data set. On the other hand, a lower score means that the texts are more similar to each other.

To view the difference percentile for each comparison, enable the toggle in the bottom-right of the heatmap.

How are difference percentiles calculated?

The difference percentile is derived from the correlation score between the linguistic features in your data sets.

This score for each pair of data sets is then compared against a benchmark distribution of difference scores generated from a sample of over 5,000 comparisons to understand how different a pair of data sets are.

Why do we use difference percentiles instead of raw scores?

The difference percentile is used rather than the raw correlation score because it clarifies the magnitude of differences between your two data sets, rather than just providing a decimal number.

To put it simply, the colors on Heatmaps take into account all the scores between the individual comparisons and will be relative to your project. The percentiles help you to understand the differences in the wider landscape context as represented by a large sample of comparisons.

Interpreting and using your Heatmaps

Using your heatmap

Understanding difference percentiles

How are difference percentiles calculated?

Why do we use difference percentiles instead of raw scores?