Built into the platform is a model that is representative of general written English. In this article, you will learn:

What is Standard English?

Relative Insight’s Standard English model is a general representation of written English. It is comprised of 9,954,331 words representing 175,954 unique parts of speech from 100,760 different sources. It is comprised of a sample of Wikipedia articles and forum conversations on a wide variety of topics. This model has been built into the platform and can be used for many sorts of comparisons.

While the best comparisons will most often be between similar data sources (e.g. website copy vs website copy, reviews vs news articles), there are several situations in which the Standard English model can be very useful.

Please note that comparison against standard English is not currently available for Relative Health.

When to use Standard English

1. To identify key themes within a data set

When analyzing a new data set, it is often helpful to do some preliminary analysis to identify key linguistic features. This kind of ‘baselining’ can help you determine potential ways to split your data based on the content of the text (topics, words, phrases, emotions and/or grammar) to build additional comparisons.

This approach is also useful when you have a data set that is either too small to be split, or you can’t get your hands on a suitable data set to compare.

Example:

  • You have a large set of reviews or tweets about your brand

  • You create a comparison against Standard English

  • The words ‘functionality’ and ‘design’ each surface with a high relative difference

  • Based on this information, you split your data set based on these words and create a new comparison of people who commented on the design vs those who commented on the functionality

2. When you are interested in frequency analysis

Being comprised of a wide range of sources, Standard English is a good representation of the general distribution of words. If you’re trying to understand what words are ‘over indexing’ then it can provide a suitable basis of comparison.

Example:

  • You are interested in understanding the SEO performance of your website content

  • You create a comparison against Standard English

  • Looking at ‘frequencies’ you search for each of your keywords/phrases to understand if you have created sufficient keyword density

Creating comparisons against Standard English

When prompted to select your basis of comparison when defining your questions, select ‘That’s all’.

If creating a comparison directly from the Data Library, after selecting the data set you are interested in learning about, click into the ‘Models’ tab and select ‘Standard English’.

Interpreting insights when using Standard English

When creating comparisons against the Standard English model, you should expect to see a higher frequency of words surfacing with very high or infinite relative difference values. You should approach these discoveries with caution as much of this will be source or topic-specific language that may not be particularly insightful.

For example, if comparing product descriptions from a supermarket against Standard English, is it likely the topic ‘food’ will surface with a high relative difference. However, this is not surprising given the nature of the text being analyzed.

Update: Now, you can also perform comparison against the Standard German model (as long as your data is in German) - just follow the same steps as above and choose the appropriate language in the 'Models' tab.

Did this answer your question?