Data cleaning allows you to exclude certain elements of the language set that may otherwise distort the analysis. 

The available options will vary depending on the data source you are uploading but may include the removal of:

  • Retweets ♻️
  • Spam and promotions 🗑️
  • Duplicates and similar posts
  • Posts from bots, public figures and organisations 🤖🏢

Data cleaning can be enabled at the time of upload, or afterwards from within the Data Library.

During upload

  1. Select the applicable data source/format
  2. Use the small arrow next to 'Show cleaning options' to display cleaning options
  3. Select the applicable cleaning options you wish to enable
  4. Click 'Save'

From the Data Library

  1. Use the checkboxes to select the data set(s) for which you want to enable or change cleaning options
  2. Select the washing machine icon at the top of the screen
  3. Select the desired options
  4. Click 'Save'

Note: any comparisons that have been built using a language set for which you have changed the cleaning options will automatically be updated.

Did this answer your question?