What is data cleaning?

Data cleaning allows you to exclude certain elements of a data set that may otherwise distort the analysis. 

The available cleaning options will vary depending on the data source you are uploading but may include the removal of:

  • Retweets

  • Spam and promotions

  • Duplicates and similar posts

  • Posts from bots, public figures and organisations

Data cleaning can be enabled at the time of upload, or afterwards via the Data Library.

Data cleaning at the time of upload

  1. When creating a new project or question, select the applicable data source/format when prompted

  2. Use the small arrow next to 'Show cleaning options' to display cleaning options

  3. Select the applicable cleaning options you wish to enable and click 'Save'

Enabling data cleaning after data has been uploaded

  1. Navigate to the relevant project folder in the Data Library

  2. Use the checkboxes to select the data set(s) for which you want to enable or change cleaning options

  3. Select the washing machine icon at the top of the screen

  4. Select the desired cleaning options and click 'Save'

The results of any existing comparisons that include data sets for which you have changed the cleaning options will automatically be updated.

Did this answer your question?