Best practices to arrive at clean survey data

Best practices to arrive at clean survey data

Once data is collected, all you have is raw data. It needs to be processed to get actionable insights out of it. Heard of ‘garbage in, garbage out’? It applies very well to survey data.


To begin with, the data collected can have problems with it.

For example, a survey response may show the age of the respondent to be 243. In another case, they may have written their own name as their city of residence. Before performing any analysis on the data, such instances need to be fixed – either remove those responses, or correct them.

Here’s a list of techniques performed on the data before it is ready for analysis:

Data labelling

Categorize the possible response to each question as numerical or categorical. In case of numerical data you could further define a set of limits, like in case of age, so any discrepancies stand out.

💡 Tip: Start by picking the right question type for your questions. For example, EV allows you to create both a numerical type question and a short answer type. If you are looking to get someone’s age, a short answer would do, but a numerical type would be more apt. This is more true in case of phone numbers where you typically find that some input the country code while others don’t. EV provides an exclusive question type for phone numbers that addresses this by letting the survey taker choose from a drop down of countries.

Data scrubbing

Quite often you find survey respondents introducing typos. They could wrongly spell a country name, a sport, or a food item. This needs to be cleaned up.

💡 Tip: Consider using drop down or multi choice question types for options that you know would be limited. List of cities, pin/zip codes, even favorite items are better addressed with a limited set of options.

Partial response

Survey creators find themselves in a catch in deciding whether they should make responding to all questions mandatory before a survey taker submits their response. Enforcing this may bring down the number of responses, while not doing so could leave them with partial responses. In case of partial responses, a decision needs to be taken whether to discard the entire response, or figure out a way to fill the missing values.

💡 Tip: Once you have completed preparing the survey, try taking it yourself. If you find questions or options that are not as important as they seemed when you created them, be ruthless and get rid of them. There is no magic formula for how many questions a survey should have for a taker to not feel the length of it, shorter is always better.


It’s important not to let the data get heavily influenced by one demographic if you are looking to compare two or more demographics over their preferences. For example, let’s say you run a survey that compares food habits of two generations, millennials and gen z. Of the 100 responses you received 75 come from millennials. You could drop 50 of these so compare 25 responses each from the two groups so the data is balanced.

💡 Tip: When distributing a survey you have no idea who would fill it up. However, once you start seeing the submissions, you could consider sending the survey to people or segments that are important but are under represented.


When working with a sample from a large dataset, it is important to shuffle the data before picking a sample. Say you have 10,000 responses to a survey and you want to pick a sample of 25 for a qualitative analysis. It is important that this sample of 25 is picked at random for it to be a good sample. Shuffling the data helps you achieve this and prevent any unwanted patterns introduced by a handful of responses that have come from a group that took the survey at the same time.

💡 Tip: A simple way to do this is to export the data on to a spreadsheet and assign a random number to each row. Once done, simply sort the data using this random number so the chances of data concentration are low.

Can you think of any more tips? Please share them in your comments! Go ahead and dust that data you collected!