The art of collecting, cleaning and storing small data. Part 2
Previously we listed few useful tips on collecting small data. Data cleaning is what typically goes next. Thus, today we are going to emphasise four crucial ideas which can help you save valuable time, avoid headaches and conduct qualitative data analysis.
Data cleaning
Tip #1. Create a separate worksheet for data cleaning.
Firstly, after you got raw dataset it is important to create its duplicate. Data cleansing typically is both manual and automatic work; therefore, it is quite simple to make a mistake while manipulating raw data. Making duplicate in a few clicks might save a lot of time and energy as you will always have an original dataset at your disposal. Secondly, do all cleaning in a separate worksheet next to the worksheet with raw data. The reason is pretty much the same.
So, at this stage, you have two spreadsheets. One with original raw data and one for cleaning with at least two worksheets.
Tip#2. Clean distinct columns on a separate worksheet.
It is a common type of cleaning process to work on the particular column. Using "Find and Replace" function for correcting misspelt entries, applying codes, identifying data types are examples of such operations. The main purpose of creating a separate worksheet for this is to avoid mechanical errors which can easily appear in other columns.
Tip#3. Use Excel functions and formulas.
Excel is a great tool with a shed-load of built-in functions and formulas that can help with data cleaning. Such functions as "Remove duplicate", "Find and Replace", formulas as "TRIM()", "CLEAN()", "SUBSTITUTE()" can be very helpful. For instance, typical excel problem is that it ignores spaces in contrast to other analytical or stats packages. Hence, due to data with spaces you might get inaccurate results if you use imported data from excel.
Excel training is definitely worth your time. Give a look at these courses from Microsoft and TUDelft.
Tip#4. Send errors back to the original source.
Frequently, an analyst works with shared dataset such as a departmental database. Therefore, it is a good habit to report errors you have found back to the original source. Then, next time you will save your time since you will have less cleaning to do.
Cleaning raw data is a part analyst everyday routine. It is not as fascinating as data analysis or data visualisation but it is a compulsory task. Hence, creating and using rules and tricks for this process can help a lot to make it more organised and satisfactory. In the next blog, we will list tips for storing data, the last part of preparing your data for further analysis or sharing.