♻️ Data Cleaning
In data science, unclean
data refers to a dataset that contains errors, inconsistencies, or inaccuracies, making it unsuitable for analysis without preprocessing. Such data may have missing values, duplicate entries, incorrect formatting, inconsistent naming conventions, outliers, or other issues that can impact the quality and reliability of the data. These problems can arise from various sources, such as...
Cleaning the data involves identifying and addressing these issues to ensure that the dataset is accurate, complete, and reliable before further analysis or modeling takes place.
Data cleaning with Excel
In Excel, data cleaning can involve tasks such as removing duplicate values, correcting misspellings, handling missing data by filling in or deleting the values, and formatting data appropriately. Excel provides us with various built-in functions and tools, such as filters, conditional formatting, and formulas, that can help with data cleaning tasks.
When we carry out data cleaning in Excel, we can improve the quality of our datasets and ensure that the data is ready for further analysis or visualization. To have a good understanding of how to clean a dataset using Microsoft Excel...
- Watch the next video 📺.
- Pause and practice along with the tutor.
A brief recap of data cleaning using Excel...
In the video above, we have covered the following techniques in data cleaning
- Separating Text - separating multiple text in a column into different cells.
- Removing Duplicates - removing duplicate data with
unique()
formula andreplace
feature. - Letter cases - using
proper()
to remove inconsistent capital letters. - Spacing fixes - removing spacing with
trim()
formula. - Splitting text - flash fill to automatically separate data such as city and country
- Percentage formats - changing numbers to percentages
- Text to Number - text to values for further calculations
- Removing Blank Cells - removing blank cells from a dataset.
👩🏾🎨 Practice: Clean the smell... 🎯
A smaller sample of the global COVID-19 dataset is provided here for this exrcise.
- Create a copy of the dataset for your own use.
- Explore the dataset to have a sense of what the it represent.
- By leveraging your data cleaning skills, attempt the following...
- Remove duplicate data if exist
- Handle blank space
- Convert the column from text to number
- Implement other cleaning techniques of your choice
- Submit this exercise using this form.
👉🏾 Next, we'll deep dive into creating cool visualization with Excel.