Practices
COVID-19 Pandemic
This practice exercise involve working with the COVID-19 pandemic dataset. Here, you'll mainly work on cleaning the dataset.
TODO
Using your knowledge of data cleaning, clean this dataset by...
- Identifying missing values: The first step is to identify any missing values in the data. This can be done using the
isnull()
function in Python. - Fill missing values: Once the missing values have been identified, they need to be filled. This can be done using a variety of methods, such as the mean, median, or mode.
- Removing outliers: Outliers are data points that are significantly different from the rest of the data. They can distort the results of analysis, so it is important to remove them. Outliers can be identified using the zscore() function in Python.
- Normalize the data: The data may need to be normalized before it can be analyzed. This means that the data should be converted to a common scale. This can be done using the
min-max
normalization method. You can read about this!
Here are some additional tips for data cleaning:
- Be careful not to introduce bias into the data when cleaning it.
- Test the data after cleaning it to make sure that it is still valid.
- Document the cleaning process so that it can be repeated if necessary.
Submission
You are required to submit documentation for practice exercises over the course of the term. Each one will count for 1/10 of your practice grade, or 2% of your overall grade.
- Practice exercises will be graded for completion not perfect correctness.
- You have to document that you did the work, but we won't be checking if you got it right.
- You MUST attempt the quiz
Practices - Data Collection and Cleaning
on Gradescope after the exercise to get the grade for this exercise.
Your log will count for credit as long as:
- It is accessible to your instructor, and
- It shows your own work.