Practices

COVID-19 Pandemic

This practice exercise involve working with the COVID-19 pandemic dataset. Here, you'll mainly work on cleaning the dataset.

COVID-19 Pandemic dataset

TODO

Using your knowledge of data cleaning, clean this dataset by...

  • Identifying missing values: The first step is to identify any missing values in the data. This can be done using the isnull() function in Python.
  • Fill missing values: Once the missing values have been identified, they need to be filled. This can be done using a variety of methods, such as the mean, median, or mode.
  • Removing outliers: Outliers are data points that are significantly different from the rest of the data. They can distort the results of analysis, so it is important to remove them. Outliers can be identified using the zscore() function in Python.
  • Normalize the data: The data may need to be normalized before it can be analyzed. This means that the data should be converted to a common scale. This can be done using the min-max normalization method. You can read about this!

Here are some additional tips for data cleaning:

  • Be careful not to introduce bias into the data when cleaning it.
  • Test the data after cleaning it to make sure that it is still valid.
  • Document the cleaning process so that it can be repeated if necessary.

Submission

You are required to submit documentation for practice exercises over the course of the term. Each one will count for 1/10 of your practice grade, or 2% of your overall grade.

  • Practice exercises will be graded for completion not perfect correctness.
  • You have to document that you did the work, but we won't be checking if you got it right.
  • You MUST attempt the quiz Practices - Data Collection and Cleaning on Gradescope after the exercise to get the grade for this exercise.

Your log will count for credit as long as:

  • It is accessible to your instructor, and
  • It shows your own work.