🔢 Data and Spreadsheets

As a multidisciplinary field, data science uses myriads of tools for different tasks within the phases of the data science workflow, and we'll explore some of these tools in this course. In this section, we'll start by looking at spreadsheets, and further explore a popular web-based spreadsheet software - Google Sheet. To start with, let us understand what we mean by spreadsheets and why we need them as data scientist.

What are Spreadsheets?

They are often designed to hold numeric and short text data types. Today, there are many spreadsheet programs out there which can be used locally on your PC or online through your browsers. They provide different features to ease data manipulation as shown below.

Overall, spreadsheets are a useful tool for data science tasks, particularly for tasks that involve organizing, manipulating, and analyzing data on a smaller scale. However, for more complex data analysis tasks or larger datasets, specialized software tools and/or programming languages may be required.

How can i use spreadsheet?

Popular spreadsheet softwares currently available includes Microsoft Excel, Apple Numbers, LibreOffice, OpenOffice, Smartsheet, and Zoho Sheet among others. However, Microsoft Excel is the most popular within the data science communities. To get started quickly without any installation bottleneck, we'll be using Google Sheet - a web-based spreadsheet.

Next, we'll explore a sample dataset using Google Sheet. As we've learnt in the previous video, you can have more than one worksheet in a workbook. In this sample dataset, we have 3 worksheets with different dataset.

  • corona_virus - official daily counts of COVID-19 cases, deaths and vaccine utilisation.
  • movies - information about movies, including their names, release dates, user ratings, genres, overviews, and others.
  • emissions - information about methane gas emissions globally.

👩🏾‍🎨 Practice: Data playground... 🎯

  1. Create a copy of this dataset for your own use.
  2. Explore each worksheet to have a sense of what the data represent.
  3. Attempt the following questions.
    • Which country 🏳️ has the highest number of active COVID-19 cases?
    • Which movie 🎞️ has the lowest budget?
    • What is the total emmisions 🌅 globally?
    • Try to change the 🔴 colour of each workbook.

Note: Remember to use the built-in functions where neccessary.