Data Science Tools
As previously stated, data scientist use different combination of tools on a daily basis to capture, organize, manipulate, analyze, visualize,a and communicate their findings. In this section, we are going to explore the most popular popular tools used by data scienctist. In this lesson, we'll be focus on some tools as listed below, however, other tools will be explored as we progress with the course.
Python
Just the same way we use natural languages like swahili, english, french, arabic, and spanish to communicate, we also need to communicate with computers using some predefined languages known as programming languages, so that our instruction can be executed. As you've probably learnt in your programming 1 & 2
courses, Python is a powerful programming language that is applicable to many areas. One of such area is data science. If you need a refresher on Python, you can use the interactive platform below.
Quick intro to Python
In subsequent weeks, we'll be using Python and its libraries to gather, explore, clean, and manipulate our data. But before then, let us look at some popular tools and python libraries which is common among data scientists.
❓ How can i work with data using Python?
Previously, we've seen how it is possible to capture, clean, manipulate, and visualize data using Excel. However, you're limited to only the features provided by Excel, even though there is more you can do as a data scientist. This is why you need python to programatically do everything you have in Excel and many more. To do that, we'll be using Jupter Notebook.
Jupyter Notebook
The unique feature of Jupyter Notebook is that it allows you to write code in small, manageable chunks called cells
, which can be executed independently. This interactive nature makes it easy to experiment with code, test different ideas, and see immediate results. You can write code in languages like Python or R, and with the click of a button, execute the cell to see the output.
Jupyter Notebook also supports the inclusion of visualizations, images, and formatted text, making it an excellent tool for data analysis, data visualization, and presenting your findings.
For this course, we'll be using a cloud version of jupyter notebook called Google Colab!
. With this, you can avoid the need for installation and configuration for jupyter notebook. Let's look at what Google Colab is all about.
With Colab
, you can do everything you've done using the python shell and more. To wrap up, let look at the benefit of Colab for a data scientist.
- Free Resources:Provision of free cloud computing resources.
- Collaboration: allows multiple users to work on the same notebook simultaneously
- Integration with Google Drive: Colab integrates with Google Drive, allowing users to easily access and store data files and notebooks.
- Pre-installed libraries: comes with many pre-installed libraries and frameworks commonly used in data science, such as TensorFlow, PyTorch, and Scikit-learn.
- Code execution: allows users to execute code in real-time and see the results immediately.
- Visualization: provides support for data visualization tools such as Matplotlib and Seaborn.
Overall, Google Colab is a powerful tool for data scientists, providing access to powerful computing resources, collaboration tools, and a range of features for data analysis and machine learning.
- From the list of Python libraries below, group each library as one of the following -
visualization
,machine learning
,data manipulation
, andUtilities
.
- Pandas
- Bokeh
- Numpy
- Maplotlib
- Pytorch
- Keras
- SciKit-Learn
- Polar
- Tensorflow
- OpenCV
- Share your answers using this padlet.
- You can like other cool answers on the padlet as well.
👉🏾 Next week, we'll deep dive into
data collection
andcleanings
.