Visualization for EDA
While we use EDA to examine and summarize the main characteristics of the data before diving into more advanced analyses, visualization refers to the use of graphical representations to understand and explore data.
Visualization plays a crucial role in EDA because it allows us to visually explore patterns, relationships, and distributions within the data. By creating visualizations, we can better understand the data, identify trends, outliers, and potential correlations between variables.
For example, imagine we have a dataset containing information about the sales of different products in a store over time. By creating visualizations, such as line plots or bar charts, we can easily see the sales trends, identify the highest-selling products, or observe any seasonal patterns. Visualizations make it easier to comprehend large amounts of data at a glance and can help us make data-driven decisions and derive meaningful insights.
📺 Visualization in descriptive statistics by Greg Martin 👨🏾💻
👩🏾🎨 Practice: Visualization for EDA 🎯
Imagine you have collected data about movie ratings from a group of people. Here's a simplified dataset representing the number of hours spent watching movies per week and the corresponding average rating given by each person:
Hours Watched: [6, 8, 5, 4, 9, 7, 3, 2, 7, 5]
Ratings: [4.5, 3.8, 4.0, 3.2, 4.7, 4.3, 2.9, 3.1, 4.2, 3.8]
- Create a scatter plot of hours watched vs. ratings. Label the axes appropriately.
- Based on the scatter plot and correlation coefficient, describe the strength and direction of the relationship between hours watched and movie ratings.
➡️ In the next section, you'll practice what you've learnt so far this week 🏙️.