Exercise: One-variable Linear Regression
Here is a sample dataset that contains the mid-term and final exam scores of 10 students. The goal is to predict the final exam score based on the mid-term exam score.
| Mid-term | Final |
|---|---|
| 70 | 80 |
| 80 | 90 |
| 90 | 100 |
| 60 | 70 |
| 50 | 65 |
| 40 | 50 |
| 30 | 40 |
| 20 | 30 |
| 10 | 20 |
| 0 | 10 |
Questions:
-
What is the number of features in this dataset?
-
What is the number of data points (m) in this dataset?
-
Which of these
wandbvalues will give the smallest cost function?w = 1andb = 10w = 0.5andb = 20w = 3andb = 7
-
Write a simple Python code to calculate the cost function for each set of
wandbvalues and try different values to find the smallest cost function.
Answer
-
There is only one feature (X) in this dataset which is the mid-term exam score. The final exam score is the label (Y) that we are trying to predict.
-
The number of data points (m) is 10. There are 10 students in this dataset.
-
w = 1andb = 10will give the smallest cost function. You can try that manually or use this code to calculate the cost function for each set of values:
import numpy as np
# Given dataset
mid_term_scores = np.array([70, 80, 90, 60, 50, 40, 30, 20, 10, 0])
final_scores = np.array([80, 90, 100, 70, 65, 50, 40, 30, 20, 10])
# # Calculate the mean squared error (MSE) for given values of w and b
def calculate_mse(w, b):
predicted_final = w * mid_term_scores + b
m = len(mid_term_scores)
mse = (1/(2*m)) * np.sum((predicted_final - final_scores)**2)
return predicted_final, mse
# Test cases
w_values = [1, 0.5, 3]
b_values = [10, 20, 7]
for w, b in zip(w_values, b_values):
predicted_final, mse = calculate_mse(w, b)
print(f"For w = {w} and b = {b}:")
print("Predicted Final Scores:", predicted_final)
print("Mean Squared Error (MSE):", mse)
print()