EN VI

Python - Internal working of linear regression in scikit-learn?

How to Python - Internal working of linear regression in scikit-learn

I'm trying to understand the internal working of the Linear-regression model in Scikit-learn.

This is my dataset

And this is my dataset after performing one-hot-encoding.

And this are values of the coefficients and intercept after performing linear-regression.

Sell Price is the dependent column and rest of the columns are features.
And these are the predicted values which works fine in this case.

I noticed that the number of coefficients is 1 greater than the number of features. So this is how I generated the feature matrix:

feature_matrix = dataFrame.drop(['Sell Price($)'], axis = 'columns').to_numpy()

# Array to be added as column
bias_column = np.array([[1] for i in range(len(feature_matrix))])

# Adding column to array using append() method
feature_matrix = np.concatenate([bias_column, feature_matrix], axis = 1)  # axis = 1 means column, 0 means row

Result

What I want to know is how does Scikit-learn use these coefficients and intercept to predict the values.
This is what I tried.

I also noticed that the value I get by doing this calculation is actually equal to the mileage in every case. But that's not the dependent feature here. So what's going on?

Solution:

The reason you get the millage from this calculation is that the coefficient for millage is 1. While the other coefficients are really small.

Also note you have enought coefficients since you add the bias column, so number of coeficients equals the features including bias column.

Since you don't show how you obtain you coefficients I can't tell you if anything has gone wrong there. (Setting the right dependent variable etc.)