I'm trying to understand the internal working of the Linear-regression model in Scikit-learn.
This is my dataset
And this is my dataset after performing one-hot-encoding.
And this are values of the coefficients and intercept after performing linear-regression.
Sell Price is the dependent column and rest of the columns are features.
And these are the predicted values which works fine in this case.
I noticed that the number of coefficients is 1 greater than the number of features. So this is how I generated the feature matrix:
feature_matrix = dataFrame.drop(['Sell Price($)'], axis = 'columns').to_numpy()
# Array to be added as column
bias_column = np.array([[1] for i in range(len(feature_matrix))])
# Adding column to array using append() method
feature_matrix = np.concatenate([bias_column, feature_matrix], axis = 1) # axis = 1 means column, 0 means row
What I want to know is how does Scikit-learn use these coefficients and intercept to predict the values.
This is what I tried.
I also noticed that the value I get by doing this calculation is actually equal to the mileage in every case. But that's not the dependent feature here. So what's going on?