Linear regression residuals python

Содержание

Linear Regression using Python
Table of Contents
What is Linear Regression
Hypothesis of Linear Regression
Data-set
Training a Linear Regression Model
Implementing Linear Regression from scratch
Evaluating the performance of the model
Scikit-learn implementation
Conclusion

Linear Regression using Python

Linear Regression is usually the first machine learning algorithm that every data scientist comes across. It is a simple model but everyone needs to master it as it lays the foundation for other machine learning algorithms.

Where can Linear Regression be used?
It is a very powerful technique and can be used to understand the factors that influence profitability. It can be used to forecast sales in the coming months by analyzing the sales data for previous months. It can also be used to gain various insights about customer behaviour. By the end of the blog we will build a model which looks like the below picture i.e, determine a line which best fits the data.

This is the first blog of the machine learning series that I am going to cover. One can get overwhelmed by the number of articles in the web about machine learning algorithms. My purpose of writing this blog is two-fold. It can act as a guide to those who are entering into the field of machine learning and it can act as a reference for me.

What is Linear Regression
Hypothesis of Linear Regression
Training a Linear Regression model
Evaluating the model
scikit-learn implementation

What is Linear Regression

The objective of a linear regression model is to find a relationship between one or more features(independent variables) and a continuous target variable(dependent variable). When there is only feature it is called Uni-variate Linear Regression and if there are multiple features, it is called Multiple Linear Regression.

Hypothesis of Linear Regression

The linear regression model can be represented by the following equation

Y is the predicted value
θ₀ is the bias term.
θ₁,…,θₙ are the model parameters
x₁, x₂,…,xₙ are the feature values.

The above hypothesis can also be represented by

θ is the model’s parameter vector including the bias term θ₀
x is the feature vector with x₀ =1

Data-set

Let’s create some random data-set to train our model.

Training a Linear Regression Model

Training of the model here means to find the parameters so that the model best fits the data.

How do we determine the best fit line?
The line for which the the error between the predicted values and the observed values is minimum is called the best fit line or the regression line. These errors are also called as residuals. The residuals can be visualized by the vertical lines from the observed data value to the regression line.

To define and measure the error of our model we define the cost function as the sum of the squares of the residuals. The cost function is denoted by

where the hypothesis function h(x) is denoted by

and m is the total number of training examples in our data-set.

Why do we take the square of the residuals and not the absolute value of the residuals ? We want to penalize the points which are farther from the regression line much more than the points which lie close to the line.

Our objective is to find the model parameters so that the cost function is minimum. We will use Gradient Descent to find this.

Gradient descent

Gradient descent is a generic optimization algorithm used in many machine learning algorithms. It iteratively tweaks the parameters of the model in order to minimize the cost function. The steps of gradient descent is outlined below.

We first initialize the model parameters with some random values. This is also called as random initialization.
Now we need to measure how the cost function changes with change in it’s parameters. Therefore we compute the partial derivatives of the cost function w.r.t to the parameters θ₀, θ₁, … , θₙ

similarly, the partial derivative of the cost function w.r.t to any parameter can be denoted by

We can compute the partial derivatives for all parameters at once using

3. After computing the derivative we update the parameters as given below

where α is the learning parameter.

We can update all the parameters at once using,

We repeat the steps 2,3 until the cost function converges to the minimum value. If the value of α is too small, the cost function takes larger time to converge. If α is too large, gradient descent may overshoot the minimum and may finally fail to converge.

To demonstrate the gradient descent algorithm, we initialize the model parameters with 0. The equation becomes Y = 0. Gradient descent algorithm now tries to update the value of the parameters so that we arrive at the best fit line.

When the learning rate is very slow, the gradient descent takes larger time to find the best fit line.

When the learning rate is normal

When the learning rate is arbitrarily high, gradient descent algorithm keeps overshooting the best fit line and may even fail to find the best line.

Implementing Linear Regression from scratch

The complete implementation of linear regression with gradient descent is given below.

The plot of the cost function vs the number of iterations is given below. We can observe that the cost function decreases with each iteration initially and finally converges after nearly 100 iterations.

Till now we have implemented linear regression from scratch and used gradient descent to find the model parameters. But how good is our model? We need some measure to calculate the accuracy of our model. Let’s look at various metrics to evaluate the model we built above.

Evaluating the performance of the model

We will be using Root mean squared error(RMSE) and Coefficient of Determination(R² score) to evaluate our model.

RMSE is the square root of the average of the sum of the squares of residuals.

SSₜ is the total sum of errors if we take the mean of the observed values as the predicted value.

SSᵣ is the sum of the square of residuals

SSₜ - 69.47588572871659
SSᵣ - 7.64070234454893
R² score - 0.8900236785122296

If we use the mean of the observed values as the predicted value the variance is 69.47588572871659 and if we use regression the total variance is 7.64070234454893. We reduced the prediction error by ~ 89% by using regression.

Now let’s try to implement linear regression using the popular scikit-learn library.

Scikit-learn implementation

sckit-learn is a very powerful library for data-science. The complete code is given below

The model parameters and the performance metrics of the model are given below:

The coefficient is [[2.93655106]]
The intercept is [2.55808002]
Root mean squared error of the model is 0.07623324582875013.
R-squared score is 0.9038655568672764.

This is almost similar to what we achieved when we implemented linear regression from scratch.

That’s it for this blog. The complete code can be found in this GitHub repo.

Conclusion

We have learnt about the concepts of linear regression and gradient descent. We implemented the model using scikit-learn library as well.

In the next blog of this series we will take some original data set and build a linear regression model.

Источник