Week 3: Running a Lasso Regression Analysis @apoorvaml-week3 - Tumblr Blog

Week 3: Peer-graded Assignment: Running a Lasso Regression Analysis

This assignment is intended for Coursera course "Machine Learning for Data Analysis by Wesleyan University”.

It is for " Week 3: Peer-graded Assignment: Running a Lasso Regression Analysis".

I am working on Lasso Regression Analysis in Python.

Syntax used to run Lasso Regression Analysis

Dataset description: hourly rental data spanning two years.

Dataset can be found at Kaggle

Features:

yr - year

mnth - month

season - 1 = spring, 2 = summer, 3 = fall, 4 = winter

holiday - whether the day is considered a holiday

workingday - whether the day is neither a weekend nor holiday

weathersit - 1: Clear, Few clouds, Partly cloudy, Partly cloudy

2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp - temperature in Celsius

atemp - "feels like" temperature in Celsius

hum - relative humidity

windspeed (mph) - wind speed, miles per hour

windspeed (ms) - wind speed, metre per second

Target:

cnt - number of total rentals

Code used to run Lasso Regression Analysis

Corresponding Output

Interpretation

A lasso regression analysis was conducted to predict a number of total bikes rentals from a pool of 12 categorical and quantitative predictor variables that best predicted a quantitative response variable. Categorical predictors included weather condition and a series of 2 binary categorical variables for holiday and working day to improve interpretability of the selected model with fewer predictors. Quantitative predictor variables include year, month, temperature, humidity and wind speed. Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.

It tends to make coefficients to absolute zero as compared to Ridge which never sets the value of coefficient to absolute zero.

#Running a Lasso Regression Analysis #Machine Learning for Data Analysis #Wesleyan University #Coursera #Python #Week3

Trending Blogs

Recently Viewed Blogs

Week 3: Running a Lasso Regression Analysis