Week 4 - Data Analysis Tools
Dataset Description:
The dataset contains variables about boughts in some regions, to determin if some age group buy more in store or outside
Qualitative variables: In.Store, region
Quantitative variables: Age, Amount, items
Source: https://www.kaggle.com/datasets/thedevastator/demographical-shopping-purchases-data
The Code
import pandas as pd import statsmodels.formula.api as smf import seaborn import matplotlib.pyplot as plt
def ageGroup(row): if row["age"]>60: return "Senior " elif row["age"]>30: return "Adult" elif row["age"]>=18: return "Young" elif row["age"]>9: return "Teenage" else: return "Child"
data = pd.read_csv('Demographic_Data_Orig.csv', low_memory=False)
data['age'] = data['age'].apply(pd.to_numeric, errors='coerce') data['items'] = data['items'].apply(pd.to_numeric, errors='coerce') data['amount'] = data['amount'].apply(pd.to_numeric, errors='coerce')
data["AgeGroup"]=data.apply(ageGroup, axis=1)
data = data[["region","items","in.store","amount", "AgeGroup"]].dropna()
model1 = smf.ols(formula='amount ~ C(region)', data=data).fit() print (model1.summary())
recode1 = {1: 'In', 0: 'Out'} data['in.store']= data['in.store'].map(recode1)
sub1 = data[['amount', 'in.store']].dropna()
print ("means for Amount by Region") m1= sub1.groupby('in.store').mean() print (m1)
print ("standard deviation for mean Amount by In vs Out Store") st1= sub1.groupby('in.store').std() print (st1)
bivariate bar graph
seaborn.barplot(x="in.store", y="amount", data=data, ci=None) plt.xlabel('In Store') plt.ylabel('Amount')
sub2=data[(data['AgeGroup']=='Young')] sub3=data[(data['AgeGroup']=='Adult')]
print ('association between in-store and out-of-store purchases - Young') model2 = smf.ols(formula='amount ~ C(region)', data=sub2).fit() print (model2.summary())
print ('association between in-store and out-of-store purchases - Adult') model3 = smf.ols(formula='amount ~ C(region)', data=sub3).fit() print (model3.summary())
print ("means - Young") m3= sub2.groupby('in.store').mean() print (m3) print ("Means - Adult") m4 = sub3.groupby('in.store').mean() print (m4)
General Result
Young Group
Adult Group
Mean















