Discover Top Posts Tagged with #chi square

Popular Recent

My comrades Aron made these adorable pixel fan art of my Oc. He doesn't use Tumblr, but I got his permission to post these.

Chi-square is the eldest brother, Z-score is the middle and Framework is the youngest. They all have mathematical morfit.

#not my art #aron art #pixel art #z score #chi square #framework

Chi Square: Language Recognition II

Chi Square: Language Recognition II I thought I would build on the last post by making a simple spreadsheet that can then easily show which language is being used. I chose the groupings of letters such that as long as there are at least 1000 letters in the text it will satisfy the Chi square condition of no expected values less than 1 and no more than 20% of expected values less than 5. Data I…

View On WordPress

#chi square #language #spreadsheets

My Thesis - A Mathematical and Statistical Algorithm to detect bias in discourse

My Thesis – A Mathematical and Statistical Algorithm to detect bias in discourse

One of the most amazing things about writing my thesis in LINFO at the faculty of Lettere at Università di Roma Tor Vergata was the development of an algorithm for the detection of bias in written and oral speech. I proposed an iteration (using convolutional networks, Chi-square distribution, and some other … spices), also applicable to Artificial Intelligence (not mentioned at that time, but…

View On WordPress

#Algorithm #Artificial Intelligence #Chi Square #CNN #Convolution #Convolutional Neural Networks #LINFO #Linguist #linguistics #Neural Networks #Neurolinguistics #Raffaello Palandri #Statistics #Thesis #University #Yourself Understood

COOL! Goodness of fit can test 45 probability distributions

So cool! I found the attached software from a book of traditional Chinese version can run goodness of fit. people with stereotype thought that the test is not a good tool to find the distribution model. But now, 45 distribution (39 continuous distributions and 6 discrete distributions) can be tested. One of them might be proper for data. In particular, the software can find the parameter value. WOW!

#statistics #goodness of fit #chi square #probability distribution #modelling

Is frequency of drinking alcohol associated with alcohol abuse/dependence among adults?

In this short study, I try to figure out whether there is relation between the frequency of drinking alcohol and alcohol abuse/alcohol dependence among adults. Here I used CHI Square Test to study the relation between the two categorical variables.

Here I used following as my explanatory and response variables:

Explanatory Variable: Drinking frequency (Categorical with 6 levels: drinking 1,2.5,4,8,14,30 days/month)

Response Variable: alcohol abuse/dependence (Categorical with 2 levels: yes, no)

The null hypothesis in this case will be: H_0 : There is no relation between the explanatory and response variables. i.e Frequency of drinking alcohol among adults is not associated with alcohol abuse/dependence.

And the alternate hypothesis in this case will be H_a : There is relation between the explanatory and response variables. (alternate hypothesis). i.e Frequency of drinking alcohol among the adults is associated with alcohol abuse/dependence.

The U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC) is a survey designed to determine the magnitude of alcohol use and psychiatric disorders in the U.S. population. It is a representative sample of the non-institutionalized population 18 years and older.

According to Chi Square Test on NESARC survey, it was found that:

The P value (1.09e-125) is lower than 0.05 which gives us significant confidence to reject the null hypothesis. Here we accept the alternate hypothesis. In other words, looking at the chi-square value and p value in this case, it could be concluded that both the mean frequencies of alcohol consumption are significantly different in alcohol abuse/dependence case. i.e alcohol abuse/dependence and drinking is associated with each other.

POST HOC TEST FOR CHI SQUARE STUDY

Bonferroni Adjustment.

Since Explanatory variable has 6 categorical levels, the total number of paired comparisons will be 5+4+3+2+1 = 15. The Bonferroni Adjustment in this case will be p/c = 0.05/15 = 0.0033 where, p = min p value required to accept the null hypothesis and c = number of comparisons.

The pairs compared in the post HOC study are 1) 1day/month vs 2.5 days/month 2) 1day/month vs 4 days/month ..... .... ....

15) 14 days/month vs 30 days/month Pairwise P values of each Post HOC comparison is shown in the table below.

(Python code and results posted in ‘keep reading’ section below)

For two values to be significantly different, the observed P value should be smaller than adjusted Bonferroni P value of 0.0033.

It could be observed that frequency 1 day/month is significantly different than all other frequencies. The graphical illustration of all the comparisons is attempted below.

The Post HOC results could be readjusted and graphically shown as below:

-----------------------------------------------------------------------------------------------------------

Data Used:

int1, int2 = pair recode = {int1:int1, int2:int2 } ct=pandas.crosstab(sub2['ALCABDEPP12DX'], sub2['USFREQMO'].map(recode)) print("chi sq test of subcategory: {}".format(pair)) print (ct) colsum=ct.sum(axis=0) colpct1=ct/colsum print(colpct1) print("chi sq test of subcategory: {}".format(pair)) cs = scipy.stats.chi2_contingency(ct) print (cs) ----------------------------------------------------------------------------------------------------------- CHI SQUARE TEST RESULT: USFREQMO 1.0 2.5 4.0 8.0 14.0 30.0 ALCABDEPP12DX 0 1734 2022 1801 1335 1064 1066 1 692 1237 1172 1320 1299 1336 USFREQMO 1.0 2.5 4.0 8.0 14.0 30.0 ALCABDEPP12DX 0 0.714757 0.620436 0.605785 0.502825 0.450275 0.443797 1 0.285243 0.379564 0.394215 0.497175 0.549725 0.556203 chi-square value, p value, expected counts (591.9713868394573, 1.0977806293259079e-125, 5, array([[1361.32429407, 1828.75345192, 1668.26757059, 1489.82522702, 1325.97250902, 1347.85694738], [1064.67570593, 1430.24654808, 1304.73242941, 1165.17477298, 1037.02749098, 1054.14305262]])) POST HOC RESULTS: chi sq test of subcategory: (1, 2.5) USFREQMO 1.0 2.5 ALCABDEPP12DX 0 1734 2022 1 692 1237 USFREQMO 1.0 2.5 ALCABDEPP12DX 0 0.714757 0.620436 1 0.285243 0.379564 chi sq test of subcategory: (1, 2.5) (54.770698468039214, 1.3544517385419353e-13, 1, array([[1602.82427441, 2153.17572559], [ 823.17572559, 1105.82427441]])) chi sq test of subcategory: (1, 4) USFREQMO 1.0 4.0 ALCABDEPP12DX 0 1734 1801 1 692 1172 USFREQMO 1.0 4.0 ALCABDEPP12DX 0 0.714757 0.605785 1 0.285243 0.394215 chi sq test of subcategory: (1, 4) (69.69478952340515, 6.922873814322958e-17, 1, array([[1588.42563438, 1946.57436562], [ 837.57436562, 1026.42563438]])) chi sq test of subcategory: (1, 8) USFREQMO 1.0 8.0 ALCABDEPP12DX 0 1734 1335 1 692 1320 USFREQMO 1.0 8.0 ALCABDEPP12DX 0 0.714757 0.502825 1 0.285243 0.497175 chi sq test of subcategory: (1, 8) (237.16710978927773, 1.6308613236724482e-53, 1, array([[1465.34028735, 1603.65971265], [ 960.65971265, 1051.34028735]])) chi sq test of subcategory: (1, 14) USFREQMO 1.0 14.0 ALCABDEPP12DX 0 1734 1064 1 692 1299 USFREQMO 1.0 14.0 ALCABDEPP12DX 0 0.714757 0.450275 1 0.285243 0.549725 chi sq test of subcategory: (1, 14) (343.6361803742184, 1.0303562387865595e-76, 1, array([[1417.40405095, 1380.59594905], [1008.59594905, 982.40405095]])) chi sq test of subcategory: (1, 30) USFREQMO 1.0 30.0 ALCABDEPP12DX 0 1734 1066 1 692 1336 USFREQMO 1.0 30.0 ALCABDEPP12DX 0 0.714757 0.443797 1 0.285243 0.556203 chi sq test of subcategory: (1, 30) (362.64890146193477, 7.4610349684920965e-81, 1, array([[1406.95940348, 1393.04059652], [1019.04059652, 1008.95940348]])) chi sq test of subcategory: (2.5, 4) USFREQMO 2.5 4.0 ALCABDEPP12DX 0 2022 1801 1 1237 1172 USFREQMO 2.5 4.0 ALCABDEPP12DX 0 0.620436 0.605785 1 0.379564 0.394215 chi sq test of subcategory: (2.5, 4) (1.3461082339165986, 0.24595963457382342, 1, array([[1999.2228819, 1823.7771181], [1259.7771181, 1149.2228819]])) chi sq test of subcategory: (2.5, 8) USFREQMO 2.5 8.0 ALCABDEPP12DX 0 2022 1335 1 1237 1320 USFREQMO 2.5 8.0 ALCABDEPP12DX 0 0.620436 0.502825 1 0.379564 0.497175 chi sq test of subcategory: (2.5, 8) (81.98141569491013, 1.3737232237288822e-19, 1, array([[1849.92610754, 1507.07389246], [1409.07389246, 1147.92610754]])) chi sq test of subcategory: (2.5, 14) USFREQMO 2.5 14.0 ALCABDEPP12DX 0 2022 1064 1 1237 1299 USFREQMO 2.5 14.0 ALCABDEPP12DX 0 0.620436 0.450275 1 0.379564 0.549725 chi sq test of subcategory: (2.5, 14) (159.49488059167192, 1.4588542363941614e-36, 1, array([[1788.91390964, 1297.08609036], [1470.08609036, 1065.91390964]])) chi sq test of subcategory: (2.5, 30) USFREQMO 2.5 30.0 ALCABDEPP12DX 0 2022 1066 1 1237 1336 USFREQMO 2.5 30.0 ALCABDEPP12DX 0 0.620436 0.443797 1 0.379564 0.556203 chi sq test of subcategory: (2.5, 30) (173.31103550758314, 1.3997260876822529e-39, 1, array([[1777.74103515, 1310.25896485], [1481.25896485, 1091.74103515]])) chi sq test of subcategory: (4, 8) USFREQMO 4.0 8.0 ALCABDEPP12DX 0 1801 1335 1 1172 1320 USFREQMO 4.0 8.0 ALCABDEPP12DX 0 0.605785 0.502825 1 0.394215 0.497175 chi sq test of subcategory: (4, 8) (59.84368736597389, 1.0269822030382857e-14, 1, array([[1656.59701493, 1479.40298507], [1316.40298507, 1175.59701493]])) chi sq test of subcategory: (4, 14) USFREQMO 4.0 14.0 ALCABDEPP12DX 0 1801 1064 1 1172 1299 USFREQMO 4.0 14.0 ALCABDEPP12DX 0 0.605785 0.450275 1 0.394215 0.549725 chi sq test of subcategory: (4, 14) (127.43002159243181, 1.4958322334350576e-29, 1, array([[1596.26030735, 1268.73969265], [1376.73969265, 1094.26030735]])) chi sq test of subcategory: (4, 30) USFREQMO 4.0 30.0 ALCABDEPP12DX 0 1801 1066 1 1172 1336 USFREQMO 4.0 30.0 ALCABDEPP12DX 0 0.605785 0.443797 1 0.394215 0.556203 chi sq test of subcategory: (4, 30) (139.4246551914622, 3.55656003277558e-32, 1, array([[1585.78437209, 1281.21562791], [1387.21562791, 1120.78437209]])) chi sq test of subcategory: (8, 14) USFREQMO 8.0 14.0 ALCABDEPP12DX 0 1335 1064 1 1320 1299 USFREQMO 8.0 14.0 ALCABDEPP12DX 0 0.502825 0.450275 1 0.497175 0.549725 chi sq test of subcategory: (8, 14) (13.626977552572807, 0.00022295851672377766, 1, array([[1269.29952172, 1129.70047828], [1385.70047828, 1233.29952172]])) chi sq test of subcategory: (8, 30) USFREQMO 8.0 30.0 ALCABDEPP12DX 0 1335 1066 1 1320 1336 USFREQMO 8.0 30.0 ALCABDEPP12DX 0 0.502825 0.443797 1 0.497175 0.556203 chi sq test of subcategory: (8, 30) (17.3849257183094, 3.052372229409848e-05, 1, array([[1260.56060906, 1140.43939094], [1394.43939094, 1261.56060906]])) chi sq test of subcategory: (14, 30) USFREQMO 14.0 30.0 ALCABDEPP12DX 0 1064 1066 1 1299 1336 USFREQMO 14.0 30.0 ALCABDEPP12DX 0 0.450275 0.443797 1 0.549725 0.556203 chi sq test of subcategory: (14, 30) (0.17687528965654875, 0.6740724342219784, 1, array([[1056.28331584, 1073.71668416], [1306.71668416, 1328.28331584]]))

#data analysis tools #Chi square #chi2 #Saurabh3494

Is there anyone who can help me with my math homework

#please #math #math homework #statsisitcs #chi square #PLEASE

Hey, so I'm taking a statistics class, and I need to write a final paper. In order to do that, I need to have a dataset, with at least 30 respondents. I've written up a short survey - it's 3 questions long, multiple choice, yes/no answers for each. It won't take long at all and it would really help me out if you could fill this out for me. Everything would be kept totally anonymous. Thanks a lot in advance!

https://docs.google.com/forms/d/1TLpm7HB1I-dQXgLzGTOiQFF9vAN2_nimcfmKCRKb6yU/edit#responses

#statistics #political science #homework #homework help #chi square

Data Analysis with Chi Square Test of Independence: Life expectancy Against Income per Person Using SAS

Here I am using the GapMinder dataset and examine the effects of different categories of income per person on life expectancy using Chi Square Test of Independence.

In particular, I am interested to statistically establish the answer the below question:

Do categories of life expectancy (high/low) vary significantly among the various categories of income per person.

To approach this, I have chosen the below Hypothesis between my categorical explanatory variable (income per person, collapsed into three categories: Very Low, Low and High) and my categorical response variable (life expectancy, collapsed into two categories: High and Low):

H0 : Life expectancy among all categories of income per person are the same

Ha : Life expectancy among all categories of income per person are not the same

The Null Hypothesis (H0), if accepted, signifies that life expectancy across the different categories of income per person are same, and therefore there is no significant relationship between these two variables.

The Alternate Hypothesis (Ha), if accepted, signifies that life expectancy across the different categories of income per person are not same, and therefore there is evidence of significant relationship between these two variables.

Since my categorical variable, income per person, will have more than two categories, the alternate hypothesis, if accepted, will not be able to determine exactly which categories are significantly different from others. In that case, I will also need to perform Post Hoc test to determine the significantly different categories. For Chi Square, unfortunately there is no easy one-shot post hoc test, therefore, I will be running pairwise Chi Square on the different categories of the explanatory variable to determine which pairs show significant difference.

In order to keep Type 1 error under check, I will be applying Bonferroni’s Correction for the p-value while deciding whether or not to reject the null hypothesis. In my case, three categories of the explanatory variable would result in 3 separate pairwise test, so I’ll use threshold α = 0.05 / 3 = 0.017 for the post hoc test.

Here is the code snippet in SAS:

Program Output

Chi Square Test on all categories

Pairwise Post Hoc (income per person category 1 and 2)

Pairwise Post Hoc (income per person category 1 and 3)

Pairwise Post Hoc (income per person category 2 and 3)

Observations from Chi Square Output

1. The categorical variable icpp_cat (income per person categories) has five categories represented as:

a. 1: Very Low (0 to 1000)

b. 2: Low (1000 to 5000)

c. 3: High (more than 5000)

2. The categorical response variable le_cat (life expectancy) has two categories

a. 0 : Low (less than 60 years)

b. 1 : High (more than 60 years)

3. Out of the 213 available data points 186 were considered for analysis (23 values were missing)

4. The Chi Square process calculated p value < 0.0001 which is much less than the threshold of α = 0.05. Therefore, we have enough evidence to reject the null hypothesis (H0) and accept the alternate hypothesis (Ha).

5. As a result, Chi Square Test revealed that income per person (collapsed into 3 categories) and life expectancy are significantly associated.

6. However, it does not deterministically tell us which categories are different. That can be arrived at from the post hoc test.

Observations from Post Hoc Test

1. Post Hoc between icpp_cat 1 and 2 (Very Low and Low) produces p value < 0.0001 which is much less than α = 0.017 (with Bonferroni’s correction). So, we reject the null hypothesis that that they are same and conclude that these two categories show different life expectancy.

2. Post Hoc between icpp_cat 1 and 3 (Very Low and High) produces p value < 0.0001 which is much less than α = 0.017 (with Bonferroni’s correction). So, we reject the null hypothesis that that they are same and conclude that these two categories show different life expectancy.

3. Post Hoc between icpp_cat 2 and 3 (Low and High) produces p value = 0.8331 which is much higher than α = 0.017 (with Bonferroni’s correction). So, we cannot reject the null hypothesis and conclude that these two categories show same life expectancy.

In summary, the post hoc test reveals income per person categories 2 and 3 (Low and High) are not significantly different in terms of life expectancy. However, category 1 (Very Low) is significantly different from categories 2 and 3 and have a much lower life expectancy.

#sas #data analysis #chi square