Non-Parametric Tests: Practice
TRUE or FALSE: The sign test is a non-parametric test.
TRUE or FALSE: Non-parametric tests usually assume your samples are normally distributed.
seen from China
seen from China
seen from South Africa
seen from Russia

seen from United States

seen from T1
seen from South Africa
seen from China

seen from United States
seen from T1
seen from Greece
seen from China

seen from United States
seen from United States

seen from United States

seen from United States
seen from United States
seen from United States
seen from United Kingdom
seen from China
Non-Parametric Tests: Practice
TRUE or FALSE: The sign test is a non-parametric test.
TRUE or FALSE: Non-parametric tests usually assume your samples are normally distributed.
nonparametric statistics for applied research
Download nonparametric statistics for applied research
Shall I go back to Clifford and put a. I shall shut up this shop to-night. Tommy sat down quietly on the doorstep and allowed He stopped abruptly, his face crimsoning, but Julius was. He felt the belt at his waist. They call them visitors or guests, I notice. I only know what Ive seen, said Aaron.
Non-parametric Econometrics and Quantile Regressions Online.
On the tail of some cool new econometric papers are a couple of cool new Stata programs.
http://froelich.vwl.uni-mannheim.de/1357.0.html
Kernel Density Estimates, Part 3
When piling dimensions onto a KDE, the possibility of correlations among the data is a striking concern. Incorporating multiple strongly correlated variables results in counting the underlying evidence explaining the results several times, possibly skewing your predictions if the evidence weighs in a different direction, and making them overconfident.
Performing a factor analysis (FA) or principal component analysis (PCA) on the data used to generate predictions prior to creating a KDE can thus not only reduce the dimensionality of your data, but also increase the accuracy of your predictions.
In order to demonstrate this method, I reused the dataset of LAUSD schools, testing the accuracy of % health risk predictions through cross validation (eliminating schools from the dataset, and then determining likelihood that a KDE generated from the remainder of the dataset assigns to the actual health risk of the eliminated school, using eliminated measures as the evidence). I iterated through a variety of variables, as well as methods of treating the data prior. The 10 most effective methods of predicting health risk of LAUSD middle school, and the probabilities when normalized to only those 10 (abreviations listed below):
Directly using %FRM and %WnHS: p = 21.2% that this is the best model.
PCAmle of API, %AAnHS, and %WnHS: p = 13.1%.
Two component PCA of API and %AAnHS: p = 13.0%.
PCAmle of %FRM and API: p = 11.1%.
PCAmle of API and %WnHS: p = 10.3%.
Two factor FA of %FRM, %AAnHS, and %WnHS: p = 7.9%.
Directly using API score, %AAnHS, and %WnHS: p = 7.0%.
Two factor FA of %FRM, API, and %HLS: p = 6.4%
Directly using API and %AAnHS: p = 5.9%.
Two component PCA of API and %WnHS: p = 4.1%.
PCAmle = PCA using Minka's mle heuristic for determining the number of components. %FRM = % of students on free or reduced meal plans. API = Academic Performance Index %AAnHS = %African American non-Hispanic students %AS = %Asian students %HLS = %Hispanic or Latino students %WnHS = %White non-Hispanic students
While in some cases, effective results were attained by using data directly, overall prediction accuracy was increased by removing correlations between the types of data. I plan to investigate the details of why the results are what they are, in hopes of building a predictive model.
Some of the results betray my expectations, eg: %FRM has a strong negative correlation with %WnHS (r^2 = 0.716). While this was the highest rated model, this may be analogous to building a PMF for coin flips and updating it 10 times per flip. If one gets lucky in their first 10 flips, landing 5 heads and 5 tails, the probability mass will very quickly and strongly converge on p(heads) = 0.5, making this PMF stand out in predictive power compared to another correctly updated only once per flip. But this is only because it converged far more rapidly than evidence would allow, and in this example due to luck it did so on the actually correct value.
It is methodologically unsound, and will not lead to accurate predictions in the long run: a few lucky "direct" methods scored well in cross validation, but most performed poorly.
Nonparametric Approaches to Auction by Athey and Haile