p-hacking explained with poker hands

pixel skylines

Andulka

JVL
Aqua Utopia|海の底で記憶を紡ぐ

Kiana Khansmith
Three Goblin Art

Kaledo Art
styofa doing anything
PUT YOUR BEARD IN MY MOUTH
Mike Driver
Lint Roller? I Barely Know Her

@theartofmadeline
I'd rather be in outer space 🛸

Product Placement
Cosimo Galluzzi
taylor price

oozey mess
TVSTRANGERTHINGS
DEAR READER
cherry valley forever

seen from Germany
seen from Malaysia
seen from United States
seen from United States
seen from Malaysia

seen from France
seen from New Zealand

seen from Vietnam

seen from United Kingdom
seen from United States
seen from United States
seen from United States
seen from Canada
seen from United States
seen from Macao SAR China
seen from United States
seen from Canada
seen from Malaysia

seen from United States
seen from United States
@appliedfunctions
p-hacking explained with poker hands
R Psychologist’s guide to multilevel models in R is stupendous.
It’s called the garden of forking paths. If you get to choose your data-exclusion rule, you get to win the “p less than .05 game,” you get to publish your articles in top journals, and if you’re really lucky you get $$$.
Paxil: What went wrong? - Andrew Gelman
My sister Bayesia thought this approach was crazy, though.
bayesian - What's the difference between a confidence interval and a credible interval? - Cross Validated
I was searching for Star Wars memes, and I saw this beauty. The asymmetry between recognizing speech and talking sets off so many associations for me.
It reminds me of the difference between receptive and expressive vocabulary. The number of words that we speak or write on a day-to-day basis is much smaller than the words we can read or comprehend. Which is the truer measure of vocabulary knowledge? Or maybe that question is a red herring, and it doesn’t make sense to partition our words into the ones we know well enough to use in a sentence versus the ones we know well enough to understand in a sentence.
The meme also makes me think of Hinton’s work in deep learning: “To recognize shapes, first learn to generate images”. In a digit-recognition network, you train your model to recognize (to correctly label) a digit from a pixel image. Traditionally, you use supervised learning, so that the model starts guessing labels for words immediately and learns by minimizing its error rate. Hinton instead suggests that the model undergo unsupervised “generative pretraining” first and learn to generate those images. With this training, it can develop discover a useful set of general features and structure in the images. Once those features are in place, learning to label images become a matter of “discriminative fine-tuning” for the network. I still haven’t wrapped my head around the recognition/generation asymmetry in this domain, admittedly, but the meme reminds me of it. Maybe it’s a difference between seeing and imagining.
Lastly, the meme reminds me of P vs. NP, the question in computer science about different kinds of computational problems. The problems in P can be solved efficiently (P for polynomial time) like sorting a list or multiplying numbers. The problems in NP can have their answers verified efficiently (NP for nondeterministic polynomial time) like Sudoku, Minesweeper or subset sum. You don’t need to solve the Sudoku again to check the answer; just check each row, column and cell for duplicate numbers. The problems in P can have their solutions checked efficiently because they can be solved efficiently: Just solve the problem again and see if the solution matches the new answer. The million-dollar question is whether the family of problems that can be solved efficiently equals the family of problems that be verified efficiently. (The answer is probably not.) Here the production/recognition asymmetry translates into a difference between solving and checking and the question of whether there are problems that are easy to check but inherently hard to solve.
So, that Star Wars meme--it’s a real tapestry of meaning if you ask me.
Poster, code, and data for the poster I presented at 56th Annual Meeting of the Psychonomic Society (Nov. 2015 in Chicago, IL) .
Next steps are to elaborate on the home language input factor structure and examine other ways to quantify lexical processing.
loess explained in a GIF [Simply Statistics]
Vocabulary Growth Milestones
How do children's vocabularies grow over the first two years of life? Rescorla, Mirak, and Singh (2000) summarized findings from parental diary studies from the 70s and 80s as a series of milestones. A child's first words appear around 10-13 months, followed by a period of slow and steady vocabulary growth (on the order of 10 words/month) during 12-18 months. After a child has accumulated around 50 words, around 16-21 months, the "vocabulary spurt" begins and the child's vocabulary growth accelerates dramatically. Indeed, Huttenlocher and colleagues (1991) modeled vocabulary growth from 14 to 26 months as a simple quadratic function of age.
Importantly, there are individual differences in vocabulary growth rates. All children start out with the same number of words (zero), and there's very little variation in vocabulary sizes in children during the first year. On the first edition of the MacArthur Communication Development Inventory (CDI; 1994), the authors observed little variation in vocabulary sizes early on. Around 13 months, however, variation increases as the vocbulary percentiles begin to fan out.
We can reproduce this fanning-out finding and the other developmental phases using the Stanford Wordbank, a database of scores for the 2007 edition of the CDI. The figure below shows individual scores on the CDI Words and Gestures from 8 to 18 months.
The fitted lines in the figure show percentiles at each slice in time. The data are cross-sectional, so different children make up the scores in each time-slice. The topmost line tracks the average vocabulary size in the top 10% (90th percentile) of word learners in each time slice.
We can observe some trends mentioned above within this plot. First, the 90th and the 10th percentiles noticeably begin to drift apart around 12-13 months. This is the "fanning out" mentioned above; individual differences in vocabulary sizes begin to emerge around this point. Second, the precocious word-learners in each sample, shown in the top two lines, begin to show accelerating vocabulary growth, whereas the median word-learners are still in that slow, steady phase of word-learning.
The Wordbank also provides data on the CDI Words and Sentences, so we can track growth in vocabulary sizes from 16 to 30 months.
The same caveats apply as before: The data-points in each time slice comprise different samples of children. A further caveat applies as well: There are around 650 words on the CDI Words and Sentences inventory. We see a classic ceiling effect, so the curves begin to decelerate as they approach the ceiling. Nevertheless, the figure depicts the onset of accelerating growth in the 10th and 25th lower percentiles.
Confusion Matrix Statistics on Late Talker Diagnoses
How many late talkers are just late bloomers? More precisely, how many children identified as late talkers at 18 months catch up by one year later? This is an important question. From a clinical perspective, we want to support children with language delays, but it is also inefficient to spend resources fixing a self-correcting problem.
Fernald & Marchman (2012) touch on this question. Children falling below the 20th percentile in vocabulary score at 18 months were labeled “late talkers”. These children, along with a control group of timely-talkers, participated in an eyetracking study at 18 months and had their vocabulary measured every 3 months until 30 months of age.
In their sample, 22 of 36 late talkers were late bloomers, catching up to the normal vocabulary range at 30 months, and 42 of 46 timely talkers remained in the normal range of vocab development. The authors later report that eyetracking reaction times at 18 months predicted rates of vocabulary growth in both groups. In particular, the late-bloomers were significantly faster than the children who did not catch up.
The authors repeatedly report confusion matrix statistics on different subsets of the data. Which make sense: The question of late bloomers is also a question about the positive predictive value of a late-talker diagnosis. In the majority of cases, a “late talker” label at 18 months did not predict continued delay one year later. Therefore, the diagnosis has poor positive predictive value (14/36 = 39%).
Confusion Matrix Measures in R
I’d like to report similar measures in my own analyses, so I figured out how to reproduce their results in R. And it’s as simple as calling the confusionMatrix function in the caret package. First, let’s re-create their data.
library("dplyr") # Counts from paper lt_bloomed <- 22 lt_delayed <- 14 td_still_td <- 42 td_delayed <- 4 # Reproduce their data-set (one row per reported child) levels <- c("WNL at 30m", "Delayed at 30m") lt_data <- data_frame( Outcome = rep(levels, times = c(lt_bloomed, lt_delayed)), Group = "LT at 18m", Predicted = levels[2] ) td_data <- data_frame( Outcome = rep(levels, times = c(td_still_td, td_delayed)), Group = "TD at 18m", Predicted = levels[1] ) all_kids <- bind_rows(td_data, lt_data) %>% mutate(ChildID = seq_along(Outcome)) %>% select(ChildID, Group, Predicted, Outcome) # Looks like a real dataset now sample_n(all_kids, 5, replace = FALSE) #> Source: local data frame [5 x 4] #> #> ChildID Group Predicted Outcome #> (int) (chr) (chr) (chr) #> 1 23 TD at 18m WNL at 30m WNL at 30m #> 2 72 LT at 18m Delayed at 30m Delayed at 30m #> 3 34 TD at 18m WNL at 30m WNL at 30m #> 4 68 LT at 18m Delayed at 30m WNL at 30m #> 5 50 LT at 18m Delayed at 30m WNL at 30m
Next, we just call confusionMatrix on the predicted values and the reference values.
conf_mat <- caret::confusionMatrix(all_kids$Predicted, all_kids$Outcome) conf_mat #> Confusion Matrix and Statistics #> #> Reference #> Prediction Delayed at 30m WNL at 30m #> Delayed at 30m 14 22 #> WNL at 30m 4 42 #> #> Accuracy : 0.6829 #> 95% CI : (0.5708, 0.7813) #> No Information Rate : 0.7805 #> P-Value [Acc > NIR] : 0.9855735 #> #> Kappa : 0.3193 #> Mcnemar's Test P-Value : 0.0008561 #> #> Sensitivity : 0.7778 #> Specificity : 0.6562 #> Pos Pred Value : 0.3889 #> Neg Pred Value : 0.9130 #> Prevalence : 0.2195 #> Detection Rate : 0.1707 #> Detection Prevalence : 0.4390 #> Balanced Accuracy : 0.7170 #> #> 'Positive' Class : Delayed at 30m #>
Here, we can confirm the positive predictive value (true positives / positive calls1) is 14/36 = 0.3889. The negative predictive value is noteworthy; most children not diagnosed as late talkers did not show a delay one year later (NPV = 42/46 = 0.913).
Technically, caret uses the sensitivity, specificity and prevalance calculation form of the PPV calculation.↩
Word Recognition Notecards
These are some "notecards" for Magnuson, Mirman, and Myers (2013).
Gating Incrementally play longer and longer chunks of a word. Listeners guess the word that finishes the token. For example:
b
ba
ban
bani
banis
banist
banister
Listeners provide a variety of guesses in the early gates (-> activation of many words in parallel). Gate 5 is the uniqueness point; there is only one completion of the word. The recognition point (gate of correct guessing) often precedes the uniqueness point (-> incremental activation of words). The amount that recognition precedes uniqueness is probably determined by the target's word-frequency (more frequent words guessed earlier).
Last winter, I wrote an implementation of the TRACE model of word recognition in pure R. I developed the package as my final project in my course on parallel distributed processing (taught by Tim Rogers). Yesterday, I finally got around to putting my report on the project online.
The above figure is from the Ganong effect demo in that paper. It shows the activation of phonemes and word units over time when the network was presented with “Xlug” where “X” is an intermediate sound between /p/ and /b/. The network initially guesses “blood” and “plug”, but decides on “plug” once /g/ arrives. Afterwards, top-down connections from “plug” to /p/ cause activation for /p/ to rise above /b/.
Programming postmortem
I wrote a naive implementation using message-passing object-oriented programming. Each network node is a bundle of data, and we send instructions (messages) to these nodes to tell them to collect input from neighboring nodes or to update their activation values.
This implementation was partly inspired by Jeremy Kun’s implementation of backpropagation in Python. His implementation convinced me to start with a few nodes, make sure they behave as expected, and bootstrap from there. And that's how this implementation developed: I first created toy nodes to implement generic functionality, then figured out input nodes and feature detector nodes, wrote some functions to assemble a layer of those nodes, and then iterated to phoneme and word nodes/layers. Very interactive and organic, bootstrapped from the bottom up.
For the #rstats people out there, I create the nodes as R6 objects. R objects normally have copy-on-modify semantics so that when I update x$value, I get a new copy of the object x. R6 objects have reference semantics, meaning that they seal off a chunk of memory. I'm not sure why I decided on R6 last year, but I probably reasoned that node$update() made more sense than node <- update(node). I also convinced myself that R6 objects would yield better performance because thousands of nodes wouldn't have to be copied on every network tick.
Unfortunately, the naive implementation doesn't scale. When I tried to regenerate the plots yesterday, it took an hour for one network to complete 60 ticks. (I don't remember it being that slow last year, but oh well.) I could try to optimize the pain away, but that would require profiling to find the pain points. Alternatively, I could ditch naive implementation and use a couple of data-frames: Store the nodes in a giant data-frame, and use split-apply-combine operations to e.g. summarise all the incoming activations to each node.
Oh and another thing
My re-implementation was not a direct port of any other implementations. That is, I didn't translate some source code into another language. Instead, I based my implementation on the description in the original paper. This top-down approach led to the most annoying part of the project.
Acoustic features like voiced, consonantal, vocalic, etc. are implemented in TRACE using values on a continuum from 1 to 8. A fully vocalic sound /a/ get an 8 for vocalic, but a continuant sound like /s/ is kinda vocalic so it gets a value of 4. The original TRACE paper leads one to believe that each sound has just one value for each feature:
Based on the text, I implemented /k/'s acute feature as a vector [0, 0, 1, 0, 0, 0, 0, 0]. WRONG! That's not what they did in 1986. The original C-TRACE code defined the acute feature for /k/ as [.1, .3, 1, .3, .1, 0, 0, 0]. These smeared feature values popped up somewhere in the specs of each phoneme, so all of my phoneme definitions were wrong. This mismatch between the description in the paper and the original code caused a lot of headaches as my simulations failed to obtain expected results. That was the most frustrating part of the project, illustrating a disadvantage of my paper-first strategy of implementating TRACE. A write-up can gloss over details, but the code doesn't lie.
Even research of exemplary quality may have irreproducible empirical findings because of random or systematic error. Direct replication is the attempt to recreate the conditions believed sufficient for obtaining a previously observed finding and is the means of establishing reproducibility of a finding with new data. A direct replication may not obtain the original result for a variety of reasons: Known or unknown differences between the replication and original study may moderate the size of an observed effect, the original result could have been a false positive, or the replication could produce a false negative. False positives and false negatives provide misleading information about effects, and failure to identify the necessary and sufficient conditions to reproduce a finding indicates an incomplete theoretical understanding.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251). http://doi.org/10.1126/science.aac4716
I'm reading the reproducibility paper for our psych department's brown-bag series. I appreciate how the authors defend the scientific value of replications, especially in the final sentence of this paragraph. To really understand a scientific finding, we should be able to specify conditions under which the result will succeed or fail in replication. It's a shame that journals regard replications as lesser science.
I’m still agnostic on whether the report is a scathing indictment of psychological research. Students in the psych department are having a mock debate about the article: Psychology is in crisis! versus No, it’s not. Maybe they can sway me. I will admit that my jaw did drop a little when the article revealed that replicated studies were all published on or after 2008. (I had wrongly assumed the project targeted older well-cited studies for replication.)
Using dplyr to back up a MySQL database
This summer I started developing a MySQL database for our lab. It’s my first experience working with MySQL, phpMyAdmin, MS Access, and dplyr’s remote data capabilities. Because I don’t know SQL at all but am a ninja at dplyr, I’ve been developing a helper R package to automate some tasks with my lab’s database. I’d like to share how I handle backing up the database into a portable set of minimally documented csvs.
(Caveat: I’m sure there are more idiomatic or efficient ways of performing these tasks by working with SQL directly but I’m going to stubbornly play to my strengths.)
Back up each table
I’d like to be able to back up the whole database with a single function call. Backing up the individual tables is easy. Here’s my function to download a table and write it to a csv.
# Download a tbl from a db connection and write to a csv backup_tbl <- function(tbl_name, src, output_dir) { # Try to download the tbl, defaulting to an empty data-frame try_tbl <- failwith(data_frame(), tbl) df <- collect(try_tbl(src, tbl_name)) output_file <- file.path(output_dir, paste0(tbl_name, ".csv")) message("Writing ", output_file) readr::write_csv(df, output_file) df }
Now, in another function, I just have to apply backup_tbl to each table in the database.
# `this_backup_dir` is a timestamped directory created a few lines earlier tbls <- src_tbls(src) dfs <- lapply(tbls, backup_tbl, src = src, output_dir = this_backup_dir) names(dfs) <- tbls # `dfs` returned later to provide a copy of the dl'd data to the R session
This code is a really good start, but what good is a bunch of csvs without documentation?
Back up the metadata as well
One feature I appreciate in MySQL are the optional table and field comments which allow me to write a brief description of each table and each column in a table. The screenshot below from phpMyAdmin shows fully commented fields for a table of scores from a vocabulary test.
By downloading these descriptions and bundling them with backed up tables, I can generate a minimal codebook to accompany the csvs. So I create a function called describe_tbl that centers around the two following lines:
# Get the table description this_query <- sprintf("SHOW FULL COLUMNS FROM %s", tbl_name) info <- DBI::dbGetQuery(src, statement = this_query)
Which allows me to grab the metadata depicted in the earlier screenshot:
describe_tbl(my_db, "PPVT") #> Table Field Index DataType DefaultValue NullAllowed #> 1 PPVT ChildStudyID UNI int(11) <NA> NO #> 2 PPVT PPVTID PRI int(11) <NA> NO #> 3 PPVT PPVT_Timestamp datetime CURRENT_TIMESTAMP NO #> 4 PPVT PPVT_Form enum('A','B') <NA> YES #> 5 PPVT PPVT_Completion date <NA> YES #> 6 PPVT PPVT_Raw int(11) <NA> YES #> 7 PPVT PPVT_Standard int(11) <NA> YES #> 8 PPVT PPVT_GSV int(11) <NA> YES #> 9 PPVT PPVT_Age int(3) <NA> YES #> 10 PPVT PPVT_Note varchar(255) <NA> YES #> Description #> 1 Child-Study ID (uniquely defines a Child-Study pairing) #> 2 PPVT Administration ID #> 3 When each record (row) was last edited #> 4 PPVT test form used. A, B or NULL (if unknown) #> 5 Date PPVT was completed #> 6 Raw score (number of words) #> 7 Standard score #> 8 Growth scale value #> 9 Age in months (rounded down) when PPVT was completed #> 10 Notes on test administration
Terrific. Now the final ingredient is to get the comments attached to each table. As above, I create a function called describe_db to wrap a single query:
# Get the table description info <- DBI::dbGetQuery(src, statement = "SHOW TABLE STATUS")
This query grabs lots of backend information about each table in the database (DB engine, collation, average row length, etc.), but for my codebook, I keep just the number of rows and comment columns from each table status. Here’s what the function returns:
# Look at just the documented, in-use tables describe_db(my_db) %>% filter(Description != "", Rows != 0) #> Database Table Rows #> 1 l2t BRIEF 224 #> 2 l2t Child 224 #> 3 l2t EVT 224 #> 4 l2t LENA_Admin 182 #> 5 l2t LENA_Hours 2968 #> 6 l2t MinPair_Admin 190 #> 7 l2t MinPair_Responses 7674 #> 8 l2t PPVT 224 #> Description #> 1 Scores from Behvr Rating Inventory of Exec Func (Preschool) #> 2 Unique IDs and demographics of children in database #> 3 Scores on Expressive Vocabulary Test 2 #> 4 LENA recordings #> 5 Stats from LENA recordings by hour-of-day #> 6 Administrations of the Minimal Pairs experiment #> 7 Trials and responses from the Minimal Pairs experiment #> 8 Scores on Peabody Picture Vocabulary Test 4
Finally, I assemble these three bits of functionality (back up each table, download field comments from each table, and download table status) together into a function called l2t_backup. This function writes all of these bits of information to a timestamped directory. Note that the final two messages refer to the metadata csvs.
l2t_backup(my_db, "inst/backup") #> Writing inst/backup/2015-08-19_09-51/BRIEF.csv #> Writing inst/backup/2015-08-19_09-51/Caregivers.csv #> Writing inst/backup/2015-08-19_09-51/Child.csv #> Writing inst/backup/2015-08-19_09-51/ChildStudy.csv #> Writing inst/backup/2015-08-19_09-51/EVT.csv #> Writing inst/backup/2015-08-19_09-51/FruitStroop.csv #> Writing inst/backup/2015-08-19_09-51/LENA_Admin.csv #> Writing inst/backup/2015-08-19_09-51/LENA_Hours.csv #> Writing inst/backup/2015-08-19_09-51/Literacy.csv #> Writing inst/backup/2015-08-19_09-51/MinPair_Admin.csv #> Writing inst/backup/2015-08-19_09-51/MinPair_Responses.csv #> Writing inst/backup/2015-08-19_09-51/PPVT.csv #> Writing inst/backup/2015-08-19_09-51/SES.csv #> Writing inst/backup/2015-08-19_09-51/Scores_TimePoint1.csv #> Writing inst/backup/2015-08-19_09-51/Study.csv #> Writing inst/backup/2015-08-19_09-51/StudyTask.csv #> Writing inst/backup/2015-08-19_09-51/VerbalFluency.csv #> Writing inst/backup/2015-08-19_09-51/metadata/field_descriptions.csv #> Writing inst/backup/2015-08-19_09-51/metadata/table_descriptions.csv
explainr translates S3 objects into text using standard templates in a simple and convenient way.
The point of homework in graduate-level stats is to practice churning out results-section boilerplate. explainr is an in-development R package that takes care of some of that boilerplate for you.
library("explainr") # Wrap in blockquote and print blockquote <- function(x) sprintf("<blockquote>%s</blockquote>", x) # Test of equal proportions ptest <- prop.test(x = 500, n = 1008) text <- explain(ptest) text %>% blockquote %>% cat
Which yields:
This was a one-sample proportion test of the null hypothesis that the true population proportion is equal to 0.5. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that true population proportion is different than 0.5. The observed sample proportion is 0.496031746031746 (500 events out of a total sample size of 1,008).
The confidence interval for the true population proportion is (0.464746, 0.5273481). This interval will contain the true population proportion 95 times out of 100.
The p-value for this test is 0.8254979. This, formally, is defined as the probability – if the null hypothesis is true – of observing a sample proportion that is as or more extreme than the sample proportion from this data set. In this case, this is the probability – if the true population proportion is 0.5 – of observing a sample proportion that is greater than 0.503968253968254 or less than 0.496031746031746.
# Power calculation of t-test test <- power.t.test(n = 20, delta = 1) class(test) <- "power.t.test" text <- explain(test) text %>% blockquote %>% cat
Which yields:
A total of 20 subjects will be enrolled in this study. Using a two-sample t test of normal means we have 86% power to detect a treatment difference at a two-sided 0.05 significance level, if the true difference in the response between treatment groups is 1 unit. This is based on the assumption that the standard deviation of the response variable is 1.
For more sophisticated statistics, a visual summary is preferable.
m <- lm(mpg ~ cyl * wt, mtcars) explain(m, "visual")
It's not clear when or if explainr will materialize into a published package on CRAN, but it's a great concept.
We wanted a task that would require the infant to show at least some evidence of semanticity, so it was essential that the task not just require the infant to listen to the form of a word but also require the infant to link the word to an object or event. But we wanted as well to be likely to capture infant sensitivities as close as possible to the 10-12 month age. At this age, word comprehension knowledge likely involves only recognitory understanding rather than referential understanding (Oviatt, 1982; Hirsh-Pasek and Golinkoff, 1996). Recognitory comprehension involves only associative knowledge, and thus simply requires that the infant learn the link between a word and an object. Referential comprehension requires an understanding of the symbolic link between a word and an object, entailing the notion that the word can stand for the object even when the object is not present. We thus developed a word-object associative-learning task in which the infant is required to learn the arbitrary association between a particular word and a particular object, and to show recognition of that learning by differential looking to a pairing that violates, versus one that maintains, the associative link that has been learned. Even if this type of associative word-object learning does not constitute full referential word learning, there is considerable consensus that such associative learning provides a necessary step toward referential word learning.
Werker, J. F., & Fennell, C. T. (2004). Listening to sounds versus listening to words: Early steps in word learning. Weaving a lexicon, 79-109.
Posting this paragraph because 1) I had never heard about the distinction between recognitory and referential comprehension until now and 2) I like how the authors describe what the baby has to learn and how he or she has to then demonstrate that learning in the lab.
Mirman on individual differences
In honor of Dan Mirman visiting our campus to give a workshop, I'd like to quote my favorite chunk of his book on growth curve analysis.
But first, let's look at some plots. In particular, we are looking at some eyetracking data, provided by Mirman. Subjects were shown four images and heard the name of one of the images (i.e., the target). The target words fell into two categories: high frequency versus low frequency. The plot below shows the proportion of looking to target over time averaged across participants and across conditions:
That's pretty cool. High frequency words were viewed more quickly, so they are more easily recognized or processed.
But we have nested data; each subject contributed multiple trials. Individuals are different, as the raw data show:
High frequency beats low frequency in most of the subjects, but for some subjects there is less of an effect of frequency.
Through the miracle of multilevel modeling, we can estimate curves for each individual participant, allowing us to capture individual variability in our model estimates. We do this by estimating fixed effects (like the average curves from the first plot) and estimating random effects, which describe how individuals and their individual curves deviate randomly from the overall curve. The plot below shows the overall curves (thick, shaded in a confidence interval) and the individual curves (thin):
As before, high frequency still beats low frequency, but the fixed effect reflects a phantom of aggregation--the "hypothetical prototypical individual" in the quote below. The frequency effect varies across people, as the individual curves illustrate. How and why these individual curves differ is the deeper, next-level question we can and should ask ourselves, as Mirman discusses below.
Traditional analyses like t-tests and ANOVA assume random variation among individual participants and stop there, limiting theories to describing a hypothetical prototypical individual. However, we can ask a deeper question: what is the source of this variability among individuals? This is an important question because individual differences provide unique constraints on our theories. Insofar as individuals differ from that prototype, this tells us something about how the system (cognitive, psychological, behavioral, neural, etc.) is organized. A good theory should not just account for the overall average behavior of a system, but also for the ways in which the system's behavior varies. For example, a good theory of human language processing should not only account for how typical college students process language, but also how language processing develops from infancy through adulthood into old age and how it breaks down, both in developmental and acquired disorders. All of this variability is not random--it is structured by the nature of the system--but we can't understand that structure unless we can quantify individual differences. Traditional data analysis methods like t-tests and ANOVAs do not provide a method for doing this. (Mirman, 2014, p. 8)
Emphasis mine.
Mirman, D. (2014). Growth Curve Analysis and Visualization Using R. Florida, USA: Chapman & Hall/CRC.
You know that if your computer beats you at chess, it is really the program that has beaten you, not the silicon atoms or the computer as such. The abstract program is instantiated physically as a high-level behavior of vast numbers of atoms, but the explanation of why it has beaten you cannot be expressed without also referring to the program in its own right. The program has also been instantiated, unchanged, in a long chain of different physical substrates, including neurons in the brains of the programmers and radio waves when you download the program via wireless networking, and finally as states of long- and short-term memory banks in your computer. The specifics of that chain of instantiations may be relevant to explaining how the program reached you, but it is irrelevant to why it beat you: there, the content of the knowledge (in it, and in you) is the whole story. That story is an explanation that refers ineluctably to abstractions; and therefore those abstractions exist, and really do affect physical objects in the way required by the explanation.
David Deutsch, Chapter 5, The Reality of Abstractions (2011)
This quote nicely reiterates the importance of using the appropriate level of description when talking about computations. Douglas Hofstadter makes this point with a bunch of analogies in I Am A Strange Loop. Some of his examples, off the top of my head, include:
You don't need to know the exact state of every molecule of gas to describe a balloon popping--in fact, the arrangement of the system could be vastly different and the outcome would not change at all. Pop!
Being a car mechanic doesn't help you explain the flow of traffic.
Having a well-defined set of abstractions allows you to escape the incomprehensible complexity of describing physical substrates. Once you start using these abstractions and start talking about the system at a convenient and comprehensible level of description, then you grants these abstractions the causal power to affect physical objects, as noted in the quote above. And that's okay! Words are symbolic abstractions, and words do things.