And just like that my timeâs up, and Iâm leaving the University of Manchester after almost half a decade. Sentimental perhaps, not morose - above are some parting shots (I still havenât taken my film camera down off the shelf).
I started Nonlinear Dynamics: Mathematical and Computational Approaches last night, from the fantastic Liz Bradley, at the Santa Fe Institute (notable hub for such research, which I first encountered through renowned complexity theorist Stuart Kauffman).
Itâs really satisfying having received notions of âflowâ rendered concrete within dynamical systems theory, giving a framework to trains of thought so ephemeral they begin to feel impossible to work into a scientific format.
A pair of papers (from Yogi Jaeger, whose work I've written about previously) recently talked in such terms, and reinvigorated my memories of studying biological oscillations, within which can be glimpsed this strange crossover of biology and fluid dynamics. As summarised by Anton Crombach:
Dynamics of body plan segmentation in long and short-germ insects more similar than previously thought: A damped oscillator imposes temporal order on posterior gap gene expression in Drosophila
Less simplification, more insight. Analysis of phase space in a data-driven model with time-dependent parameters: Dynamic Maternal Gradients Control Timing and Shift-Rates for Gap Gene Expression
Thereâs a good 50 year-long backstory to this field â for a time complexity science was the âhot topicâ in biology (not necessarily in positive terms!). The hype I've seen of it in magazines in the 90s, becoming exploited for publicity with some canny approach to influencing stock markets no doubt, harks to âBig Dataâ in recent years (though do note that there is valid and fascinating, biological analysis-relevant work on multidimensional data beneath Big Data's spiel).
I've been reading Complexity: Life at the Edge of Chaos by Roger Lewin, who gives a staggering account of the field for a lay audience, and nothing short of a tour of Hollywood for a biochemist who's bumped into these figures in dribs and drabs.
(I was disappointed to find that the reason I hadnât heard of Lewin prior was likely to do with his conversion to pseudoscience in later life, in what would be his final published book)
In a 2015 talk, Marc Vidal (Harvard) â proponent of the âedgeticâ view of biological networks which Iâve written about previously â showed a map of the world of complexity science briefly - traced back to this site.
My notes on the first module are in a Wiki here, though a problem Iâm currently finding is that their relevance is to some extent conditional on my own â increasingly specific â interests, using complexity theory to understand nonlinear dynamical systems, including the undeniably abstract notions of optical and attention processes in theories of mind, for application to neural network processing of knowledge (to put it as plainly as possible).
Itâs not all pie in the sky however, for an idea of why these are relevant, take a look at this blog post: Attention and Augmented Recurrent Neural Networks, by Chris Olah and Shan Carter at Google Brain. To follow the research in this field, you need to take up background readings in psychology (as notions such as transfer learning will be dropped into technical discussion, direct analogies from human psychology).
In other news, the company Iâm interning with changed its name to benevolent.ai this week⌠the new site is an order of magnitude more swish (and it was already pretty neat) - it really seems like a different venture all of a sudden. The former vice president of IBM Watson was announced to be joining the company on the same day: IBMâs AI guru leaps over to Brit biz benevolent.ai. Itâs... a lot to take in.
Three talks I found by the man in question, JĂŠrĂ´me Pesenti, are all kinds of incredible:
Keynote discussion on IBM Watson and Big Data insight (2014)
"Cognitive computing | TEDxBermuda" (2014)
"New advances in cognitive computing" (2015), a talk at ENS Cachan (i.e. more intended for researchers than public relations, but not overly technical)
From Alison B. Lowndesâs Bachelors dissertation, Deep Learning with GPU Technology for Image & Feature Recognition:
Cognitive computing systems, such as IBMâs Watson, allow reasoning via probabilistic natural language processing for interacting more naturally with computers than ever before. The pace of progress is so great within cognitive deep learning that NVIDIA is now shipping a plug-n-play CNN âapplianceâ for the academic and developer community integrating 4 GPUâs and their DIGITS system, a powerful visualisation and configuration tool for neural networks, which I also demonstrate in this report.
benevolent.ai is situated in the âKnowledge Quarterâ, neighbours with Facebook, Google and research institutes from the Crick to the Alan Turing Institute. Itâs⌠a lot to take in.
Spot benevolent.ai [formerly known as Stratified Medical] right next to Euston station⌠via: knowledgequarter.london
The past few weeks have been pretty incommunicable, and fruitful in some ways, even if I had found the time to write much here. Iâve begun some new literature curation projects to channel and structure (to avoid overloading naivelocus.com):
thermodynamics and dynamical systems at spin.systems (for now site is just a placeholder, readings are being synced to Twitter @systems_spin)
machine learning, computer vision and optics/microscopy at @naiveoculus (no website)
There's a sense of being on the edge of understanding, and âgood-enoughâ to proceed (though I'm not an artificial intelligence expert, I think that's understood), having a sufficient grasp lets you carry on downstream. Chris Meiklejohn, one of the technical workers Iâm currently idolising [for want of a better word] wrote recently about how this process (not quite the overbaked complaint of âimpostor syndromeâ) of riding out to overcome... incompleteness of your experience (or knowledge) is not so much fraudulent as an honorable task, and a necessary period for upstarts in interdisciplinary work.
Speaking of which, another paper recommendation: Greg Wilson, Jennifer Bryan, Karen Cranston et al. (2016) Good Enough Practices in Scientific Computing. arXiv: Software Engineering (cs.SE), 1609.00037
Coming back to the sense of ideas being rendered concrete, I feel a lot of whatâs happened in the past few months wonât settle for a long time to come, however there are bubbles in which it seems to be useful in my workaday research. The mathematical readings Iâm undertaking now are more out of necessity not to have certain trains of thought suffocate in my ignorance of lower level detail (I have seen computing researchers write - though not without rebuttal - that mathematics/physics ignorance is a barrier to where you can take your research in this domain). I wish Iâd been given the support for these as an undergraduate, but the idea is barely on the table, and I don't really know who the relevant parties are to take the issue up with. MOOCs are great, but nothing can stand in for meaningful institutional support.
On that Iâll say no more â oh and if you are interested, I found out too late that the Royal Statistical Society and SIAM (Society for Industrial and Applied Mathematics) both offer free membership for undergraduate students. Such is life.
A couple of sites at which I see abstract mathematical notions being relevant to biochemical research:
â Louis Maddox (@biochemistries) 7 September 2016
See: tweet, on this paper proposing "noncommutative biology" (nicely summarised in the conversation shown above)
I've not written about permutation interconnect networks but they're very easy to find out more about in the network computing literature: see for example this paper: "Fast subword permutation instructions based on butterfly networks" which describes how interconnection networks are used to solve permutation problems down at the machine hardware level of a computer.
The modern scientific programming language Julia gets you down to this level of abstraction (the magic words "code_native" let you see the 'machine code' rather than the human-readable code written by the programmer, such as a simple for loop) to repeatedly do something on a list of items (e.g. for a biology lab's set of genes, proteins, etc.).
On LLVM and Julia, see Working with LLVM
Julia: Reflection and introspection
Leah Hanson's blog: Julia introspects
All in all it doesn't seem too far-fetched to bring these two disparate fields of reading (optical/network computing and modelled biological regulation) together, and may help the latter go further. However, it can be difficult to explain or reason through in terms others share while the ideas are coming into focus.
Cats in trees, permutohedra, and other colourful associahedra
Iâd never seen a geometrical representation of phylogenetics before (the âtreeâ diagrams used to investigate heritage, similar to pedigree charts but where branch length indicates some measure of evolutionary divergence from an ancestor).
Need to read a bunch of papers in more detail, but discovering the word âpermutohedronâ has made my day â as well as the fact it has an alternate spelling, permutahedron (both show up in the literature).
As well as a mathematical concept, as in "permutations and combinations", a permutation in common usage is an anagram, or an alternative presentation of something. The dual spelling seems to be the sort of terrible deliberate meta-humour dearly beloved of mathematicians that works its way deep into tacit usage if you ask me... By way of example, statistical programmer extraordinnaire Hadley Wickham released another R package recently, forcats.
It's an anagram of factors, and concerned with the statistical concept of 'factor'. The R source code documentation notes:
(the terms âcategoryâ and âenumerated typeâ are also used for factors)
Hence, to the statistically-minded, Hadley's squeezing both cats (a nickname for 'categories of small categories', or 2-categories, in category theory) and permutation into a pun... or I may be reading too much into it ;-)
Some thought that sprung into my head at the time of using forcats for the first time, from its key concepts of factors, levels, and orderings, led me to [as I understand it] the formal geometric development of factors, in a 1982 publication I couldn't find online but mentioned in âFactor Space, the Theoretical Base of Data Scienceâ and material on the subject.
Excerpt from The Basics of Factor Spaces, chapter 12 of Fuzzy Neural Intelligent Systems (2000):
The original definition of âfactor spacesâ was proposed by Peizhuang Wang [l]. He used factor spaces to explain the source of randomness and the essence of probability laws. In 1982, he gave an axiomatic definition of factor spaces [2]. Since then he has applied factor spaces to the study of artificial intelligence (Wang 1990, Kandel 1990). Several applications in the area of fuzzy information processing have been discussed (Li 1994). This chapter provides an introduction to the basic concepts and methods of applications of factor spaces.
To cut a large body of work short, Peizhuang Wang, who originally proposed 'factor space', is now writing about âCognition Math Based on Factor Spaceâ. Update - I'm not sure if I want to suggest reading this paper after feedback, see note below for an alternative.
Big data is the era we are faced with, the character of big data is I & I, Internet and Intelligence. Internet is the wing and intelligence is the soul of information revolution [1,2]. Big data era is not beyond, but belonged in the historical stage of information revolution, and the core of big data is intelligence still. However, big data is a new stage at the internet time. When all computers communicate each other on line, what is the kind of computer like? The role of the CPU of the computer has been marginalized and the data processing software plays the main role of AI. The entity of AI machine, so called the fifth or post-fifth generation computer, will be replaced by the man-machine cognition network [3]; the mode of AI will be changed from the bottom-top manual work to the combination of top-bottom and bottom-top network; which makes the man-machine cognition network intelligently huff ânâ puff big data from internet. Different from human intelligence, the subject of AI is not brain, but machine. How do machine emulate intelligence of brain? Is it possible to construct a brainlike machine? No matter how advanced science becomes, it is not possible to make a machine as a clone of brain. It is mystery that the insuperable barrier does not take away all belief from AI researchers. Even though the ebb of fifth generation computer in 1990s hints that computer must emulate from the structure of human brain, people still have confidence on AI facing the difficulty of structure-emulation. Indeed, we would cognize that brain is the cognitionâs subject, but not the very cognition. Is there a cognition theory keeping a little independence from brain? It concerns with the relationship between cognition information and ontological information [3]. Even though brain has influence to ontology information, ontology information is independent from the subject of cognition essentially, and there exists inner cognition theory to guide artificial intelligence. There were theories arising in artificial intelligences, unfortunately, they are not deep and united but shallow and split [3,4]. There have been no deep and united artificial intelligence theory yet. No a strong theory, no substantial practice! Therefore, we are going to build a strong theory of artificial intelligence [5,6].
Artificial intelligence canât be achieved without mathematics. Existent heterogeneity of intelligence theory is caused by the heterogeneity of mathematics. To build a united information theory, we must to build a united cognition math.
(None of the references were available through Google Scholar or online materials I could find, included below anyway)
1. Wang PZ (2014) Factor space, a mathematical preparing for the coming of big data tide (special talk), High-end Forum on Big Data. Chinese Academy of Sciences, Beijing
2. Huang CF (2015) A way to test if wisdom network can improve intelligence (Special talk). In: International conference of orient thinking and fuzzy logic: the 50th anniversary of fuzzy sets. Dalian, China
3. Zhong YX (2002) Information science principle. Beijing University in Posts and Telecommunication Press, Beijing
4. He HC, Ma YC (2008) Information, intelligence and logics. North-west University of Polytechnical Press, Xiâan
5. Zhong YX (2012) Higher principle of artificial intelligence, the idea, method, model and theory. Science Press, Beijing
6. He HC, Wang H, Liu YH, Wang YJ, Du YW (2001) Universal logics principle. Science Press, Beijing
Update: I mentioned Wang's recent paper on 'cognition math' to Hadley and he replied that "I would best summarise that paper as jolly obfuscatory"... between the lines of which I guess means it's not just a language barrier but er, work not worth my time reading... Going to leave the above here anyway (references not being on Google Scholar is probably the red flag to note in future). Instead, see: Garrett Grolemund and Hadley Wickham (2014) A Cognitive Interpretation of Data Analysis. International Journal of Statistics 82(2)
Hadley took up a post as Adjunct Professor of Statistics at the University of Auckland late last year, but continues his role as chief data scientist with RStudio (developers of a major code editor used with R, widely used by life sciences researchers).
Flowers and Bouquets, above, comes via Devadoss et al. (2013) Polyhedral Covers of Tree Space.
Further reading:
Pattern-Avoiding Polytopes
A new paper posted to the arXiv, within which I discovered the permutohedron and dug out the rest of the papers in this listâŚ
Associahedra via spines
Which nestohedra are removahedra?
See: nestohedron
A Terrible Expansion of the Determinant
Permutrees
Graph properties of graph associahedra
See: associahedron
Convex Polytopes from Nested Posets
See: partially ordered sets(posets)
Poset vectors and generalized permutohedra
Free algebraic structures on the permutohedra
Generalized Permutohedra from Probabilistic Graphical Models
Most of evolutionary theory has abstracted away from how information is coded in the genome and how this information is transformed into traits on which selection takes place. While in the earliest stages of biological evolution, in the RNA world, the mapping from the genotype into function was largely predefined by the physicalâchemical properties of the evolving entities (RNA replicators, e.g. from sequence to folded structure and catalytic sites), in present-day organisms, the mapping itself is the result of evolution. ⌠several in silico evolutionary studies [have examined] the consequences of evolving the genetic coding, and the ways this information is transformed, while adapting to prevailing environments. Such multilevel evolution leads to long-term information integration. Through genome, network, and dynamical structuring, the occurrence and/or effect of random mutations becomes nonrandom, and facilitates rapid adaptation. This is what does happen in the in silico experiments. Is it also what did happen in biological evolution?
Excerpt from the introduction to a chapter in Evolutionary Systems Biology by Paulien Hogeweg, who coined the term 'bioinformatics' (to mean informatics of biotic systems).
She gave the keynote to the European Conference on Computational Biology today, introduced by her PhD student Jaap Heringa. The talk doesn't seem to be online yet, but the phrases in its title come from this chapter in _Evolutionary Systems Biology, which was cited as 'in press' at the time of publication of her brief review of James A. Shapiro's Evolution: A View from the 21st Century
⌠[the] profound⌠challenge is to unravel how evolution generated such richly structured, versatile evolving systems that biological experiments have shown biological systems to be. This question is conspicuously absent throughout the book, and he explicitly claims in the last chapter that it is(presently) outside the realm of science. In my opinion in silico evolution however can shed light on the evolution of evolution if it is allowed enough degrees of freedom. Indeed this is currently my main research focus, the main results relevant to the present discussion are reviewed in (Hogeweg 2012). Taking the classical Darwinian random mutation and selection as starting point, we have shown that through genome structuring through transposable elements, and through evolved genotype phenotype mapping, long term evolution leads to random mutations which are non-random in occurrence and/or effect and biased to advantageous mutations. We have also shown that the genome dynamics, gleaned from phylogenetic reconstruction, and experimental evolution, is mimicked in our models. This is just a beginning, but it shows we are still far from understanding what the basic paradigm of ârandom mutation selectionâ can do. Much remains to be discovered (yes in the twenty-first century!).
via Manuel Corpas, Twitter
What I'm reading this morning
Letsou & Cai (2016) Noncommutative Biology: Sequential Regulation of Complex Networks.. PLOS Computational Biology
Venkataram et al. (2016) Development of a Comprehensive Genotype-to-Fitness Map of Adaptation-Driving Mutations in Yeast Cell (pdf, DOI link here)
Ross (2016) The Dark Matter of Biology. Biophysical Journal
Nissen et al. (2016) Publication bias and the canonization of false facts. arXiv [Physics and Society]
Jacobs et al. (2016) Structure-Based Prediction of Protein-Folding Transition Paths. Biophysical Journal
Blog post: Should we sample time series more frequently? [a report from the Royal Statistical Societyâs international conference taking place in Manchester today]
Rapid Brownian Motion Primes Ultrafast Reconstruction of Intrinsically Disordered Phe-Gly Repeats Inside the Nuclear Pore Complex in Scientific Reports, 2016
Just came across this recent work from the Mofrad lab (UCB and Lawrence Berkeley National Laboratory), and just blown away by it. The NPC contains nucleoporin proteins, which are âintrinsically disorderedâ (i.e. highly dynamic rather than rigidly structured, see blog tag / Wiki) and as such come up âfuzzyâ to any imaging tool a lab cares to throw at it.
This dynamism is core [literally] to nuclear transport, letting hydrophobic FG [phenylalanine-glycine motif] repeats flail every which way in the confined space of the nuclear pore complex channel (often described as a âbasketâ composed of ~30 different nucleoporins), creating an âoily spaghettiâ as itâs described here, i.e. a hydrogel which instills a phase separated boundary across which ~1000 molecules pass per second (described in notes I wrote here last year).
See also: 2013 post on Peter Tompaâs review of hydrogel formation by IDPs, âmicrotrabecular latticeâ reincarnate as he would have it.
Here, the hydrogel is presented as self-oscillating â compared to the fluctuations predicted by Alan Turing in 1952 (as written about a couple of years back, here). BelousovâZhabotinsky reactions were observed in the 1960s, as Turing had foreseen from thinking about developmental processes, instantiated in a formal mathematical framework as reaction-diffusion systems.
In this piece the authors write:
The concentration fluctuations within the FG-meshwork are reminiscent of the cyclic motions of self-oscillating gels induced by an oscillatory chemical reaction called the Belousov-Zhabotinsky (BZ) reaction (Maeda 2008). In those systems, periodic chemical energy of the BZ reaction is converted to mechanical oscillations within the polymeric meshwork. Indeed, fluctuations in self-oscillating hydrogels has been harnessed for mass transport and cargo delivery purposes(Murase 2008, Shinohara 2008). However, there is a fundamental difference between those systems and the NPC in that no chemical reaction occurs inside the FG-meshwork, nor is there any external source of energy to wriggle the FG-meshwork.
Indeed, the strange beauty of IDPs is that their effects are purely entropic - of the possession of a jam-packed microcanonical ensemble, which can be viewed in thermodynamic terms or through information theory (mentioned regarding Boltzmann recently, and in the post last week on critical states, I touched on how these threads were pulled together by Shannon).
Back to Berkeley, the authors go on to present a totally new view of the situation:
Instead, we propose that the thermal noise, spreading through the geometrically confined NPC channel that hosts numerous transient, individually weak hydrophobic and electrostatic bonds, along with the delicate structures of the end-tethered FG-repeats, en masse produce incessant rapid Brownian motion, leading to continuous concentration fluctuations in the NPC channel.
I suppose thermal noise would have been taken as given in descriptions of motion at this scale by previous authors, but in this work it is explored quantitatively: the authors found the âwrigglingâ induced by incessant fluctuations of Brownian motion produced by the reverbration of thermal noise upon these radially confined and channel-tethered molecules kept the NPC permanently sealed.
Butting in with a book recommendation to readers at this point: Howard C. Bergâs Random Walks in Biology.
This begs the question⌠if a molecule breaks through, how long does it take to âreconstructâ the dense meshwork? Satisfyingly, they wrote that inter-FG motif hydrophobic bonds suction matters back together, âself-healingâ (but later this was disproven by simulation, hydrophobic crosslinks were suggested instead). The fine details of simulation take the wind out of this really nice result so I wonât go into their minutiae.
As well as the analogy to BZ reactions, the authors compare the system to the cell crawling mechanism (actin filaments driving lamellipodium protrusion, attachment, release/retraction and elastic-propelled procession along the surface).
We quantified the reconstruction pattern and time for the FG-meshwork in detail and proposed a time-dependent relation for this process which is biphasic with a rapid and a slow phase. The reconstruction occurs mainly during the first phase, which is mainly entropically driven; a cavity within a dense meshwork is entropically highly unfavorable, and thus, once the macromolecule passes through, the meshwork quickly rearranges itself. Once the configurational entropic cost is compensated, the density fluctuations within the cavity, Ď(t), continues more slowly with lots of âsmall-amplitudeâ peaks and valleys during the saturating phase. Imaginably, favorable inter- and intra-FG-meshwork hydrophobic as well as electrostatic interactions play a more visible role in this phase.
âŚtime of reconstruction, Ď, lies anywhere between 0.44Âą 0.12Îźs and 7.91Âą 0.30Îźs, depending on the macromolecule size and shape. Given the transport time of a single macromolecule is anywhere between 3ms to several seconds, the reconstruction occurs three to seven orders of magnitude faster than the actual transport. This indicates that the reconstruction process is ultrafast and âinstantaneousâ compared to the timescale of the entire transport process.
Note to readers who donât come across biophysics much, in statistical mechanics angle brackets âŠaround something⪠indicate âover an ensembleâ, as in âaveragedâ often (they can also be Dirac notation)
More precisely, Ď is even shorter than the time of local diffusional motion of the macromolecule. Here, by local we mean the time it takes for a particle to diffuse a distance equal to its characteristic length, L. For example, for a globular cargo of 20 nm in diameter it takes about = /6Dâ 15.0Îźs to diffuse its diameter in cytoplasmic viscosity47. Since the cargo diffusion inside the NPC channel is slower than that within the cytoplasm45, 15.0Îźs is the lower bound on the local diffusional time of a 20-nm globular cargo inside the NPC. Yet, Ď for the cavity created by the same cargo is 3.89Âą0.14Îźs (ÂąSEM) (Table 1), meaning that the reconstruction process is at least fourfold faster than the local diffusional motion. This implies that the cargo does not leave any void behind itself, suggesting the NPC channel is perpetually sealed so that the traffic of macromolecules does not lead to breaking the permeability barrier or âleakingâ
There are more conclusions which I wonât go into: within the radial zone of FG motifsâ wanderings thereâs a ârod-like zoneâ of particular rigidity, and various fascinating curiosities noted through experiment: such as how hydrophobic binding spots may scale with size of the macromolecular surface.
One thingâs for sure: physicists still want to drill down to the finer details, as ever.
Nonetheless, from the viewpoint of polymer physics criteria, whether the FG-meshwork is truly a âgelâ, or an entangled meshwork of polymers, or a confined polymer brush, or something else, awaits further in-depth mechanical investigation.
In his book on IDP structure and function (to my knowledge the only such text on the topic), Peter Tompa stuck to 'entropic brush' - I think there's acknowledgement that these are only convenient placeholders until later work follows up proceedings.
Ending with a nod to the systems biological analogies they left as they went, of oscillatory processes and self-regulation, the authors conclude:
More importantly, the current study also suggests that the aggregation of FG-repeats within the NPC channel can be viewed as a novel biopolymeric stimulus-responsive network that immediately changes its conformation in a stimulus-dependent manner. In a shape- and size-dependent way, FG-meshwork ârapidly opens upâ in response to hydrophobic affinity difference, while the intrinsic ultrafast Brownian motion of FG-repeat biopolymers âquickly closesâ the meshwork as soon as such a stimulus disappears. These call for new investigations on the NPC from novel biopolymer physics viewpoints, combined with shape effects
Whatâs that phrase, everything that connects computesâŚ?
Further readings
Some of my favourite IDP papers of late, starting with the father of the field:
Vladimir Uversky (2016) Dancing Protein Clouds: The Strange Biology and Chaotic Physics of Intrinsically Disordered Proteins. The Journal of Biological Chemistry
Vladimir Uversky (2016) Protein intrinsic disorder-based liquidâliquid phase transitions in biological systems: Complex coacervates and membrane-less organelles. Advances in Colloid and Interface Science
Bergeron-Sandoval (2016) Mechanisms and Consequences of Macromolecular Phase Separation. Cell
Wu and Fuxreiter (2016) The Structure and Dynamics of Higher-Order Assemblies: Amyloids, Signalosomes, and Granules. Cell
Schmidt and GĂśrlich (2016) Transport Selectivity of Nuclear Pores, Phase Separation, and Membraneless Organelles. Trends in Biochemical Sciences
Nott et al. (2015) Phase Transition of a Disordered Nuage Protein Generates Environmentally Responsive Membraneless Organelles. Molecular Cell
Patel et al. (2015) A Liquid-to-Solid Phase Transition of the ALS Protein FUS Accelerated by Disease Mutation.. Cell
Csizmok et al. (2016) Dynamic Protein Interaction Networks and New Structural Paradigms in Signaling. Chemical Reviews
Pak et al. (2016) Sequence Determinants of Intracellular Phase Separation by Complex Coacervation of a Disordered Protein. Molecular Cell
...You get the picture - there really is a minor surge of papers on this topic, it's wonderful to watch a new aspect of biological regulation appear before our eyes. The word 'coacervate' just came back to me, it's oddly rarely used on my network but I've been seeing it in these papers regularly.
coacervate: a colloid-rich viscous liquid phase which may separate from a colloidal solution on addition of a third component.
Inevitable Wikipedia definition:
Coacervation is a unique type of electrostatically-driven liquid-liquid phase separation, resulting from association of oppositely charged macro-ions. The term "coacervate" is sometimes used to refer to spherical aggregates of colloidal droplets held together by hydrophobic forces.[1] Coacervate droplets can measure from 1 to 100 micrometres across, while their soluble precursors, are typically on the order of less than 200 nm. The name "coacervate" derives from the Latin coacervare, meaning "to assemble together or cluster".
The process of coacervation was famously proposed by Alexander Oparin and J. B. S. Haldane as crucial in his early theory of abiogenesis (origin of life/ prozikhozhdenic zhiney). This theory proposes that metabolism predated information replication, although the discussion as to whether metabolism or molecules capable of Template replication came first in the origins of life remains open and for decades the theory of Oparin and Haldane was the leading approach to the origin of life question.
My former research supervisor M. Madan Babu was off on a mountain at some unspecified conference the other week, a photo and audio recording of which popped onto my Twitter feed (Cell Conversations: Illuminating the Dark Proteome). In the conversation recorded, Madan asked âHow many phase separated structures can actually coexist, and how do they interact with each other? What kind of biology do they drive?â
...so subsequently that's what I've been ticking over on too. The list above came from looking through the citations to the particularly wonderful Nott 2015 study ("disordered nuage" protein coacervates, tying together P-granules, Cajal bodies and other incunabula of some new textbook understanding... there really is a whole hidden level down there...) - as well as my Google Scholar alerts Twitter bot on the topic, which leaves a tweet trail of mine and others' readings and occasional comments (it can be hard to remember from a title alone if you've read a paper sometimes).
Do check them out:
Google Scholar: citations to Nott et al. (2015) since 2016
Twitter: @IDP_papers
...it's with regret I note that I did set up a nice blog to send these to but forgot to link it up to the feed :-( idp-papers.tumblr.com - watch this space perhaps...
Post-script
My other notes from the Cell Conversations session (via Twitter)
New challenges: to decode combinatorial SLiM code & low complexity sequence patterns conferring ability to phase separate
âBarely scratched surface of signalling punctaâ, eg: neural Shh Src sequestrat'n in cancer
Madan Babu: âHow many phase separated structures can actually coexist, and how do they interact with each other? What kind of biology do they drive?â
Advice to next generation: âkey would be to be bold, and apply a multidisciplinary approach⌠try to link IDP function directly to biologyâ
In 1995, while he was a graduate student at McGill University in Montreal, the biomedical scientist Peter Friedl saw something so startling it kept him awake for several nights. Coordinated groups of cancer cells he was growing in his adviserâs lab started moving through a network of fibers meant to mimic the spaces between cells in the human body.
For more than a century, scientists had known that individual cancer cells can metastasize, leaving a tumor and migrating through the bloodstream and lymph system to distant parts of the body. But no one had seen what Friedl had caught in his microscope: a phalanx of cancer cells moving as one. It was so new and strange that at first he had trouble getting it published. âIt was rejected because the relevance [to metastasis] wasnât clear,â he said. Friedl and his co-authors eventually published a short paper in the journal Cancer Research.
Two decades later, biologists have become increasingly convinced that mobile clusters of tumor cells, though rarer than individual circulating cells, are seeding many â perhaps most â of the deadly metastatic invasions that cause 90 percent of all cancer deaths. But it wasnât until 2013 that Friedl, now at Radboud University in the Netherlands, really felt that he understood what he and his colleagues were seeing. Things finally fell into place for him when he read a paper by Jeffrey Fredberg, a professor of bioengineering and physiology at Harvard University, which proposed that cells could be âjammedâ â packed together so tightly that they become a unit, like coffee beans stuck in a hopper.
Nice piece I came across in Quanta magazine last week, after my own post on 'jamming' â biochemistri.es/jammed
I'll write again about this topic soon as it's quite a lovely mix of biophysical, biochemical and translational research, along with a long and elegantly communicated history to pick through.
Code-switching
I'm going to return to using this blog more often, but it'll be more free form than long dives into papers and subject areas (I was working on a system earlier this semester to facilitate that process, queck, but it's still bouncing around the drawing board).
One of the notions in the aforementioned is to provide little collections of links (codified as 'eggs', long story), so in that spirit... some papers, blogs, and links I fancy throwing in after the above:
Note for new followers:
I also have a stream of papers, articles, blogs etc., naĂŻve locus:
sync'd to Twitter here
...and posts to a blog you can follow through Tumblr at naivelocus.com / or using its RSS feed is here.
â§
"Nice set of intro articles on key concepts in statistical analysis of biological data. Useful teaching resource." in Nature, via Anshul Kundaje, Stanford, who works on machine learning - a key application of statistics in biology at the moment.
Two recent papers from the lab of Jacky Goetz on tumour biomechanics:
Seeing is believing: multi-scale spatio-temporal imaging towards in vivo cell biology. Journal of Cell Science, August 2016
Intravital Correlative Microscopy: Imaging Life at the Nanoscale. Trends in Cell Biology, August 2016
Network biology concepts in complex disease comorbidities fresh off the press at Nature Reviews Genetics
Scaling in complex systems: A link between the dynamics of networks and growing interfaces. Scientific Reports, December 2014
Feynman's original treatise on nanotechnology: 'There's Plenty of Room at the Bottom' (web page, PDF)
Many of these ideas are now possible (so says Subutai Ahmad, at machine learning company Numenta)
Plenty of Room Revisited (Nature Nanotechnology, 2008)
Fedde Benedictus, What is a dimension?
while you're on his blog, How natural is the natural logarithm?
... and Probability 0 is not impossibility
Bonus, more on the physics side of matters:
Space Emerging from Quantum Mechanics by Sean Carroll, theoretical physicist at Caltech
I'm not usually into that sort of thing, but this podcast interview with him apparently namechecks: Thomas Bayes, Rod Thorn, Albert Camus, Fitz Dixon, Christoper Nolan, Brian Greene, Orlando Woolridge...
I came across the quantum-emergent space blog post via the tech side of my online feeds (I've begun separating the biochemist in me off from that, but hopefully there's no need to do that here...)
Postscript to my last post, from John von Neumann and the Foundations of Quantum Physics
I noticed Ludwig Wittgenstein popping up as I read about Boltzmann earlier, it seems it was due to this 'picture theory' of his. I was shocked to find that in fact Wittgenstein had wanted to study under Boltzmann (via)
The early Wittgenstein can help us to understand modern physics. This may be unexpected, although we know that Tractatus was inspired by Wittgensteinâs study of the philosopher-physicists Heinrich Hertz and Ludwig Boltzmann. Wittgenstein often referred to Hertz and planned to study under Boltzmann, but was prevented from doing so by Boltzmannâs sudden suicide
Who knew...
In the passage above (pictured): the "irritating image" of âsomeone drowning while everything passes before his eyesâ, introduced by a lucid imagination from the word âRevue passieren zu lassenâ - to reminisce, literally 'to let pass a revue' [in the theatrical sense] -- though my German isn't up to much...
Today marks the 110th anniversary of Ludwig Boltzmannâs death, a suicide while on holiday from his post in Vienna in Duino, near Trieste, Italy, where he was born.
This remembrance came to me today via Seamus Blackley, Xbox co-creator and particle physicist whose Twitter avatar has been Boltzmann for years.
R.I.P. Ludwig Boltzmann, suicide today 110 years ago amidst brutal unrelenting harassment for his belief in "atoms."
I thought Iâd never heard of this before, but it turns out I had, yet the source I read it on first time round was quite sketchy [both senses of the word]. What Iâd not read of was the timing of this event in the grand scheme of 20th century natural science. The excerpts above highlight some of this significance.
In brief, Boltzmann was succeeded in his chair of physics at the university by former student Fritz HasenĂśhrl, in turn doctoral advisor to Erwin SchrĂśdinger. SchrĂśdinger writes of the dead kingsman as his world, my first love, horrendous hollow
Pictured in 2002: Boltzmannâs grave, marked with his formula for entropy
Photograph by Thomas Schneider, 2002, schneider.ncifcrf.gov/boltzmann.html. Schneiderâs site is worth investigating on the nature of entropy in biology, or else through his Google Scholar profile.
Working at the National Institutes of Health, Thomas specialises in biological information theory - on which he gave a brief review in 2010.
The idea that we could build molecular communications systems can be advanced by investigating how actual molecules from living organisms function. Information theory provides tools for such an investigation. This review describes how we can compute the average information in the DNA binding sites of any genetic control protein and how this can be extended to analyze its individual sites. A formula equivalent to Claude Shannonâs channel capacity can be applied to molecular systems and used to compute the efficiency of protein binding. This efficiency is often 70% and a brief explanation for that is given. The results imply that biological systems have evolved to function at channel capacity, which means that we should be able to build molecular communications that are just as robust as our macroscopic ones.
Boltzmann's letter to Frantz Brentano was an excerpt of Ludwig Boltzmann His Later Life and Philosophy, 1900â1906: Book One, the history of Viennese physics in the early 20C via The Golden Age of Theoretical Physics (Google brings up this freely available copy). The subsequent passage discusses how the young Erwin took up study of Hamiltonian mechanics. These subjects are seen in computational/quantum chemistry more than strictly biological sciences, though hybrid MC, a.k.a. Hamiltonian Monte Carlo, is an example of application in statistical physics which enters biology through bioinformatics.
SchrĂśdinger's What Is Life was highly influential in biology â a former university lecturer of mine, Matthew Cobb, wrote about this in The Guardian in 2013 (and is prone to a casual reference to Riemannian manifolds from what I remember).
While this may appear a little obscure to bioscientists, associates of this Viennese school abound in biophysics (Peter Debye for example lends his name to the Debye length used to quantify DNA rigidity scales), as well as being the casual muse behind the SchrĂśdinger equation.
Following up on de Broglie's ideas, physicist Peter Debye made an offhand comment that if particles behaved as waves, they should satisfy some sort of wave equation. Inspired by Debye's remark, SchrĂśdinger decided to find a proper 3-dimensional wave equation for the electron. He was guided by William R. Hamilton's analogy between mechanics and optics, encoded in the observation that the zero-wavelength limit of optics resembles a mechanical systemâthe trajectories of light rays become sharp tracks that obey Fermat's principle, an analog of the principle of least action.
See also:
Selected meanderings, mainly picked out of/pointing to Wikipedia, for the motivated student
Ludwig Boltzmann: The Man who Trusted Atoms
The general struggle for existence of animate beings is not a struggle for raw materials â these, for organisms, are air, water and soil, all abundantly available â nor for energy which exists in plenty in any body in the form of heat, but a struggle for [negative] entropy, which becomes available through the transition of energy from the hot sun to the cold earth.
"Entropy and life" - quite a nice Wiki entry which cites Boltzmann in its origins
Wahrscheinlichkeit, the German word for probability (in the context of Boltzmann's entropy formula)
Boltzmann brains: "a hypothesized self-aware entity which arises due to random fluctuations out of a state of chaos. The idea is named for the physicist Ludwig Boltzmann (1844â1906), who advanced an idea that the Universe is observed to be in a highly improbable non-equilibrium state because only when such states randomly occur can brains exist to be aware of the Universe. The term for this idea was then coined in 2004 by Andreas Albrecht and Lorenzo Sorbo." -- "Can the universe afford inflation?" Physical Review D, 70(6), 063528. (arXiv)
Boltzmann, a crater on the southern limb of the moon named for Ludwig: "This formation has become eroded by many tiny impacts, leaving the features rounded and worn. Little of the original rim still stands above the surrounding terrain, leaving only a depression in the surface."
Ergodicity, a term derived from the Greek ÎĎγον (ergon, work) and οδĎĎ (odos, path or way) â chosen by Boltzmann while working on a problem in statistical mechanics; giving rise to ergodic theory (a branch of mathematics studying invariance in dynamical systems).
Boltzmann's H-theorem, the tendency of a nearly-ideal gas to increase in the quantity H: subsequent discussion leads to Lochsmidt's paradox [concerning reversibility], Gibbs's formulation of the H-theorem, and Liouville's theorem, etc. etc....
relatedly, the assumption of molecular chaos in isolated systems, which has been hotly debated.
"Correct Boltzmann counting", associated with statistical ensembles as discussed on this blog in relation to protein conformational [i.e. folding] states for example; detailed in:
JR Ray (1984) (Eur J Phys)
Wikipedia: Maxwell-Boltzmann statistics
Boltzmann's principle of detailed balance, that "at equilibrium, each elementary process should be equilibrated by its reverse process" (which finds application in the statistics of Markov processes).
The Club of Vienna, headed up by zoologist Rupert Riedl whose philosophical writings savoured Boltzmann's attacks on Kant's "static" evolutionary categories of the mind (long story)...
The Wicht Club, formed by G. W. Pierce who studied as a postdoc in Boltzmann's Leipzig lab before moving to Harvard to teach.
Gustav Herglotz, a student of Boltzmann's who worked on relativity and seismology, but made contributions in number theory, differential geometry, complex analysis &c.
Diverse forms of matter around us can simply be classified as those that flow and those that donât and exhibit rigidity, a resistance to deform when a force is applied. This elementary distinction informs our ability to comprehend their properties and to manipulate them. It is easy enough to pick up a spoon, a solid that pushes back against our fingers, and pour a spoonful of sugar into and stir a cup of coffee, a liquid that will flow in response to the force we exert. A cupful of coffee, sugar poured into a cup, and water flowing down an incline, are examples of flowing matter, whereas a spoon or a lump of rock or window glass are solids, characterized by their rigidity. Of course, the same substance can exist in either a flowing or rigid state. Water that runs off a roof with ease on a warm day will turn rigid when the temperature drops and will hang down the edges as icicles, resisting gravity. Somewhat less obviously, one may think of the sugar that pours off the spoon easily as being in a rigid state inside the jar, as may be noticed if one attempts to press it down with the back of a spoon. These transformations are of obvious importance in understanding the properties of these substances. Of these, the transformation of water into ice is a well understood, so-called first-order phase transition, involving discontinuous changes in properties, such as density. The rigidity of ice is understood in terms of the periodic arrangement of molecules, which reducesâor breaksâthe continuous translational symmetry of the liquidâs microscopic structure, in which molecules have no such regularity. But the emergence of rigidity in substances that remain disordered when they transform from flowing to rigid states (such as sugar compacted into a jar) remains mysterious, and many fundamental aspects of the transition to the rigid state are poorly understood.
Srikanth Sastry, Critically Jammed - a commentary in this weekâs PNAS on a paper by Carl Goodrich and colleagues.
We have an intuition for the condensed matter physics of âcritical pointsâ, as Sastryâs recollection of the emergence of rigidity within a cup of coffee highlights so enjoyably.
As a brief aside, I came across the work of Douglas Hofstadter yesterday, who wrote of finding maths all too abstract as a university student. All that changed however, as told in "Analogies and Metaphors to Explain GĂśdel's Theorem".
I had always thought that I was a pretty abstract thinker, but what I began to realize about that time in my life was that, in fact, all of my thoughts are very concrete. They all are based on images, analogies and metaphors. I really think only in concrete ideas, and I found that I couldn't attach any concrete ideas to some of the mathematics I was learning.
In this issue of PNAS, Goodrich et al. propose a âWidom-like scaling ansatzâ. Ansatz is German for âinitial placement of a tool at a work pieceâ, and in practice used to mean:
the establishment of the starting equation(s), the theorem(s), or the value(s) describing a mathematical or physical problem or solution. It can take into consideration boundary conditions.
Widom-like is a reference to Widom scaling, after Benjamin Widom, a physical chemist awarded the Boltzmann Medal in 1998 "for his illuminating studies of the statistical mechanics of fluids and fluid mixtures and their interfacial properties, especially his clear and general formulation of scaling hypotheses for the equation of state and surface tensions of fluids near critical points".
For anyone wanting to study this in greater detail, itâs covered in chapter 3 of Phase Transitions and Collective Phenomena by Ben Simons of University of Cambridge Theory of Condensed Matter dept. (1997), PDFs at the link.
See also: The Jamming Transition and the Marginally Jammed Solid by Liu and Nagel (2010) Annual Review of Condensed Matter Physics, again free as PDF at the link.
Yeats bemoaned that âThings fall apart; the centre cannot hold; Mere anarchy is loosed upon the world.â Equally calamitous is that things get stuck and often seemingly at the worst possible momentâthey lose the ability to flow so that no further rearrangement is possible. This can occur as particles get wedged tightly together in a pipe on a factory floor or as molasses refuses to pour from a container as the temperature drops in winter. Of course, falling apart and getting stuck are just approaching from opposite directions this catastrophic transition in the dynamics known as the jamming transition. The metaphor of jamming is powerfulâit extends across many physical phenomena to social behavior and politics and even to the internal state of mind of a poet. Our goal here is to review some progress that has been made in seeing if the concept of a jamming transition has a more general applicability (in the purely physical realm!) than had previously been realized and whether it can unite our understanding of different ways in which a flowing material or liquid can gain rigidity.
I donât have time to read into this much more deeply to do this topic justice, but reading Sastryâs oh-so-casual lead-in to condensed matter physics, I felt the same excitement that I had back in undergrad at first reading of protein intrinsic disorder; that took me diving down into that rich interdisciplinary history (check my tagged posts and/or Wikipedia if youâre unfamiliar).
I hope reading the accounts here might lead other young life scientists to explore the biochem.-contiguous fields of biophysics, from thermodynamics to entropy and its encoding in information theory by Claude Shannon, at which point definitions become indistinguishably mathematicotheoretical.
Claude Shannon, A Mathematical Theory of Communication (The Bell System Technical Journal, 1948) - PDF
The proposal of ansatz scaling, as I understand it, reframes what was an energetic subject in a âscale-invariantâ form, a concept well known through âdimensionless numbersâ, e.g. atomic weight or pH (see more).
Not to retread that well-discussed IDP path for the sake of it, Iâll just mention that I was utterly amazed this week to stumble on an explanation of neural networks in terms of protein folding models  (âspin funnelsâ, in turn related to the broader biophysical topic of spin glasses).
âAn Intro. to the Theory of Spin Glasses & Neural Networksâ, Viktor Dotsenko (1994) - PDF
Perhaps itâs a bit too obscure for those unfamiliar with the current flurry of activity in machine learning research (which is really quite unavoidable in computational science circles), but I find the analogy a fantastic example of how information theoretic notions are whatever we wish to formulate them to be; how they may shift target from Facebook social network to protein atom long-range contacts. Models are just tools we use to approach problems, language with which to broach the mathematics underlying such processes.
Check out this 2013 post for a video and some links about the folding funnel model.
The lecture series on maths of deep learning can be found at joanbruna.github.io/stat212b.
I came across a talk the other week on quantum computing (D-Wave Systems founder Eric Ladizinsky), one application of which it turns out has been to solving protein folding optimisation problems (not to be confused with the protein folding problem itself).
From what I recall, there are only two (or three?) such machines made, one of which is with Google(?), another with NASA. From what I could find, the only published work in bioinformatics was a good few years ago to great media acclaim, from Alan Aspuru-Guzik (I asked on Twitter whether anyone knew of groups currently using the system and apparently not). From the prominence of the example in Ladizinsky's talk, I'd been convinced the work was ongoing, perhaps covertly - but no such excitement (Aspuru-Guzik continues to work in quantum chemistry, but it's hard to tell if there'll be new work with D-Wave on the horizon).
After reading the deep learning-as-spin funnel analogy, it got me wondering - would there be (or else was there potential for) a happy convergence of protein folding and artificial intelligence research? If both groups were able to devise their tasks with the same conceptualisation, at some level anyway... By partial explanation of the interest in this analogy: I'm starting an internship using deep learning systems for {bio/chem}informatics in a couple of weeks (with a company called Stratified Medical down in London) so no doubt I'll have time to develop some more substantial thoughts on the conceptual crossover.
Goodrich & co. write that their paper is a hopeful first step to reformulating the jamming transition in terms of renormalisation groups â and at this point I really am not qualified to explain, so do read the Goodrich paper instead!
See also:
Recent thesis, covers the "mapping between deep neural networks and renormalisation groups" nice and concisely: Fabian Holling, University of Cologne Institute of Theoretical Physics (August 2015) Renormalizing spin systems using deep learning techniques
Mehta & Schwab (2014) An exact mapping between the Variational Renormalization Group and Deep Learning
Image editing has become very important to me for processing, effectively a subconscious algorithm (Andrea Bertozzi's lecture at the Turing Institute this summer quite frankly blew my mind, and all its airy preconceptions of 'optics').
Looking into cancer is traumatising after it takes someone you know (agonisingly slowly). Particularly, when that someone had a clarity of vision you could only dream of. That you can never see a fraction of their power, barely perceive let alone understand.
Pictured is the camera I picked out today: no lens yet.
I have the cognitive terrain mapped out for a verbal/non-verbal reasoning suite of image processing functions which I aim to execute within the year, to assist researchers with what they call a 'working memory'. Life's spectrum is spectacular, and my thinking is we can only hope to conceive it fully before the tape is full / light blinks out, all comes to a head.
I'm reading various subjects in parallel at this point (and managing to arrange them successfully, at last) - ML/AI, statistics and static analysis, all of which infiltrate the fields of machine/computer vision. I'm flirting with the idea of a 'mathematical systems' PhD, but it seems like I'm stepping into industry for a hot minute first.
Deep thanks to those who continue to guide and sustain me through their brilliance - you know who you are.
Louis.
P.S. Those interested are invited to peruse my current research directions at `permut.co` - all comments & discussion appreciated. An example from this post is 'memory/network latency isomorphism', to use the language of #InformationTheory (#bioinformatics â #informatics). You can contact me through the regular channels as well as Twitter: đ¤ @biochemistries and @permutans
Tags for chance visibility: #machine vision #computer vision #image processing #NLP #natural language processing #machine learning #deep-learning #oncology #biotechnology
RIP -- Sir David Mackay, 22.04.1967 - 14.04.16 . "Everything Is Connected". Gone far, far too soon
I touched on Erwin Chargaff's 1975 essay The Fever of Reason previously as chance may have it almost exactly a year ago, from its inclusion in his collected works 'Heraclitean Fire: Sketches from a Life Before Nature'.
He really is a wonderful writer (though doesn't fancy himself as 'an example for younger scientists to follow').
There exist mysterious links between language and the human brain ; and the heartless and brutal way in which language is used in our times, as if it were only a power tool in public relations, a shortcut from sly producer to gullible consumer, has always seemed to me the most threatening portent of incipient bestialization. It is frightening to observe that a progressive aphasia, not organically determined, appears to overtake large numbers of people, especially in this country, who seem to be unable to express themselves except by hoarse barks and (undeleted) expletives. The gift of tongues, not explainable on the basis of natural selection, is the true attribute of Menschwerdung (hominization); and it is only fitting that it is revoked shortly before the tails are beginning to grow.
As well as scientific ethics, this essay discusses being an amateur, being an outsider, disdain for complacency, and the 'extreme pedigree consciousness' of the sciences. He admits to 'avid extracurricular reading', dropping references from Kierkegaard to Nietzsche and Nabokov.
Only science has become complacent in our times; it slumbers beatifically in euphoric orthodoxy, disregarding with disdain the few timid voices of apprehension. These may, however, be the forerunners of horrible storms to come.
On America:
out of that threatening continent, somber and dehumanized, there seemed to arise a wind of the freedom of the absurd.
I'm rereading it on the occasion of Obama's visit to Hiroshima, as he calls for a "world without nuclear weapons" (full transcript of his speech posted here).
Seventy-one years ago, on a bright cloudless morning, death fell from the sky and the world was changed. A flash of light and a wall of fire destroyed a city and demonstrated that mankind possessed the means to destroy itself.
Why do we come to this place, to Hiroshima? We come to ponder a terrible force unleashed in a not-so-distant past. We come to mourn the dead, including over 100,000 Japanese men, women and children, thousands of Koreans, a dozen Americans held prisoner.
Their souls speak to us. They ask us to look inward, to take stock of who we are and what we might become.
...in the image of a mushroom cloud that rose into these skies, we are most starkly reminded of humanityâs core contradiction. How the very spark that marks us as a species, our thoughts, our imagination, our language, our toolmaking, our ability to set ourselves apart from nature and bend it to our will â those very things also give us the capacity for unmatched destruction.
How often does material advancement or social innovation blind us to this truth? How easily we learn to justify violence in the name of some higher cause.
I'm not sure what the general public thinks of when they think of biology's place at the 'nexus' Chargaff writes of, that murderous potency of the life sciences. I'd say it's a dynamic that plays out at the level of biology's place in society - a much more casual one than nuclear physics or astrophysics for example (the world of explosions, bombs, and space flights).
It can be the way biochemical technology or biological information is framed, used, legalised, privatised, banned, or makes it to market, and the social role proposed for cures, poisons, tests and knowledge of one's molecular identity.
Birth control for example, a liberatory biochemical technology for women as regards their reproductive rights, is in modern times showing some of this more concerning nature, in the dubious way liberal media suggests it can 'solve poverty' (the Washington Post have been advocating this line from 2013 to 2015). A piece in the New Republic late last year called it a 'liberal obsession'.
Multimillionaire Ezra Klein (founder of media company Vox) thinks birth control is poverty's "magic solution" (linking to this piece).
Similar uncomfortable solutionism exists in the debate around an up-and-coming biochemical technology targetting gay men, 'PrEP' (pre-exposure prophylaxis) - in fact coverage tends to explicitly make the link to birth control. Through refusing to prescribe this freely on the National Health Service, the UK government recently stood accused of being implicit in the seroconversion of the LGBTQ (among whom the non-white and lower-class are at greater risk, making it not only a homophobic but a racial and classist hate crime in the view of campaigners).
Yet opponents say such 'safeguarding' prescriptions would lead to a rise in unsafe sex, antagonising the crisis of antibiotic-resistance, antiretroviral-resistant HIV, and other unforeseen side effects from an altered culture (no microbial pun intended). Coverage of a HIV prevention study (Lo et al., 2016) in the Guardian today addresses the importance of the broader context for biomedical interventions in this domain:
Loâs paper âreminds us that itâs vital not to look for quick fixes in HIV preventionâ, she said. On the contrary, fighting HIV transmission requires long-term interventions that take into account the sexual norms and practices, and the social and religious context of people who are vulnerable to HIV, ânot just the mechanics of how one becomes infectedâ, Hand said.
None of biological science's grim scenarios are "the end", none nearly as catastrophic as nuclear armageddon â though to read antibiotic resistance headlines it feels we push closer to it each year. Last Christmas, Nature warned of the creeping threat of "bacterial apocalypse", and unlike nuclear proliferation shows little sign of abating.
Some further reading
(but do go read Chargaff's essay!)
In the news: The UK Medical Research Council announced last week some ÂŁ10 million to tackle antibiotic resistance, at the universities of Bristol, Leeds and Sheffield. Read about the projects here
Vogwill, Kojadinovic, and MacLean (2016) Epistasis between antibiotic resistance mutations and genetic background shape the fitness effect of resistance across species of Pseudomonas â finds "genetic background is a key determinant of the fitness costs of antibiotic resistance"
Allen and Waclaw, preprint submitted 19 May 2016: Antibiotic resistance: a physicist's view
This cracking video fresh from presentation at the London Calling Oxford Nanopore conference by Zamin Iqbal, showing real-time genome analysis of samples to detect resistance fast enough to tailor treatment regimes to patients (48 hours rather than partial info over weeks).
A pleasant little note in the journal IEEE Transactions on Information Theory by Claude Shannon (1956) - The Bandwagon. He offers words of caution to the field which exhibit parallels to that around machine learning today.
Seldom do more than a few of nature's secrets give way at one time.
Itâs been a dark couple of days since the passing of University of Cambridge scholar, Regius Professor of Engineering and former Chief Scientific Advisor of the Department of Energy and Climate Change Sir David Mackay, FRS. David succumbed to stomach cancer on Thursday the 14th of April this week, aged 48, in Addenbrookeâs Hospital (Cambridge, England). The details from bemused diagnosis to final writings were communicated [through his blog](http://itila.blogspot.co.uk/), whose URL is an acronym of _Information, Theory, Inference, and Learning Algorithms_ - the title of [his first book](http://www.inference.phy.cam.ac.uk/mackay/itila/). Its headline - _Everything Is Connected_ - was both a reference to this central theme, and a permit to intermingle the rest of his life within its posts. > One of the themes of that book was that everything is connected. Not only information theory and machine learning, which are two sides of a single coin; but also communication, data compression, evolution, sex, satellites, discdrives, solitons, thermodynamics - pretty much any topics you care to mention. ITILA is a textbook about information theory whose goals include bringing out these connections, and making information theory and statistics fun.
A graph of the 'dependencies' of the material in Information Theory, Inference, and Learning Algorithms.
I was honoured to meet him for lunch once late last year, though reading that he would schedule it around his chemotherapy appointments left a lump in the throat. His passing yards from where I once slept and worked, and the tributes from those I look up to in the field bring a certain resonance to this headline.
His book on climate change (Without Hot Air) was constructed under the careful visual traditions of Edward Tufte (another statistician and computer scientist who became dedicated to the betterment of the public sphere), made freely available, and led the discourse on effective action against climate change. Bill Gates called it "one of the best books on energy that's been written. If someone is going to read just one book I would recommend this one". He was a model public scientist, and his separately compartmentalised technical writings on statistics and information are simply elegant.
He lectured me on information theory in a dream last night, and when I opened my eyes the very first thing I saw on my phone was a tweet from one of the figures I look up to repeating the news of his passing. I feel like I'll be carrying his teachings (along with an insufficient sense of their implications) through life, but it's just too soon to read many more of them at present.
Professor Sir David J. C. Mackay FRS, 22/4/1967 - 14/4/2016
I've mentioned him here twice before, both written before having met him. Once after watching his excellent introduction to Gaussian Processes, and once within a discussion of models of parsimony in systems biology (related to the book, ITILA).
On building a Gaussian distribution
On parsimony in systems biology
My first contact with his work came via a talk on enzyme thermodynamics he gave with his elder brother, Robert S Mackay, entitled "How does work work".
His PhD thesis, submitted at CalTech in 1991, was on Bayesian Methods for Adaptive Models, available online.
In 1996, he wrote a letter to Nature, on Bayesian statistics (which I've discussed here in recent weeks): "The pope is (probably) not an alien".
I'm not certain of the full technical implications of the word, but I've a feeling the headline of his blog is also related to 'connectionism'. Around the time of the NIPS conference discussed here, a mailing list sharing a manuscript of his opened "Dear Connectionists".
Obituaries and press:
Telegraph, Professor Sir David MacKay, physicist
The Register, Brit AI daddy Sir David MacKay dies: Polymath rebooted debate on climate change, co-founded software biz
Climate Home, David MacKay: âIf everyone does a little, weâll achieve only a littleâ
Athene Donald's touching tribute, RIP Sir David Mackay
Mark Lynas's, What David MacKay taught me, and taught us all
I mentioned a bunch of statistics resources in my post on Bayesian statistics the other day, but perhaps so many as to be overwhelming. For linear algebra specifically to be used for statistics (i.e. if your interest in more advanced mathematics is for scientific application in experimental design and interpretation), [James Gentle](http://mason.gmu.edu/~jgentle/)'s _Numerical Linear Algebra for Applications in Statistics_ (outline [here](http://mason.gmu.edu/~jgentle/books/nlabk.htm)) is a good introductory text (it doesn't go through proofs of each equation). [David Cox](http://dacox.people.amherst.edu/iva.html), [John Little](http://math.holycross.edu/~little/homepage.html), and [Don O'Shea](http://www.ncf.edu/about-the-president/) have an approachable '_Introduction to computational algebraic geometry and commutative algebra_' (not as bad as it sounds!) which begins to build on such linear algebra, for 'algebraic geometry': [_Ideals, Varieties, and Algorithms_](http://link.springer.com/book/10.1007%2F978-3-319-16721-3) (4th ed. at the link; 3rd ed. full text [here](http://www.dm.unipi.it/~caboara/Misc/Cox,%20Little,%20O'Shea%20-%20Ideals,%20varieties%20and%20algorithms.pdf)). I took some time to work through the first chapter in detail motivated by the material in Pachter and Sturmfels's [_Algebraic Statistics for Computational Biology_](http://yaroslavvb.com/papers/pachter-algebraic.pdf), and found it quite striking that: > The ability to regard a [polynomial](https://en.wikipedia.org/wiki/Polynomial) as a function is what makes it possible to link algebra and geometry. If you've ever drawn `y = x` as a straight diagonal line, or any more complicated function, then you [implicitly] understand this, but the implications are far-reaching (and not restricted to the 2D [âaffineâ](https://en.wikipedia.org/wiki/Affine_plane) plane). * y = f(x) is an [affine variety](https://en.wikipedia.org/wiki/Affine_variety) written as V(y â f(x)) Books like Cox's demystify the language around algebraic geometry (such as â[field](https://en.wikipedia.org/wiki/Field_(mathematics)â), which can mean something like "all the [Real numbers](https://en.wikipedia.org/wiki/Real_number)") while Gentle's applies the concepts to the real world (that's lower case r!) where fields are often hardware-defined - computers use [floating points](https://en.wikipedia.org/wiki/Floating_point) to represent decimal numbers for example.
Not to be confused with neuroscience/epigenetics PhD-holding electronic music act Floating Points
Essentially, biology should be taught with [more] statistics, and statistics nowadays is much enabled by/benefits from open-source/extensible computational systems like R in environments like RStudio, so an introductory course in statistical computing becomes an unwritten part of the biology curriculum. None of this is all that obvious to the average science student, so I hope I can help make things clearer through the occasional mealy-mouthed blog post like this...
See also
Recent post -- Conditional probability and Bayes rule
Edit â The blogger under the alias 'luysii' has funnily enough just written a post on this book a few weeks after mine, here: High level mathematicians look like normal people. Ideals, Varieties and Algorithms is apparently known as CLO after the initials of its authors, and was chosen for the Leroy P. Steele Prize for Mathematical Exposition, "because it is a rare book that does it all" (press release here) ! > It is accessible to undergraduates. It has been a source of inspiration for thousands of students of all levels and backgrounds. Moreover, its presentation of the theory of Groebner bases has done more than any other book to popularize this topic, to show the powerful interaction of theory and computation in algebraic geometry, and to illustrate the utility of this theory as a tool in other sciences.
When evaluating a therapeutic strategy based on CRISPR/Cas9, it is critical to understand that not only will HIV-1 be eliminated from latently infected cells, but the majority of uninfected cells will become resistant to HIV infection. Thus, there is a high likelihood that rebounding viral infections will be contained by the resistant cells. Still, some formidable challenges remain before this type of strategy can be implemented. First, it will be important to maximize elimination of viral sequences from patients. This will require analysis of the HIV-1 quasi-species harbored by patientsâ CD4+ T-cells and design of suitable, i.e. personalized CRISPRs. Second, improved delivery of CRISPR/Cas9 will be required to target the majority of circulating T-cells. In summary, our novel ex vivo findings that our lentiviral delivery-based approach reduced HIV-1 DNA copy numbers and protein levels in PBMCs of HIV-1 infected patients provides strong proof-of-concept evidence that CRISPR/Cas9 can be effectively utilized as part of HIV Cure strategies.
Kaminski et al., Elimination of HIV-1 Genomes from Human T-lymphoid Cells by CRISPR/Cas9 Gene Editing
A fascinating paper, using CRISPR rather than other gene editing techniques such as TALEN (as done recently in Karpinski et al.). CRISPR's guide RNA is used to target an HIV promoter sequence that all viral isolates share, and was found to remove HIV from multiple chromosomes. Impressively "both the integrated as well as pre-integrated, free-floating intracellular HIV-1 DNA are edited by Cas9/gRNA", as well as rendering cells resistant as mentioned above.
Further work seeks to compare the technique on 'naĂŻve' and antiretroviral-treated tissue (controls vs. patients undergoing treatment) to
determine whether or not, in the context of ART, the virus enters into the latent stage and remains responsive to CRISPR/Cas9. Of note, results from these ex vivo studies using ART treated patient PBMCs and CD4+ T-cells show that CRISPR/Cas9 effectively suppresses viral replication by introducing InDel mutations.
Back in 2013, Ebina et al. debuted the idea in the same journal, at a time when the processes involved were less well understood (such as prediction of off-target effects).
Perhaps the most important finding in this study is that we could excise provirus from the host genome of HIV-1 infected cells, which may provide a ray of hope to eradicate HIV-1 from infected individuals. However, there are numerous hurdles that must be cleared before utilizing genome editing for HIV-1 eradication therapies such as gene therapy. First, the efficiency of genome-editing and/or proviral excision should be quantified in HIV infected primary cells, including latently infected CD4+ quiescent T cells. Second, an efficient delivery system must be developed. Fortunately, the CRISPR/Cas9 system has the advantage in size compared with TALENs. Thus, the CRISPR system has the potential to be delivered by lentivirus vectors, whereas TALENs do not because of their large size and repeat sequences. The final hurdle concerns possible off-target effects, which are pertinent concerns for all genome-editing strategies that may lead to nonspecific gene modification events. If Cas9 has off-target effects, then removal of the off-target activity may be the best approach before utilizing CRISPR/Cas system for anti-HIV treatment.
Kaminski's group sure enough did use lentiviral hosts for the CRISPR/Cas system, using one lentivirus to boot out another - how strange.
My last post briefly touched on Needleman-Wunsch and Smith-Waterman dynamic programming techniques, for what I thought was a good, accessible explanation of how the algorithms work under the hood (for comparison to BLAST, which built off their foundation).
This plot, from a new paper from Mark Gerstein's group at Yale really shows how far the field has moved on from these early works however -- Muir et al. (2016) The real cost of sequencing: scaling computation to keep pace with data generation.
Alignment tools have co-evolved with sequencing technology to meet the demands placed on sequence data processing. The decrease in their running time approximately follows Moore's Law (Fig. 3a). This improved performance is driven by a series of discrete algorithmic advances. In the early Sanger sequencing era, the Smith-Waterman and Needleman-Wunsch algorithms used dynamic programming to find a local or global optimal alignment. But the quadratic complexity of these approaches makes it impossible to map sequences to a large genome. Following this limitation, many algorithms with optimized data structures were developed, employing either hash-tables (for example, Fasta, BLAST (Basic Local Alignment Search Tool), BLAT (BLAST-like Alignment Tool), MAQ, and Novoalign) or suffix arrays with the Burrows-Wheeler transform (for example, STAR (Spliced Transcripts Alignment to a Reference), BWA (Burrows-Wheeler Aligner) and Bowtie).
In addition to these optimized data structures, algorithms adopted different search methods to increase efficiency. Unlike Smith-Waterman and Needleman-Wunsch, which compare and align two sequences directly, many tools (such as FASTA, BLAST, BLAT, MAQ, and STAR) adopt a two-step seed-and-extend strategy. Although this strategy cannot be guaranteed to find the optimal alignment, it significantly increases speeds by not comparing sequences base by base. BWA and Bowtie further optimize by only searching for exact matches to a seed. The inexact match and extension approach can be converted into an exact match method by enumerating all combinations of mismatches and gaps.
Their paper is open access and covers plenty of ground - from new approaches like Salmon ('lightweight' alignment, using estimated position of reads on the genome) and Kallisto ('pseudoalignment', determining theoretical compatibility of reads rather than aligning) -- as described by Rob Patro over on his blog last year -- to discussion of budgets, Big Data, and observations on Moore's and (a new one to me) Kryder's laws.
It echoes Ewan Birney's call for life science infrastructure during his visit to the U.S. National Human Genome Research Institute, where he compared the state of biology and its information services against the monument to research architecture supplying CERN. Muir and colleagues suggest a 'biomedical cloud' on which datasets, analysis programs, and so on were stored on, rather than the commercial (and regularly failing) commercial options that have become the norm.
Academic/scientific-specialised systems exist, but a meltdown just last night in a similar package manager, npm, as has been pointed out, highlights the liabilities such systems open up in the long term.
Npm blog, official response: kik, left-pad, and npm
Ars Technica: Rage-quit: Coder unpublished 17 lines of JavaScript and "broke the Internet"
The Verge: How an irate developer briefly broke JavaScript
# Conditional probability and Bayes' rule Above are my notes on one particularly fundamental stats topic which (amazingly) was never formally taught on my undergraduate or Masters-level biostatistics courses â [conditional probability](https://en.wikipedia.org/wiki/Conditional_probability) and [Bayes' rule](https://en.wikipedia.org/wiki/Bayes%27_rule). Despite the name, it's widely agreed that Bayes' theorem should be credited to [Pierre-Simon Laplace](https://en.wikipedia.org/wiki/Pierre-Simon_Laplace). In closing his _Philosophical Essay on Probabilities_, Laplace would write breathlessly: > One sees in this essay that the theory of probabilities is basically only __common sense reduced to a calculus__. It makes one estimate accurately what right-minded people feel by a sort of instinct, often without being able to give a reason for it. It leaves nothing arbitrary in the choice of opinions and of making up one's mind, every time one is able, by this means, to determine the most advantageous choice. Thereby, it becomes the most happy supplement to ignorance and to the weakness of the human mind. > If one considers the analytical methods to which this theory has given rise, the truth of the principles that serve as the groundwork, the subtle and delicate logic needed to use them in the solution of the problems, the public-benefit businesses that depend on it, and the extension that it has received and may still receive from its application to the most important questions of natural philosophy and the moral sciences; if one observes also that even in matters which cannot be handled by the calculus, it gives the best rough estimates to guide us in our judgements, and that it teaches us to guard ourselves from the illusions which often mislead us, one will see that __there is no science at all more worthy of our consideration, and that it would be a most useful part of the system of public education__. A grasp of Bayes' theorem [has been described as](http://www.stat.cmu.edu/~kass/papers/about-bayes-rule.pdf) _a deep aesthetic experience and a pragmatic recognition of profound consequences_: > Mathematical scientists often sense a combination of harmony and power in certain formulas... Bayesâ Theorem gives such a formula. It says there is a simple, elegant way to combine current information with prior experience in order to state how much is known. It implies that sufficiently good data will bring previously disparate observers to agreement. It makes full use of available information, and it produces decisions having the least possible error rate. > Bayesâ Theorem is awe-inspiring, but when people are captivated by its spell they tend to proselytize, and become blinded to its fundamental vulnerability: __although most great equations of science are descriptive, the Bayesian use of Bayesâ Theorem is different, it is prescriptiveâsuggesting how scientific inference should be doneâand it requires strong assumptions; its magical powers depend on the validity of its probabilistic inputs__. Hopefully I've annotated the above notes clearly enough to communicate the power of the concept, and the discussion that follows will give due diligence to its 'vulnerability'. Regarding notation, you will also find the '[intersection](https://en.wikipedia.org/wiki/Intersection_(set_theory))' symbol (âŠ) replaced with a comma elsewhere, as in: P(A|B) = P(A, B) / P(B), notably in the [chain rule](https://en.wikipedia.org/wiki/Chain_rule_(probability)).
Both notations indicate a joint distribution (explained here).
These notes were made from a lecture on the Statistical Inference course with Biostatistics professor Brian Caffo, who calls the median as it's usually conceived "an estimator" for a "sample quantity" (i.e. for a given set of numbers), as distinct from the "population median" for "the target of estimation, the estimand". The course focusses on this connection between 'populations' (in an abstract sense) and sampling upon them through probability models.
YouTube has the course videos here, Brian's talks being a brief taste of another Coursera series, Mathematical Biostatistics Boot Camp 1.
If videos aren't your thing (or you just want further introduction), there's a nice beginner's example from Gelman et al.'s textbook Bayesian Data Analysis here, for a spell checker, using probabilities allegedly supplied by Google researchers. Gelman writes that he likes this particular example as:
> The models arenât just specified as a mathematical exercise, they represent some statement about reality. And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities... it demonstrates how Bayesian data analysis works in the context of probabilities (both the âlikelihoodâ and the âpriorâ) that were constructed from data. I like having an intro example that goes beyond simple data models such as the binomial and simple prior information such as âp has to be somewhere between 0 and 1â.
Comments regarding this example note that in a sense, it's not actually Bayesian:
> These applications of Bayesâ rule to assess and update relative frequencies of events are also available to a ânon-Bayesianâ. So there seems no disagreement there. As Wasserman emphasizes, thereâs a difference between using Bayesâ rule and being a Bayesian about statistical inference.
> ...In fact, most inference for these kinds of problems is not Bayesian, at least in the sense of treating parameters probabilistically and using posterior uncertainty in inference. They estimate with posterior modes, or what a ânon-Bayesianâ might call penalized maximum likelihood. But thatâs mainly for computational reasons, not matters of principle.
> I also find it deeply ironic that ânaive Bayesâ is almost never done with Bayesian inference, leading to all sorts of confusion in the machine learning literature and very confusing names like âBayesian spam filteringâ.
I wanted to read a review of methods for computational biology the other day â "Features of ChIP-seq data peak calling algorithms with good operating characteristics" â and looking up one of the parameters for comparison (the F score) led me to some loose ends in my understanding of statistics.
Biologists' mathematical education has traditionally been neglected among the sciences at universities, though there do exist paths for the maths-curious explorer.
A good map of the territory can be found in Mark Gerstein's suggested graduate and undergraduate-level syllabus for computational biology .
All of Statistics: A Concise Course in Statistical Inference by Canadian statistician Larry Wasserman is a solid guide "suitable for graduate students in computer science... and useful for students beginning graduate work in statistics", "for people who want to learn probability and statistics quickly", many results are stated without proof for brevity, and R code is provided to run analyses - I think this is a good fit for some types of biology student too.
> Students who analyze data, or who aspire to develop new methods for analyzing data, should be well grounded in basic probability and mathematical statistics. Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid.
A less mathematically formal, and more biology-oriented option [used for all graduate life scientists at my university] is Statistics: An Introduction Using R by Michael Crawley, professor of plant ecology at Imperial College London.
In the following post I'll discuss the state of Bayesian and frequentist stats in biological research, and round off with an explanation of F-scores.
Genomics is a fundamental technology in the life sciences, and the plain fact of the matter is it requires statistical training to design experiments for (same goes for all the other 'omics). A particularly nice review on the occasion of Sir David R. Cox's 80th birthday, A Cox Model for Biostatistics of the Future notes:
> The last decade has seen a strong shift towards the use of Bayesian inference in many branches of statistics including biostatistics. Whether this is because statisticians have been convinced by the philosophical arguments, or because the computing revolution and the development of Monte Carlo methods of inference has apparently reversed the historical computational advantage which classical methods held over their Bayesian rivals, is less clear. For a thought-provoking discussion of likelihood-based inference, including a critique of Bayesian inference, see Royall, 1997, Statistical Evidence: A Likelihood Paradigm.
A fortnight ago I wrote of how a new breed of DNA sequencer uses neural nets to identify its A's, C's, G's and T's, but natural language processing and machine learning are being used throughout biological research, often leaning on Bayesian foundations.
Roger Grosse (University of Toronto) gives a background on Metacademy:
> Bayesian statistics is a branch of statistics where quantities of interest (such as parameters of a statistical model) are treated as random variables, and one draws conclusions by analyzing the posterior distribution over these quantities given the observed data. While the core ideas are decades or even centuries old, Bayesian ideas have had a big impact in machine learning in the past 20 years or so because of the flexibility they provide in building structured models of real world phenomena. Algorithmic advances and increasing computational resources have made it possible to fit rich, highly structured models which were previously considered intractable.
One of the papers I really enjoyed last year, on a machine learning web server for identification of molecular recognition features (MoRFs) in a given protein sequence, has returned with a fresh Bayesian look: the Gsponer lab now use a "hierarchical application of Bayesâ rule" to score candidate MoRFs, as explained in this excellent video, from the code's author Nawar.
Three distinct property scores are now computed for the amino acid sequence, and used as 'component predictors' (for terminology see this 'meta-predictor' paper), measuring sequence similarity, stretches of intrinsic disorder, and residue conservation: combined through a hierarchical Bayesian statistical model written in C++ (source available here).
Two SVM [machine learning classifier] kernels are used, one sigmoid and one Gaussian (some explanation here), and the highest quality dataset was kept for final verification. Despite a slow multiple sequence alignment step for the third propensity score component, their web server is still faster than all existing services by their benchmarks, in addition to giving higher quality MoRF predictions. Really nice work.
Demis Hassabis, CEO and co-founder of Google DeepMind, writing about the AlphaGo tournament this week, hinted the biomedical space was one of their next targets [in disease diagnosis]. Toward this end, they launched DeepMind Health in February, describing a collaboration within the UK's National Health Service at a Royal Society of Medicine-hosted lecture (video here). The Streams app for practitioners was said "to integrate both detection and task management into a single platform", using machine learning and AI. Applications to similar logistics in life sciences research may not be far off.
âTodayâs posterior is tomorrowâs priorâ
Alex Etz has a newbie-accessible series of writings, called Understanding Bayes which follows on from his manuscript guide: How To Become A Bayesian. While these are useful in some ways, if you're clueless about cognitive modelling it can be hit and miss experience looking through these resources. As a biosciences student, it definitely seems like I'm crashing a 'behavioral and social science' studies party.
Some of the core readings Alex points newcomers towards:
On Lindley (1993) The analysis of experimental data: The appreciation of tea and wine:
> A key takeaway from this paper is that Lindleyâs approach depends only on the observed data, so the results are interpretable regardless of whether the sampling plan was rigid or flexible or even known at all. Another key point is that the Bayesian approach is inherently comparative: Hypotheses are tested against one another and never in isolation. Lindley further concludes that, since the posterior probability that the null is true will often be higher than the p-value, the latter metric will discount null hypotheses more easily in general.
Rouder et al. (2009) Bayesian t-tests for accepting and rejecting the null hypothesis:
> present two related critiques of classic null-hypothesis significance tests: (1) They do not allow researchers to state evidence for the null hypothesis, and, perhaps more importantly, (2) they overstate the evidence against the null hypothesis...
I'll be highlighting various resources within the discussion here so do take some time to check out the references along the way, as well as the Further reading section at the end (and as always, comments are open if you have any to share).
Not just a passing Bayes
John Horgan is a science writer whose 1993 Scientific American article "The Death of Proof" (better read as a PDF) called out "heresies" in science, whereby "some mathematicians are challenging the notion that formal proofs should be the supreme standard of truth".
> Another catalyst of change is the computer, which is compelling mathematicians to reconsider the very nature of proof and, hence, of truth. In recent years, some proofs have required enormous calculations by computers. No mere human can verify these so-called computer proofs, just other computers. Recently investigators have proposed a computational proof that offers only the probabilityânot the certaintyâof truth, a statement that some mathematicians consider an oxymoron.
> Although no one advocates doing away with proofs altogether, some practitioners think the validity of certain propositions may be better established by comparing them with experiments run on computers or with real-world phenomena. "Within the next 50 years I think the importance of proof in mathematics will diminish," says Keith Devlin of Colby College, who writes a column on computers for Notices of the American Mathematical Society. "You will see many more people doing mathematics without necessarily doing proofs."
> ⌠Some workers are complaining bitterly about the computerization of their field and the growing emphasis on (oh, dirty word) "applications." One of the most vocal champions of tradition is Steven G. Krantz of Washington University. In speeches and articles, Krantz has urged students to choose mathematics over computer science, which he warns could be a passing fad. Last year, he recalls, a National Science Foundation representative came to his university and announced that the agency could no longer afford to support mathematics that was not "goal-oriented." "We could stand up and say this is wrong," Krantz grumbles, "but mathematicians are spineless slobs, and they don't have a tradition of doing that."
> "Proofs are the only laboratory instrument mathematicians have," he remarks, "and they are in danger of being thrown out." Although computer graphics are "unbelievably wonderful," he adds, "in the 1960s drugs were unbelievably wonderful, and some people didn't survive."
> ...Ronald L. Graham of A T&T Bell Laboratories suggests that the trend away from short, clear, conventional proofs that are beyond reasonable doubt may be inevitable. "The things you can prove may be just tiny islands, exceptions, compared to the vast sea of results that cannot be proved by human thought alone," he explains. Mathematicians seeking to navigate uncharted waters may become increasingly dependent on experiments, probabilistic proofs and other guides. "You may not be able to provide proofs in a classical sense."
Science trivia interlude: The Death of Proof starts with a description of Andrew Wilesâs proof of Fermatâs theorem - which just this week he received an Abel prize: a stunning proof of Fermatâs Last Theorem by way of the modularity conjecture for semistable elliptic curves, opening a new era in number theory (Abel foundation press release and biography; UoOx, Nature, New Scientist, Guardian). The Abel prize is often called âthe mathematicianâs Nobelâ [as is the Fields medal, though Marcus Du Sautoy notes that only tends to go to under-40s meaning Wiles wouldnât have been eligible].
The piece also features Berkeley mathematician William Thurston (1982 Fields Medal recipient for showing interplay between analysis, topology, and geometry and the idea that a very large class of closed 3-manifolds carry a hyperbolic structure), a chat with Stephen Wolfram (of Wolfram Alpha fame, just 5 years after Mathematica debuted) and ventures into the beautiful chaos of Julia and Fatou sets.
The Death of Proof annoyed many mathematicians for claiming with the confidence of a few interviews that their field was becoming a lax shadow of its former self, and that their intellectual endeavours were all but obsolete in the cold light of Kurzweil's Singularity (back when it was still known as I. J. Good's 'intelligence explosion').
His subsequent book "The End Of Science" rocked a still bigger boat, declaring science in general had seen better days, and no 'big' advances would be forthcoming in the modern era. Thomas Kuhn's 1962 Structure of Scientific Revolutions was enlisted to serve this thesis (which suggested scientific theories are accepted as social phenomena rather than on a Platonic ideal Objective Truth), a 'postmodernist attack' ft. Chomsky.
He wrote a recollection of The Death of Proof a year ago, for so-called Pi Day, and a fascinating history of the controversy last year in Scientific American: Was I Wrong about "The End of Science"? (with endosymbiotic theorist Lynn Margulis making a cameo).
Horgan penned a fresh polemic in January this year, hanging Bayes' rule out to dry, along with its supporters whom he paints as 'cultish', guilty of aiding the enemies of reason, and becoming "too pervasive to ignore".
Its language is plainly hostile â with generalisations that arguably aren't conducive to balanced reporting, much less a nuanced discussion of the merits or flaws of Bayesian reasoning.
There's one part I found particularly bothering, that implies Donald Rubin is a statistician gone rogue, for his role testifying for the tobacco industry (for technical expertise, not as a lawyer - there are ethically neutral factors that may compel a scientist/statistician to do so). Rubin wrote openly about his ethical dilemma:
> Personally, I have experienced essentially no hostility at any of these presentations [on consulting work at academic meetings], although at times there has been substantial hostility toward the tobacco industry, which I do not combat. I am defending the importance of honest and competent statistics, that is all.
Never mind either, the thoughtful response Gelman (Rubin's former PhD student) gave to the matter in 2005. Horgan concludes from Rubin's "prominence" that this alleged corruption reflects the deceitful nature of Bayesian statistics itself.
R. A. Fisher was an ardent frequentist - he invented the p-value in his 1925 work Statistical Methods for Research Workers - who loathed Bayesian reasoning, and forced it out of the National Institutes of Health by Sharon McGraynes' account (see Further reading). Like Pearson (who formulated hypothesis testing in the 1920s), Galton (Pearson's mentor), and numerous other founding fathers of formal (bio)statistics, Fisher was a passionate eugenicist, becoming increasingly devoted to furthering American eugenics from around 1909 (the year in which his final report for President Rooseveltâs commission on health and longevity gave a chapter to the âquestion of race improvement through heredityâ). He withdrew from propaganda efforts during the 1930's, denouncing an anti-Semitic radio broadcaster in 1938 - such is the history of biological research.
It's clear to anyone not in the midst of taking a cheap shot that the use, or defence, of Fisher's statistical work does not equate to support of the ideologies he held. Fisher was also a chain smoker, who consulted for the tobacco industry and notably insisted on the inability to show causal link between cigarettes and cancer - and yet his statistical work is sound; so sound that in fact many hold a great deal of respect for his work.
This was driven home for me when I realised statistician and evolutionary biologist A. W. F. Edwards's mention of "apologists for Fisher" while reviewing Royall's Statistical Evidence in 2000, was actually referencing the dispute between the 'classical' traditionalists and Bayesians (see Further reading) (not his moral failings).
> No apologist for Fisher will deny that he introduced the notion of a null hypothesis and the associated tests of significance as repeated-sampling procedures, but the modern use, or misuse, of these to represent evidence is quite alien to most of his thinking, and it should in any case be remembered that in the early days of their development Fisher still felt that it was going to be possible to develop a unified theory of statistics centred on 'exhaustive estimation' which would have justified attaching evidential meaning to significance tests.
(I'll get to these issues later). Allen Downey, author of Thinking Stats and Thinking Bayes, gives Horgan the benefit of the doubt, and interprets his article as expressing the importance of Cromwell's rule, which Lindley (1980, p.29) named after Oliver Cromwell's 1650 plea to the Scottish church: "I beseech you, in the bowels of Christ, think it possible you may be mistaken". It can be paraphrased as hard convictions are insensitive to counter-evidence: "logic, as distinct from experience, can make a probability zero, or one", i.e. the prior p(θ) â (0,1): > Cromwell's rule is relevant when we consider the relationship between a Bayesian view of the world and the reality of that world that he learns by experience. We have seen that with a complete probabilistic description p(X|H), with X = (X1,X2) and X2 observed, the experience is translated into p(X1|X2,H), so that he need only update his probabilities according to the rules and no rethinking, only calculation is needed. This may be unsatisfactory as the following example shows.
The requirement Lindley named Cromwell's rule was termed 'regularity' in Carnap's 1971 Basic System of Inductive Logic (part I).
Other statisticians were less generous, and read the piece as rushed, becoming derailed on elementary misunderstandings, veering off into 'nonsense' - a regrettable thing for popular science communication to be called by those it seeks to represent.
> > The potential for Bayes abuse begins with P(B), your initial estimate of the probability of your belief, often called the "prior." In the cancer-test example above, we were given a nice, precise prior of one percent, or .01, for the prevalence of cancer. In the real world, experts disagree over how to diagnose and count cancers. Your prior will often consist of a range of probabilities rather than a single number.
> > In many cases, estimating the prior is just guesswork, allowing subjective factors to creep into your calculations. You might be guessing the probability of something thatâunlike cancerâdoes not even exist, such as strings, multiverses, inflation or God. You might then cite dubious evidence to support your dubious belief. In this way, Bayes' theorem can promote pseudoscience and superstition as well as reason.
> The problem he's talking about is, to use a cliche, not a bug but a feature. When the evidence doesn't prove, with mathematical certainty, whether a statement is true or false (i.e., pretty much always), your conclusions must depend on your subjective assessment of the prior probability. To expect the evidence to do more than that is to expect the impossible.
> In the example Horgan is using, suppose that a cancer test is given with known rates of false positives and false negatives. The patient tests positive. In order to interpret that result and decide how likely the patient is to have cancer, you need a prior probability. If you don't have one based on data from prior studies, you have to use a subjective one.
> The doctor and patient in such a situation will, inevitably, decide what to do next based on some combination of the test result and their subjective prior probabilities. The only choice they have is whether do it unconsciously or consciously.
> The second paragraph quoted above is simply nonsense. If you apply Bayesian reasoning to any of those things that may or may not exist, you will reach conclusions that combine your prior belief with the evidence. I have no idea in what sense doing this "promote[s] pseudoscience." More importantly, I have no idea what alternative Horgan would have us choose.
> Here's the worst part of the piece:
> > Embedded in Bayes' theorem is a moral message: If you aren't scrupulous in seeking alternative explanations for your evidence, the evidence will just confirm what you already believe. Scientists often fail to heed this dictum, which helps explains why so many scientific claims turn out to be erroneous. Bayesians claim that their methods can help scientists overcome confirmation bias and produce more reliable results, but I have my doubts.
> Horgan doesn't cite any examples of erroneous claims that can be blamed on Bayesian reasoning. In fact, this statement seems to me to be nearly the exact opposite of the truth.
Gelman responded to the accusation of the prior being 'guesswork', that it was "misleading in that all parts of a model are subjective guesswork. Or, to put it another way, all of a statistical model needs to be understood and evaluated". This echoes §3.2: Sources of uncertainty in A Cox Model for Biostatistics of the Future (see Further reading):
> Currently, statistical theory deals explicitly and very thoroughly with... the theory of statistical inference. In essence, inference allows us to say: âthis is what we have observed, but we might have observed something differentâ and to moderate our conclusions accordingly. To a limited extent, techniques such as Bayesian model-averaging, or classical nesting of the model of interest within a richer class of possible models, allow us to take account of uncertainty at the level of model formulation, but here the methodology is less well developed, and less widely accepted. Rather, it is generally accepted that model formulation is, at least in part, a subjective process and as such not amenable to formal quantification. All statisticians would surely agree that careful attention to design is of vital importance, yet our impression is that in the formal training of statistical graduates, courses on design typically occupy a very small fraction of the syllabus by comparison with courses on inference, modelling and, increasingly, computation. We predict that some relatively old ideas in experimental design, such as the construction of efficient incomplete block designs, will soon enjoy a revival in their importance under the perhaps surprising stimulus of bioinformatics, specifically gene expression data...
Gelman continued: "I object to the attitude that the data model is assumed correct while the prior distribution is suspect", then linking to a blog post of his, which recalled "the passionate battles" younger readers may not be aware of, from his 2013 paper on the 'perceived absurdity' of Bayesian inference (see Further reading):
> the missionary zeal of many Bayesians was matched, in the other direction, by a view among some theoreticians that Bayesian methods are absurdânot merely misguided but obviously wrong in principle. Such anti-Bayesianism could hardly be maintained in the present era, given the many recent practical successes of Bayesian methods. But by examining the historical background of these beliefs, we may gain some insight into the statistical debates of today. . .
Horgan wrote that before embarking on his exposĂŠ, he had encountered "wonkier" students using Bayes' theorem as "an almost magical guide for navigating through life", whose "rants" confused him, "as did [encyclopaedia] explanations" leading him to dismiss it as "a passing fad" (rather than perhaps, say, a complex academic topic warranting more intent study than a cursory glance over Wikipedia). Talk about prior beliefs...
It's curious (and surely exasperating for those whose work he writes so disparagingly of) that Horgan seeks to hold Bayesians accountable for pseudoscience and paranormal enthusiasts, given the many pages have been devoted to countering such far-fetched claims.
For what it's worth... my final impression of Horgan's piece was that it was useful on balance, though does surely confuse newcomers to the field (who, unfortunately, it is also pitched at). Taking the time to refute the points or read analogous discussions has unearthed much more intricacies than introductory examples, and keep thoughts about Bayesian models grounded in the real world - Horgan highlights how Bayesian thinking could alleviate "overdiagnosis and overtreatment for cancer and other disorders". His piece also made me consider the uses and limitations of the process of 'reasoning by analogy' we all use when we lack expertise in subjects - the process of learning through an ever-deepening (and thus never quite satisfactory) inventory of statistical and mathematical concepts essentially.
"What Jeff said." Rhetorical flair makes the world go around.
Will Kurt (who last week published a brilliant Bayesian take on supserstitious beliefs shown in an episode of The Twilight Zone) noted that Horgan should have consulted the paranormal-debunking perspective in E. T. Jaynes' 2003 text Probability Theory: The Logic of Science (freely available as PDFs via the author), tackled in-depth during chapter 5: Queer Uses for Probability Theory).
> Laplace perceived this phenomenon long ago. His Essai Philosophique sur les probabilitĂŠs (1819) has a long chapter on the 'Probabilities of Testimonies', in which he calls attention to the immense weight of testimonies necessary to admit a suspension of natural laws". He notes that those who make recitals of miracles, decrease rather than augment the belief which they wish to inspire; for then those recitals render very probable the error or the falsehood of their authors. But that which diminishes the belief of educated men often increases that of the uneducated, always avid for the marvelous ."
You can find the 6th edition (1840) to view here from archive.org â original French; Andrew Dale's English translation of the 5th edition (1995) is available via Springer.
In fact, Laplace discusses testimony throughout the essay, including several times just within the first chapter:
> The trajectory of a simple molecule of air or vapour is regulated in a manner as certain as that of the planetary orbits; the only difference between them is that which is contributed by our ignorance. Probability is relative in part to this ignorance and in part to our knowledge.
> ⌠It is thus that the same matter recounted before a large crowd of people, finds various degrees of belief according to the extent of the listeners' knowledge. If the man who reports it is deeply convinced of it {i.e. of its truth}, and if by his calling and character he inspires great confidence, his account, however extraordinary it may be, will have the same degree of likelihood {or plausibility} for ignorant listeners as an ordinary matter reported by the same man, and they will believe it implicitly. However, if anyone of the listeners has had occasion to hear the same matter is denied by other equally respectable men, he will doubt the truth of the report; and the matter will be judged false by well-informed listeners who deem it inconsistent, either with well-authenticated matters or with the immutable laws of nature.
> âŚFrom the preceding discussion, we ought generally to conclude that the more extraordinary a fact is, the more need it has of strong evidential support. For the probabilities that those who witness it may either deceive or be deceived increase as the probability of the reality of the fact decreases. This will become particularly noticeable when we come to speak of the probability of testimony.
Aubrey Clayton describes it in an hour-long exposition of Jaynes' 5th chapter. The main discussion is around the Soal-Goldney experiments - a rigorous statistical assessment of alleged paranormal ESP abilities - the conclusive point coming around the 49-51 minute mark on directives for effective magic and trickery (later noted to be generally applicable, from court room evidence, to political report interpretation, and the movements of the heavens).
The take-home message concerns the 'classical', standard statistical interpretation of data testing a hypothesised sixth sense. Horgan writes that Bayes' rule opens the door to belief in such nonsense, whereas in fact:
> According to the orthodox statistical techniques [covered in Jaynes' later chapters and Aubrey's subsequent videos] we would certainly have rejected the null hypothesis of chance under any of these scenarios, and we would have no other hypothesis to replace it with.
> So, an ordinary statistical significance test would just tell you 'well, you've got some very unlikely data: it must be that there's a real effect here, and the person probably has ESP, because we reject the null hypothesis and we don't know what hypothesis to replace it with'. In our framework, we do know what hypotheses to replace it with, and we can test against hypotheses in a particular well-defined class.
[
](https://www.youtube.com/watch?v=eUABtMhxJXI)
Will writes: 1, 2, 3, 4, 5
> One of the most important ideas that Jaynes has, which nearly all discussions of Bayesian reasoning miss: ALL probabilities are conditional. Likewise there is an assumption when attacking "guesswork priors" that by not using a prior at all we are somehow closer to the Truth.
> ## At a philosophical level, Bayesian reasoning is reasoning that admits we live in a subjective universe, and quantifies subjective views.
> ⌠basically, priors, explicit or not, have affected reasoning for all history, Bayesian stats at least let us quantify these.
See notes in the Further reading below on Bruno de Finetti's writing regarding Bayesian 'subjectivity'. I was quite surprised at the crossover statistics exhibits: the citation network of de Finetti and co. is an ornate web of particle physicists, philosophers (such as Rudolf Carnap mentioned earlier) and mathematicians (of the like of Turing), with what I'd previously considered 'my field' just a strand of geneticists, biostatisticians and bioinformaticians running through.
Other statisticians found the Sci. Am. piece "deeply confused⌠but wonderfully superficial", suggesting "the sudden popularity of Bayesian reasoning is partly caused by it having been ignored (at our peril) for so long".
p review
Epigraph to Gelman and Loken (2013), The garden of forking paths: Why multiple comparisons can be a problem, even when there is no âfishing expeditionâ or âp-hackingâ and the research hypothesis was posited ahead of time
There's been long-running criticism of p-value misuse in biological research (which is starting to feel like a bit of a broken record).
Back in February, the American Statistical Association produced a statement on p-values, intended to "[articulate] in non-technical terms a few select principles that could improve the conduct or interpretation of quantitative science, according to widespread consensus in the statistical community" as guidance for "researchers, practitioners and science writers who are not primarily statisticians":
> Underpinning many published scientific conclusions is the concept of âstatistical significance,â typically assessed with an index called the p-value. While the p-value can be a useful statistical measure, it is commonly misused and misinterpreted.
> ...Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.
These principles were:
> 1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
Media and academic blogs soon reacted to the statement, which raked back up long-simmering community tension over the importance of significance:
Nature news item, Statisticians issue warning over misuse of p-values (March 7th, 2016)
optimistically-titled news piece on FiveThirtyEight, Statisticians Found One Thing They Can Agree On: Itâs Time To Stop Misusing P-Values
An interview with ASA's executive director Ron Wasserstein at Retraction Watch (the scientific publishing watchdog/blog), explaining the statement through lines from The Princess Bride, and his desire for a âpost p<0.05â era
> In the post p<0.05 era, sound statistical analysis will still be important, but no single numerical value, and certainly not the p-value, will substitute for thoughtful statistical and scientific reasoning.
Plain English run-down of the ASA's 6 principles on the ever-excellent Molecular Ecologist community blog: A statement on p-values that approaches significance*
thoughtful response in opposition to simplistic readings of the statement from Tal Galili, the Tel Aviv University Statistics PhD student who runs the R-bloggers site
Two critical takes highlighting various assumptions behind the principles from Deborah Mayo: Donât throw out the error control baby with the bad statistics bathwater and âA small p-value indicates itâs improbable that the results are due to chance aloneâ âfallacious or not? (more on the ASA p-value doc)
The statement garnered a reply from 'meta-research' Stanford scholar John P. Ioannidis (a recent interview with Retraction Watch introduced him as best known for a 2005 paper âWhy Most Published Research Findings Are Falseâ though his work appears regularly in the press). I can't find the original of this report (anyone?) but The Scientist magazine quotes him as writing that:
> adding more statistical layers does not solve the problems of âhidden multiplicity and selective reporting biases.â Transparency â another of the ASAâs principles â is essential. âEfforts to promote transparency in study design, conduct and reporting may have more to offer in this setting than blaming P-values,â
Note that multiplicity here means the multiple comparisons problem
This week, Ioannidis published the results of a text-mining study in the Journal of the American Medical Association. His group had analysed all >12 million MEDLINE abstracts (a search engine-indexed subset of NCBI PubMed), along with all >843 thousand deposited full-text articles in the open access repository PubMed Central (PMC).
> A random sample of 1000 MEDLINE abstracts was manually assessed for reporting of p-values and other types of statistical information; of those abstracts reporting empirical data, 100 articles were also assessed in full text.
They found zero Bayes' factors reported in the wild for a (tiny, n = 100) subset of full-text papers to investigate.
A Bayes factor is a scaling factor for the priors to give the posterior, similar to likelihood ratios: see Further reading.
It would be interesting to see the extent to which Bayes factors are in use more broadly given the reports of Bayesian statistics "rippling through everything from physics to cancer research" (NYT, 2014). Ioannidis's results show that biomedical researchers are more comfortable presenting smaller p-values on the face of their paper (otherwise burying them in shame inside the body of the text).
The press release frames the findings alongside the ASA's statement, repeating the paper's conclusion that:
> from 1990 to 2015... increasing prevalence of p-values reported in the biomedical literature. Moreover, p-values reported in abstracts were in general lower (showing greater statistical significance) than p-values reported in the full text. The use of p-values was even more common in core clinical journals and in influential articles such as randomized trials and meta-analyses. The selection of more statistically significant p-values in the abstracts was prominent also in randomized trials and meta-analyses. In-depth manual analysis of a sample of 1000 abstracts and 100 full-text articles demonstrated that Bayesian methods and false-discovery rate methods were almost entirely absent, and use of CIs was seldom reported and provided mostly for risk metrics. Effect sizes were reported in a sizeable proportion of abstracts but almost always without information that would allow conveying their uncertainty. Furthermore, besides the substantial proportion of abstracts that report p-values, a larger proportion of abstracts included qualitative statements about significance, mostly without any other quantitative information.
There was also recognition of the practice behind one of the academic science Twittersphere's long-running memes ('#stillnotsignificant').
> There was also a small, but slowly increasing, number of reports that included only statistically nonsignificant results in the abstract. This is a welcome change and may suggest an increasing niche for the publication of ânegativeâ studies. However, it is unclear whether these nonsignificant results are also interpreted as such by their authors. For example, there is evidence for spin effects, whereby investigators interpret nonsignificant findings as if they were significant. Moreover, the majority of abstracts did not report a single statistically nonsignificant p-value. Beyond biomedicine, other data suggest that ânegativeâ results are disappearing from many scientific fields and from research conducted in many countries, and there are spuriously too many statistically significant results, while all results should be communicated in an unbiased fashion regardless of their statistical significance.
The authors recommend much what the ASA do: opt for estimation in place of testing where possible, but measure the uncertainty associated with effect sizes through confidence intervals (CIs, i.e. give error bars for population summary statistics):
> Overall, we do not recommend that p-values should be abandoned. Alternative statistics such as Bayes factors may also be warranted and helpful to consider in many cases, but even if there were a change from p-values to Bayes factors or false-discovery rates, this would not necessarily reduce the problem of selective reporting and lack of reporting of important information on effect sizes, such as absolute and relative risk measures and mean differences. The transparency, accuracy, and information content of the biomedical literature would benefit from increased reporting of both effect sizes and measures of uncertainty or at least both effect size and p-value in abstracts. Qualitative statements about significance in the absence of quantitative information are difficult to interpret and may be misleading because statistical, biological, and clinical significance are different concepts and subjective interpretation of significance may be incorrect. Therefore, such isolated qualitative statements should be avoided. By default, isolated reporting of p-values also should be avoided, unless a cogent argument can be made that effect size is not relevant (eg, in some genomic studies). In addition, journals should encourage investigators to report in their abstracts the quantitative findings of their main analyses and not necessarily those that were nominally statistically significant.
Full-text analyses would be possible with the likes of the Sci-Hub database, but at present copyright law prevents its mining (advocacy work is ongoing to change the lay of the land here, from the likes of The Content Mine group, to the Hague Declaration on text data mining (TDM) late last year, to Chris Hartgerink contesting Elsevier and Wiley's orders for him to cease and desist, Julia Reda's lobbying in European Parliament, etc...).
All that aside, Ioannidis's response to the ASA statement chimes pretty firmly with Andrew Gelman's (both were consulted during the statement's drafting), in which he expresses frustration at the omission of the full definition of a p-value. The statement's authors declined "to address the issue of multiple potential comparisons (Gelman and Loken, 2014)... in order to keep the statement reasonably simple".
Gelman wasn't significantly impressed:
> The whole point of the "garden of forking paths" (Gelman and Loken, 2014) is that to compute a valid p-value you need to know what analyses would have been done had the data been different. Even if the researchers only did a single analysis of the data at hand, they well could've done other analyses had the data been different. Remember that "analysis" here also includes rules for data coding, data exclusion, etc.
The "garden of forking paths" Gelman mentions here refers to the multiple comparisons problem, as discussed in a 2013 manuscript with Eric Loken.
In this garden of forking paths, whatever route you take seems predetermined, but thatâs because the choices are done implicitly. The researchers are not trying multiple tests to see which has the best p-value; rather, they are using their scientific common sense to formulate their hypotheses in a reasonable way, given the data they have. The mistake is in thinking that, if the particular path that was chosen yields statistical significance, this is strong evidence in favor of the hypothesis.
> data-analysis decisions [that] were theoretically-motivated based on previous literature, but where the details of data selection and analysis were not pre-specified and, as a result, were contingent on data.
> A dataset can be analyzed in so many different ways (with the choices being not just what statistical test to perform but also decisions on what data to include or exclude, what measures to study, what interactions to consider, etc.), that very little information is provided by the statement that a study came up with a p<.05 result. The short version is that itâs easy to find a p<.05 comparison even if nothing is going on, if you look hard enoughâand good scientists are skilled at looking hard enough and subsequently coming up with good stories (plausible even to themselves, as well as to their colleagues and peer reviewers) to back up any statistically-significant comparisons they happen to come up with.
> This problem is sometimes called âp-hackingâ or âresearcher degrees of freedomâ (Simmons, Nelson, and Simonsohn, 2011). In a recent article, we spoke of âfishing expeditions, with a willingness to look hard for patterns and report any comparisons that happen to be statistically significantâ (Gelman, 2013a).
> But we are starting to feel that the term âfishingâ was unfortunate, in that it invokes an image of a researcher trying out comparison after comparison, throwing the line into the lake repeatedly until a fish is snagged. We have no reason to think that researchers regularly do that. We think the real story is that researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.
> It might seem unfair that we are criticizing published papers based on a claim about what they would have done had the data been different. But this is the (somewhat paradoxical) nature of frequentist reasoning: if you accept the concept of the p-value, you have to respect the legitimacy of modeling what would have been done under alternative data.
This 2013 manuscript was to become a citation for multiple subsequent critiques of the p-value (including the ASA statement), carrying this disapproving reference to "the (somewhat paradoxical) nature of frequentist reasoning".
The phrasing is reminiscent of [Amsterdam University Pyschology professor] Eric-Jan Wagenmakers's working definition of Bayesian statistics as:
> > "the only statistical procedure that is coherent, meaning that it avoids statements that are internally inconsistent"
Gelman's reply to the ASA continues,
> When I was sent an earlier version of the ASAâs statement, I suggested changing the sentence to,
> ### âValid p-values cannot be drawn without knowing, not just what was done with the existing data, but what the choices in data coding, exclusion, and analysis would have been, had the data been different. This âwhat would have been done under other possible datasetsâ is central to the definition of p-value.â
> ### The concern is not just multiple comparisons, it is multiple potential comparisons.
> Even experienced users of statistics often have the naive belief that if they did not engage in âcherry-picking . . . data dredging, significance chasing, significance questing, selective inference and p-hackingâ (to use the words of the ASAâs statement), and if they clearly state how many and which analyses were conducted, then theyâre ok. In practice, though, as Simmons, Nelson, and Simonsohn (2011) have noted, researcher degrees of freedom (including data-exclusion rules; decisions of whether to average groups, compare them, or analyze them separately; choices of regression predictors and iteractions; and so on) can and are performed after seeing the data.
> A scientific hypothesis in a field such as psychology, economics, or medicine can correspond to any number of statistical hypotheses, and if the ASA is going to issue a statement warning about p-values, I think it necessary to emphasize that researcher degrees of freedomâthe garden of forking pathsâcan and does occur even without people realizing what they are doing. A researcher will see the data and make a series of reasonable, theory-respecting choices, ending up with an apparently successfulâthat is, âstatistically significantââfinding, without realizing that the nominal p-value obtained is meaningless. Ultimately the problem is not with p-values but with null-hypothesis significance testing, that parody of falsificationism in which straw-man null hypothesis A is rejected and this is taken as evidence in favor of preferred alternative B (see Gelman, 2014). Whenever this sort of reasoning is being done, the problems discussed above will arise. Confidence intervals, credible intervals, Bayes factors, cross-validation: you name the method, it can and will be twisted, even if inadvertently, to create the appearance of strong evidence where none exists.
Gelman cites the 2014 Sci. Am piece, and it's a little frustrating to see â for a topic so important â that the corrections have not made it to the magazine's site as one would hope. The magazine's editors confused the very definition central to the whole discussion here, as Gelman was at pains to clarify in correspondence published to his blog :
> > Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation.
> How horrible! Russ correctly noted that the above statement is completely wrong, on two counts:
1. To the extent the p-value measures âconfidenceâ at all, it would be confidence in the null hypothesis, not confidence in the data.
2. In any case, the p-value is not not not not not âthe probability that a perceived result is actually the result of random variation.â The p-value is the probability of seeing something at least as extreme as the data, if the model (in statistics jargon, the ânull hypothesisâ) were true.
Early this year Gelman returned to his green-fingered analogy to distinguish plausibility vs. probability, and prior distributions.
> Somebody doing a study where they found an interaction with âp less than .05,â no, thatâs not strong evidence. Thatâs where forking paths comes in. Forking paths comes into the p-value calculation, and forking paths comes into the prior. If you want to go full Bayes, thatâs fine with me, then you donât have to worry about other analyses the researcher might have done, you just have to worry about other models of the world that are just as plausible as your current favorite.
Meanwhile at Berkeley...
The exception to the aforementioned trend in life sciences departments might be U. C. Berkeley.
Most recently, they ran a pretty jaw-dropping course for all first years starting this academic year, apparently "aiming at being an intro to data course for all first year students" (course site with materials here).
data-8.appspot.com/sp16/course has details and materials to help direct/structure your own study.
Genomicist Lior Pachter's lab has produced the CuffLinks/TopHat RNA-seq transcriptome assembly/exon mapping software, (more recently: Kallisto, Sleuth and Shannon), which variously bring statistical rigour to the analysis of biological sequencing data. In 2013, Pachter wrote that Integrative Biology and Molecular & Cell Biology undergraduates were being required to study a 'Statistics and Combinatorics' course (some of the material can be found online).
Next week our semester begins and Iâll be teaching Math 10 to more than 250 incoming students. Math 1, which is the standard 1st year calculus course is an option for students as well (although that may be discontinued in the future). To accommodate the new course requirements Integrative Biology has increased its math requirement from one to two semesters, and hopefully other biology departments will follow suit.
The course covers topics from three different areas:
calculus: the language of change.
discrete mathematics: the art of counting.
probability theory and statistics: the science of data analysis.
The premise of the course is that these topics are essential for describing and understanding biological systems, and for working with biological data. We are not alone in this belief. Recent reports and recommendations from institutions such as the HHMI and AAMC all suggest that undergraduate institutions rethink math education of biology students. In particular, they emphasize the point that there is much more for students to learn than just calculus.
Ideally a course covering the topics in Math 10 would be 2 years long, but this is impossible given the constraints of biology majors, who are already overburdened with course requirements. Instead, Math 10 integrates these topics so that they complement and reinforce each other. For example, integration is used to obtain cumulative distribution functions from probability density functions. Similarly, combinatorial concepts are introduced in the context of their statistical applications. The syllabus for my Fall 2013 class is posted on the Math 10a class website.
In an era where education debates are dominated by technology issues, its easy to forget the basics: biology students are better off learning more math, statistics and computer science, and academics in those fields are better off teaching them.
A life scientist walks logs into a MOOC...
I'd heard a lot of positive murmurings about the Stat. Inf. MOOC ('massive open online course') from the R community on Twitter [#rstats] but was otherwise occupied by the statistics module on my own Masters course last semester.
It's run by 3 biostatisticians at Johns Hopkins University's Bloomberg School of Public Health. Having gone through quite a few of these courses over the years since the 2012 launches of Coursera/edX (which offered a shinier, more organised interface to materials than the generally less accessible and from what I saw more Physics-focussed options on YouTube/the long-running MIT OpenCourseware), I can say it's one of the best-executed (and best-intentioned) I've seen. I wish I'd have had found such nicely structured mathematical guidance sooner on.
There's no paywalling of course materials here (it's all open source to the point of community best practice), they re-release the course and update its materials regularly, while a dedicated site ensures that at each run the course experience improves thanks to the efforts of students, such as with this full set of course notes (pdf).
While programming may not be to all biologists' tastes, schedules, or immediate needs, improved grasp of statistics is inseparable from good science â and I'd say this could bolster both both post-, motivated undergraduate or(or even particularly curious pre-higher ed.) students.
To return to the F-score (or F-measure), it's calculated from the harmonic mean (a weighted average) of precision (a.k.a. positive predictive value, PPV) and recall (a.k.a. coverage, sensitivity, or true positive rate).
What do you mean there's more than one mean?
Wikipedia calls median and mode 'statistical locations' (median can also be thought of as "the only 2-quantile", and mode simply the most frequent value in a dataset, also used for nominal data such as a list of names) â these are distinct from the 'Pythagorean means': arithmetic mean, geometric mean, and harmonic mean.
The harmonic mean for a list of numbers a is nicely summed up in the Python [programming language] statement:
1/length(a) / sum(1/x) for x in a
whereas the well-known arithmetic mean would be sum(x for x in a) / length(a).
Note to aspiring programmers: While new coders generally get told to learn Python, for all things statistical it's often much more intuitive to use R (a programming language cooked up by and for statisticians): for one thing, it makes calculations on vectors (sets of values like our variable a here) much simpler.
In R, you can calculate the arithmetic mean for a numeric vector (i.e. a set of numbers) a as simply mean(a). The harmonic mean in R is therefore:
1 / mean(1/a)
i.e. the reciprocal of the mean of the reciprocals of each number in vector a.
If you've got a good eye you'll see "1/mean(1/a)" multiplied by "mean(a)" = 1. In other words, the arithmetic and harmonic means are reciprocal duals. The other Pythagorean mean, the geometric mean, nâ(a1a2a3...an), or in R: exp(mean(log(a))), is its own reciprocal dual (i.e. the geometric mean of a is equal to 1 á the geometric mean of 1/a).
Concluding Un-Bayesian Postscript to Statistical Fragments
The Stanford Coursera series Natural Language Processing begins with a demonstration of how minimising errors for even the simplest toy problem (finding text patterns with regular expressions) boiled down to increasing precision (maximising true positives & minimising false positives) and increasing coverage or recall (max. true positives & min. false negatives).
Harder tasks will often use machine learning classifiers which are much more powerful, but it turns out that even then regular expressions are used as features in the classifiers, and can be very useful in capturing generalisations.
Week 1 of this NLP course has a really nice framing of the Needleman-Wunsch global alignment and Smith-Waterman local alignment algorithms, widely used in biological sequence comparison. BLAST for example does something pretty much like Smith-Waterman â but 'cuts corners' with a heuristic for performance.
The second week's material, on n-grams, requires an understanding of conditional probability (not least to read its mathematical notation, which the Johns Hopkins University Statistical Inference course mentioned above can help shine light on.
Week 3 of this course, on text classification, explains that the F measure is a simple measure that considers accuracy and precision. For clarity, it's also not a Bayesian statistic - but it did end up motivating me to go back and study Bayes via conditional probability, so I thought I'd include this post-script.
Accuracy is a flawed measure for low-frequency events, e.g. when looking for something expected 1 in 1000 times, a lazy classifier could 'cheat' - ignoring all events - and still get 99.9% accuracy! This worst case scenario illustrates that good performance means (as noted earlier) value true positives only in proportion to the false positives that come with them (minimising spurious results: i.e. having the selectivity to distinguish real, true positives) and to account for false negatives (minimising 'missed' positive results).
Maximising:
precision P (a.k.a. PPV) tp / (tp + fp) is to minimise 'mistakes',
recall R (a.k.a. sensitivity, or coverage) tp / tp + fn is to minimise 'misses'
As mentioned already, rather than F-score being directly proportional to these statistics: F â PâR, it is a weighted, harmonic mean, usually expressed in the beta form.
F = 1 / Îą(1/P) + (1-Îą)(1/R)
F = (β2 + 1)âPâR / β2âP + R
letting β = 1 i.e. weight precision and recall equally
â â â = 2PR / P + R
> For all positive data sets containing at least one pair of nonequal values, the harmonic mean is always the least of the three means, while the arithmetic mean is always the greatest of the three and the geometric mean is always in between
Haefner's Modelling Biological Systems notes a case study example of biochemists measuring plant growth in a system in which 3 nutrients interact (O'Neill et al. 1989), who compared a dozen methods and in a close call opted for 'additive rate' for molecular interactions rather than harmonic mean. For non-Michaelis Menten modelled kinetics, harmonic mean would be preferable. They use analogy of the flow of chemical reactants under limitations in a biological pathway to current and use the harmonic mean as physicists do to find mean resistance in a circuit, see Wikipedia here
F-score is usually expressed in the beta format, and as such equal weighting to both precision and recall (ι=½ or β=1) is known as the F1-score, and is equal to F = 2PR/(P+R).
This will be a 'conservative' mean when compared to the arithmetic mean (i.e. closer to the minimum value), but also more robust to outliers that cause a standard arithmetic mean to shoot up.
One peculiar feature of the harmonic mean is (again) its 'regularity': the values whose mean is being calculated must be greater than 0. Thinking back to the physicists' analogy (or rather, application) of harmonic mean to resistance, supplying a zero value inserts a 'path of least resistance', resulting in zero. There's a brief discussion on the Cross-Validated forum:
> One physical interpretation of the harmonic mean is that if you have resistors in parallel, the total resistance is as though each resistor had the harmonic mean resistance. If one of the resistors has no resistance, there is no resistance over all (a short), and this is the same as if all resistors had no resistance.
It sounds like Python errors out rather than 'short' (give zero) based on the mathematical principle that 1/0 is not defined, whereas R gives `Inf` (the IEEE standard), and for neural network classifiers, another commenter says they filter out zero values before application.
Naive Bayes models (also covered on the Stanford NLP course) are one type of model often assessed with F-scores.
Further reading
We need conditional probabilities, not summary statistics: Cancer statistics: WTF?, a guest post by Phil Price on Andrew Gelman's blog. His friend had been diagnosed with cancer, and the statement of how long they had to live (on which patients may make the important decision of whether to undergo chemotherapy) was 'classical' summary statistic, rather than a conditional probability taking age, gender, etc. into account.
Zeger et al. (2004) A Cox Model for Biostatistics of the Future. Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 32 -- a really enjoyable paper celebrating Sir David Cox's 80th birthday, looking a decade into the future from 12 years ago... it aged well.
The very model of a model of a review. Zeger et al. (2004) A Cox Model for Biostatistics of the Future.
Contains an accessible description of hierarchical models (such as the Cox process), and also cites Royall's 1997 text, Statistical Evidence: a likelihood paradigm.
Royall's Statistical Evidence critiques Bayesian inference (which, as mentioned in the text, A. W. F. Edwards reviewed in 2000).
It's interesting to note here that hierarchical [Bayesian] models are especially [used] as models for spatial or longitudinal data", as the spatial aspect of the Gsponer lab's model, for example, is the protein sequence, and its longitudinal dimension would be evolutionary change [i.e. time].
As a fresh pair of eyes the perspectives in these 'old' reviews of the then-contemporary are an invaluable historical education: for example, it's surprising (humbling I suppose) to read that generalised linear models were not invented until 1972. GLMs use a 'link function' (such as the logarithm) to transform the output of a linear predictor, i.e. of the form y = mx + c, given an error structure (such as Poisson errors for count data, binomial errors for data on proportions, or exponential errors for survival analysis data).
Left: the error structure of various types of data (distributions named in-text here); right: descriptions for the four main deviance ('lack of fit') measures used to model errors in GLMs. Chapter 12 (Other Response Variables), in Crawley (2014) Introduction to Statistics using R, 2nd edition
The GLM class essentially represents the state of the art for modelling the relationship between a set of mutually independent univariate responses and associated vectors of explanatory variables.
...note that one way to extend Nelder and Wedderburnâs GLM class for independent responses Yi is to introduce a latent stochastic process into the linear predictor, so defining a generalized linear mixed model (GLMM).
The explosion of information technologies during the last few decades has changed forever the way in which empirical science is conducted. The collection, management and analysis of enormous data sets has become routine. The standard responses in biostatistical research have radically changed. Journal articles 20 years ago dealt mainly with binary, count or continuous univariate response variables. Generalized linear models was a breakthrough because it unified regression methods for the most common univariate outcomes.
But in todayâs studies, the response is commonly of very high dimension: an image with a million discrete pixels; a micro-array with a continuous measure of messenger RNA binding for each of 30,000 genes; a schizophrenia symptoms questionnaire with 30 discrete items. The intrinsically multivariate nature of data has been made possible by fast computers with inexpensive storage. The emerging fields of computational biology, bioinformatics and data mining are attempts to take advantage of the exponential growth in digitally-recorded information and in the computing power to deal with it.
Bioconductor demonstrates how the Internet can re-shape biostatistical research to involve larger teams of statisticians, computer scientists and biologists, loosely organized to achieve a common goal. A cautionary note is that an understandable focus on the computational challenges of bioinformatics brings with it a danger that the continuing importance of fundamental statistical ideas such as efficient design of incomplete block experiments may be forgotten.
Fisher (1925) Statistical Methods for Research Workers - reviews of the work are listed on Wikipedia
Statistics for experimental biologists: Putting the methods you use into context (defining frequentist, Bayesian, information-theoretic and likelihood methods)
Gelman (2013) âNot Only Defended But Also Appliedâ: The Perceived Absurdity of Bayesian Inference recounting William Feller's
notorious dismissal of Bayesian statistics, which is exceptional... in its intensity... [combining] a perhaps-understandable skepticism of the wilder claims of Bayesians with a naĂŻve (in retrospect) faith in the classical NeymanâPearson theory to solve practical problems in statistics.:
Also discusses 'the link between Bayes and bogosity' and the perils of taking the prior at face value: "as a reasoning based on an âinfinite population of machinesâ". As mentioned in the text, see further links on this in Gelman's 2015 blog post
Biau et al. (2009) P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers
Richard Doll (2002) Proof of Causality: Deduction from Epidemiological Observation. Perspectives in Biology and Medicine -- a Fisher memorial lecture given in 2002,
last weekâs Sunday Bayes blog post from Alex Etz, âA brief history of Bayesian statsâ who notes that:
Bayesâ theorem was really formulated by Laplace. By all accounts, we should all be Laplacians right now.
Thereâs a historical account of the specifics over at LessWrong, a summary of the popular history in Sharon McGrayne's The Theory That Would Not Die.
McGrayne gave a talk to Google employees in 2011, "The Theory That Would Not Die": How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, diving into some splendid anecdotes featuring Alan Turing, Claude Shannon, and Cold War history the LessWrong post treads more lightly over.
Introducing her, the host calls it "a very Googley book... Bayes' Theorem is used all over the place, and when I asked [the Google staff] if [they] were interested, it was either the first or the second most popular book that I have ever put up...". Her talk is really worth a watch:
By today's standards, Richard Price would be considered Thomas Bayes's co-author. If however there were justice in this world, Bayes' rule should be named after someone else entirely, and that is the great French mathematician Pierre Simon Laplace, who's better known today for the Laplace transform.
As a young man of 25, Laplace discovers the rule independently of Bayes, in 1774, and calls it the probability of causes. Laplace, also unlike Bayes, was the quintessential scientific researcher. He mathematised every science known to his day, and he spends the next 20 years off and on, in the midst of this enormous career, developing what we call Bayes' rule into the form it's used today (and actually used it).
But when Laplace dies, in 1827, the Western world begins almost a manic fad collecting precise and objective facts. There were clubs that collected them. Even women could do it!
Some of the famous numbers were the chest sizes of Scottish soldiers, the number of Prussian officers who were killed by kicking horses, the number of victims of cholera...
And with lots of these precise numbers at their disposal, any up-to-date statistician rejected Bayes' rule. They preferred to judge the probability of an event by the frequency that it occurred--nine times out of ten, three out of four, and so on. And eventually they will become known as the Frequentists, and the Frequentists become the great opponent of Bayes' rule up until quite recently because for them, modern science requires both objectivity and precise answers. And Bayes, of course, calls for a measure of belief and approximations and the Frequentists called that quote "subjectivity run amuck," "ignorance coined into science." By the 1920s, they were calling it, saying that Bayes "smacked of astrology, of alchemy", and another said, "We use Bayes formula with a sigh as the only thing available under the circumstances." Now, the surprising thing that I discovered in all of this time is the theorists and the philosophers denounced Bayes' rule as subjective.
People who had to deal with real world emergencies, who had to make one-time decisions based on scanty data, they kept right on using Bayes' rule - because they had to make do with what they had...
Robert E. Kass (of the Carnegie Mellon stats. dept.) highlights one of his contributions to McGrayne's book on the success of Bayesian inference:
Bayesâ rule is influential now in ways its pioneers could never have envisioned, Rob Kass emphasizes. âNeither Bayes nor Laplace recognized a fundamental consequence of their approach, that the accumulation of data makes openminded observers come to agreement and converge on the truth. Harold Jeffreys, the modern founder of Bayesian inference for scientific investigation, did not appreciate its importance for decision-making. And the loyalists of the 1960s and 1970s failed to realize that Bayes would ultimately be accepted not because of its superior logic but because probability models are so marvelously adept at mimicking the variation in real-world data.â
"In modern terms, the problem solved by Bayes in a quite convoluted notation was the inference of the binomial parameter p, conditioned on x successes in n trials, under the assumption that all values of p were a priori equally likely: P(p|n, x) = P(x|n,p) / ÎŁpP(x|n,p)" â Giulio D'Agostini: from The mathematics of beliefs, section 5 of Probably a discovery: Bad mathematics means rough scientific communication (2011)
Going one step further, physicist Giulio D'Agostini writes that "Bayes did not really derive this formula, but only developed a similar inferential reasoning for the parameter of Bernoulli/binomial trials".
Laplaceâs fifth general principle of the probability calculus states:
If one calculates a priori {(a)} the probability of an event that has occurred, and {(b)} the probability of the compound event made up of this event and another event that one expects, then the second probability divided by the first will give the probability of the expected event conditional upon the occurrence of the observed event.
i.e. [with y as data, and θ as unobserved parameters], letting {(a)} be denoted P(y), and {(b)} (the joint distribution) be denoted P(θ, y), Laplace defines so-called Bayesâ rule: P(θ|y) = P(θ, y) / P(y).
His sixth general principle develops that concept further, to arrive at the full sense of Bayesâ theorem:
The greater the probability of an observed event given anyone of a number of causes to which that event may be attributed, the greater the likelihood of that cause {given that event}. The probability of the existence of anyone of these causes {given the event} is thus a fraction whose numerator is the probability of the event given the cause, and whose denominator is the sum of similar probabilities, summed over all causes. If these various causes are not equally probable a priori, it is necessary, instead of the probability of the event given each cause, to use the product of this probability and the possibility of the cause itself. This is the fundamental principle of that branch of the analysis of chance that consists of reasoning a posteriori from events to causes.
So for an event (i.e. data) y, and [multiple potential] causes θ (in the sense that the diagnostic test B discussed earlier expanded to B or Bc, its complementary event) this is the expanded form of Bayesâ rule: P(θ|y) = P(y|θ)âP(θ) / ΣθP(y|θ)âP(θ)
For more advanced material in stats and machine learning, take a look around the inconspicuously-named videolectures.net (I've previously written about David Mackay's excellent Gaussian Process Basics talk from there).
Talk given by Jean-Michel Marin on Approximate Bayesian Computing methods for model choice in machine learning, which notes applications in biology (specifically population genetics).
The goal is to recover some elements of populations' history. To analyse the structure of genetic data, these methods use the gene trees.
The formulation of a model is constrained by an evolutionary scenario that mimics the historical and demographic reality. Such a scenario summarizes the evolutionary history of populations by a sequence of demographic events from an ancestral population.
Our datasets are composed of genetic information coming from several locus [on the genome], and right now we have more and more...
The relations between these genetic positions can be modelled in different ways. We can have a common genealogy, which says that all locus are completely independent, or a lot of recombination (infinite recombination on the genome) to give completely independent genealogies. I give an example where I consider a model where I have independent loci which are a target at some moment... and for neutral models, [Kimura](https://en.wikipedia.org/wiki/Motoo_Kimura) wrote there is no selection effect. There is a lot of work in population genetics with respect to finding some selection effect...
To get acquainted with all things Bayes, see the textbook: 'Bayesian Data Analysis', by Gelman et al.
Sean Eddy's 2004 article in Nature Biotechnology gives a good introduction and highlights many more resources: What is Bayesian statistics?
Kent Staleyâs 2014 thesis, Pragmatic Warrant for Frequentist Statistical Practice: The Case of High Energy Physics, which explains the use of a 5Ď standard in declaring the discovery of the Higgs boson significant at CERNâs ATLAS Experiment, before critiquing significance testing, and explaining how both frequentist and Bayesian interpretation results in High Energy Physicistsâ "philosophy of pragmatism" somewhere betwixt the two.
Forum discussion on the difference between likelihood ratio and Bayes factor
Wagenmakers (2016) "Bayesian benefits for the pragmatic researcher", an approachable 10 page paper aimed at researchers in psychology, a field that makes extensive use of Bayesian statsâŚ
âŚas I found out when I accidentally attended a conference on Bayesian psychology research 2 years ago (still, made for some nice notes: Bayesian inference as a methodological innovation in the life sciences led by Eric-Jan Wagenmakers, Professor of Neurocognitive Modelling at the University of Amsterdam's Department of Psychology).
An interesting paper in Genetic Epidemiology (2008) on the advantages of using Bayes factors for GWAS: Bayes Factors for Genome-Wide Association Studies: Comparison with p-values
As a summary measure of noteworthiness p-values are difficult to calibrate since their interpretation depends on minor allele frequency (MAF) [Ed.: âthe frequency at which the least common allele occurs in a given populationâ] and, crucially, on sample size. A consequence is that a consistent decision-making procedure using p-values requires a threshold for significance that reduces with sample size, contrary to common practice.
At the last statistical genetics meeting I attended at the local MRC research centre I remember p-values on Manhattan plots (Gosia Trynka presented the GoShifter method, which uses "only high-frequency (MAF > 5%) bi-allelic autosomal SNPs with a genome-wide significant (p<5 Ă 108) association with any trait"), as did Ewan Birney (director of the European Bioinformatics Institute) in a recent talk at the NIH:
Statistical law, mythology, and Peter Donnelly has divined that 5 Ă 10-8 is a good genome-wide significance level, that captures the multiple testing that's going on. Do not really ask anyone to justify that number, but we're all super comfortable... er... with... 5 Ă 10-8. But we of course did a hundred factors, and tested a hundred different [cardiac imaging] dimensions, so we needed to penalise ourselves by another factor of a hundred.
And so it was these points here, coming up above 5 Ă 10-10 which made me just... very very happy. My belief is that it's the first time that an unsupervised approach on imaging genetics has worked. I don't belive anybody else has done this. And that is all credit to Hannah [Meyer] and Antonio [de Marvao].
Ewan also discussed EMBL-EBI colleague Oliver Stegleâs Probabilistic Estimation of Expression Residuals (PEER), "which consists of a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles by using factor analysis methods" (Stegle et al., 2012), developed from Stegle et al. 2010.
PEER is a latent factor modelling system with a very strong Bayesian flavour... now if that sounds pretty cool, and sexy, that's fine, you too can say these words if you want to, and draw this diagram. I don't actually know how the Bayesian magic works. I just know that it works.
But let me just tell you how PEER is set up, you have phenotype here, this is 1500 people, by 27,000 dimensions, and you say that this is broken up into known factors (like weight, or sex and age), hidden factors, and then residuals. Now PEER's been around for a long time, and is used a lot in eQTL studies, and in eQTL studies, you hope that the hidden factors are 'missing batch problems', or weird things going on in your lab... and residuals, you hope, is your cleaned up signal. But we actually wanted to use it in the opposite way, where we wanted the signal - the genetic signal - to go into these 'hidden factors', and nothing to go into the residuals.
Bruno de Finetti, Philosophical Lectures on Probability (2008), ch. 4 covers Bayes' Theorem, though the more general principles of subjective/objective views of probabilities are relevant to the Bayesian vs. 'classical' or 'orthodox' statistical viewpoints.
"Subjective probability is... an aid to give a reliable measure of what cannot be measured objectively."
"It is commonly said that an event E has a certain probability P(E), meaning by the term âeventâ every event of a certain type (for instance, obtaining Heads by tossing a coin) rather than a single well-defined fact. This habit is not erroneous in itself: after all, it is just a matter of terminology. Nonetheless, it has the disadvantage of suggesting that one actually wants to refer to the âtrials of the same event,â in the sense intended by the objectivists. Probability, on the other hand, always concerns single events, even under the hypothesis that the trial is equally probable. Each event is what it is individually. And whenever probability is to be judged, such a probability should be thought of as conditional on a particular state of information.
Strictly speaking, the latter should be made explicit in full: yet this is practically impossible, for it would mean condensing into one proposition the entire experience that the bearer of the probability judgment has accumulated since birth, or even since he was in his motherâs womb."
This explanation could be easily grasped even by schoolboys. Nonetheless there are many philosophical points of view that in an attempt at turning into objective what is subjective, arrive at complicated alternative formulae without verifying whether the latter have a foundation or not. Since 1764 â the year of the publication of Bayesâ essay (Bayes [1764] 1970) â there has been an ongoing lively debate on this point. Among the major subsequent probabilists there have been divergences of opinion on some alternative empirical formulae, which were, at first sight, quite reasonable. I. J. Good calls those regolette (some of which are artful and some simplistic) âadhockeries.â I cannot deny that sometimes they might even have an approximative meaning: yet none of them is grounded on reasoning that is directly derived from the concept of conditional probability.
The defendants of those alternative rules argue that they are, in practice, more useful than Bayesâ theorem and try to show that they better account for inductive behaviour. I do not mean to show any disrespect towards other scholars but I believe that those attempts lack any foundation. Although I am respectful of othersâ work, I cannot help but notice that every formula alternative to Bayesâ theorem is grounded on empirical and qualitative arguments and therefore lacks a robust foundation.
...objectivists think that objective data, instead of being a piece of evidence of a circumstance that helps one forming an opinion, constitutes just the essence of probability.
...And there is no big difference if we look at the issue from an empirical point of view. One is not allowed, in fact, to say: âthese are the mathematicianâs or the philosopherâs quibbles, there is no need to be so subtle.â The reason is that there is a profound difference between those who see in objective circumstances reasons to argue for some probability values and those who pretend to define probability in terms of those circumstances instead. It must be added, however, that it is not true that on the one side of this watershed everything is objective whilst on the other everything is subjective. But if we take as objective what â if grounded on considerations which may well be objective and reasonable â is but a subjective judgment, we lose the possibility to examine, for any single case, all its concurrent circumstances. Moreover, in such a case, a person who makes a probabilistic judgment, would not be responsible for it.
Some try to turn the judgement of probability into something objective through the concept of frequency. For instance, instead of saying: âthis is my evaluation,â they say: âthis is the result of a series of a thousand tosses of a coin.â But this result is a fact that happened only once: a series of heads only or of tails only could have likewise happened. It is reasonable to attach a very small probability to such an outcome: but likewise reasonable would be to attach a small probability to any other succession of a thousand tosses, the one which actually occurred included. The identification of probability and frequency is an attempt to give an objective meaning to probability by means of useless complications.
Related post: excerpt from chapter 5 (Physical Probability and Complexity) in this book of de Finetti's lectures; the transcript of a conversation with his students on probability and physical laws.
Bonus: further listening
Roger D. Peng (one of the professors on the JHU online course) and Hilary Parker (data analyst at Etsy) discuss the p-value in episode 11 of the Not So Standard Deviations podcast: tune in here