slipstream @jrlittle - Tumblr Blog

Conclusion

Excel users have a strong mental model of how data analysis works, and this makes learning to program more difficult. However, learning to program will allow you to do things that you can’t do easily in Excel

#r #excel #programming #analysis

A quick introduction to Apache Spark

#spark #hadoop #amplab

The Long Read: The ability of statistics to accurately represent the world is declining. In its wake, a new age of big data controlled by private companies is taking over – and putting democracy in peril

“Statistics began life as a tool through which the state could view society, but gradually developed. ... [to become] one of many pillars of liberalism and Enlightenment.

“The declining authority of statistics – and the experts who analyse them – is at the heart of the crisis that has become known as ‘post-truth’

In constrast, “Data Analysts skills are often not developed [from or] for the study of society.

As datafictation become a normative foundation “... it is not just the quantity of data that is different. It represents an entirely different type of knowledge, accompanied by a new mode of expertise.

... there is no fixed scale of analysis (such as the nation) nor any settled categories (such as “unemployed”).

A “post-statistical” society questions

whether indicators such as GDP and unemployment continue to carry political clout ... if they don’t, it won’t necessarily herald the end of experts, less still the end of truth. The question to be taken more seriously, now ..., is where the crisis of statistics leaves representative democracy.

#government data #technology #politics #office for national statistics #uk news #statistics #big data #guardian #the guardian #democracy

After 174 years, John Little is closing its last department store in Singapore. The remaining outlet in Plaza Singapura will shutter by the end of December.. Read more at straitstimes.com.

That’s it. I’m closing all my Singapore options. Retail storefronts just can’t compete with the sexy allure of data science!

#data science #big data #data analytics

R has found its way into a good number of news groups who do data journalism. Andrew Flowers for FiveThirtyEight talks about how they use the statistical computing language throughout their workflo…

22 minute vdeo (you can alter the playback speed). Informative and interesting.

#r #[R]#538 #fivethirtyeight

Quick Start to R for Data Wrangling

Introduction to R - interactive and online tutorial by DataCamp

Data Wrangling with R & RStudio (dplyr & tidyr) -- Video

Introduction to dplyr

Quick Start to Exploratory.io (Exploratory.io is a User Interface [UI] to pre-configured R with dplyr, tidyr, and ggplot2, aka Hadleyverse)

#quickstart #rstudio #dplyr #tidyr #[r]#R

Quick Start Introduction to Tableau

Tableau for Students

Tableau Guide

Tableau Introduction and Workshop with data (Video)

Lynda Video for Tableau

#tableau #quickstart

Created by libjohn

#exploratory.io

Workshops by Duration

#exploratory.io

Registrations by Workshops Sessions Offered

#exploratory.io

Registratant's Academic Status by Academic Year

#exploratory.io

Total Workshop Sessions offered by Academic Year

#exploratory.io

Unique Workshops offered by Academic Year

#exploratory.io

I recently had a chance to play with Exploratory.io. The tool/website/desktop application bills itself as “an interactive and reproducible real data wrangling and analysis experience powered by R and visualization”.

Upon download what you find is a very nice drag-and-drop interface combined with a cloud-based sharing platform. The tool masks a lot of the R command-line complexity and simplifies the wrangling processes when using dplyr and ggplot. In essence it masks the complexity inherent to the popular R data tool, along with the dplyr, and ggplot2 packages. (I think there are a few additional packages also loaded) And it does this without removing the command-line; so, tweaking is still possible.

Of course with simplification comes a loss of customization. I think it’s a fair trade-off. It is perhaps short of being a total and unbridled power tool but delivers a quicker start with a more consumable learning curve by putting many of the necessary data munging commands at your fingertips and thereby relieving you of the need to memorize the arcania of [R] commands and switches.

For the non data-scientist (and perhaps also for a data scientist) this provides a welcome simplification on the way to simple analysis.

Brief Definitions from ShellyPalmer on Data Science

Brevity from “What Do You Do With Data” referenced in another ShellyPalmer article which notes Data Science literacy does not demand fluency ...

Transformational Analytics

Aggregation – a class of techniques used to summarize data including basic statistics such as mean and weighted averages, median, Gaussian distribution and standard deviation. Other aggregation techniques include probability distribution fitting (the repeated measurement of variable phenomena – remember “method of moments” and “maximum likelihood” from Stats class?) and good, old-fashioned plotting points on a graph.

Enrichment – a set of techniques employed to add information to, or fill gaps in, a data set – for example, adding zip + 4 to five-digit zip codes, appending purchase data or credit scores or even simply standardizing prefixes or suffixes.

Processing – everything from data munging or data wrangling (the cleaning up of data) to entity extraction (identifying key terms in unstructured data that have value) to true feature extraction (building derived values from existing data).

Learning Analytics

Regression – a common way to predict the future based on the past by exploring spatial relationships. There are many types of regression techniques, but all share the common goal of predicting the value of a dependent variable where partial related variables are available, or estimating effects of an explanatory variable on the dependent variable.

Clustering – is just what it sounds like. The goal is to group a set of data points so that the ones with the most in common are closest together. Importantly, clustering is not a specific formula; it is accomplished by using a series of algorithms. And it is almost always an iterative process.

Classification – algorithms and other techniques used to identify to what category or subpopulation a data point belongs. When speaking about classifications, you must be careful to also identify the discipline you are speaking about. Statisticians use the term differently than practitioners of machine learning do.

Predictive Analytics

Simulation – a set of techniques used to create a simulated environment for testing predictive models.

Optimization – a wide-ranging tool set for making optimal selections from a set of alternatives. Commonly used for pricing and maximizing yield.

#data science #literacy #transform #predict #analytics

Trending Blogs

Recently Viewed Blogs

slipstream