As someone who has used, created, and studied statistics for the better part of the past decade (or even further back if you consider my days studying the back of baseball cards as a child), it gives me no greater joy than to watch the rise of terms such as “big data”, “analytics”, and “data science.” Although these buzzwords are so new that they often don’t have set definitions, data analysis is nothing new.
As long as conscious decisions have been made - by man, animal, or machine - data analysis has been part of this world. It predates spreadsheets, computers, and even math, by thousands of years. For example, making the basic decision of what to eat is actually data analysis. You consider the descriptive elements of a food item (does it provide energy, does it taste good, is it poisonous, etc.) and then you decide whether or not to eat it. This is analogous to inputting different variables into a model that spits out an output.
What is important in both a mental model that takes seconds and a statistical model that takes hours is the understanding of the data. The first, often forgotten, assumption when creating a model is that the relationship of one variable to another actually makes sense to study. This seems so intuitive and simple that neglecting this step would be silly, but relationships in a data set are complex. They require an understanding of what the data means before a valid inference or conclusion can be verified.
“Causation” and “correlation” are not the only ways to classify data relationships. Instead, they are two ends on a classification spectrum. This is what separates analysts from computers. Any computer can create a great model from data, but it is the expert who derives meaning and application from that model. Often a single data point can have many levels of meaning, which is something only an analyst of the data can understand.
This is where PareUp fits into the world of food waste data. Very little academic research has been done on commercial food waste, but that does not mean it doesn’t exist. It may be raw and less extensive than what one might want in a perfect world, but the relationships are still there to be found.
Let’s look at a simple example. Imagine a hypothetical bakery gives us data with only two variables: time and the number of items that are considered waste at the end of a day. There is not a lot to work with initially, but upon further review, something is going on. We notice a simple plot of time versus waste may look like this:
After a superficial analysis of the data, it is easy to conclude that there is some clear relationship. But where does it fall on the correlation/causation spectrum? It would be pretty reckless to conclude causation – that every incremental day of the month could cause an increase in waste. It also wouldn’t really make sense. During this time period, there might have been a number of other changes, like in-store controlled activities (such as changes in production, price, or item variety), changes that impact store traffic (such as economic or environmental factors), or changes in attitude towards the store or items in the store. In this case, time may actually be the proxy for any or all of those factors. What we actually see is pretty clear correlation, but doing the analysis into why these variables are correlated will give us a good start into figuring out the cause(s) of the underlying trend. PareUp is working with a number of stores to do just that.
Data analysis has been heavily invested and analyzed in many industries, from marketing to athletics, but it is still very young in the world of food. This is not only due to a lack of data but also a lack of understanding about the story data can tell. Statistics do not discriminate by industry. The statistical techniques and theories that help marketers individually focus their advertisements are the same as the ones that help baseball franchises create winning teams and are the same as the ones that PareUp is using to help stores reduce waste. What we know is true, though, is that there is a causation relationship between more data and a reduction of waste. We’re just getting started, and we look forward to sharing what we find!