Here’s a good question that came up in the LinkedIn Statistics & Analytics Consultants discussion group (from John Rogers, slightly edited here):
I am curious as to what defines “Big Data”. Is it considered a population, a large sample? Are we talking parameters or statistics?
Anyone interested in the term “Big Data” and its implications would do well to read Gil Press’ article, A Short History of Big Data. http://onforb.es/16bw9Kt
When you generalize from the data that you have to any other case, the data is a sample. So, the results of Big Data analysis are statistics. However, Big Data sources are usually convenience samples, not random samples. So, the assumptions for classical statistical analyses are not met. That’s one reason why we should be cautious in interpreting Big Data, and not get too uppity about our fancy analyses.
Date: October 20, 2014