Wednesday, 22 of October of 2014

Big Data: Parameters or Statistics?

Here’s a good question that came up in the LinkedIn Statistics & Analytics Consultants discussion group (from John Rogers, slightly edited here):

I am curious as to what defines “Big Data”. Is it considered a population, a large sample? Are we talking parameters or statistics?

My reply:

Anyone interested in the term “Big Data” and its implications would do well to read Gil Press’ article, A Short History of Big Data. http://onforb.es/16bw9Kt
When you generalize from the data that you have to any other case, the data is a sample. So, the results of Big Data analysis are statistics. However, Big Data sources are usually convenience samples, not random samples. So, the assumptions for classical statistical analyses are not met. That’s one reason why we should be cautious in interpreting Big Data, and not get too uppity about our fancy analyses.


Leave a comment

News galore

Data Mining for Dummies, my epic tome for beginning data miners, is available now.

Here’s the scoop:

Data Mining for Dummies, an easy-to-read new book for beginners in data mining, published by John Wiley and Sons, and available through your favorite bookseller.
Data Mining for Dummies is for business people, information technology professionals and students who want to…
• Know what data mining is all about
• See what’s really involved in data mining, icky parts and all
• Find friendly expert guidance for getting started as a hands-on data miner
Data Mining for Dummies is written in a light, yet no-nonsense, style for readers who are new to data mining. You won’t need any special expertise to read and understand this book.
Beginners can learn the basics of data mining, including
• Understanding data mining concepts
• Embracing a comprehensive data mining process
• Planning for data mining
• Gathering data from internal, public and commercial sources
• Preparing data for exploration and predictive modeling
• Building predictive models
• Selecting software and dealing with vendors
Author Meta S. Brown is a hands-on data miner who has educated thousands of beginners from industry, government and academia in the fundamentals of data mining. She’s known in the analytics community for her articles, books and talks on data mining, text mining and classical statistics, reaching out to audiences from novices to working professionals.
Here’s what Tom Khabaza, pioneering data miner and Founding Chairman of the Society of Data Miners has to say about Data Mining for Dummies:
Meta S. Brown tells it like it is, more than anyone else in the field.
Data Mining for Dummies is the first data mining book for beginners which gives an accurate picture of what we data miners do. This is a landmark for the profession, and an essential tool for anyone learning or teaching practical data mining. I will be recommending it to everyone I meet: business people, students and teachers alike.
Where to find Data Mining for Dummies:
Your favorite independent bookseller (find one on Indiebound http://bit.ly/1ruU9n0)
Powell’s City of Books http://bit.ly/1qFLkQG
Amazon http://amzn.to/1eFD3WI
Barnes and Noble http://bit.ly/1qFLAz8
• Ask your local library to get it. ISBN: 978-1-118-89317-3


Leave a comment

Why Data Analysts Need Business Analysts

Why Data Analysts Need Business Analysts http://bit.ly/alla023

Find out why it’s good to business analysts on your team. Article on All Analytics.


Leave a comment

Breeding Analytics Talent Through AP Statistics

Breeding Analytics Talent Through AP Statistics http://bit.ly/alla024

The next generation of data analysts is breeding in the high schools. Article on All Analytics.


Leave a comment

Text Analytics Summit West 2013

Just back from Text Analytics Summit West 2013!

Heard some terrific talks.

One of the things I particularly liked was Mark Eduljee’s concise set of seven principles for useful analysis. I’ll be writing about the details soon!

Mingzhu Lu mentioned embarrassingly parallel computing, another topic begging for more explanation – maybe I should do a distributed computing piece similar to my Bluffer’s Guide to NoSQL Databases.

Janine Johnson led a GATE workshop. This gave participants the opportunity to see GATE (a text analytics tool for developers) in action, and get a good sense of how developers can work with it. Some of the crowd installed GATE and tried it hands-on. The rest of us watched as Janine demonstrated – and I, for one, saw more of what the tool could do in the two hour workshop that I probably could have worked out on my own in two days. It probably would have taken me more than two hours just to install and get it running!


Leave a comment

Storytelling for Data Analysts

“Storytelling with data is critical. But the emphasis is on data, not story.”
– Richard Hren, marketing strategist


Storytelling for Data Analysts

http://bit.ly/alla021


Leave a comment

Data is not statistics

You may be aware of data.gov, the Federal government website devoted to providing access to certain types of data. This site facilitates access to records of government activity, information that has been available in the past, but not necessarily in a form that lent itself to data analysis or use in computer applications. While this may be useful to you, it is what it’s called – data – not statistics. It’s raw data which has not been analyzed.

Often, what you need is not raw data. You might need to know the typical income of a plumber, the number of fatalities association with various forms of transportation, or the proportion of high-school students who graduated last year in your state. These are statistics.

Statistical information is available through many sources. Federal, state and local governments provide statistics. So do nonprofit organizations. Commercial entities develop statistics, and often make them available to the public, sometimes for a fee, yet often at no charge. In most cases, these statistics are prepared by well-qualified data analysts, who may provide significant information on background, methods and interpretation of results.

I’ve just written an article on great sources for statistics, to be published later this year. I’ll post an update with a link for you when it becomes available.


Leave a comment

Don’t Go Steady (With 1 Data Analysis Method)

Old dogs still need new tricks.

Don’t Go Steady (With 1 Data Analysis Method)

http://bit.ly/alla020


Leave a comment

When Less (Data) Is More (Information)

Have Big Data? Here’s why you still need to understand sampling.

When Less (Data) Is More (Information)

http://bit.ly/alla016


Leave a comment

Mo’ Data Blues

There’s a lot that Big Data can’t do.

Mo’ Data Blues

http://bit.ly/alla018


Leave a comment