Thursday, 17 of April of 2014

Why Data Analysts Need Business Analysts

Why Data Analysts Need Business Analysts http://bit.ly/alla023

Find out why it’s good to business analysts on your team. Article on All Analytics.


Leave a comment

Breeding Analytics Talent Through AP Statistics

Breeding Analytics Talent Through AP Statistics http://bit.ly/alla024

The next generation of data analysts is breeding in the high schools. Article on All Analytics.


Leave a comment

Text Analytics Summit West 2013

Just back from Text Analytics Summit West 2013!

Heard some terrific talks.

One of the things I particularly liked was Mark Eduljee’s concise set of seven principles for useful analysis. I’ll be writing about the details soon!

Mingzhu Lu mentioned embarrassingly parallel computing, another topic begging for more explanation – maybe I should do a distributed computing piece similar to my Bluffer’s Guide to NoSQL Databases.

Janine Johnson led a GATE workshop. This gave participants the opportunity to see GATE (a text analytics tool for developers) in action, and get a good sense of how developers can work with it. Some of the crowd installed GATE and tried it hands-on. The rest of us watched as Janine demonstrated – and I, for one, saw more of what the tool could do in the two hour workshop that I probably could have worked out on my own in two days. It probably would have taken me more than two hours just to install and get it running!


Leave a comment

Storytelling for Data Analysts

“Storytelling with data is critical. But the emphasis is on data, not story.”
– Richard Hren, marketing strategist


Storytelling for Data Analysts

http://bit.ly/alla021


Leave a comment

Data is not statistics

You may be aware of data.gov, the Federal government website devoted to providing access to certain types of data. This site facilitates access to records of government activity, information that has been available in the past, but not necessarily in a form that lent itself to data analysis or use in computer applications. While this may be useful to you, it is what it’s called – data – not statistics. It’s raw data which has not been analyzed.

Often, what you need is not raw data. You might need to know the typical income of a plumber, the number of fatalities association with various forms of transportation, or the proportion of high-school students who graduated last year in your state. These are statistics.

Statistical information is available through many sources. Federal, state and local governments provide statistics. So do nonprofit organizations. Commercial entities develop statistics, and often make them available to the public, sometimes for a fee, yet often at no charge. In most cases, these statistics are prepared by well-qualified data analysts, who may provide significant information on background, methods and interpretation of results.

I’ve just written an article on great sources for statistics, to be published later this year. I’ll post an update with a link for you when it becomes available.


Leave a comment

Don’t Go Steady (With 1 Data Analysis Method)

Old dogs still need new tricks.

Don’t Go Steady (With 1 Data Analysis Method)

http://bit.ly/alla020


Leave a comment

When Less (Data) Is More (Information)

Have Big Data? Here’s why you still need to understand sampling.

When Less (Data) Is More (Information)

http://bit.ly/alla016


Leave a comment

Mo’ Data Blues

There’s a lot that Big Data can’t do.

Mo’ Data Blues

http://bit.ly/alla018


Leave a comment

5 Steps Every Data Analyst Must Know

Every statistical test, simple or complex, has the same 5-step structure.

5 Steps Every Data Analyst Must Know

http://bit.ly/alla019


Leave a comment

Analytics Lessons from Penises, Professors & Prohibitions

A recent All Analytics post of mine, Analytics Lessons from Penises, Professors & Prohibitions, did not sit well with some of the professionals in LinkedIn’s Advanced Business Analytics, Data Mining and Predictive Modeling discussion group. Two members of that group made some rather strong comments about the piece.

I offered to address each of their comments here on my blog. Carey Butler accepted. His questions and my replies appear below. These are followed comments from Don Philip Faithful who said “Thank you for asking, Meta, although it isn’t really necessary to do so. In a public forum, I fully expect my comments to cause riots.” (Riots?) No riots here, I think, Don.

What follows is a heckuva long blog post.

The original article, Analytics Lessons from Penises, Professors & Prohibitions is here: http://bit.ly/alla022

The LinkedIn discussion is here (this requires an account) http://linkd.in/1eeoz22

Questions from Carey Butler:

I have a few questions. Couldn’t you find something better to write about? How is this “tough analytics”? How did this earn the label “research”? Isn’t this kind of article more social engineering than factual and informative information? Can we not elevate our “tough analytics” to something more important than the size of peoples organs? This article does more to disgust, than to edify. Did you notice the source article also entertains by citing its “Most Popular” with “Charles Manson and I Are Going To Get Married.”? Aren’t the issues you discuss far more important than the story you have chosen to frame them with? Do you not see that our collective focus is being brought down to base level with such “research”?

— Carey Butler

Carey,

Thank you for reading my latest article, and for asking questions. I’m happy to respond to each and every one.

Question 1: Couldn’t you find something better to write about?

I write about many aspects of data analysis. For example, I am coauthor of a how-to book for users of a popular data mining product and an upcoming academic book on Big Data analytics, and author of many articles (links to many of them can be found at http://bit.ly/metaarticles).

May I point out that the title of the article begins with “Analytics Lessons…”. I believe that these lessons are important, and that the cases mentioned illustrate the points effectively and in a memorable way. If you don’t agree, that’s OK, you’re not the first person ever to dislike something that I have written, and you won’t be the last. However I have other feedback that tells me many others enjoy and find it useful.

Question 2: How is this “tough analytics”?

Analytics challenges are not always about fancy math or data management. Tough, in these examples, refers to the difficulty of obtaining accurate and unbiased data. No amount fancy math or data management will correct the problem of having inadequate data.

Question 3: How did this earn the label “research”?

re•search [ri-surch, ree-surch] noun
1.diligent and systematic inquiry or investigation into a subject in order to discover or revise facts, theories, applications, etc.
Source: http://dictionary.reference.com/browse/research?s=t

Ansell conducted a systematic investigation in order to revise facts. They needed accurate, current data about penis size, so they collected it in a methodical manner. That, sir, is research.

Question 4: Isn’t this kind of article more social engineering than factual and informative information?

Are you speaking of the article that I wrote, or Floyd Elliott’s humor piece that I quoted, or Ansell’s research study? I’ll just cover all of these.

My article described a challenging research case and what it has in common with more mundane research. “Social engineering” http://en.wikipedia.org/wiki/Social_engineering_%28security%29, according to Wikipedia, is the use of trickery or fraud to obtain personal information. I did not encourage the use of trickery or fraud, did I? Not social engineering.

Floyd Elliot’s piece, “That’s Not Normal” is a work of humor. Or is it? On the surface, a light-hearted laugh, but beneath the surface, a thinly veiled deception with the real motive of driving men to discuss math! And look at the comments – they are about math! Could this be… social engineering? Oh wait, Floyd said clearly that men would discuss math if it had to do with their penises, and then provided an example. There was no deception, he was upfront about it. Not social engineering after all.

And how about Ansell? Did they trick men into unwittingly revealing private information? Hmmm, let’s think about this. They asked men in a bar to step into a tent to present their penises for measurement. It’s true that we don’t know exactly what was said. And the men had probably been drinking. Yet it is hard to imagine the man who expected that a gloved technician approaching his genitalia with a ruler had some other purpose in mind. Once more, not social engineering.

Question 5: Can we not elevate our “tough analytics” to something more important than the size of peoples [sic] organs?

The analytics community devotes a lot of energy to understanding what gets people to click on links and purchase items they do not need. Tremendous resources are poured into tracking trivial digital interactions. Netflix offered a $1million prize for a model to predict what movies people will like and I never once heard any data analyst complain that the problem was unimportant.

Ansell makes condoms. These modest pieces of latex enable millions of people to protect their health and life, and to have control over their own reproductive destinies. These are very important matters, Carey, and I find them all worthwhile.

If, Carey, you would like to open a discussion about something you find more important than protecting health, life and reproductive destiny, please do so. I will not stand in your way.

One last thing – we’re adults here, so let’s call a penis a penis, not an organ.

Question 6: This article does more to disgust, than to edify. Did you notice the source article also entertains by citing its “Most Popular” with “Charles Manson and I Are Going To Get Married.”?

No, Carey, I did not notice that. However, I would not have been surprised to see that, given that the piece appeared in the comedy section of the Huffington Post.

What I did notice was Floyd Elliot’s remarkable mix of serious math with lighthearted presentation, and the relevance of those things for my own work.

Question 7: Aren’t the issues you discuss far more important than the story you have chosen to frame them with?

Maybe so.

The issues are very important. Yet for readers to learn my position on the issues, I must first have their attention.

Take you, for example. You are so offended by what I have written that you felt moved to respond, and sharply. Yet you looked at not only my article, but an article that was referenced within it.

Question 8: Do you not see that our collective focus is being brought down to base level with such “research”?

No, Carey, I do not see that.

The research is serious, but people are silly. I see that people will focus on challenging topics, if there’s something in it for them. If that something is a giggle, it’s OK with me.

And now, some thoughts from Don Philip Faithful:

Like Carey, I think you could have used a more meaningful example to make your points. However, since you have chosen this specific research to make your case, I guess it is fair game for me to comment on the research itself. I’m trying to determine the usefulness of the data since the research seems to contribute to a certain level of intellectual masturbation. Nazi scientists collected all sorts of phenotypical data, drawing inferences regarding racial intelligence and fitness. If it is necessary to accept the research to legitimize your points, then in short you have failed to substantiate your case. For instance, we should not just follow the rules of law but also the rules of social conduct, making your first point superfluous. At the same time, I’m not questioning your points, just their applicability and usefulness beyond the measurement of genitalia.

Dear Don,

Since I have already discussed the meaning and usefulness of Ansell’s research above, let me address some of the unique aspects of your commentary.

Item 1: the research seems to contribute to a certain level of intellectual masturbation

Don, bro, have you looked at some of the other online discussions among data analysts. In one group, I saw a discussion that went on for pages and pages over whether statistics is math. That’s a fine example of unproductive use of intellect.

Ansell conducted practical research to support manufacturing.

Item 2: Nazi scientists collected all sorts of phenotypical data, drawing inferences regarding racial intelligence and fitness.

Every doctor on planet Earth collects data about each and every person who walks in the door, and draws inferences from it. Teachers also collect data. So do all sorts of people in all sorts of professions with all sorts of motives. Yet you have chosen to mention the Nazis.

As a Jewish woman whose family members actually faced Nazis, I am somewhat opinionated about your choice of words. Likening people or work you don’t like to Nazis on the thinnest of premises is disrespectful, Don. And it’s a pretty weak argument as well – measuring people is a routine activity. Measuring doesn’t make one a Nazi, nor anything similar.

Item 3: If it is necessary to accept the research to legitimize your points, then in short you have failed to substantiate your case.

Hmmm. The research is there to illustrate points, not really to legitimize them. How much evidence do you need, for example, to accept the point that you should know the laws regarding your research and obey them?

And there’s nothing superfluous about reminding people to know and obey the law.

Item 5: I’m not questioning your points, just their applicability and usefulness beyond the measurement of genitalia.

Hoo boy, Don. They were pretty everyday data collection guidelines. If issues like knowing the law and monitoring staff don’t strike you as applicable to cases other than the three mentioned in “Analytics Lessons from Penises, Professors & Prohibitions” http://bit.ly/alla022, you have a pretty unconventional viewpoint on data collection.


15 comments