Wednesday, 22 of February of 2012

One Model Does Not Fit All: Part 2

This is the second of three in a series which began with One Model Does Not Fit All: Part 1

One Model Fits All

A government agency invited me in to talk to the staff about some cool new items in the statistics goodie bag. The audience was a mix of people from analysts with modest training to PhDs, from many departments of the agency. Nobody was obligated to attend, it was just an opportunity to pick up some new tricks.

During a presentation on decision tree methods, two men, obviously experienced analysts, sat together and heckled me through much of the talk. They took the position that everything could be modeled using logistic regression, and that logistic regression could do anything that decision trees could do. The presentation, and the methods presented, were without value.

Sooner or later, this will happen to you, so you need to understand what’s going on in situations like this. The hecklers weren’t there to learn, nor make an impression on me. They were showing off for the others in the audience, and reinforcing their own sense of self -importance. This is a common occurrence.

The others weren’t stupid, so while trying to act smart, these guys made the opposite impression on many people in the room. It’s important to understand, though, they the hecklers were also customers, and it’s not cool to embarrass customers. If this had been an academic teaching environment, I might have been free to really set them straight, but the limitations of the schedule and my purpose didn’t allow for that. So the wise guys didn’t learn much, if anything, in that session. There wasn’t time to explore the cases where their favorite technique was weak, and contrast these with alternatives in details.

The solution: Be open-minded about modeling techniques. In real life, there is no one-equation-fits-all model.

Tune in tomorrow for the final story: Don’t Hang Your Hat on a Dummy.


One Model Does Not Fit All: Part 1

In “Better than Brute Force: Analysis with Big Data,” I touched lightly on the advantages of modeling for segments, rather than seeking one big holy grail of a model that covers everyone and everything. I’ve often been surprised by professional analysts whose education seems more than adequate to get them over that conceptual hurdle, but… I guess it isn’t.

Before some big-word-using consultant tries to sell you on the mother of all logistic regressions, I have some stories you should hear. Here’s the first:

The Clothing Model

A military data analyst came to me with a statistical model he was using to predict clothing sizes. I was perplexed from the start, since I would have thought you just measure the person and that’s it. But he had a real issue – he needed to estimate demand for some highly specialized clothing items worn by military pilots, and people didn’t always prefer what seemed the likely size.

He had a put together a model, but it was not performing very well. I grew suspicious. “Are any of these pilots women?” I asked. Some were. I took his table of predicted and actual sizes, and ran down the list pointing out certain cases, telling him that each one was a woman. He checked the data for each case I suggested, and yes, each was a woman.

What made this so obvious? The clothing item was a flight suit – a tight-fitting garment. It was designed for men. The primary measurement used to identify size in men’s clothing is the chest circumference. If you know a man’s chest measurement, and you know he’s healthy and young enough to be a military pilot, you have all the data you need for this application. Now I hope this doesn’t come as a shock to the male readers, but women are not shaped just like men. Women have – this is the technical term – “breasts” – which exhibit a significant level of variability in size and shape, between individuals. The best fitting garment for a woman is consistently a smaller size than for a man with the same chest measurement, and the bigger the breasts, the bigger the discrepancy. Any questions? You’re all taking notes, right?

If you know a guy’s chest measurement, you know his size. If you know a gal’s chest measurement, you know her chest measurement. This analyst had the best of intentions, he was just a little too well educated to see think clearly.

The solution: Segment the data by gender, build two separate models. And find some more dependable measurements to use for the women.


Tune in tomorrow for: One Model Fits All.


More on secrets

In yesterday’s post, I mentioned an article from The New York Times which discussed customer research at Target. The folks at Target weren’t so happy about the journalist’s research. He offered them a draft for review, and they replied telling him the piece had a number of errors, but wouldn’t say what they were.

Let me take a stab at pointing out some concerns that come to mind as I read the article. I have no knowledge of what they do inside Target, so this comes purely from my own experience as an analyst.

The article says that Target tries to track every customer with an ID.

It’s not easy to do this well, but Target has a lot to work with. The usual means of tracking purchasing behavior is through a loyalty program. Target does not have a loyalty program, per se, but they offer a house credit card, which many people use. With their huge customer base, customers using that card represent a lot of information. However, customers may not always use the same card, and sometimes they may even pay in (gasp) cash.

The article says that Target used their baby shower registry to study the buying habits of pregnant women.

That made me think for a minute, as it slowly dawned on me that I was in that registry. My name was there, but was my husband’s? Maybe, maybe not. We probably paid for our Target purchases with his credit card much of the time. Another tracking problem.

For research purposes, you’d want to compare the behavior of pregnant women with demographically similar women who are not pregnant. No registry for those. Some educated guesswork needed to solve that problem.

The article gives a fictional example and a fictional probability that the fictional woman is pregnant and due in a certain month.

It’s just an example, and a little too perfect example, too. The idea is useful, but don’t get too wound up in the details of the example. Real life models are rarely, if ever, as perfect as that.

Feel free to add some thoughts of your own….


Handling Secrets

There’s an article in The New York Times today, “How Companies Learn Your Secrets,” which offers a lot of insight into how practical predictive models are built, as well as how we need to put thought into how to integrate those models into daily business.

The predictive modeling theme is buried in with a lot of material about habit, neuroscience and how the author lost weight. If your interest is in analytics, you can just skim, maybe even skip all of that. Concentrate on the business process – Target wanted to build its shopper base, more shoppers, more often, buying more things. They knew that shoppers change habits during certain life changing events, and the biggest of these is the birth of a child. So they used their data to look for hints that a customer might be pregnant.

Read the article to learn more about the data exploration, interesting findings, and right and wrong ways to put model predictions into everyday use.


Controlled measurement matters

It’s hard to measure a baby’s height. There’s no point in asking the baby to stand up nice and straight, is there? People who do this all the time use the right tools to measure, and the right technique: stretch the kid out and hold it still.

If you want to do any meaningful analysis in business, you need dependable measurements to work with. You can’t grab the entire economic system and hold it still, of course. Yet there are many opportunities for controlled measurement in real business applications. You can’t control the whole society, but you may be able to control your own actions, and those of your coworkers. You can also control the function of inanimate things like equipment and websites.

Let’s say you have been using the same coupons since the beginning of time, and you’d like to try something new. If you replace the old coupon with a new one today and watch to see what happens, you’re going to get lousy information. Why? Because the world is a squirming baby! Things are changing, and whether the response to your new coupon is fabulous, crummy or so-so, you won’t really know what’s going on. It won’t be the response to a coupon, but the response to a mix of everything affecting your business at that moment, plus a new coupon.

What you need is a controlled test. You need a group of people to get the old coupon – exactly as it has always been. And you need another group to get the new coupon. In order to separate the effect of the coupon change from all the other excitement in the world, you need to match the two groups in every way you can – deliver the coupons at the same time, in the same way, avoiding any kind of systematic bias in who gets which. In other words, don’t send one version to men and the other to women, don’t form the groups by geography, or age, or anything else that might affect results. You want random samples, and the definition of random is that every single person has an equal chance of getting into either one of the two samples.

That way, at least for that moment with its unique economic conditions, you’ll know which coupon worked better.

[There’s a lot more that could be said about analysis methods and continued testing that are relevant to this issue. I’ll write more about those topics down the road.]


No room for romance

Yesterday, I posted a piece on Smart Data Collective called “Analytics: What’s Passion Got to Do with It?” If it isn’t obvious enough from the title, let me paraphrase myself:

The point of analytics is to bring reason into decision-making. Passion does not add value to the process.

That was a bit of a fork in the side of some readers. One response that I enjoyed came from Rob Saker (@robsaker), who tweeted:

I couldn’t disagree more with the author. Idealistic.

Rob is a Chicagoan and so am I, so maybe I’ll get to meet him one day soon. Here’s his bio:

Advanced Analytics leader, presently with MillerCoors. Tweets & opinions here are my own. And yes, I get free beer.

Let me lay it on the line: I’m a romance-killer. Analytics is utilitarian. Apply it to things that matter to you, be passionate about them if it suits the situation. Analytics itself is merely a set of tools, no more a matter of passion than a boxful of hammers and screwdrivers.

On that note, a little quote from Billy Beane, of the Oakland As and Moneyball fame:

There’s no room for romance in baseball or business.

[You can read more of Mr. Beane's advice in Miranda Miller's "5 'Moneyball' Tips for Search Marketers from Billy Beane."]


Analytics: What’s Passion Got to Do with It?

Long, long ago, in a land far, far away, there were frogs. Lots and lotsa frogs. Oh, the frogs were really beautiful princesses and handsome princes, but you’d never know it. They never got around to any kissing! Why not? They must have lacked passion.

Read more about the frogs, passion and analytics in “Analytics: What’s Passion Got to Do with It?”, my new post on Smart Data Collective.


Leave a comment

But Can She Type?

A few days ago, Gary Cokins asked, “Could Beethoven have implemented business analytics?” A strange question, indeed. After all, Beethoven had his own job to do. Can the average analytics manager write a symphony? Can any analytics manager write a symphony? Does Beethoven have to do everything around here? Or, as Seth Grimes put it, “Would Beethoven Have Given a Rat’s Ass about Business Analytics?”

If I may take the liberty of paraphrasing Mr. Grimes’ response, the answer is, “No.”

This reminds me of the once-famous poster: a large photograph of Golda Meir, then prime minister of Israel, with the caption, “But Can She Type?”

More importantly for us in the analytics trade, change to present tense and substitute any name you want: Does [fill in] Give a Rat’s Ass about Business Analytics? Unless the fill in is one of our own kind, the answer is still, “No.”

The gal riding next to you on the commuter train doesn’t care about business analytics. Your manager’s manager’s manager doesn’t care about business analytics. Your prospect doesn’t care about business analytics. People care about themselves, and their own business interests. While it may be true that business analytics has everything to do with that, they probably don’t know that and it’s going to take a heck of a lot of effort to make that connection in their brains.

What’s the lesson here? Next time you want someone to care about what you do, don’t talk about what you do. Instead, ask yourself: “What does this gal/guy care about? How can I relate to issues that resonate with this person?”

If you’d like a few hints on how to do that, read my piece, “Talk Analytics with Executives: 4 Things You Must Understand.”


2 comments

Sentiment Analysis Symposium May 8, 2012 in New York City

Sentiment Analysis Symposium is coming up May 8 in New York. I attended this event last year in San Francisco, and it was worth the trip.

http://sentimentsymposium.com/


Brilliant minds in manufacturing

I’ve done a lot of work with manufacturers, and I never cease to be amazed by what goes on inside factories. Outside, they are often plain and may not inspire the imagination. Inside, the complexity of making everyday things is mind-boggling. It’s amazing to me then, to attend tech industry events and find that when they speak of technology, they speak of computers and cell phones, of IT and programming, but never of making the many other things we use each day. It’s ignorance, and prejudice, and we ought to know better.


Leave a comment