April 14, 2013

How is the hot new field of "Data Science" different from dull old marketing research?

From the NYT:
HARVARD BUSINESS REVIEW calls data science “the sexiest job in the 21st century,” and by most accounts this hot new field promises to revolutionize industries from business to government, health care to academia. 
The field has been spawned by the enormous amounts of data that modern technologies create — be it the online behavior of Facebook users, tissue samples of cancer patients, purchasing habits of grocery shoppers or crime statistics of cities. Data scientists are the magicians of the Big Data era. They crunch the data, use mathematical models to analyze it and create narratives or visualizations to explain it, then suggest how to use the information to make decisions. 
In the last few years, dozens of programs under a variety of names have sprung up in response to the excitement about Big Data, not to mention the six-figure salaries for some recent graduates.

I started in the marketing research field in 1982, working on the "purchasing habits of grocery shoppers" using the new flood of data from checkout scanners. By 1987, we had all purchases from about a tenth of the supermarkets in the country. My wife worked on "tissue samples of cancer patients" in 1987. 

I wouldn't discourage people from getting interested in these kind of fields, but “the sexiest job in the 21st century?"

People make a lot of money by (roughly in order) owning things, selling things, and motivating and managing (and firing) people. Analyzing stuff is fine work to get paid to do if you have an analytical personality, but, in the long run, don't expect to get paid like the sales guys. 

35 comments:

wren said...

I think this kind of thing is how Steve's ideas will gain traction.

I know a lot of people in the "Applied Behavior Analysis" industry, making money off of autism.

They need to justify their salaries to school districts, etc, so they are extremely fond of the term "evidence-based." It pops up in their conversations all of the time.

They love thinking of new data to collect, taking lots of data and analysing their data, because then they can say that what they are doing is "evidence based." And get paid loads.

Maybe Steve could rename this blog "Evidence Based Data Analysis" or something, and everyone would be digging it.

The sexiest blog of the 21st century.

Anononymous said...

This is only of interest to the advertising industry. It doesn't really benefit society. It's more money spent on advertising budgets and higher prices on products to pay for it.


"There will be almost half a million jobs in five years ... The average salary was $89,100"

Salaries to be paid out of bigger advertising budgets.

Steve Sailer said...

"The sexiest blog of the 21st century."

Yeah, but by the time this whole data analysis thing catches on, I'll be totally into blogging about the rules of golf 168 hours per week.

wren said...

Yeah, but by the time this whole data analysis thing catches on, I'll be totally into blogging about the rules of golf 168 hours per week.

That's why you need the sales guys earning the big bucks selling golf rules analysis to the world's thought leaders and trend drivers.

GLS said...

It's all about big data. Big data, big data, big data. If you're in the tech world, HBASE and Hadoop are all the rage.

There's a fad element to this. The technology is real enough, no question about that, but I think the value of all this data has yet to be shown.

Regarding the "data scientist" role, the guys I've worked with in the past that have this title are more like high level computer programmers/mathematicians. They're working on the algorithms that actually do the work of going into a huge data repository and getting whatever result is asked for. On the project I was working on, we were trying to surface relevant product recommendations for users, similar to what Amazon does once they have a little bit of information about you. In theory, those recommendations get better and more relevant as they learn more about you. That's the idea anyway, right or wrong.

Anonymous said...

Technically Buffet is an analyst who puts his money behind his analysis. Very well paid.

GLS said...

It's all about big data. Big data, big data, big data. If you're in the tech world, HBASE and Hadoop are all the rage.

There's a fad element to this. The technology is real enough, no question about that, but I think the value of all this data has yet to be shown.

Regarding the "data scientist" role, the guys I've worked with in the past that have this title are more like high level computer programmers/mathematicians. They're working on the algorithms that actually do the work of going into a huge data repository and getting whatever result is asked for. On the project I was working on, we were trying to surface relevant product recommendations for users, similar to what Amazon does once they have a little bit of information about you. In theory, those recommendations get better and more relevant as they learn more about you. That's the idea anyway, right or wrong.

Pollo Asado said...

I've worked in analytics since the late 90s, and I've spent about 10 years now reading about an impending shortage of analytical talent and how it's the Next Big Thing. Unfortunately (for me), the shortage hasn't happened yet. It's a nice living, and if you're in marketing analytics you make slightly more than somebody at the equivalent level doing just marketing, but you're not going to get rich doing it (not so far, anyway).

Oh, and in this field a lot of companies like to hire immigrants with poor language skills on a contract basis. If we don't allow more immigration, analyses will be left rotting in the fields.

Anonymous said...

The dirty little secret of market research is that it is tailored to what the clients want to hear. And the clients want accounting type numbers, not research. If you researched "purchasing power of grocery shoppers" I speculate you did not send it unfiltered to Kraft or Coca Cola or Proctor&Gamble. They pay for a number that fits their agenda.

Fortunately for the public you didn't stay in the market research racket, but became one of the few truly interesting and courageous pundits around.

FWG said...

Hey no complaints from me on the golf thing, Steve.

anon said...

long term capital management

sunbeam said...

Sort of an aside, but there really isn't much anonymity on the internet, unless you jump through hoops, and even then...

Just saying that I think "they" know who is seeing internet advertising, and coupled with credit card and other info...

Let's take a credit card provider. They know what you are buying. Coupled with all the tracker info, which doesn't appear to be regulated, it seems like it is possible to determine how effective advertising actually is, even at the individual level.

Credit card info is regulated, but with all the overseas operations, it's not much of a stretch for me to imagine all kinds of things you can do legally if you have the right lawyers and right pull.

50% of advertising wasted? No more.

Anyone know how this is playing out now?

kQueste 428

Anonymous said...

Why would the sales guys get paid more? Data analysis seems like a more important and unique skill?

MSL said...

Data analysis requires the use of various recent computer optimizations (parallel processing/algorithmic optimizations etc.) that aren't typically taught in statistics/marketing courses.

Data science is overhyped though.

Chicago said...

So who'll be left to market all the various goodies to? The millions of former illegals now working at the carwash and in poultry plants? The many millions on Food Stamps?

Anonymous said...

An olde but a goodee... Any field with "Science" in its name... isn't.

Cail Corishev said...

You're right, it's basically the same thing. But here's the difference as I visualize it (in broad stereotypes):

Marketing research: Guys in white shirts and black ties working 9-5 writing COBOL programs to mine data that was gotten by bugging people: phone surveys, those annoying grocery checkout cards, and so on.

Data analysis: Guys in casual clothes (or pajamas) working on a per-job basis, writing in languages like Perl to mine "found" data: web site traffic patterns, ad click rates, genetic databases, and the like.

In the latter case, the data is either incidental (if you own a web site, you have traffic data you can analyze), or you get a massive amount of data to analyze from very little effort (a few blood samples could keep you busy searching for various gene patterns for ages. There's very little sense that you have to go get the data from people; it's mostly just there for the taking in huge amounts.

By the way, there's an immigration angle: on discussion forums for this kind of programming, you get a lot of Indian programmers who are clearly in way over their heads. They'll ask (in horrible English) stuff like, "How can I search a 100GB file for any pattern like ACGTTA being repeated within 100 characters, but one character can be different?" Often it's obvious the questioner doesn't even know the basics of the language he's trying to use, so there's no chance of him figuring out a problem that complex even with help; but a boss at the code factory said do it, and he got stuck. His only hope is that someone will do it for him for the challenge or to show off, which unfortunately happens a lot.

Very few of them seem capable of doing any serious programming on their own (whether from lack of aptitude or poor training I won't bother to guess) so I don't worry about them out-programming me. They're so cheap, though, that it's impossible to compete with them for freelance jobs -- at least until the client realizes he's getting what he paid for and decides to look elsewhere.

Mr. Anon said...

"HARVARD BUSINESS REVIEW calls data science “the sexiest job in the 21st century,” and by most accounts this hot new field promises to revolutionize industries from business to government, health care to academia."

So, according to Harvard, government is now an "industry"?

joecanuck said...

Steve, it's an exciting new field because it applies a lot of machine learning research that until very recently was unusable outside of academia.

Anonymous said...

"People make a lot of money by (roughly in order) owning things, selling things, and motivating and managing (and firing) people. Analyzing stuff is fine work to get paid to do if you have an analytical personality, but, in the long run, don't expect to get paid like the sales guys. "

They make less than investment bankers or traders, but more than pretty much any other profession. The salaries are high because understanding the algorithms requires a significant math background and a few great analysts are vastly superior to an army of engineers.

Anonyia said...

Data mining is also being pushed in education. I don't think much good will come out of it, other than employment opportunities for data analysts.

http://www.geekwire.com/2013/gates-sxswedu-data/

Anonymous said...

"This is only of interest to the advertising industry. It doesn't really benefit society."

Advertising dosn't benefit society? How do you think you learn about things existent that can benefit you? From an online paper:

"Today men with risk of heart trouble know to take half an aspirin a day. By 1988 it was well
established that aspirin greatly reduces the risk of myocardial occlusion. But for years the FDA
forbade aspirin makers from advertising that fact (the FDA still significantly restricts advertising
about it). The FDA surely killed tens, and quite possibly hundreds, of thousands of Americans by
this restriction alone"

How much time do you think a busy GP spends researching new uses of familiar medicines? After advertising, almost all GP's know to recommend aspirin for at risk patents.

Cail Corishev said...

They make less than investment bankers or traders, but more than pretty much any other profession. The salaries are high because understanding the algorithms requires a significant math background and a few great analysts are vastly superior to an army of engineers.

But outsourcing companies like IBM are sure going to try replacing them with armies of (cheap) programmers and engineers anyway.

I think it's jealousy, to some extent. A boss would rather have a hundred employees who are clearly below him in ability, each supplying a piece of the puzzle, rather than one genius, even if the genius is ultimately more productive. Problem is, the genius will make the boss feel stupid, and maybe even threaten his job -- or at least threaten his job security, because the boss is dependent on the genius to keep producing for him. The army of average guys is more predictable.

Cail Corishev said...

After advertising, almost all GP's know to recommend aspirin for at risk patents.

To be fair, most of them also recommend a diet high in grains and industrial seed oils as "heart-healthy" -- in large part thanks to advertising telling them so. So the aspirin thing is a bit of a broken clock.

Anonymous said...

Problem is, none of the "data scientists" I work with understand the concepts in this book...

https://en.wikipedia.org/wiki/Fooled_by_Randomness

Real scientists do, and that's why they tend to shy away from trendy, sexy applications of the arithmetic.

Anonymous said...

"... it's an exciting new field because it applies a lot of machine learning research that until very recently was unusable outside of academia."

Yes. Although it's easy to laugh at how fast the marketing folks have taken up Big Data This and Big Data That, there is some fire behind the smoke. Take classical stats, machine learning, throw in Bayesian statistics and folks that apply things like Bayesian networks/graphs to areas such as language understanding/modeling and DNS analysis, add Hidden Markov Models (very handy for all sorts of things, heavily used in DNA analysis), and you've got something that truthfully could be considered a field.

Machine Learning alone has a set of algorithms, some of which are pretty simple (decision trees) and are pretty effective at taking a pile of data that has been manually classified and coming up with a deployable "recognizer"... without doing any explicit programming.

Nothing magic, but a lot of useful stuff. I think Google has been able to do things using Bayesian models and the huge amount of language data that web-crawlers can collect to produce better language understanding models than all the old stuff based on classic linguistics theory... or something like that.

For the stat folks, at a deep level (probably too nuanced for the marketing Big Data-ers and maybe me) there are apparently real statistics issues. If I have it right it's impossible to calculate almost all of the initial estimators that statistical procedures often require. For instance, see:

"The Big Data Bootstrap"

"The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding."


So working out how to do statistics at this scale really is a bit different. Though I don't think that's what most people mean by Big Data. They just mean using Hadoop or anything that uses Map/Reduce. Sort of writing algorithms you could deploy in disk-based merge sort, one of the oldest techniques in the book.

DR said...

"Analyzing stuff is fine work to get paid to do if you have an analytical personality, but, in the long run, don't expect to get paid like the sales guys. "

Jim Simons disagrees with you.

http://www.forbes.com/profile/james-simons/

Dahinda said...

"People make a lot of money by (roughly in order) owning things, selling things, and motivating and managing (and firing) people."

Speaking of statistics, what are my odds of just winning the lottery? That would be sexy!

Unknown said...

"Problem is, none of the "data scientists" I work with understand the concepts in this book...

https://en.wikipedia.org/wiki/Fooled_by_Randomness

Real scientists do, and that's why they tend to shy away from trendy, sexy applications of the arithmetic."

You're full of it and so is Taleb. Fooled by Randomness and it's ilk are literally books, most of the phenomenon that data scientists work on are things like web traffic, which has no feedback loops/black swans etc.

candid_observer said...

"Nothing magic, but a lot of useful stuff. I think Google has been able to do things using Bayesian models and the huge amount of language data that web-crawlers can collect to produce better language understanding models than all the old stuff based on classic linguistics theory... or something like that."

I'd like to see some genuinely convincing examples of the usefulness of these techniques in the real world.

By their fruits ye shall know them.

Anthony said...

How is the hot new field of "Data Science" different from dull old marketing research?

Was it a marketing person who came up with the idea of rebranding "marketing research" as "Data Science"?

NOTA said...

It's worth noticing the broad pattern: Some people are good at convincing people to do what they want them to do, at negotiating effectively, and at impressing others with the value of their product or service or company. Those same people are also extremely good at doing those same things, employing those same skills, when convincing someone to pay them more money, when negotiating for a higher salary and a better job, and when impressing prospective employers with their value as employees.

Other people aren't as good at that stuff. They may very well be much better at cranking out working code, or keeping the network up, or building bridges that don't fall down, or producing medicines that make sick people well again. But while all those are important things, they're a lot further from the point of where you get paid. Indeed, the great majority of working scientists are extremely smart, hard-working people who are, at best, going to get a middle-class lifestyle and a relatively nice level of social prestiege. Some subset of that work will change the world in amazing ways, and leave our grandchildren thinking of us as poor and deprived. But that doesn't mean they're going to be getting a larger slice of the pie anytime soon.

Anonymous said...

This will lead to artificial intelligence.

Anonymous said...

The field has been spawned by the enormous amounts of data that modern technologies create — be it the online behavior of Facebook users, tissue samples of cancer patients, purchasing habits of grocery shoppers or crime statistics of cities. Data scientists are the magicians of the Big Data era. They crunch the data, use mathematical models to analyze it and create narratives or visualizations to explain it, then suggest how to use the information to make decisions.

It's called "scientific computing". To quote Wikipedia:
Computational science (also scientific computing or scientific computation) is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems.[1] In practical use, it is typically the application of computer simulation and other forms of computation from numerical analysis and theoretical computer science to problems in various scientific disciplines.

Now you might wonder: "Where is the science in 'online behavior of Facebook users' or 'purchasing habits of grocery shoppers'?" The answer is "Operations Research".

- from Germany

Alcalde Jaime Miguel Curleo said...

Computers are really fast now, that's all it means--fast enough that monopolists like Intel are unmotivated to spend billions improving their widgets any more. Journalists are innumerate and, in the old scholarly sense, illiterate so they're easily snowed by PR about Hadoop or Google or Bitcoins or hyperscale cat photo processing. When Nate Silver just averaged some poll results on a time chart he appeared unto them as a god