Neil Saunders has an interesting (to me) blog post up this morning, with the title “Dumped on by data scientists.” He uses the use of “data scientist” in a Chronicle of Higher Ed article to rant a little bit about the term. For Neil, it’s redundant, as the act of doing science necessarily requires data; it’s insulting, as if “scientist” wasn’t cool enough and you have to add “data”; and it’s misleading, as many people who call themselves “data scientists” are actually dealing with business data rather than scientific data.
Without disagreeing that there’s a terminological sprawl going on, I did want to address the use of the term, and partially disagree with Neil.
As someone with scientific training who uses those tools to solve business problems, I certainly struggle with a description of my role. “Data Scientist” or “Statistical Data Scientist” is actually pretty good, as it correctly indicates that I use scientific techniques (controlled experiments, sophisticated statistics) to understand our company’s data. I often describe myself as a “Statistician”, too, which gets across some of the same ideas without people having to do a double take and parse a new phrase. I also sometimes describe myself as doing “Operations Research” (aka “Management Science”, although I don’t use that term), since I use some of the tools of that field, as well as of Artificial Intelligence/Machine Learning, to optimize certain objective functions.
“Business Intelligence” actually is not that good a term for what I do, as most of what is usually called BI is about tools for better/more relevant/faster access to data for business people to use. This is not a bad thing to be doing, at all, but it’s different from the predictive and inferential statistical methods that I use in my job.
I don’t know what the right answer is. It might depend on the precise person and their precise role. My title, for instance, is the result of a back-and-forth with my boss, HR, and others, trying to find words that have both appropriate internal and external meanings. “Technical Lead” is a rank, indicating that I run technical projects without (formally) managing people. “Inventory Optimization and Research” covers a variety of areas. “Inventory” here means “sellable units”, like boxes on a shelf, or in this case, like scheduled airline flights. Probably baffling for an external audience without an explanation, but extremely clear inside the company. “Optimization” means what it sounds like, both in a technical and a non-technical sense, and for both internal and external audiences. “Research” indicates a focus on the development of long-term and cutting-edge systems. “Data Scientist” didn’t end up in there, but it could have.
For people using Big Data tools and scientific methods to study topics inside academia, the right answer seems to me to put the field of study first. You’re not a “Data Scientist”, you’re an astrophysicist, or a bioinformatician, or a neuroscientist, with a specialization in statistical methods. If you’re a generalist inside the academy, you’re probably a statistician. Perhaps “Data Scientist” should be restricted to people applying scientific tools and techniques to problems of non-academic interest? That might work, as long as it included people who do things like apply predictive analytic tools to hospital admissions data.