“Who is your data scientist, and what do they do?”
Arnold Schwarzenegger, 1990
As the volume of data generated by a modern, digitally connected society has grown, the ability to process this data and derive value from it has increased. This has brought the analysis of data to the fore. No longer an auxiliary function of a business, analytics is a key revenue generating part of any strategy, particularly the ability to make predictions. The result has been the definition of both the discipline known as ‘Data Science’ and the role of ‘Data Scientist’, both often criticised as hype, and not always entirely without reason.
Data Science is a term that has generated endless blog posts (here’s another), articles, talks, and musings with one fundamental question – “what the hell is it?”. It says everything - and absolutely nothing - all at the same time. This is, in part, what feeds the hype narrative surrounding data science. On the one hand, it suggests the use of scientific methods, techniques, and technologies to do something presumably valuable using data. On the other, it tells us nothing at all about what that actually might be. For one thing, all sciences use data, so to describe something as ‘data science’ can be accused fairly of being vague in the extreme.
For decades, companies have employed analysts and statisticians in order to better understand their business and make more insightful decisions. What seems to be special now, is the widespread adoption of mathematical techniques that are more complex, (and often known about, tested and developed in academia for decades) into the commercial sector and implemented at scale using modern computing technology. Data science is, thanks to the ever-increasing volume of data, the next evolutionary step not a freshly sprung entity. A reasonable high-level representation is given by the diagram below:
Making this possible from a skills perspective has seen a rapid increase in academically trained scientists taking the step into the private and commercial sectors. This has been in equal parts due to:
- A lack of funding in science and academia,
- An increase in graduates trained to PhD and MSc,
- Investment in data by the private sector,
- Greater ease of access to large real-world datasets in industry relative to academia.
However, the demand for suitably skilled data scientists still outstrips supply.
WHAT IS DATA SCIENCE?
It is perhaps unhelpful to view ‘Data Science’ as a distinct discipline. Instead, it is more useful to think of it as an interdisciplinary tool kit applied to a wide range of sectors and industries. The answer to “What is Data Science?” therefore may not lie in the headline description ‘Data Science’, but instead by a general characterisation of what it is to “do data science”. We should ask ourselves; “are there any common themes to the problems solved, the methods used, technologies leveraged, and philosophy applied?”.
Here is a list of what we came up with at T-DAB:
- Solving problems through the use of data, advanced mathematical techniques, and modern computing technology at scale.
- Question and hypothesis driven, but less reductionist in philosophy than traditional research – models are informed by data and as little as possible by humans
- Motivated by return of value and extraction of actionable insight
- Defined by a focus on the use of algorithms to uncover patterns within data and make predictions
- Focus on an ability for algorithms to update themselves based on new data (machine learning)
- Use of larger-than-normal (big) datasets and high dimensional data
- The integration and use of multiple data sources and types
THE DATA SCIENCE SAUCE
Data science has some defining quirks that set it apart from more traditional analytics:
- Data science is often more concerned with predictive power rather than statistical significance per se.
- Establishing causation is not always an imperative, so long as predictive accuracy is high and that there is no need to take action based on the predicting features.
- The focus on prediction has led to a propensity for the use of complex, difficult to interpret algorithms, such as neural networks.
- Finally, data science has extended into areas not the domain of traditional data analysis or business intelligence; Image recognition, computer vision, and the wider use of unstructured data.
One might also be tempted to define data science according to traits of those that practice it. If you were to search, ‘what is a data scientist’ on Google, you will get the same repeated list of skills that a data scientist ‘must have’:
Note that, among such descriptions you will rarely come across the one thing that really defines any scientist – subject matter expertise. A mathematician who knows nothing of stars makes for a poor astrophysicist. The same is true for data scientist and it is possibly this that truly distinguishes the data scientist from the analyst or the engineer. Data scientists must, to be successful, have a deep understanding (or ability to understand very quickly) of the business and industry they work for, and a creative mind to help them design novel solutions to solve problems. This is where the true value of the data scientist and data science lies; the ability to combine deep subject matter expertise, multiple data sources, and advanced analytical techniques.
Data Science is here to stay
It is a mistake to think that ‘Data Science’ will, like other hypes, fade away simply because of broad and changing definition. Though it is a broad term for a wide range of both established and evolving techniques and technologies, these are definitely here to stay. Data is certainly not going away, and nor is society and business’s desire to benefit and profit from it. Therefore, you should ask not ‘what is data science’, but instead what can it do for you?