By Mervyn Mooi, Director at Knowledge Integration Dynamics. (KID).
Data scientists have become among the most prominent professionals in companies today. Their promise is epic and growing daily.
There are many definitions for a data scientist; IBM notably defines this character as follows:
“A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modelling, statistics, analytics and maths. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organisation approaches a business challenge. A good data scientist will pick the right problems that have the most value to the organisation.”
A data scientist is therefore highly skilled at overcoming data and information challenges.
It’s not surprising, considering that data scientists at Internet giants such as Google and LinkedIn have shown the potential impact they may have. For example, LinkedIn data scientists figured out how to recommend people you may know, a feature that boosted page views by millions, and therefore LinkedIn’s marketability. When the feature was tested back in 2006, it achieved click-through rates 30% higher than other prompts to visit additional pages on LinkedIn. It shifted the organisation into a higher gear, and since 2011, the company’s share price has increased 65%.
But behind the façade of the data scientist demigods, enshrouded in near mythic qualities, lie pragmatic realities that must be accounted for by responsible organisations seeking their employment – and the business results they promise. Desired results range across a vast expanse of business issues, but some include dialogue with consumers; accelerated product development; regulatory, reputational and operational risk analysis; data security and compliance; new revenue streams; reduced production maintenance costs; personalised Web site experience to propel marketing campaigns and customer engagement; tight financial and operational performance monitoring and results analysis; maximised customer profitability; minimised customer churn; fraud detection; and increased marketing campaign effectiveness.
First reality: teams
When interrogating the functions data scientists must perform in order to promote these business goals, their demigod status becomes earthly palpable. It quickly becomes clear that the list of skills is so karmic it would require several lifetimes to fully acquire. These include statistics, mathematics, predictive analytics, computer science, data engineering, programming, information management, integration, data warehousing, virtualisation (Hadoop, MapReduce and more), data quality, business analysis, data mining, visualisation, content management, collaboration, presentation authoring and executive communication – among others.
The first reality then is that no one person can extensively fulfil this function. There must be a team of people the size of which depends on the scope of the company and the projects it intends.
Second reality: scarcity
The next reality that hits home is the scarcity of these people, which is a contributing factor in their prohibitive expense. They are so scarce that they are often referred to as unicorns. Data scientists are routinely offered positions in SA with salaries averaging half a million rand per annum – this without much experience. Experienced scientists go for up to triple that amount. Regardless, a data scientist possessing all of the above skills at once is yet to be found.
Although the data scientist came to prominence with the advent of big data, data science as a discipline is an age-old concept that incorporates a wide variety of ICT skills. And this is where data scientists find their place, by quickly bringing together the necessary skills and knowledge to produce information and insights that would result in value for a company.
Third reality: foundations
With that in mind, the third reality is the necessity of adequate infrastructure and maturity before hiring even one data scientist. No lucid executive should afford the expense of employing data scientists without first ensuring the fundamentals are sound.
[Data scientists] are so scarce that they are often referred to as unicorns.
There should be a data management strategy, good data quality, good data governance, a platform and architecture, and operational data processes. These outcomes rationally precede statistical and predictive analyses and the entirety of the interrogative and operational functions companies seek to perform with their data.
Fourth reality: toolbox
Thereafter, the fourth reality is that these people require a toolbox. They must have the analytical tools to perform data quality experiments and roll-outs as well as the statistical tools such as Statistical Package for the Social Sciences, the R programming language and software environment for statistical computing and graphics, PIG and general purpose Python programming languages for anywhere access and quick high-level programming, and MATLAB (Matrix Laboratory), which is useful for linear algebra, solving algebraic and differential equations, and numerical integration.
Larger South African organisations already, to some extent, have the foundation they require and even some of the skills to launch the data scientist function. Some have already acquired the requisite skilled personnel and have operational projects.
Many smaller companies, however, are gingerly dipping a toe into the waters, because they know and understand the business benefits. However, they do not know and understand what lies behind the façade of benefits, they do not have the resources to retain experienced people indefinitely, they do not have the fundamentals in place, and they struggle to grasp the scope of their intentions. These companies are not, however, to be left in the cold. The scarcity of skills has resulted in the acknowledged data consultancies investing in personnel to fill the breach.