Technology: What is Data Science?
Some of you may recall having come across a very neat article back in 2011 by “Harvard Business Review” proclaiming data scientist as the sexiest job of the 21st century. Since then, the field of Data Science has experienced growth in exponential scales over the last ten years with many businesses transforming digitally and leveraging their data to discover interesting insights and drive business value.
Today, Data Science remains a buzzword, albeit one that is very broad and most associated with and in some way encompasses other buzzwords such as AI, machine learning and big data. Data Scientists are one of the most well-paid professionals who are highly sought after by virtually all kinds of businesses across industries. Tertiary education institutes across the globe have slowly begin to offer Bachelor and Master level Data Science degrees to train up the next-generation of Data Scientists to fill the massive skill-gap that still exists in the job market. So how did this highly-esteemed profession seemingly boomed overnight and has literally taken almost every single industry by storm?
First, let us define the scope of what Data Science is about. Data Science is a multidisciplinary field of methodical scientific practice which combines statistics, mathematics, computer science and information science. Practitioners of Data Science falls under various buckets accordingly to their respective fields of expertise within Data Science but most commonly or more broadly are referred to as Data Scientists. The goal of Data Scientists is to extract knowledge and insights from data and leverage this newfound perspective or comprehension on the subject matter to drive business processes, decisions and value through some form of automation.
To support the work of a data scientist however, there is typically a team of data engineers, data architects, machine learning engineers and analysts who together with the data scientists, form the empirical unit of the data science ecosystem within an organisation. To understand who all these people are, think about this by drawing a direct comparison of a Data Science team with a café.
Customers are typically coming to buy coffee.
The café serves several different kinds of coffee to cater to the needs of the diverse demographic who visits the café.
Below are some work/duties that you may see in a café:
· Someone to deliver the beans
· Someone to store and maintain the beans, ensuring the beans are of the expected quality and ready for use any time
· Someone to plan and map out the logistics of the beans
· Finally, baristas to make coffee from the beans
Data Science team
Business stakeholders are typically seeking for some kind of data science solutions.
The Data Science team is capable of carrying out and delivering a plethora data science solutions to cater to the requirements of the business stakeholders. This can be anything between data delivery, insights discovery, reporting, predictive modelling and many more.
Below are some work/duties that you may see in a Data Science team:
· Data engineers to deliver data to the platform, ensuring the data is stored correctly, maintained properly, are of the expected quality and ready for use any time
· Data architects to plan and map out the logistics of the data, including where data would sit logically in relation to each other
· Finally, data scientists, analysts and machine learning engineers would use the data to derive some form of data science solutions
There was a time when Data Science was not a hot topic, what changed?
Many people may wonder – Why is Data Science such a big deal now? Let’s breakdown this question and consider two aspects of this which we will explore briefly, these are 1) Why was Data Science not as big of a thing before? and 2) Why is data science so hot in the market right now? If we consider the first point, quantitative analytics has been around for quite some time, we have had statisticians analysing data long before the rise of Data Science and advanced Data Science techniques such as the neural networks was invented back in 1958, so what is the pivotal moment in history that gave rise to and accelerated the growth of this brand new profession?
In short, the answer to this would be the rise of Data Science can be fundamentally attributed to the following four factors.
· Big data
· Access to hardware
· Open source community
· Emergence of MOOCs
Big data is another buzzword that has been popping up all over town much like Data Science. It is essentially data that is so large and complex that it can no longer be studied, analysed or dealt with by traditional means of data processing. For instance, a small business may use Excel spreadsheet to keep tab on their customer activity. As the business operates over the years, the excel file may eventually run out of rows to store new incoming data and the business owner may decide to create a new spreadsheet to capture new data. Now imagine if there were million rows worth of data coming in every second, each one filling thousands of these spreadsheets. The business will need to find a new way to store and maintain this velocity of data growth, such as setting up ETL jobs and databases to automatically capture data from source. Consider big data the next step up, where even traditional databases are no longer suffice and technologies such as Hadoop and MapReduce impart material difference to system and business performance. With all these data coming in, businesses are now able to dive deeper into the data than they can ever before to understand and optimise every business decision and strategy to drive business value.
Over the years, hardware development has continued to grow rapidly following Moore’s Law which states that computing power will double but the cost halved every two years and after almost 60 years, this still holds true. The volume of space that is needed for the same computing power is getting smaller but the computing power itself is also growing every year. This coupled with cloud computing, has greatly increased the accessibility of hardware by businesses across the globe. Thus, enabling businesses across industries greater flexibility and capacity to compute more complex calculations that is needed for machine learning, neural networks and more at scale.
One of the key factors to the growth of Data Science is its open source community. This means any one at any time can contribute to research and development of Data Science and participate in discussions through online channels such as GitHub, StackOverflow, Slack and others. This allows many R&D projects, products, services to be developed more effectively and efficiently by people with diverse backgrounds who bring across their unique set of skills and expertise into the field of Data Science. A good example of this is the development of XGBoost by Tianqi Chen, who built the machine learning algorithm as part of his PhD research project in 2014. Chen made his code publically available and XGBoost gained widespread popularity after Chen built the winning solution with XGBoost for the Higgs Machine Learning Challenge. Today, XGBoost is a standard tool in your average data scientist’s toolbox and the community has ported the algorithm to R and Python, creating libraries and packages to support the use of this new machine learning algorithm.
Finally with the emergence of MOOCs around 2012 such as Coursera, Udacity, EdX and more, the accessibility to the knowledge and skill set required to become a data scientist became available to the wider public in a more structured form, enabling total beginners with no prior programming experience, mathematics or statistics training, or even any computer science background to start learning and enter the field.
Therefore, it was the nexus of the above four factors which enabled Data Science to be practiced at scale and grow rapidly at a speed never encountered in other professions in history.
Why is Data Science so hot in the market now?
The need to uplift value driven by businesses has been around since the beginning of time, or at least since commerce and trading was invented. Data Science is the latest tool to help drive business value and much like the way electricity changed the way people work in the second industrial revolution, data is changing the way we work today.
To leverage data, businesses will need to attract talents with a specific skill set to unleash the power from data. One of these groups of talents is the data scientist role, whose job is to transform data into dollar value. Although the growth of demand for data scientists has far outpaced the growth of supply and there is currently a massive skill shortage across the globe for experienced data scientists.
Nonetheless, organisations are not conceding to this skill shortage or allow the deceleration of adoption and growth of their own data science capabilities with many lowering the bar to employ less experienced and skilled data analysts (when compared to data scientists) to support various data science projects and to upskill these analysts to become data scientists.
Two industries in particular have stood out in the adoption of Data Science, these are telecommunications and financial services. Some popular Data Science project topics in these fields include fraud detection, predictive analytics, customer segmentation, customer churn prevention, lifetime value prediction, recommendation engines and sentiment analytics. In turn, these projects support the business by empowering management to make better decisions, identify opportunities, drive rapid experimentation, make real-time decisions in scale (such as automated process to pre-approve credit card applications), and many more.
If you are interested in learning more about Data Science and how it can help your business grow, please stay tuned to our blog as we will deep dive into Who is a Data Scientist? And How can you find the right Data Scientists for your business? in the next article.