Technology: Who is a Data Scientist?
In our earlier article “Technology: What is Data Science?” we briefly explored what Data Science is and how the field of Data Science evolved over time to become what it is today, an integral part of business across all industries with practitioners of Data Science being in high demand and remunerated handsomely for their work.
As we are living in the big data era, Data Science has become a high-yield, high return on investment (ROI) field that enables business to harness the true power of big data through processing huge volumes of data generated across various business processes, sources and day-to-day activities. Often defined as a discipline that is vast in scope, Data Science is a multidisciplinary field of methodical scientific practice which combines statistics, mathematics, computer science and information science. This is because Data Science projects typically require solving mathematical and statistical problems and involve predictive analytics, data modelling, data engineering, data mining, visualisation and many more.
Therefore practitioners of Data Science, or data scientists typically possess a wide-range of skills in order to adequately cover the whole spectrum of Data Science work that is expected by the business and are often required to continuously improve and learn new skills as the industry evolve and technology advances. In this article, we will explore the knowledge, experience and skill set that is typically required to become a data scientist.
Who is a data scientist?
There is a somewhat infamous description of data scientists online originating from a Tweeter post - “A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician”. This is more or less true as typically the role of a data scientist does not necessarily involve software engineering nor are data scientists primarily statisticians as the early influx of data science practitioners stems from all walks of life, mostly those with background in a quantitative discipline and this could be physics, bioinformatics, engineering and others.
Today, as a data scientist one’s day-to-day responsibility may include:
1. Identifying business problems, creating hypotheses and defining the scope of the research project
2. Extract and process huge volumes of both structured and unstructured data for the research project through internal systems such as relational databases or externally through APIs, web scraping, surveys, IoT systems and others
3. For data that is not available in relational databases, build new ETL pipelines for data productionisation or ad-hoc data load into sandpit areas to store data required for the project
4. Work with source system owners or subject matter experts to understand data captured and apply appropriate business rules to filter out non-relevant data
5. Employ various statistical methods to profile and analyse the dataset and cleaning the data accordingly to procure a dataset that can be directly consumed for analytics and modelling
6. Perform analysis and modelling over the dataset using both statistical methods and more advanced modelling techniques such as machine learning and neural networks
7. Continuously refine training parameters to produce the best performing model such as grid-search or random search over learning rates, number of training iterations, and other algorithm specific parameters
8. Produce a set of metrics and report with visualisation for the final model and present to key stakeholders to get their approval
9. Productionisation of model – e.g. producing an executable JAR for daily scoring jobs
10. Setting up model monitoring to ensure the model continues to perform well in production and calibrate or rebuild models when required
11. Identify new technologies and cost-effective changes to improve the above process, such as building new in-house capabilities to automate data profiling or model building
A data scientist’s role may differ across organisations and industries but most of the above would still hold true and as a data scientist builds his/her experience and expertise their responsibilities would change, often specialising one area of the data science spectrum such as in model building or productionisation of data assets, and the data scientist may also take on more project and people management responsibilities.
Given the above, to be a successful data scientist would typically necessitate the possession of the below skills, knowledge and experience.
Machine learning, neural network, maths (e.g. linear algebra), statistics
Python, R, SQL, Spark
Platform of operation
Microsoft Azure, AWS, Google Cloud, Hadoop
Communication, presentation, leadership, critical thinking, complex problem solving, creativity, adaptability, willingness to learn, ability to learn quickly and apply new skills
Advanced degrees in research such as MSc or PhD in a quantitative field highly preferable
At least 3 years of experience demonstrating ability to solve real-world data science problems
In terms of career prospect, data scientists are on the higher end of the scale as there is a global shortage for talent, the demand for data scientists is far outpacing the supply and the demand is set to continue to grow as more and more business across the world adopts big data and data science.
To address this problem, many Universities and education institutions across the world have started offering programmes and professional certifications for Data Science. Many are hoping a Data Science degree or some form of certification can provide the skills and knowledge needed to improve their chance of breaking into the industry. Despite this, there is no doubt this new breed of data scientists will be very different to the early pioneers of the field who brought along their expertise and skills from a diverse array of experience in other areas.
If you are interested in learning more about Data Science and how it can help your business grow, please stay tuned to our blog as we will deep dive into How can you find the right Data Scientists for your business? in the next article.