Data Science Digest #1 - Dec 11, 2022
@LexFridman's Conversation With @AndrewNg | Concept Drift | In-demand Data Science Skills | @Google's Pre-ML Checklist for Data Preparation
Hi all,
Welcome to the first ever issue of Data Science Digest, this is where we level up our data skills one newsletter at a time. Hope you like this first issue, and even if you don’t, it’s only going to get better from here! Have an amazing day and stay datalicious!
I. Watch | @LexFridman's Conversation With @AndrewNg
Episode #73 of The Lex Fridman podcast where @LexFridman, AI researcher at MIT, has a conversation with @AndrewNg, Co-founder of Google Brain and Coursera, inarguably one of the most influential figures in AI and Machine Learning.
What better way to kick off our inaugural issue than these two amazing people in the same room talking about Deep Learning, Education and AI. Stream it on Spotify or you can also -> Watch it on Youtube here.
II. Read | Concept Drift in ML
Concept Drift - In the context of Machine learning and predictive analytics Concept Drift refers to the phenomenon of change in the relationship between the input features and the target variable over time, which often leads to deterioration of model performance.
Further Reading on how Concept Drift affects your model performance, why you should care about it and what you can do to address it:
1. A Gentle Introduction to Concept Drift in Machine Learning | @JasonBrownlee
2. How Concept Drift Ruins Your Model Performance | @AlexandraAmidon
III. See | In-demand Data Science Skills
A few years ago when i decided to plunge myself into the world of data science, i didn’t really know where to start. The sheer volume of tools, technologies and programming languages that were at my disposal was overwhelming, and this was 5 years ago, i can only imagine how much more intimidating it can be today.
Wether you are just starting out in data science or itching to learn something new, it makes sense to see what skills are in demand and what the top technology companies are looking for.
@JeffHale from DataAwesome has a very good article you can find here where he scraped through data science job postings in 2019 from various job portals to figure out the most in-demand skills and technologies for data scientists.
@TerenceShin did a similar analysis in 2021 that you can find here.
Not surprisingly, through the years the top 3 most in-demand data science skills have been Python, R and SQL. If you are completely new to Data Science, my advise would be to not be distracted by all the fancy technologies out there, start with SQL - essential for fetching data Python - for data manipulation and Tableau for data visualization.
IV. Do | Pre-ML Data Preparation Checklist From @Google
According to this Forbes article from 2016 that is often cited in the data community - “Data preparation accounts for about 80% of the work of data scientists”. Now that number might not be entirely accurate and might vary depending on the data and the business problem at hand but the computing principle of GIGO (Garbage In, Garbage Out) holds true in Data Science as well and having high quality data is essential to the success of any Machine Learning Project.
The very first step in any Machine Learning Project, is to figure out what data you need, understand the data you have, assess the quality of the data and prepare it before it can be fed into a machine learning model.
To make the process of Data preparation easy, our friends @Google have provided a comprehensive 29 point checklist - Is My Data Any Good? A Pre-ML Checklist that provides guidelines and rules of thumb for assessing and preparing data for a machine learning project.
Although all the checks might not be applicable to your specific problem, it shouldn’t take you a lot of time to skim through them all. You should definitely add this checklist to your data science repertoire.
Thank you for subscribing and reading the newsletter. I appreciate your patience and time!
If you liked this post from Data Science Digest, why not share it? oh and if you haven’t subscribed already, you can subscribe by clicking the button below.
I’m Praneeth Kandula, i fell in love with data and analytics in the year 2017 and thus began my data science journey, there’s still a long long ways to go and through this newsletter i hope to share, inspire, learn, grow and together level up our data skills one newsletter at a time.