Data Science Digest — We Are Back!

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

Hi All,
I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

Currently, I’m working on adding more content types to the digest, coming soon: webinars, interviews, quizzes, surveys, and much more.

If you’re more used to getting updates every day, follow us on social media:

Telegram -
Twitter -
LinkedIn -
Facebook -

We’ve launched the digest page on Patreon. So, if you want to support us and get more awesome stuff in the future (yes, there’ll be some perks), don’t hesitate to chip in.

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Dmitry Spodarets.


Recommendation Algorithms & System Designs of YouTube, Spotify, Airbnb, Netflix, and Uber
Ever wondered how top technology companies can so accurately predict their customers’ next step? Go no further! In this article, you will find a collection of recommendation algorithms and system designs that they may be using. Note that the author collected all the info from open sources and judges from his personal experience; there is no guarantee that the designs are 100% correct.

R vs. Python vs. Julia
Data Scientists are used to writing code with R and Python, but a new programming language for Data Science, Julia, can be their new option of choice. Julia promises C-like performance without compromising the way Data Scientists write code and interact with data, and brings a refreshening programming mindset to the DS community. Eager to learn more about Julia and how it stands against R and Python? Check out this article!

A Machine Learning Model Monitoring Checklist: 7 Things to Track
Once the model is deployed in production, you need to ensure it performs and that you have accounted for data/mode drift and other changes affecting accuracy and precision. Here comes model monitoring! In this article, we will look into the specifics of model monitoring and explore open-source tools that you can start using today. It also features a short, 7-step checklist to help you make machine learning work in the real world.

How to Break a Model in 20 Days. A Tutorial on Production Model Analytics.
When your machine learning model is deployed in production, you are only half way done, because you need to constantly retrain and tune it, to ensure accuracy of predictions. Learn how frequent retraining, as well as model monitoring, can help you address production model maintenance and add one more layer to assure model quality.

Data Science Learning Roadmap for 2021
Ever dreamed about starting a new year by delving into Data Science? This article will help! It features a learning framework and is full of useful resources and innovative project ideas that will help you build a solid portfolio of work showcasing expertise in data science. You will explore: Programming for Data Science, Data Collection & Data Cleaning, Data Analysis & BI, Data Engineering, Applied Statistics & Mathematics, AI & ML.

17 Types of Similarity and Dissimilarity Measures Used in Data Science
Measuring similarity and dissimilarity is quite important in any data science work, simply because it allows to “see” how close or distant data objects are located to each other. In this article, the author took a deep dive into 17 types of similarity and dissimilarity measures, to help you navigate various metrics and their applications in Data Science. The amount of work done by the author is massive, so the article is definitely worth checking out.

Mixing Normal Images and Adversarial Images When Training CNNs
This tutorial will guide you on the path to Computer Vision mastery. As the first step, you will learn how to generate image batches of normal images and adversarial images during the training process to improve your model’s ability to generalize and defend against adversarial attacks. The tutorial is extra detailed and features all the details that CNN beginners may need to start to prepare images for their networks.

Awesome AI-ML-DL Repository
If you are interested in all things AI/ML/Dl as we are, check out this awesome repository by @neomatrix369 that features study notes and a curated list of helpful resources on the topic. The repo is created by and for engineers, developers, data scientists, and all other professionals looking to sharpen their mastery of AI, ML, and DL. As with other repos on GitHub, you can easily contribute to, watch, star, fork, and share the repo with others in the tech community.

Unit Testing Python Code in Jupyter Notebooks
Writing unit tests appears to be the natural way of doing things when you work on production code, library code, or have to engage with test-driven environments. But what about writing unit tests for your Jupyter notebooks? In some cases, you don’t need it, but when you go beyond data exploration, you definitely do — you just have to maintain and monitor your code in the long run. Learn how unit tests can help you make your Data Science work more efficient and then try it in practice. You won’t be disappointed!

How to Deploy Machine Learning / Deep Learning Models to the Web
Machine and deep learning models should not exist in vacuum (theoretical environments). They need to be deployed in production and used by businesses/customers. In this article, the author provides a step-by-step guide on deploying models to the web and accessing them as a REST API using Heroku and GitHub. You will also learn how to access that API using Python requests module and CURL.

Transfer Learning and Data Augmentation Applied to the Simpsons Image Dataset
In this article, the author uses the Simpsons characters dataset to experiment with data augmentation for transfer learning. Through image filtering to splitting, testing, and validating datasets, a series of experiments is conducted to address the problem of small datasets and overfitting. Check out this step-by-step guide to learn about the results and the final metrics that the experiments yielded.


You Only Look One-level Feature
Explore feature pyramids networks (FPN) for one-stage detectors and learn why they are so successful (hint: divide-and-conquer optimization in object detection vs. multi-scale feature fusion). Dig into code to examine the optimization problem from an entirely new perspective — You Only Look One-level Feature (YOLOF). The new method builds upon Dilated Encoder and Uniform Matching, key components that bring considerable improvements on the COCO benchmark.

Monocular Quasi-Dense 3D Object Tracking
Predicting future locations of surrounding objects is a complex task that requires reliable and accurate 3D tracking frameworks. Delve into a new framework that associates moving objects and estimates their full 3D bounding box information from a sequence of 2D images captured on a moving platform. Experiments performed on simulation data and such real-world benchmarks as KITTI, nuScenes, and Waymo datasets, show that the framework offers robust object association and tracking in various urban-driving scenarios.

Zero-Shot Text-to-Image Generation
One of the major tasks of text-to-image generation is finding better modeling assumptions for training on a fixed dataset. Such assumptions involve complex architectures, auxiliary losses, or side information that make it more difficult to handle data. Discover a new method of text-to-image generation based on a transformer that autoregressively models the text and image tokens as a single stream of data. Learn why it is competitive for at-scale data compared to previous domain-specific models when evaluated in a zero-shot fashion.


The Applied Machine Learning Course at Cornell Tech
Starting from the very basics, the course covers the most important ML algorithms and how to apply them in practice. The slides are Jupyter notebooks with programmatically generated figures so that readers can tweak parameters and regenerate the figures themselves. The course explores topics such as how to prioritize model improvements, diagnose overfitting, perform error analysis, visualize loss curves, etc.