How to Become a Machine Learning Engineer in 2024
Introduction
I turned my attention to machine learning ten years ago when the glory days of deep learning began. At that point, I knew next to nothing about the subject, but two years later, I started my first machine-learning company.
What I didn't have, however, was a University degree. Instead, I learned everything I know from online resources and my own passion projects.
This post is an attempt to lay out an effective strategy for anyone who wants to become a machine learning engineer in 2024. Since I want this guide accessible to everybody, every resource I point to is free.
I focus on "traditional" machine learning rather than generative AI, but you want a solid understanding of the underlying technology anyway.
Also, subscribe to my newsletter! 🥳
About the structure
I decided to write a single long post from zero to hero instead of writing one for each step. Consequently, this blog post is over 7,000 words and probably not something you read in one sitting.
My how to become a machine learning engineer guide has the following four sections:
- Part 1: Zero to Beginner
- Part 2: Beginner to Intermediate
- Part 3: Intermediate to Advanced
- Part 4: Advanced to Hero
Each part contains 3-4 main topics. Some of them tell you what you need to know and point you toward great free resources. Others are more about your mindset and other tips to keep in mind.
If you find this information helpful, make sure to share it! 🙂
Part 1: Zero to Beginner
Some of us start with no prior knowledge or experience, and that’s ok. It’s never too late to learn something new. When you want to become a machine learning engineer but you’re starting from zero, it’s key to build a solid foundation. Here’s what you need to do:
If this feels too basic for you, feel free to check out part 2 instead.
Start by learning Python
- Why Python
- Use virtual environments
- Begin with Jupyter notebooks
- Write code the right away
- Never copy code
- Use GitHub
- Study guide & Resources
Why Python?
It’s common for aspiring software engineers to get stuck selecting a programming language. Luckily, that’s not an issue for machine learning. Python is by far the most popular option and has plenty of fantastic libraries that make our job easy.
Apart from machine learning, Python is an excellent choice for new programmers. It’s versatile and easy to learn. You don’t need to become an expert before you start developing algorithms, but there are some best practices to keep in mind from day one.
But first, I want to mention that it’s somewhat common to implement machine learning algorithms into software that’s written in another language, such as C#. When that’s the case, you’ll still create the algorithm in Python, but save it in some general format that you can load with corresponding libraries.
As you become comfortable writing code, it’s usually straightforward to pick up other languages when you need to.
Use virtual environments
Experienced machine learning engineers and programmers always use virtual environments for their projects. The purpose is to separate the program you’re working on from your local machine. Some of the many advantages are:
- You don’t install libraries directly on the computer
- Your program won’t crash if you update a dependency in another project
- It’s easy to transfer the environment to a new computer
Beginners often skip this because they want to start working on the fun stuff immediately. However, if you start using virtual environments right away, you’re going to save hundreds of hours in the long run.
The bugs you encounter, if you have incompatible dependencies, take a lot of energy and provide no value to your learning.
If something goes wrong with a virtual environment, you can often delete it and create a new one. Try doing that if everything is installed directly on your computer.
There might be a tiny hurdle initially, but it doesn’t take long to learn. Soon, you’ll thank me. A popular option for Python is miniconda.
Begin with Jupyter Notebooks
Many data scientists work with Jupyter Notebooks. It’s excellent for vizualizing data and has a friendly interface. You can write markdown, which is perfect for creating detailed documentation and adds to your learning.
Once you get more comfortable with programming best practices, you’ll stop using Jupyter notebooks for programming. They are not intended or suitable for software development.
However, it’s always a useful tool when you present your work to customers or other stakeholders. It’s easy to create beautiful visualizations and explain core concepts with markdown. You can test it here. I recommend the classic notebook, but JupyterLab works as well.
Write code the right away
There’s more to programming than making your code run. It should be beautiful, easy to understand, and reproducible. The two most essential elements of good code are functions and naming. Put everything in methods and make sure that each one does one thing only.
In many universities and online courses, you’ll learn to put comments into your code. As a result, many people think that comments are essential for clarity, but that’s not true. Great code doesn’t need comments because it’s clear anyway.
Here is a perfect GitHub repository (clean-code-python) that explains the essence of writing code the right way.
Never copy code
When you’re learning, you should write all the code yourself. It helps you to understand what’s going on, and you’ll remember it for longer. Copy and paste is bad practice because you might introduce a bug or sub-optimized code into your programs. This is even more true today when you can generate most of your code with AI.
Once you get better at programming, you can start copying code if you understand what it does. Just don’t fool yourself. Here’s a blog post with a more detailed discussion.
Use GitHub
Every single programmer out there knows about git. It’s a tool for collaboration, and almost all software teams use it. You should learn it for several reasons, including:
- You can see all the changes you made and revert them if necessary
- Potential employers can see your work
- It’s best practice and simple to learn
GitHub is the most popular platform. If you don’t have an account, start one right away. It’s free and takes no time. When someone applies to my company, I ask for their GitHub. If they don’t have one, it’s over unless there are some exceptional circumstances.
Study guide & Resources
You can read books and participate in online courses, but the key to learning Python is to write code every day. Nothing substitutes hands-on practice.
Here’s a list of five free resources to help you on your quest. I ordered the list, but you can skip ahead if a concept is familiar. Use each one as a complement to hands-on practice.
1. Learn Python – Full Course for Beginners
This is a four-hour-long YouTube video with zero ads. It’s an excellent introduction if you have very little experience with Python or programming in general.
2. Introduction to Python ProgrammingNext, we have a course from Udacity. It’s slightly more involved than the previous one but still rudimentary.
3. Intro to Data Structures and Algorithms
In this course, you’ll dive deeper into common programming patterns. These algorithms have nothing to do with machine learning.
4. Version Control with Git
Here is a free course from Udacity for learning git. It’s a bit more than what you need for now since you’ll mostly work by yourself, but make sure to check it out.
5. Design RESTful APIs
In the last one, you’ll learn how to build web services. You don’t have to learn this right now, but it’s necessary later on when you start deploying your algorithms.
Become familiar with basic concepts
- Introduce yourself to machine learning
- Read about interesting use cases
- Study common algorithms
- Different types of learning
Introduce yourself to machine learning
Before you dive deep into building machine learning algorithms, you need to have a fundamental understanding of why it’s needed. You should be able to spot possible use-cases and notice potential issues.
There are tons of concepts to learn, but for now, you should focus on the cornerstones. Many of the more challenging ideas you’ll encounter later derives from these building blocks. Here are some excellent posts and articles for you to read:
1. A visual introduction to machine learning
Probably the most beautiful visualization of machine learning available. Also, make sure to check out part two about model tuning.
2. No, Machine Learning is not just glorified Statistics
You often hear people say that machine learning is a part of statistics, but that’s not true. Here’s a fantastic blog post discussing this common misconception.
3. Pitfalls to avoid in building a successful machine learning program
Another good way to gain understanding is to read about common mistakes. This blog post describes some typical pitfalls.
Read about interesting use cases
Next, you should start reading about use cases that you find interesting. You can apply machine learning to so many problems. Imagination is the limiting factor. Also, solutions are surprisingly similar across different industries, so always ask yourself if you can apply the same techniques to other areas.
Search for opportunities in your life or the industries that you understand. Make sure to question old assumptions and revisit challenges that the company previously deemed too difficult. It’s common to replace old solutions that took years to develop with a machine-learning algorithm. The best example is computer vision.
1. 100+ AI Use Cases & Applications in 2020
This article lists hundreds of common use cases and is a perfect place for you to start. Once you finish, check out other resources on aimultiple.com. They have several posts describing applications in different industries.
Study common algorithms
You don’t need to be a mathematician to become a machine learning engineer, but it’s good to have some general knowledge of how the algorithms work. Understanding the inner workings gives you some confidence going forward. The same concepts occur across algorithms, and the math is relatively straightforward.
Knowing which algorithm to use is difficult for beginners. There are many alternatives, and it’s not always clear when one is better than the others. Luckily, machine learning engineers use only a fraction of the algorithms frequently. You’ll see this when you enter Kaggle competitions.
1. 9 Key Machine Learning Algorithms Explained in Plain English
Here’s an article from freecodecamp.org explaining the most basic algorithms in an intuitive way.
2. Machine Learning Algorithms For Beginners with Code Examples in Python
Next, we have another post looking at simple algorithms. Here, the authors add tiny code snippets to show you what it looks like in practice.
Different types of learning
There are several types of machine learning. Supervised learning, unsupervised learning, and reinforcement learning are the most well-known.
Knowing what they are and how they work is useful. It helps you to understand the requirements to start a machine learning project.
1. 14 Different Types of Learning in Machine Learning
For now, you’ll work mostly with supervised learning. It’s also the most applicable type in companies.
Experiment with tabular data
- Why tabular data?
- Where to find datasets?
- Visualizing data
- Libraries to master
- Study guide & Resources
Why tabular data?
Tabular data is a great place to start your machine-learning experimentation. Usually, it comes as a CSV file where each row represents a data point and each column a feature. Tabular datasets are often smaller, and that allows for faster iterations.
For some industries, tabular data is the most common format. A couple of examples are financial data, employee surveys, and game statistics.
Even if you are more interested in deep learning and data types like images, you spend some time here. It’s a more effective way to learn the key concepts you read about previously.
The algorithms take less time to train compared to deep learning models. As a result, you can try many more variations and experiments. You can even play around with different types of ensembles.
Where to find datasets
The best place to find datasets is on Kaggle. There are over 40,000 datasets. Each has a rating, so it’s easy to avoid the bad ones. If you don’t find what you’re looking for, go to Google.
Start with a highly-rated dataset that interests you. Look for the ones marked as CSV, and don’t take the largest you can find. It’s often tempting to go for big data, but that’s not necessary for learning. If you choose one of the more voted alternatives, you’ll find many user-created kernels to inspire you. Here are some perfect options:
- World Development Indicators
- European Soccer Database
- Credit Card Fraud Detection
- Kiva Crowdfunding
- Global Terrorism Database
You don’t need to pick one from the list. There are so many great options that I know you’ll find something that’s perfect for you. Just don’t spend to much time on selecting a dataset, and when you pick one, stick to it before taking on a new one.
How to visualize data
Becoming a machine learning engineer involves more steps than writing code, although that’s the most critical part. You must learn to explain your work to people of any knowledge level. To do that, you need a deep understanding of machine learning and utilize tools for visualization.
In general, machine learning engineers tend to do less analysis than data scientists. Instead, we trust the algorithm and test its performance thoroughly.
It’s still necessary to learn data analysis and visualization. Here are some beautiful notebooks from Kaggle that show you how to do it:
Udacity offers a free course called Intro to Data Science that’s great for learning data analysis.
Libraries to master
When becoming a machine learning engineer, there are some Python packages to master. The ones you’ll see most frequently are:
- Pandas – Handling tabular data
- Matplotlib and Seaborn – Visualizing data
- ScikitLearn – Machine learning algorithms and helper functions
- XGBoost and LightGBM – Gradient Boosted Decision Trees
You are free to try out other libraries as well. There are plenty of great alternatives. Don’t purchase a course on any of the above. The best way to learn is to experiment. You can always look at the documentation if you get stuck.
Study guide & Resources
Here is where your journey towards becoming a machine learning engineer starts. Now it’s time to dig in and start building algorithms. As a first step, I encourage you to play around with the simple models you read about previously and compare performances on the dataset you picked.
Just like before, the only way to truly learn is through hands-on practice. You can’t expect to master the craft by just watching videos and following tutorials. As soon as something feels difficult or challenging, that’s when you learn, so dig in!
There is nothing wrong with using online tutorials if you keep the previous points in mind. The internet has tons of alternatives for you. Some courses are expensive, and some are free. In accordance with the rest of this guide, here are a couple of free alternatives.
– Intro to Machine Learning & Intermediate Machine Learning
Here are two free courses on Kaggle, a place where you’ll spend a lot of time in the upcoming sections. Feel free to check out their other courses as well, especially the one on feature engineering.
– Introduction to Machine Learning
Udacity has some of the best courses for learning tech. Their main programs are rather expensive, but they have free courses as well.
Part 2: Beginner to Intermediate
To go from beginner to intermediate, it’s time to start challenging yourself and make sure that your solutions measure up. You must compare your work to others and learn from people who are better than you.
There are also a ton of machine learning topics to study. Here are my suggestions for beginners:
- Create a profile on Kaggle
- Starting with deep learning
- Changing setup
- Adding more data typesThere are many use cases for deep learning, but I think computer vision is the best place to start to get the hang of deep learning quickly.
If you know all of this, go to part 3.
Create a profile on Kaggle
- Learning by competing
- Cross-validation
- Public kernels and discussions
- Inspiring competitors
Learning by competing
You’ve already come across Kaggle in this guide. It’s a website that hosts machine learning competitions. Competing is a great way to learn because you get direct feedback when you submit solutions.
Also, other members share code and ideas. There’s no better place to stay updated on the latest best practices.
Many people don’t participate in competitions because they don’t think they’re good enough. That’s the worst excuse in the world because no one cares except you. It doesn’t matter if you finish last or first (which you won’t); what matters is learning effectively. Once you register, head over to competitions.
As you can see, there are prices for finishing among the top three. Tempting as it is, don’t think about price money when entering a competition. It’s unlikely that you’ll finish at the top, at least right now. Pick one that you find interesting and start hacking.
During competitions, people post their complete solutions for everybody to see. You’ll see that tons of people copy code and submit, don’t be like them. The frustrating part is that when you start, these public solutions might score higher than your submissions. As a result, you’ll be far down the leaderboard. It doesn’t matter, don’t copy and submit, end of the story.
Cross-validation
Cross-validation is a central concept to succeed as a machine learning engineer, and that shows on Kaggle. You need to know how good your algorithm is before you submit it (or push it to production).
Every master competitor on Kaggle says that getting your cross-validation to reflect the public score is half the battle. A new version with a higher CV-score should improve your position on the leaderboard. If it doesn’t, you’re probably not solving the correct problem.
I bring this up because it’s equally critical in the real world. You must know what performance to expect from your algorithm in production. There are hundreds of reasons why your scores can be overly optimistic, and that’s a dangerous situation.
Obsession with cross-validation is necessary when becoming a machine learning engineer. Check out this article for more details.
Public kernels and discussions
For every competition and dataset on Kaggle, you’ll find kernels created by users, usually in the form of Jupyter notebooks. Sometimes, there are full solutions that you can submit directly to the competition yourself and get the same result as the uploader. That’s a great example of how not to become a machine learning engineer.
Obviously, copying other people’s solutions and submitting them is bad. The only person you’re fooling is yourself. Also, the best competitors don’t submit their code this way. Instead, they are active in the discussion section, and you should be too.
Inspiring competitors
You’ll see the same people frequently appear at the top of every competition. They have a lot to teach. Make sure to follow their profiles and read what they post. Look at their notebooks and discussion threads. You can find them all on the leaderboard.
Keep in mind that most of them are researchers and not software engineers. You’ll find multiple solutions that don’t follow the programming best practices that you learned about previously.
Starting with Deep Learning
- Understanding neural networks
- Computer vision
- Data augmentation
- Adding more data-types
Understanding neural networks
Neural networks are almost synonymous with deep learning. It’s a type of algorithm that you can apply to nearly every use case. Compared to other algorithms, neural networks perform better on more complex data types. Typical examples are images, text, and audio. On tabular data, it’s usually better to go with some decision tree.
Neural networks come in many forms and sizes. For example, convolutional neural networks are one type of architecture that works well on images. When it comes to size, the algorithms can have many millions of parameters that you tune during training. That’s the reason why it’s so hard to understand why it behaves a certain way.
There’s also an infinite amount of ways to modify your neural networks. You can tune hyper-parameters, change architecture, add regularisation techniques, and much more. To avoid over-engineering, you need to know current best practices so that you can create a good enough solution quickly.
Computer vision
There are many use cases for deep learning, but I think computer vision is the best place to start to get the hang of deep learning quickly. It’s relatively straightforward to create reasonable convolutional neural networks, and most use cases are similar. As a result, you can begin to develop reusable packages that allow you to complete your next project more effectively.
Another thing that makes computer vision suitable for people new to deep learning is that many essential machine learning concepts are easy to integrate. Two examples are data augmentation (I’ll talk more about that later) and transfer learning.
Transfer learning is a fantastic concept that allows you to take a pre-trained network and utilize it on your problem. Subsequently, you can create awesome algorithms with fewer data points and save days of training. You’ll never find a winning solution in a computer vision competition that doesn’t involve transfer learning. Check out this blog post for a detailed explanation.
Compared to tabular data, datasets with images tend to be larger and won’t fit into memory. That might be a little bit uncomfortable initially, but it isn’t a problem. There are tons of datasets online. I suggest you head over to Kaggle and look for something that makes you excited. Here are a couple of beginner-friendly datasets:
Before you start hacking, check out 5 Computer Vision Techniques for a great overview of the field.
Data augmentation
Another concept that’s extra important when working with deep learning is data augmentation. The goal is to create more training data by modifying what we already have. As you know by now, quantity is critical when training complex algorithms (otherwise, you risk overfitting), so this should make sense to you.
There are plenty of open-source libraries that you can use for augmentation, and frameworks like PyTorch have built-in tools as well. One popular alternative is albumentation. As training, I also recommend you implement augmentations yourself.
It’s common for beginners to use augmentations carelessly. You don’t want to change your data so that it doesn’t represent reality anymore. It’s good practice to start with a little augmentation and add more as you go.
Study guide & Resources
1. A friendly introduction to Deep Learning and Neural Networks & A friendly introduction to Convolutional Neural Networks and Image Recognition
Two beginner-friendly explanations of the inner workings of neural networks from Luis Serrano.
2. Intro to deep learning with PyTorch
A course on Udacity for learning PyTorch
3. Intro to Tensorflow for deep learning
A course on Udacity for learning Tensorflow
4. Fast.ai
Probably one of the most appreciated deep learning courses on the internet. It’s free and has zero ads.
Changing setup
- Limitations of Jupyter Notebooks
- Increasing need for computation
- Cloud instances & Local development
Limitations of Jupyter Notebooks
Now it’s time to stop using notebooks and start programming like a software developer. Notebooks don’t scale well when you have thousands of lines of code. It’s also hard to divide your program into multiple files and functions (something you should do). It’s still useful for data analysis and visualization. I frequently use notebooks when presenting results or examples to customers.
Many other setups work. You can use an IDE like Atom or VS Code. Most of them have support for iPython and Jupyter in case you need it. I often use Atom together with an addon called Hydrogen. That way, you get everything you like from Notebooks, but it’s easier to apply programming best practices.
Another great setup is VS Code for remote development. I always program on my laptop, so when I need additional computation, I spin up a server on one of the many cloud platforms. With this setup, I can write code directly on a remote machine with very little latency.
Increasing need for computation
By now, you might notice that your computer isn’t strong enough for machine learning. It’s especially apparent with deep learning. Training complex neural networks on large datasets takes time. For some use cases, it might even take days.
If your computer doesn’t have a GPU, you’ll hate your life as soon as you start running experiments. Waiting days for your training to finish is not an option, especially when you’re testing multiple algorithms.
Luckily for all machine learning engineers, the gaming industry has spent billions on creating processing units that can handle a lot of computations in parallel. It turns out that they are suitable for training deep learning algorithms as well.
You don’t need a GPU on your computer to create and train algorithms. I am happy doing the work on my Macbook. When I want to train an algorithm, I simply use a virtual machine using one of the many cloud providers.
The most cost-effective alternative is vast.ai. It’s a website where people can list machines that you can access. The price of the cheapest options is around $0.3 an hour.
Cloud instances & Local development
For me, the perfect setup is to have a lightweight and beautiful laptop with access to some cloud platform. In some cases (including mine), it’s a better alternative than having an expensive workstation.
The reason is that you don’t need a particularly strong computer when you write code. It’s only required when you train your algorithms. Just make sure that your laptop is strong enough to test that your code works, and have enough memory to debug your algorithms.
Many cloud providers allow you to turn off your GPU when you don’t need it and only pay for storage. Another great thing about this setup is that it reduces the temptation to run unnecessary experiments. It also allows you to continue developing during training.
Adding more data types
- Natural language processing
- Sequential data
- Video
- Handling large datasets
Natural language processing
Once you feel comfortable with computer vision, you should start learning natural language processing. Text is probably the type of data you’ll come across most often in the real world. Most companies have hundreds of processes revolving around letters, such as writing contracts, reports, and advertisements.
It’s also one of the machine learning subfields that have experienced the most progress in recent times. New best practices completely replace state-of-the-art from a couple of years ago. That makes NLP exciting and there are many opportunities. The transition started with this paper about BERT.
Text is more challenging to work with than images because it’s not numeric. To make it useable, you’ll first need to translate it to something that computers understand. In that process, you always loose some information. That’s not an issue with other typical data types like images, video, or audio.
I recommend that you with a beginner-friendly dataset like Amazon Fine Food Reviews or Sentiment140 and start hacking.
If you’re interested in natural language processing, you have to check ruder.io. Sebastian is an NLP researcher at Google DeepMind. He writes about research and other machine learning related topics.
Sequential data
Sequential data is another common data-type. The most obvious example is historical prices for different financial instruments, such as stocks and currencies. Other examples include audio and vibrations. Text is also sequential, but that’s where the similarities end.
If you’re like most people, you immediately start thinking about becoming a billionaire by predicting stock prices. That brings me to an important point. It’s hard to make predictions, especially about the future. But it’s a perfect learning opportunity because you need to work obsessively with validating your models, and that’s critical for many use-cases.
When it comes to financial data, you can take it one step further by including additional information. A typical idea is to combine stock prices with news articles. Here’s a previous competition on Kaggle.
LSTM and GRU are two common types of neural networks for working with sequences. They’ve been around for many years but are still relevant. Here are a couple of great articles to increase your understanding:
Video
A video is images placed in sequence. Therefore, everything you learned previously is relevant here. For many use cases, such as detecting objects, you handle video just like you handle regular images. It becomes more challenging when you want to follow that object across frames. An excellent dataset for testing object-tracking is the Stanford Drone Dataset.
There are several other exciting use cases involving video as well. You can experiment with things like action detection, compression, and even using image sequences for some reinforcement learning purposes.
Handling large datasets
Large datasets don’t fit into memory. Instead, you need to read from the disc during training. When you work with images or videos, I recommend that you always read from storage in your data pipeline. It’s better to save memory for the algorithm.
For consistency, it’s good practice to do this even when you have smaller datasets. Doing the same thing the same way makes it easier to reuse work from previous projects.
If the dataset is too large for your computer, it’s a more challenging problem. However, you don’t need the entire dataset when you develop. It’s often better to pick a small sample since everything you do takes less time and space. You only need the full dataset when you’re finished with experimentation and want to train the final algorithm.
Part 3: Intermediate to Advanced
So far, most focus has been on hands-on practice, which is the most critical aspect of becoming a machine learning engineer. However, when you want to go from intermediate to advanced, theoretical knowledge plays an important role.
If this feels too basic for you, feel free to check out part 4 instead.
Understanding the math
- Do you need to master the math?
- Essential concepts
- Read some research papers
- Study guide & Resources
Do you need to master the math?
Math is less critical for machine learning engineers than most people think. You can build useful algorithms as long as you have a decent understanding of the algorithms and master the art of validation.
Remember, this is a guide on how to become a machine learning engineer, not a scientist. Your goal is not to invent the next breakthrough algorithm. Instead, your speciality is to adapt and implement state-of-the-art on the problem in front of you.
However, at this stage, you want to go from intermediate to advanced. And to do that transition, it helps to understand the math. We’re still not talking about getting a Ph.D. in mathematics. Luckily, that’s not needed because everything you need to understand is relatively straightforward.
Essential concepts
In machine learning, the mathematics is surprisingly simple. At least to the level required for engineers. Almost every concept you need to master revolves around first-year university courses.
A perfect example of an essential concept is back-propagation in deep learning. When we train neural networks, we tune parameters by calculating derivates for each one with respect to the loss. Sure, we have different optimizers that add a layer of complexity, and we exploit the chain rule to an unparalleled degree. Still, it’s derivatives, and we don’t need to do them by hand.
You’ve already come across the most relevant math in previous courses. If you didn’t fully understand, I recommend that you revisit some of the past information.
In summary, what you need to study is linear algebra, multivariate calculus, and some statistics. You’ll get everything you need (and more) from undergraduate material.
Remember, mathematics is easy, but you might be unaccustomed. Have patience, and things will start to make sense.
Read some research papers
In addition to studying math, I recommend that you start reading some research papers. Even if you have a firm grasp of mathematics, understanding research requires practice. Not necessarily because it’s complex but because it’s different from other materials.
The ability to understand research papers is the main reason why we want to improve our skills in mathematics. When you’re faced with a new problem, the first thing you do is study the latest research. That’s how you decide what to develop.
At this point, don’t worry about implementing what you read; just get used to the language. Later, you’ll start implementing research as well, and you can probably do that better than the scientists.
If you don’t read research, what you know is soon out of date. At some point, you start delivering outdated solutions to customers.
Study guide & Resources
1. 3Blue1Brown – Neural networks
An excellent and effective walkthrough of neural networks and back-propagation.
2. An overview of gradient descent optimization algorithms
This is a fantastic article from ruder.io explaining deep learning optimizers in detail.
3. Mathematics for Machine Learning
If you want to dive deeper, here’s a free ebook covering everything from basic linear algebra to neural networks.
4. Stanford University Youtube channel
You can find full math and machine learning courses on Stanford's YouTube channel. It’s the only place you need for university content.
Compete as much as possible
- Are you any good?
- Go your own way
- Focus on one competition
- What did the winners do?
Are you any good?
I’m surprised that so few machine learning engineers compete on Kaggle. If you don’t compare your results with others, how do you know if you’re any good? The answer is that you don’t, so make sure to use the opportunity.
One thing to add is that what you score on specific metrics is not everything that matters. On Kaggle, the difference between the top-scoring percent is often negligible. To spend all that time on improving the algorithm in a real project is usually a waste of money and time.
Still, if you’re far down the leaderboard, there might be something missing from your arsenal. A great goal is to place somewhere in the top 5% or maybe even higher.
Go your own way
I’ve already told you that you, under no circumstances, should copy other people’s solutions. Now, I want to tell you about another way to learn effectively through competition.
For as long as possible, try to be competitive without looking at discussions or notebooks. If you can place high on the leaderboard by your self, you’ve come a long way towards becoming a kick-ass engineer.
Once you’re out of ideas and have done everything in your power, it’s ok to look at other competitors. Sometimes you’ll discover bright ideas that have a significant impact on your score. With all the work you’ve done already, you can quickly implement that idea, and it will stick with you forever.
Focus on one competition
There are usually two or three competitions running at the same time. It’s tempting to do a little bit on each, but that’s ineffective. To maximize your learning, pick one and dive deep. Study previous competitions, read research papers, and learn about the domain.
What did the winners do?
Many top performers share their solutions after the competition. Usually, they post a description and point you to a GitHub repository. It’s a goldmine for your future career. Here are four examples:
- Web Traffic Time Series Forecasting by Arthur Suilin
- Quora Insincere Questions Classification by Psi
- Handwritten Grapheme Classification by deoxy
- Deepfake Detection Challenge by Selim Seferbekov
Remember that many of the best competitors are more scientists than developers. As a result, you might need to rewrite their code based on the best practices you know so well by now.
Deploying your algorithms
- Data scientist & Data engineer
- Micro-services & Endpoints
- AWS Lambda
- Study guide & Resources
Data scientist & Data engineer
A typical setup within companies is to have a data science team and a data engineering team. The data scientists are responsible for developing the models (usually in Notebooks). When they are satisfied with the algorithm, they hand it over to the data engineering team, who put it into production.
The problem in multiple companies that I’ve talked to is that the data engineering team has little knowledge of machine learning. Also, the data scientists have nothing to do with the production environment.
As a result, the process of updating the algorithm is ineffective. Also, it’s not clear who keeps track of changes in performance, input distributions, and other potential problems. We are currently working with a couple of R&D departments where we help to make this process better by providing endpoints to the algorithms that they control.
Micro-services & Endpoints
Unless you’re deploying your algorithm directly to a device, always try to create standalone microservices. The point is to separate your algorithm from other parts of the program and make it accessible through an endpoint.
There are several reasons why this approach is appropriate. One is that the algorithm shouldn’t take up memory and computation from other backend functions. Another reason is that microservices are easy to monitor, and if something breaks, only that service goes down.
Kubernetes is a fantastic open-source tool that makes microservices straightforward. It monitors every service and makes sure that they’re always up and running. You can even have multiple instances of the same service, which is great for machine learning. It’s possible to scale up individual services depending on their requirements.
We deploy all algorithms into our Kubernetes cluster and make them accessible to our customers. We measure performance, downtime, behavior, and other metrics automatically for each endpoint. Sometimes, we add additional functionality, such as saving challenging data points.
AWS Lambda
For smaller algorithms with a lot of downtime, AWS Lambda is a perfect deployment option. You only pay when it’s in use which often makes it a cost-effective alternative.
If no one tries to access your service in a while, it shuts down. Next time someone wants to use it, it performs a cold start and rebuilds the entire thing from scratch. For small services, that approach works great.
However, If you want to deploy a deep learning model, your solution probably doesn’t fit. If that’s the case, and you don’t want to set up a Kubernetes cluster, you can use an AWS EC2 instance. There are many other similar alternatives on the various cloud platforms.
Study guide & Resources
1. Getting started with AWS Lambda
Here’s a series on YouTube to help you get going with AWS Lambda. It’s a deployment option that you need in your toolbox.
2. Scalable Microservices with Kubernetes
This is a free course from Udacity, where you learn to build and deploy microservices using Kubernetes. They also cover Docker.
3. Kubernetes & Docker & AWS Lambda
Also, make sure to check out the official documentation and tutorials posted by the creators themselves.
Part 4: Advanced to Hero
Going from advanced to hero is all about making a difference with your work. Gone are the days when you could implement a state-of-the-art network to the thunderous sounds of applause.
It's time to do work that benefits everyone.
Effective experimentation
- Test stuff that matters
- Reuse your code
- Automate repetitive tasks
- Never back down from a challenge
Test stuff that matters
I’ve heard experienced data scientists recommending platforms where you can run thousands of experiments at the same time. It sounds fantastic, but you don’t need to, and it’s a waste of energy. There are hundreds of hyperparameters to tune and many architectures to try. Even so, most adjustments don’t make a difference, so don’t waste your time.
I want to make a connection with Chess. The difference between grandmasters and ordinary players isn’t calculation. Instead, they investigate fewer options because they understand what matters.
When you have a new project, begin by planning your experiments. You’ll quickly notice that there are thousands of things to test. Once you start training something, it’s also tempting to change small things to get a tiny improvement in performance. Please don’t fall for that temptation because it’s the most significant time sink in machine learning.
A better approach is to prioritize experiments that can yield drastically different results. One such example is to test different types of algorithms and architectures. It’s also a more effective way to learn because you’ll write more code.
Another simple but powerful tip is to split your data before experimentation. Doing so allows you to retrain from a previous checkpoint instead of starting from scratch. It’s also necessary if you want to compare different experiments to each other.
Reuse your code
Reusing your code sounds obvious, but few people do it correctly. We don’t want to copy and paste code between projects. Instead, create Python packages and import them when necessary. You have already mastered git, so you know about versioning as well.
Whenever you notice something that you do frequently, remove it from your project and create a separate library. It doesn’t matter if it’s just one function or thousands of lines of code.
Publish your repositories on GitHub and share your code with the world. Maybe someone likes what you do and wants to contribute. Never feel ashamed for your work, we are all learning.
Automate repetitive tasks
It’s not only code that you can reuse. Whenever you do something, again and again, automate it. It can be anything from setting up a new project to changing your local version of Kubernetes.
Start with something simple, like setting up new projects and creating GitHub repositories. Next, you can try something more challenging, such as automatically spinning up new cloud instances for running experiments.
Never back down from a challenge
If you only pick up one tip from this guide, this is it. The best way to learn is to challenge yourself and step out of your comfort zone. When you only work on things you fully understand, you stop making progress. It’s the main reason why experience is a poor indicator of performance. What matters is deliberate practice.
There are so many ways to fool yourself when learning something. The most common one is copying something that you think you understand. Remember all the times you peaked at the answer to a math question. When you look at the answer, everything seems obvious, but the next day, you still can’t solve it on your own.
Always make your best effort before searching for the answer. You won’t feel a sensation of flow, but you’ll make progress. Here’s an excellent blog post from Scott Young (the author of Ultralearning) about flow and mastery.
Implementing papers
- Papers with code
- Do it better
Papers with code
The best place to find machine learning research is paperswithcode.com.
Implementing papers is one of the most rewarding exercises you can do at this point. It requires that you read carefully and understand the topic deeply.
The purpose of the website is to link research to code. You can navigate to a topic that interests you and find several relevant repositories.
When you’re starting a new use case, it’s a perfect place to start. You can quickly catch up on best practices and see how others have solved similar tasks.
Do it better
Most of the code won’t live up to your high standards. Try to implement it yourself using programming best practices.
Test your solution on real problems and tweak if necessary. Make sure to upload your work on GitHub so others can benefit.
Work on bleeding edge concepts
- Generative adversarial networks
- Reinforcement learning
Generative adversarial networks
GANs are an exciting machine learning technique where multiple algorithms compete with each other. In the simplest case, you have two algorithms.
The first one, called the Discriminator, separates fake and real data. The second one (Generator) creates fake data and tries to fool the Discriminator.
It’s a perfect example of how open the field of machine learning is to innovation.
- Stanford University: Generative Models
- Face editing with Generative Adversarial Networks
- Deep Generative Modeling | MIT 6.S191
Reinforcement learning
A hot topic is reinforcement learning. Here you have an agent that learns a policy through actions and rewards.
There are sub-fields here as well. The most exciting one is deep reinforcement learning, where a neural network represents the policy. A famous example is Alpha Zero.
I encourage you to experiment with reinforcement learning. It’s different from what you’ve done previously.
Currently, it’s hard to find use cases for reinforcement learning in companies. One exception is game developers.
- An introduction to Reinforcement Learning
- Introduction to Deep Reinforcement Learning
- MIT 6.S091: Introduction to Deep Reinforcement Learning
- MIT 6.S191 (2019): Deep Reinforcement Learning
Covering new ground
- Inventing use-cases
- Multiple inputs and outputs
- Challenging datasets
Inventing use-cases
What I love the most about machine learning is the room for creativity. You can apply the techniques you’ve learned to almost anything. The limiting factor is imagination.
One of the best things you can do to go from advanced to hero is to come up with a use case yourself. Find a problem that you care about and investigate if you can develop an algorithm to solve that problem. Who knows, you might be on to something.
Remember to plan your work accordingly. Read the latest research and write down the ideas that are suitable for your use case. Put down the critical steps to create the first version and implement them.
Multiple inputs and outputs
Another opportunity for innovation is to combine multiple types of input and output. There’s nothing that stops your algorithm from performing more than one task. You can try to use both video and audio and teach an algorithm to describe the scene. And why not add transcribing at the same time?
If you have multiple outputs, they can have separate objective functions, and you backpropagate them differently. Maybe you turn off the gradient for one output, given some conditions.
You can always add more complexity. Just remember to work on feasible projects. The last thing you want to do is to take on something so hard that you lose motivation. In the best of worlds, you should always work on something right outside your comfort zone.
Challenging datasets
Now, you’ve worked with several datasets. However, most of them are relatively straightforward. To challenge yourself further, find datasets that don’t look like anything you’ve seen before. Here are some alternatives:
- Human Pose Dataset
- 3D Object Detection for Autonomous Vehicles
- High Energy Physics particle tracking in CERN detectors
For some datasets, none of the best practices you learned previously apply. That’s when you need to get out of your comfort zone and use your brain. You won’t create something useful right away, but if you stick to it, the learning opportunity is massive.
This type of original work is more common among scientists, but engineers are capable as well. You’ll approach the challenge with a different perspective and a different mindset. You are a software developer.
Thank you!
Wow, you made it all the way through. I’m impressed. I hope this was the guide you wanted. If it was, help other aspiring machine learning engineers by sharing.
Let me know if you have any questions. Have a good one!