Should You Study the Data Science Nanodegree on Udacity?

Should You Study the Data Science Nanodegree on Udacity?
Photo by Luke Chesser / Unsplash

Introduction

Do you want to become a Data Scientist but don’t know where to start? Today, I’m reviewing the Data Science Nanodegree from Udacity to help answer that question! 🔥

A few words about data science

Data science is a perfect skill to learn if you want a new career or to become more valuable in your current workplace. Your new tools apply to both small everyday tasks and significant business problems. Understanding how to work with data is certainly something that would benefit most careers.

What makes a good data science course

Data science is a skill that requires both theory and practice. A good course should give you a solid understanding of the fundamentals while challenging you with hands-on exercises. Let’s see if the Udacity Data Science Nanodegree lives up to that.

Table of content


Summary

Here’s a quick summary in case you run a tight schedule.

Curriculum (3.5/5)

The Data Science Nanodegree curriculum is the one spanning over most areas, given my Udacity experience. It’s great overall, but the lessons on object-oriented programming and web development felt out of place.

Projects & Exercises (4/5)

The projects are good, but there’s too much boilerplate for my taste. I learn more by building something from scratch instead of filling in what’s missing. Luckily, the capstone project ticks all the boxes.

Affordable (3.5/5)

You can learn anything for free, but if you enjoy online courses, you get a lot of value from your investment. If it’s affordable to you depends on your finances. It’s not worth the money if it restricts your life in other ways.

Overall (4/5)

The Udacity Data Science Nanodegree is excellent for anyone who wants to build a solid understanding of foundational techniques. It’s broad rather than deep, and you’ll learn essential concepts of several related fields, such as data engineering, software engineering, machine learning, and data analysis.

Many believe that data science and machine learning are the same, but that’s incorrect. If you want to build deep learning algorithms, this is not the best course for you. The Deep Learning Nanodegree is a better alternative. Here’s my review of that program.


Full Review of the Data Science Nanodegree

Let’s dive into the detailed walkthrough of the Data Science Nanodegree curriculum.

Course 1: Introduction to data science

The program starts with some preparatory videos. You learn about CRISP, which you must remember when doing the capstone project.

The first lesson about the data science process also discusses approaching a problem and some tips regarding what to look for.

Reviewing the data science process lessons

I have experience working with many tools in this section, so I decided to skip some of the content. If you are unfamiliar with things such as Jupyter and Pandas, doing the different exercises is good.

Communication in data science

Communicating your findings to stakeholders is a big part of data science. Communication includes explaining the problem, visualizing data, and proving that your algorithms work.

lesson 2 for students in udacity data science nanodegree review

The first section ends with selecting a dataset and writing a blog post using the CRISP method. I find this type of project, where you’re forced to show your work publicly, a bit annoying.

Of course, most people studying onUdacity want to find work, so having a good portfolio is critical.

data scientist blog of udacity data science nanodegree review

It’s a decent start, but there’s more to come!

Part 2: Software engineering

It’s great that they have a section about software engineering in this Nanodegree.

Bad code is the most common reason for me to reject people applying for a machine learning engineer position. In any role where you write code, it’s critical to write good code. Making stuff work is not enough.

This part doesn’t include a project, so you don’t need to watch it to finish the data science Nanodegree.

Luckily, the project in part 3 contains some software development.

Learning about best practices

First, you’ll learn about best practices such as writing modular code, documentation, and version control. The content is great, and I’m happy that they emphasize the need for readable code.

Banners for lessons on software engineering in data science

The second part has more to do with software engineering in a team. They talk about testing and code reviews. Both are essential topics, but it’s something that you learn through practice, not videos.

Object-oriented programming

I’m not a fan of object-oriented programming. It makes perfect sense in theory but often leads to code that’s difficult to debug when something goes wrong.

Object-oriented vs. functional programming is a massive debate and not something I want to include in my Udacity Data Science Nanodegree review. But to be clear, I prefer functions most of the time.

Displaying object oriented programming

The content is good if you like object-oriented programming, but for the sake of programmers worldwide, don’t make classes and objects of everything.

Web development

There’s also a short section about web development, which felt a bit out of place. You get to launch a live web page and learn to use some handy Python libraries, but that’s about it.

Web applications are the best way to make your work accessible to billions of people, and worthwhile to learn. But you’re better off learning something like React.

Web development banner with Flask, Bootstrap  and Pandas

The most helpful library you learn about in this lesson is Flask, which allows you to create web services in Python. That’s perfect if you want to build an API for one of your models that others can call.

Part 3: Data engineering

Large companies usually have many people working on preparing data for the science teams, but in smaller companies, data engineering is a part of your job description.

ETL pipelines

First, you learn about ETL pipelines, where ETL stands for “Extract”, “Transform”, and “Load”. It’s a typical flow for data-driven organizations and a good starting point for the lessons. Exactly what happens inside these pipelines depends on the use case and data types.

Lesson banner on data science etl pipeline

It’s a good overview of the most common data engineering concepts, such as outlier, normalization, and feature engineering. At this stage, it’s more about understanding these concepts than learning specific methods.

When I work with data, I don’t want anyone to remove outliers ahead of time. It’s a good idea to be restrictive with transforming data in your pipelines.

NLP pipelines

Next, we have my biggest nightmare: preprocessing text data. It doesn’t have to be bad, but I’ve worked so much with PDF documents, and that can often be a real pain. They simplify these lessons because you get to work with more accessible text.

Reviewing the nlp pipeline lessons for data science

You start by learning about traditional ways of preprocessing text, such as tokenization and one-hot encoding. To your help, you have the Swiss army knife of text-preprocessing libraries, NLTK.

We use deep learning for most NLP cases, which introduces additional preprocessing, such as Word2Vec. If you’re interested, Udacity also includes optional lessons about those concepts.

Machine learning pipelines

The last type of pipeline in this sequence of lessons is the machine learning pipeline. You learn how to use another convenient library called Scikit-learn.

This type of pipeline converts the data into a format you can feed your algorithm. It also manages training and prediction.

machine learning udacity data science nanodegree

These lessons help you get up and running with data engineering and machine learning, but there’s much more to learn. The functions you include in your pipelines are often entirely dependent on your dataset.

It’s great with libraries such as NLTK and Scikit-learn that take care of almost everything that doesn’t depend on your specific data.

Project: Disaster response pipelines

In this section’s project, your task is to implement everything you did in the previous lessons in one place. The dataset contains emergency messages, and your task is to create an NLP pipeline to preprocess the messages and a machine learning pipeline to train a model.

project response of the data science nanodegree on udacity

You’re free to implement your own ideas in the pipeline, but you can also do exactly what was covered in the lessons. In addition, you also need to do a little bit of visualization and software development.

Part 4: Experimental design and recommendations

This section is a great example of why I love how Udacity approaches online education. By spending an entire section teaching you how to design valuable data science experiments, you learn the skills that make you eligible for a data science position.

It’s the type of foundational knowledge that’s easy to miss if you only learn through practice.

Designing experiments

This section covers one of the most essential topics in both data science and machine learning. How do you create experiments that you can trust?

Trusting experiments that don’t represent reality is the most common mistake in Data Science, and it’s very easy to do.

Illustration of data experiments

This shorter section is purely theoretical, and there are no exercises. You’ll learn about common pitfalls when designing experiments, and the section ends with a discussion on ethics.

It’s a reasonably shallow section, which is understandable given the complexity of the topic. It should give you some critical insights if you’re new to data science.

Statistical considerations and AB-testing

In this lesson, you’ll learn to know if you can trust the results of your experiments. Understanding when something is statistically significant is essential to avoid costly mistakes where you jump to conclusions too quickly.

Normal distributions to show statistical considerations in testing

These concepts are critical for both data science and machine learning. In machine learning, you might want to know if one algorithm architecture outperforms another.

If you don’t know if the improvement is significant, you can jump back and forth between similar configurations without any clear conclusions.

Even though I’ve studied statistics, it’s one of my weaker areas as a machine learning engineer. The reason is that I work almost exclusively with deep learning and seldom need to use statistics in my work.

A/B testing

Now, we combine statistical consideration and experiment design to learn about A/B testing. This is a standard approach to test changes on different user groups.

AB testing banner for this udacity nanodegree

I have never done any proper A/B testing in my work, but if you work on a product with a large user base, you do it all the time.

Recommendation engines

Creating good recommendations for users is an OG use case for data scientists.

The section begins with an introduction where you learn about the fundamentals of recommendation engines and the core challenges that make this an interesting use case.

Evaluating the lessons on recommendation engines

Next up we learn to create recommendations using matrix factorization. Studying these more traditional methods is great even if you want to solve the same problem using the latest approaches.

Matrix factorization is one of those mathematical concepts that often appears in data science and machine learning. Understanding why and how it works can help you understand many related concepts, such as PCA.

I prefer to only look at the videos and do all the implementations during the project. However, if you don’t have a lot of experience with programming from before, it’s a good idea to do the exercises. Usually, you’ll implement the same techniques in the project, so you don’t lose time writing code in the lessons.

The last standard project

The section ends with a project where your task is to create a recommendation engine for IBM products. It’s the common format where you get a notebook and implement the requested code.

You also need to answer a couple of questions to show that you understand what you’re building.

putting skills to work to build your own recommendation engine

It’s a good project, even though I’m not a fan of boilerplate code. If they didn’t have a capstone project, I would’ve questioned if this proves that you know anything about data science.

Part 5: Data science capstone

The general

Udacity provides you with some problems and datasets you can use, but you’re free to pick something completely different. You can work on anything from image analysis to recommendation engines as long as you do it following what you’ve learned in the data science Nanodegree.

Illustration for the data scientist capstone project on Udacity

A general recommendation for capstone projects

Whenever you are free to create your own project, it’s easy to take on more than you can chew. Remember that the goal is to show that you understand the content, not to solve a previously unsolved problem.

Therefore, I recommend selecting a fairly simple problem and focusing on everything from data exploration to modeling perfectly.

My capstone project

I decided to use an old Kaggle competition where the goal was to predict future energy consumption for buildings. I reasoned that I’d already worked a lot with unstructured data, so now I wanted something that required more data science and less machine learning.


Verdict

Now that you know everything about the content let me tell you what I liked and what I didn’t like.

Things I loved about this Nanodegree

The positives first

Short and to-the-point lessons

In general, the people at Udacity are experts in creating content that teaches you critical concepts in a way that’s both effective and memorable. Using that strength for in-demand and applicable tech skills is a perfect combination.

It’s also smart that they try to trim down their programs to teach you exactly what you need to know to start your career. This is even more true for the Data Science Nanodegree than most of the other ones I’ve completed.

I really liked that they had an entire section about constructing good experiments because that’s the single most important data science aspect. If they only wanted to create engaging content, they would have replaced that section with one about machine learning.

Good and interesting projects

All the projects were fun and relevant, but maybe a little bit too easy. There’s a lot to learn, so, obviously, four projects can’t cover everything, but I’m happy with the selection.

I like that they included a project with some simple software engineering because in my experience that’s where many data scientists struggle.

I also liked that the Nanodegree ended with a capstone project. You can create a project you like, but it requires much more work than the other projects.

Things I didn’t love

And some negatives

Too many tiny exercises

This is a matter of preference, but for me, the in-lesson exercises are mostly in the way. I would happily trade them for more video content.

As mentioned earlier, I think these exercises are good for people with less experience in programming and data science. Implementing something yourself is a great way to understand how it works.

I want more mathematics

What got me into machine learning from the beginning was that it combines my passion for programming and math. Obviously, data science is a math-heavy subject, but that doesn’t show in the Nanodegree.

Instead, the Nanodegree focuses on teaching you practical skills such as tools, frameworks, and how to avoid common pitfalls. That’s all well and good, but I would appreciate some deeper explanation of algorithms and statistical methods.


Is it Worth the Money?

If the course is worth the price depends on your learning style and financial situation. If 249$ a month is something you can live without, this is a good alternative for cementing your foundational knowledge of data science.

But if you enjoy high-quality video content and believe that a course like this can speed up your learning, it’s worth a shot. The difference between Udacity and other course platforms is that they spend more time and money on creating engaging content.

However, if the price affects your finances too much, looking at other alternatives is worthwhile. You can also start with one of their free courses to decide if Udacity is the best platform for you.

Personal discounts

It's possible to get a discount on your first month of learning. So, if you're price sensitive, clear your calendar, grab the discount and finish the course in a month. It's 100% possible.

Udacity has also changed its pricing model, and you now get access to all their paid course when you enroll in the Data Science Nanodegree.

Is the Udacity Data Science Nanodegree worth the money?

Also, when you finish a Nanodegree, you continue to have access to the content after you cancel your subscription.

Who should study this Nanodegree?

To me, this Nanodegree has a clear student in mind. If you don’t have experience with data science or programming and want to learn exactly what it takes to work as a data scientist in a junior position, this is a perfect program.

With that said, you still need to continue learning once you graduate, and no online course can substitute for hands-on practice.


Final Words

To finish my Udacity data science Nanodegree review, I want to emphasize a couple of previous points. You can purchase the best online course or attend the top university, but there’s no way of learning that’s more effective than doing the work.

The benefit of a program such as this one is that you learn faster when you start hacking yourself because you have a firm grasp of the fundamentals. You’re also closer to knowing what you don’t know, which is essential to mastering a skill.

It takes years to master a skill like data science, but only months before you can start adding value. Don’t be afraid to apply for positions early. Take a pay cut (if you can afford it) and prove your worth through hard work and progress.

Thanks for reading! ☺️