Normalisation for ML problems

Hubert Rzeminski
3 min readJun 21, 2021

One of the most common tasks in data preparation is feature normalisation. Even though this is not a complicated topic I struggled to understand when to use it and what type of normalisation to use.

In this short article, I hope to help you guys out by shedding some light on this topic.

We will cover:

  • What is min-max normalisation?
  • What is standardization?
  • Why you should normalise your data?

What is min-max normilisation?

In this method, we will simply change the range of the data from [min, max] to [0,1] or in some cases when you have negatives you may choose to change the range to [-1,1].

Note: The min-max scaler will not reduce the impact of outliers.

For applied ML you don't need to worry about the formula as there are already implemented methods for this here, but if you’re interested here it is:

What is standardization?

A common misconception is that standardisation changes the distribution of the data which it does not.

Standardisation simply changes the mean of your data to 0 as well as the standard deviation to 1 (variance is also 1). In most cases, this is done feature-wise (calculated per feature).

Again you can find out how to use it with python here, and if you're interested, below is its formula:

Why you should normalise your data?

A quick way to see if you may need to normalise is if your data has very different ranges, e.g. when predicting house prices and one feature is the square footage of the house (range between 2000 and 5000) while the other is pool size in square meters (range between 10 and 100).

In most cases, ML algorithms will perform better when data is at a similar scale (not always the case). Here are a few algorithms that almost always get a boost in performance after normalisation:

  • Linear and logistic regression
  • Neural networks
  • PCA
  • Linear discriminant analysis

Conclusion

I want to end this quick recap with a few other types of normalisation and a few extra tips:

  • Robust scaler: Similar to the min-max scaler although does not fit into a specific range, this should be used when you want to reduce the impact of outliers.
  • When standardising, about 68% of the values will land between -1 and 1.
  • Deep learning algorithms usually benefit from zero mean and unit variance.

I would also like to add that experimentation is the best solution in my opinion when you don't know if and how to normalise.

--

--

Hubert Rzeminski

Hey, I'm a 3rd year computer science student. I create these blogs to both help me learn and to hopefuly help others that are in a similar position to mine.