Feature Scaling Made Simple

4 min readJun 18, 2021

Every Numerical value consists of unit and magnitude. take as example Age. if my age is 24, that means the magnitude is actually 24 and the unit is years. The main thing to understand is that when we have many features, each feature will be computed by different unit and magnitude.

Feature Scaling? 🤔

Feature Scaling is a part of Feature Engineering. It allows us to put all of our features in the same scale.

But.. why?

It’s an important part of the data preprocessing phase for some Machine Learning algorithms that we try to scale down features into some scale. In order to avoid some features to be dominated by other features in such a way that only the dominant feature are considered by the Machine Learning Model.

Tho, It doesn’t mean the we need to apply Feature Scaling for all Machine Learning Models, it’s only necessary for some of them and we’ll see that later in this article.

How to do Feature Scaling?

There’s actually many techniques and the two most common scaling techniques that are used are Normalization and Standardization.

What is the difference?

Normalization (also called Min-Max Scaler): It simply scaled down all values of the feature to be between 0–1, using the formula below

So basically, it subtracts each value by the minimum value then divide by the difference between the maximum and minimum values. And since both numerator and denominator are positive and numerator is always less or equal to denominator that means all values of the feature will be between 0–1.

Below is the implementation using python ⬇️

Normalization in python

Standardization (also called Z-score Normalization or Standard Scaler): it scales down the features based on the Standard Normal Distribution; where mean is usually 0 and standard deviation is 1. This will put all values of the feature between -3 and 3, using the formula below

So it subtracts each value in the feature by the mean of all values of the feature then divide it by the standard deviation which is nothing but the square root of the variance.

Below is the implementation using python ⬇️

Standardization in python

When to use Feature Scaling?

When we use any Machine learning Algorithm that involves Euclidean Distance or Deep Learning Algorithm where Gradient Decent is involved.

Examples are: KNN, K Mean clustering, all deep learning algorithms such as Artificial Neural Network(ANN) and Convolutional Neural Networks(CNN).

When is it not important?

In algorithms such as Decision Trees, Random Forests, XGBoost. Because for example, in Decision Trees we just create a Decision Tree and its divided based on the features ( it won’t affect if the values of the features are high or low) it will behave the same anyways.

Normalization Vs. Standardization, When to use each?

In most Machine Learning Algorithms that requires scaling, Standardization performs better than Normalization ( it will do the job all the times), therefore my recommendation is to go for Standardization.

On the other hand, Normalization is recommended when we have Normal Distribution in most of our features ( it will be great)

Also on deep learning techniques like ANN and CNN we use Normalization because we need to scale down the values between 0–1. For example in images, pixels’ values are between 0–255 so when we scale it down, it should be between 0–1. Similarly in ANN using Tenserflow and Keras they would accept the inputs between 0–1 which will help them learn the weights quickly.

Last Note

When we apply Feature Scaling, we should always apply it for training and testing set SEPARATELY. Be very cautious to not fit the scaler on the whole dataset at once, as this will miss up the mean and standard deviation values in the case of Standardization and with minimum and maximum values in the case of Normalization. So fit and transform in the trining set then transform in the testing set, why so? In order to avoid Data Leakage.

With that all said, I hope you enjoyed this blog and until next time stay well✌🏻