Software Blog Hub

What is Normalisation and Standardisation in Data Science?

Data Science

Introduction

Normalisation and standardisation are two common preprocessing techniques included in any Data Scientist Course. These techniques are used in data science to scale and transform features before feeding them into machine learning algorithms. Normalisation and standardisation play a vital role in preparing data for analysis and modelling.

Importance of Normalisation and Standardisation

The following sections describe the role of normalisation and standardisation in rendering data suitable for  analysis. Unless data is correctly prepossessed using these techniques, the result obtained for analysis can be skewed and incorrect.

Basic Normalisation and Standardisation Equations

The common basic equations for normalisation and standardisation that will be taught in any Data Scientist Course are the following:

Normalisation: Normalisation typically refers to scaling each feature to a range between 0 and 1. It is useful when the features have different scales. One of the most common normalisation techniques is Min-Max scaling, which transforms each feature to the range [0,1][0,1] using the formula:

where 𝑋 is the original feature value, Xmin is the minimum value of the feature, and Xmax is the maximum value of the feature.

Standardisation: Standardisation (also called S-score normalisation) transforms the data to have a mean of 0 and a standard deviation of 1. It is particularly useful when the features are normally distributed. The formula for standardisation is:

where X is the original feature value, 𝜇 is the mean of the feature, and 𝜎 is the standard deviation of the feature.

Normalisation scales the data to a fixed range (usually [0,1][0,1]), while standardisation rescales the data so that it has a mean of 0 and a standard deviation of 1. The choice between normalisation and standardisation depends on the specific characteristics of the data and the requirements of the machine learning algorithm being used.

Conclusion

Overall, normalisation and standardisation are essential techniques in the data preprocessing pipeline, ensuring that the data is appropriately prepared for analysis and modelling, leading to more reliable and accurate results. Although advanced methods of normalisation and standardisation are usually used in research studies and by statisticians, the basic methods are essential first steps in any data analysis process and are often mandatory topics in a Data Science Course in Mumbai or a data analysis course curriculum

Business name: ExcelR- Data Science, Data Analytics, Business Analytics Course Training Mumbai

Address: 304, 3rd Floor, Pratibha Building. Three Petrol pump, Lal Bahadur Shastri Rd, opposite Manas Tower, Pakhdi, Thane West, Thane, Maharashtra 400602

Phone: 09108238354

Email: enquiry@excelr.com

Exit mobile version