One Hot Encoding visually explained using Excel

Published: 13 May 2021
on channel: Kunaal Naik | Data Science Masterminds

6,367

Some algorithms can work directly with categorical data.

For example, a decision tree can be learned directly from categorical data (dependent on the specific application) without requiring data transformation.

Most machine learning algorithms cannot work directly on tag data. They require all input and output variables to be digital.

Generally, this is primarily a limitation in the efficient implementation of machine learning algorithms, rather than strict limitations on the algorithms themselves.

This means that categorical data must be converted into a digital format. If the categorical variable is an output variable, you can also convert the model predictions into a categorical format for presentation or use in a particular application.

In this video, you get to see the logic behind how One Hot Encoding is implemented visually using Excel.

Full coding is not sufficient for categorical variables that do not have such a sequential relationship.

In fact, using this coding and letting the model take a natural order between categories can have a bad outcome, unexpected performance, or results (estimates across categories). In this case, hot coding can be applied to the entire impression.

This is where the encoded integer variable is removed and a new binary variable is added for each unique integer value.

#onehotencoding #sklearn #machinelearning #excel