Introduction to Naive Bayes classifiers (Two Classes)

Naive Bayes classifiers (Two Classes)

Introduction

Bayes Theorem describes the probability of an event taking place considering prior knowledge about features that might impact the event. Naive Bayes classifiers are based on Bayes theorem and assumes each feature is independent given that the class is known. In other words, the classifier assumes that one feature being present in a class is not related to another feature being present in the same class. Features can be discrete or continuous. A discrete feature is discontinuous, only a select number of values are possible. Continuous features can be measured on a scale and can be divided into smaller levels. Discrete variables have a probability mass function. Continuous variables have a probability density function. The example in this tutorial works with discrete variables.

We want to predict if a football match will take place based on two features: weather and temperature. In this example, we will focus on two classes: yes (y1) and no (y2). The football match will either take place or not take place. The priors, or the probabilities of the match happening or not happening, must be known. These can be determined by dividing the number of times the class was assigned by the total number of datapoints. For the example below, the data is available for 14 past matches. Nine matches were classified as yes (y1) so the probability of the match taking place (P(y1)), is equal to 9/14. Five matches were classified as no (y2) so the probability of the match not taking place (P(y2)), is equal to 5/14.
Since, this example contains discrete variables, the likelihood represented by the probability mass function (p(X|yi) where i = 1, 2) must also be known. This describes the distribution of if a match went ahead in the past based on the weather and temperature conditions.
Bayes' theorem states that the probability of the match taking place given a set of features of weather and temperature (p(X|y1)) is called the *posterior*. The *posterior* is equal to the likelihood, or the probability mass function (p(X|yi)), times the priors, or the probability of the match taking place (P(yi)), divided by the input data probability distribution.
If the probability of playing given the weather and temperature is lower than the probability of not playing given the weather and temperature, then the features are assigned to the no class. In other words, the prediction is to not play.

If P(y1|X) < P(y2|X), assign sample X to y2

If the probability of playing given the weather and temperature is higher than the probability of not playing given the weather and temperature, then the features are assigned to the yes class. In other words, the prediction is to play.

If P(y2|X) < P(y1|X), assign sample  X to y1

If the probabilities of playing and not playing given the weather and temperature are equal, then the prediction can be assigned to either class.

If P(y2|X) = P(y1|X), assign sample  X to y1 or y2