Introduction

Naive Bayes Classifiers (multi-label)

Introduction & Dataset

Naive Bayes classifiers for multiple labels work similar to naive Bayes classifiers for two classes. In this case, we will be looking 13 different features of wine: alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, od280/od315 of diluted wines, and proline. Based on these 13 features, the wine will be categorized into three classes: Class 0 (y1), Class 1 (y2), and Class 2 (y3). This time, the features are continuous, not discrete. 

Again, the priors, or the probabilities of wine being assigned to a particular class, must be known. For the example below, the data is available for 178 wines. 59 wines were classified as Class 0 (y1) so the probability of the wine being assigned to Class 0 (P(y1)), is equal to 59/178. 71 wines were classified as Class 1 (y2) so the probability of the wine being assigned to Class 1 (P(y2)), is equal to 71/178. 48 wines were classified as Class 2 (y3) so the probability of the wine being assigned to Class 2 (P(y3)), is equal to 48/178. 

Because the dataset uses continuous features, the likelihood is represented by the probability density function (p(X|yi) where i = 1, 2, 3).The process is similar to classifying into two classes except now look at the probability of three classes.

If P(y2|X) < P(y1|X) and P(y3|X) < P(y1|X), assign sample X to y1

If the probability of the wine being of Class 0 given the features is greater than the probability of the wine being of Class 1 and is greater than the probability of the wine being of Class 2, then the features are assigned to Class 0. 

If P(y1|X) < P(y2|X) and P(y3|X) < P(y2|X), assign sample X to y2

If the probability of the wine being of Class 1 given the features is greater than the probability of the wine being of Class 0 and is greater than the probability of the wine being of Class 2, then the features are assigned to Class 1. 

If P(y1|X) < P(y3|X) and P(y2|X) < P(y3|X), assign sample X to y3

If the probability of the wine being of Class 2 given the features is greater than the probability of the wine being of Class 0 and is greater than the probability of the wine being of Class 1, then the features are assigned to Class 2. 



Figure 1: (Left) Likelihood represented by the probability density function (p(X|yi) where i = 1, 2, 3) (Center) Likelihood multiplied by class priors (Right) Obtain class posterior probabilities and determine the decision boundaries.


Let's take a look at implementing a naive Bayes Classifier with more than two classes with SciKit Learn.

Load the wine dataset and assign it to a variable. Print the name of the featues and labels to see the features and classes of the dataset.

# Import scikit-learn dataset libraryfrom sklearn import datasetsfrom scipy.stats.kde import gaussian_kdeimport numpy as npimport matplotlib.pyplot as plt
# Load datasetwine = datasets.load_wine()
# print the names of the 13 featuresprint("Features: ", wine.feature_names)
# print the label type of wine(class_0, class_1, class_2)print("Labels: ", wine.target_names)
Features: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline'] Labels: ['class_0' 'class_1' 'class_2']
By running the code below, the likelihood of the feature alcohol given the three classes (0, 1, and 2) is graphed. The likelihood is represented by the probability mass function (p(X|y1)).

alcohol = wine.data[:, 0]classes = wine.target
alcohol_class0 = [w[0] for w in zip(alcohol, classes) if w[1] == 0]alcohol_class1 = [w[0] for w in zip(alcohol, classes) if w[1] == 1]alcohol_class2 = [w[0] for w in zip(alcohol, classes) if w[1] == 2]classes_all = ["Class 0", "Class 1", "Class 2"]
kde_0 = gaussian_kde(alcohol_class0)dist_space_0 = np.linspace(0, 15, 250)kde_1 = gaussian_kde(alcohol_class1)dist_space_1 = np.linspace(0, 15, 250)kde_2 = gaussian_kde(alcohol_class2)dist_space_2 = np.linspace(0, 15, 250)
# plot the resultsplt.title('Probability Density Function (alcohol | y_i)')plt.ylim((0, 1))plt.xlim((np.amin(alcohol), np.amax(alcohol)))plt.xlabel('x') plt.ylabel('p(x | y_i)')plt.plot(dist_space_0, kde_0(dist_space_0), label='y_1 Class 0')plt.plot(dist_space_1, kde_1(dist_space_1), label='y_2 Class 1')plt.plot(dist_space_2, kde_2(dist_space_2), label='y_3 Class 2')plt.legend()plt.show()


By running the code below, the likelihood of the feature alcohol given the three classes (0, 1, and 2) multiplied by the priors of each class is graphed.

alcohol = wine.data[:, 0]classes = wine.target
alcohol_class0 = [w[0] for w in zip(alcohol, classes) if w[1] == 0]alcohol_class1 = [w[0] for w in zip(alcohol, classes) if w[1] == 1]alcohol_class2 = [w[0] for w in zip(alcohol, classes) if w[1] == 2]classes_all = ["Class 0", "Class 1", "Class 2"]
kde_0 = gaussian_kde(alcohol_class0)dist_space_0 = np.linspace(0, 15, 250)kde_1 = gaussian_kde(alcohol_class1)dist_space_1 = np.linspace(0, 15, 250)kde_2 = gaussian_kde(alcohol_class2)dist_space_2 = np.linspace(0, 15, 250)
# plot the resultsplt.title('Likelihood * Class Priors')plt.ylim((0, 1))plt.xlim((np.amin(alcohol), np.amax(alcohol)))plt.xlabel('x') plt.ylabel('p(x | y_i)*p(y_i)')plt.plot(dist_space_0, kde_0(dist_space_0) * (len(alcohol_class0) / len(alcohol)), label='y_1 Class 0')plt.plot(dist_space_1, kde_1(dist_space_1) * (len(alcohol_class1) / len(alcohol)), label='y_2 Class 1')plt.plot(dist_space_2, kde_2(dist_space_2) * (len(alcohol_class2) / len(alcohol)), label='y_3 Class 2')plt.legend()plt.show()

We can print a number of pieces of information to understand some of the characteristics of the dataset.

print("Class 0 count: ", np.count_nonzero(wine.target == 0))print("Class 1 count: ", np.count_nonzero(wine.target == 1))print("Class 2 count: ", np.count_nonzero(wine.target == 2))print("Total wines for classification: ", len(wine.target))
Class 0 count: 59 Class 1 count: 71 Class 2 count: 48 Total wines for classification: 178
# print data(feature)shapewine.data.shape
(178, 13)
# print the wine data features (top 5 records)print(wine.data[0:5])
[[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00 2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03] [1.320e+01 1.780e+00 2.140e+00 1.120e+01 1.000e+02 2.650e+00 2.760e+00 2.600e-01 1.280e+00 4.380e+00 1.050e+00 3.400e+00 1.050e+03] [1.316e+01 2.360e+00 2.670e+00 1.860e+01 1.010e+02 2.800e+00 3.240e+00 3.000e-01 2.810e+00 5.680e+00 1.030e+00 3.170e+00 1.185e+03] [1.437e+01 1.950e+00 2.500e+00 1.680e+01 1.130e+02 3.850e+00 3.490e+00 2.400e-01 2.180e+00 7.800e+00 8.600e-01 3.450e+00 1.480e+03] [1.324e+01 2.590e+00 2.870e+00 2.100e+01 1.180e+02 2.800e+00 2.690e+00 3.900e-01 1.820e+00 4.320e+00 1.040e+00 2.930e+00 7.350e+02]]
# try printing the top 5 records using pandas for a easier to read displayimport pandas as pddf = pd.DataFrame(wine.data[0:5], columns = wine.feature_names) df.head()
# print the wine labels (0:Class_0, 1:class_2, 2:class_2)print(wine.target)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]