K-nearest Neighbors Classification with Multiple Labels
Introduction
To classify inputs using k-nearest neighbors (KNN) classification, it is necessary to have a labeled dataset, the distance metric that should be used to compute the distance between datapoints, and the value of k which is how many nearest neighbors should be reviewed.
K-Nearest Neighbors classification associates the class of neighboring datapoints to assign labels. When an unknown datapoint needs to be classified, the distance to all other datapoints is measured. Then, the k nearest neighbors are retrieved and based on which label is most common within this group, a predicted label is determined.
The k value is a parameter when generating the model. Below, see how selecting the right k value can impact the classification.
Image From: Murphy, K. P. (2021). Figure 1.15. *In Machine Learning: A Probabilistic Perspective*. textbook, MIT Press.