KNN Algorithm(things you should know)..!

Jay Vinay
3 min readMay 8, 2021

KNN” is one of the simplest algorithms used in Machine Learning for regression and classification problem. KNN algorithms use data and classify new data points based on similarity measures (e.g. distance function). Classification is done by a majority vote to its neighbors. The data is assigned to the class which has the nearest neighbors. As you increase the number of nearest neighbors, the value of k, accuracy might increase.

KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other.To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before.

**What is K in KNN algorithm?** K in KNN is the number of nearest neighbors considered for assigning a label to the current point. K is an extremely important parameter and choosing the value of K is the most critical problem when working with the KNN algorithm. The process of choosing the right value of K is referred to as parameter tuning and is of great significance in achieving better accuracy. If the value of K is too small then there is a probability of overfitting the model and if it is too large then the algorithm becomes computationally expensive. Most data scientists usually choose an odd number value for K when the number of classes is 2. Another formula that works well for choosing K is, k- sqrt(n) where n is the total number of data points. Selecting the value of K depends on individual cases and sometimes the best method of choosing K is to run through different values of K and verify the outcomes. Using cross-validation, the KNN algorithm can be tested for different values of K and the value of K that results in good accuracy can be considered as an optimal value for K. When should you use KNN Algorithm KNN algorithm is a good choice if you have a small dataset and the data is noise free and labeled. When the data set is small, the classifier completes execution in shorter time duration. If your dataset is large, then KNN, without any hacks, is of no use. **The most important parameters are:** n_neighbors: the value of k, the number of neighbors considered weights: if you want to use weighted attributes, here you can configure the weights. This takes values like uniform, distance (inverse distance to the new point) or callable which should be defined by the user. The default value is uniform. algorithm: if you want a different representation of the data, here you can use values like ball_tree, kd_tree or brute, default is auto which tries to automatically select the best representation for the current data set. metric: the distance metric (Euclidean, Manhattan, etc), default is Euclidean.

--

--

Jay Vinay

Computer Science Engneering Student.Interested in Psycology and Cognitive Sciences and Love to Code.