Naive Bayes

Jay Vinay
2 min readSep 2, 2020

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variableBayes’ Theorem provides a way that we can calculate the probability of a piece of data belonging to a given class, given our prior knowledge. Bayes’ Theorem is stated as: P(class|data) = (P(data|class) * P(class)) / P(data),Where P(class|data) is the probability of class given the provided data.

**Gaussian process regression** A decision tree is arriving at an estimate by asking a series of questions to the data, each question narrowing our possible values until the model get confident enough to make a single prediction. The order of the question as well as their content are being determined by the model. In addition, the questions asked are all in a True/False form. The GaussianProcessRegressor implements Gaussian processes for regression purposes. This algorithm assumes a mean of zero if normalize_data=False, and will renormalize the data to make the mean zero if normaliza_data=True. Then a covariance matrix for the features in the dataset is estimated. In other words, the algorithm must find the normal distribution which maximizes the log marginal likelihood of each pair of variables-marginal likelihood because we hold all other variables constant, log for numerical stability, and likelihood in terms of the likelihood function of the observed data.This is done using an optimizer parameter. But log marginal likelihood is non-convex, so it may have multiple local maxima, and thus n_restarts_optimizer is provided to allow you to solve for the covariance matrix multiple times and “pick” the values with the highest level of consensus. The data the regression is applied to does not necessarily have to be the original data in its original vector space. As with other techniques, like support vector machines, we may detect more complex boundaries by providing a non-linear kernel. Also like with support vector machines, the default kernel is RBF. This example illustrates the predicted probability of GPC for an RBF kernel with different choices of the hyperparameters. The first figure shows the predicted probability of GPC with arbitrarily chosen hyperparameters and with the hyperparameters corresponding to the maximum log-marginal-likelihood (LML). While the hyperparameters chosen by optimizing LML have a considerable larger LML, they perform slightly worse according to the log-loss on test data. The figure shows that this is because they exhibit a steep change of the class probabilities at the class boundaries (which is good) but have predicted probabilities close to 0.5 far away from the class boundaries (which is bad) This undesirable effect is caused by the Laplace approximation used internally by GPC.**Probabilistic predictions with Gaussian process classification (GPC)**

Originally published at https://www.jayvinay.com on September 2, 2020.

--

--

Jay Vinay

Computer Science Engneering Student.Interested in Psycology and Cognitive Sciences and Love to Code.