Artificial Intelligence: What is the Naive Bayes Classifier?

6 min readJun 12, 2020

All humans make decisions on a daily basis such as the clothes we are going to wear and the food we are going to eat. Making decisions are so common for us to do but have you ever thought about computers making decisions for us? That might sound crazy, but currently, from online ads to the spam filters, computers are trying to predict the decisions we would make. The Naive Bayes Classifier is the most commonly used classifier that is trying to make these predictions for us.

What Is Conditional Probability?

In order to understand conditional probability, first, we need to understand what probability is. Probability is the likeliness of an event to occur. For example, when you flip a coin it could either be heads or tails. Therefore the probability of the coin landing on heads would be 50%. Similarly to a die, when it rolled it can only be a number from 1–6. Therefore the probability for the die to roll to a 1 would be around 17%.

Conditional Probability is the likelihood of an event or outcome occurring, based on the occurrence of previous events or outcomes. Conditional probability is found by multiplying the probability of the event that has happened by the probability of the given event.

Let’s take an example of a school that has a total population of 100 people which is made up of teachers and students. The teachers and students can also be described as a male or a female. Let’s try to figure out what is the conditional probability of a male teacher. The data, states that 12 of the teachers are male and there are 60 teachers in total. Therefore the conditional probability of a male teacher would be 20% since 12 / 60 = .20.

What is the Bayes Rule?

The Bayes Rule is a way of going from P(X|Y) to find P(Y|X) where X is the evidence and Y is the outcome. P(X|Y) is when the classifier has evidence and is trying to figure out the outcome which can be found from the training data. P(Y|X) is when the classifier knows the outcome and is trying to figure out the evidence which is predicted for test data.

How does the Naive Bayes Classifier work?

The Bayes Rule is a formula to find the probability of Y when X is known but commonly in real-world problems, there are multiple X variables. The Naive Bayes Classifier is an extension of the Bayes Rule, where the X’s are independent. The Classifier is called “Naive” because the classifier makes a naive assumes that all the X's are independent of each other.

On the left-hand side of the equation, it is known as the posterior probability. Posterior probability is a conditional probability that is assigned after the evidence is taken under account.

On the right-hand side of the equation, it has two terms in its numerator. The first term is called the “Likelihood of Evidence” which is the conditional probability of each X’s given value. Since all of the X’s are assumed to be independent of one another, the algorithm will multiple the likelihoods of the X’s which is the “Probability of likelihood of evidence”. The second term is called the prior which is the overall probability of Y=k, where k is a class of Y.

Naive Bayes Classifier Setup

Let’s supposed we have 1000 animals which are either elephants, dogs, or lions. These are the possible cases for the Y variable. Let’s also suppose that we can describe these animals if they are big, have fur, or have a tail. The object of this classifier to predict whether the animal is an elephant, dog, and lion based on if the animal is big, have fur or have a tail.

The first step for the Naive Bayes Classifier is to find the priors of each animal class out of all the animals from the population. In this case, there is a 1000 animals in the population, 500 of them are elephants, 300 of them are dogs, and 200 of them are lions. The Classifier will divide the number of animals in one type by the total animals in the population. Therefore the priors for elephants, dogs, and lions would be .50, .30, and .20 respectively.

P(Y=Elephant) = 500 / 1000 = 0.50

P(Y=Dog) = 300 / 1000 = 0.30

P(Y=Lion) = 200 / 1000 = 0.20

The second step is to find the portion of each feature out of all the animals from the population. In this case, there is a 1000 animals in the population, 800 of the animals are big, 500 of the animals have fur, and 1000 of the animals have tails. Similarly to the first stage, the classifier will divide the number of animals for a feature by the total number of features. Therefore the portions for big, have fur, and have tails are .60, .50, and 1.00 respectively.

P(x1=Big) = 600/ 1000 = 0.50

P(x2=Fur) = 500/ 1000 = 0.50

P(x3=Tail) = 1000/ 1000 = 1.00

The third stage is to find the product of conditional probabilities for the features. In the formula, it states that P(X1 |Y=k). In this formula, X1 is the feature, and k is the animal class. This means in order to find the probability the classifier will divide the number of animals that a feature is present in the animal class by the animal class population. For example, the data says that there are 450 big elephants. Therefore P(Big | Elephant) will equal .45 because 450/500. After finding the probability for each feature, the classifier will then find the product of all of the probabilities. The algorithm will then continue this process until it has found the product of all the probabilities for each one of the animal classes.

Probability of Likelihood for Elephant

P(x1=Big | Y=Elephant) = 450/ 500 = 0.90

P(x2=Fur | Y=Elephant) = 0/ 500 = 0.00 -> 1.00

P(x3=Tail | Y=Elephant) = 500 / 500 = 1.00

So, the overall probability of Likelihood of evidence for an Elephant = 0.9 * 1.00* 1.00= 0.9

The fourth and final stage is to substitute the product of the probabilities into the Naive Bayes Classifier to find which class will have the highest probability. In this case, the elephant will have the highest probability out of all the animal classes.

What is Laplace Correction?

In the example above the probability for an alligator, would be zero since at least one feature is not present in any of the animals. This makes sense, but when the Naive Bayes classifier has many features, the entire probability would be zero since one feature’s value was zero. In order to avoid this, the classifier would increase the count of the variable with zero to 1 in the numerator, so that the probability won’t be zero. This method is called Laplace Correction.

Widespread Use of the Naive Bayes Classifier

The Naive Bayes Classifier provides intelligence to applications that most of us use every day and is transforming the way that computers can make real-time predictions. Currently, it is being used primarily to make real-time predictions. It can also be used in recommendation systems and predict if a user would like a certain resource or product. It is now having widespread use of in-text classification as it has a higher success rate compared to other algorithms. Even though the Naive Bayes Classifier can have many different applications, it has some limitations. One limitation is that it assumes that all of the attributes are independent. Another limitation is that it treats all attributes equally, so some weights can’t be added to the important attributes to increase their contribution in the final decision. The Naive Bayes Classifier is still the most common and most powerful classifier to date and it is a foundational tool that is improving Artificial Intelligence’s potential.

Hi, I am Nivan Gujral! I am a 13-year-old, who is passionate about the intersections between AI and aerospace. Send me an email at nivangujral@gmail.com if you would like to further discuss this article or just talk.

Artificial Intelligence: What is the Naive Bayes Classifier?

Written by Nivan Gujral