Artificial Intelligence 🤖
Basics Refresher
Probability Density/Mass functions

Probability density function and probability mass function

Probability density function

The normal distribution is the most popular, and that is an example of a probability density function. The following figure is an example of a normal distribution curve:

Normal Distribution

It's conceptually easy to try to think of this graph as the probability of a given value occurring, but that's a little bit misleading when you're talking about continuous data. Because there's an infinite number of actual possible data points in a continuous data distribution. There could be 0 or 0.001 or 0.00001 so the actual probability of a very specific value happening is very, very small, infinitely small. The probability density function really tells the probability of a given range of values occurring. So that's the way you've got to think about it.

So, for example, in the normal distribution shown in the above graph, between the mean (0) and one standard deviation from the mean (1σ)(\mathbf{1 \sigma}) there's a 34.1%34.1 \% chance of a value falling in that range. You can tighten this up or spread it out as much as you want, figure out the actual values, but that's the way to think about a probability density function. For a given range of values it gives you a way of finding out the probability of that range occurring.

  • You can see in the graph, as you get close to the mean (0), within one standard deviation (-1 σ\boldsymbol{\sigma} and 1σ)\mathbf{1 \sigma}), you're pretty likely to land there. I mean, if you add up 34.1 and 34.1 , which equals to 68.2%68.2 \%, you get the probability of landing within one standard deviation of the mean. - However, as you get between two and three standard deviations (-3 σ\boldsymbol{\sigma} to −2σ-2 \sigma and 2σ2 \sigma to 3σ)3 \boldsymbol{\sigma}), we're down to just a little bit over 4%(4.2%4 \%(4.2 \%, to be precise).
  • As you get out beyond three standard deviations (−3σ(-3 \sigma and 3σ)3 \sigma) then we're much less than 1%1 \%.

So, the graph is just a way to visualize and talk about the probabilities of the given data point happening. Again, a probability distribution function gives you the probability of a data point falling within some given range of a given value, and a normal function is just one example of a probability density function. We'll look at some more in a moment.

Common probability density functions are:

Probability mass function

Now when you're dealing with discrete data, that little nuance about having infinite numbers of possible values goes away, and we call that something different. So that is a probability mass function. If you're dealing with discrete data, you can talk about probability mass functions. Here's a graph to help visualize this:

For example, you can plot a normal probability density function of continuous data on the black curve shown in the graph, but if we were to quantize that into a discrete dataset like we would do with a histogram, we can say the number 3 occurs some set number of times, and you can actually say the number 3 has a little over 30%30 \% chance of occurring. So a probability mass function is the way that we visualize the probability of discrete data occurring, and it looks a lot like a histogram because it basically is a histogram.

Common probability mass functions are:

Terminology Difference

The probability density function is a solid curve that describes the probability of a range of values happening with continuous data; probability mass function is the probabilities of given discrete values occurring in a dataset.