Artificial Intelligence 🤖
Basics Refresher
Types of data

Types of data

There are different variations of techniques that you might use for different types of data, so you always need to keep in mind what kind of data you're dealing with when you're analyzing it.

Numerical data

Let's start with numerical data. It's probably the most common data type. Basically, it represents some quantifiable thing that you can measure. Some examples are heights of people, page load times, stock prices, and so on. Things that vary, things that you can measure, things that have a wide range of possibilities. Now there are basically two kinds of numerical data, so a flavor of a flavor if you will.

Discrete data

There's discrete data, which is integer-based and, for example, can be counts of some sort of event. Some examples are how many purchases did a customer make in a year. Well, that can only be discrete values. They bought one thing, or they bought two things, or they bought three things. They couldn't have bought, 2.25 things or three and three-quarters things. It's a discrete value that has an integer restriction to it.

Continuous data

The other type of numerical data is continuous data, and this is stuff that has an infinite range of possibilities where you can go into fractions. So, for example, going back to the height of people, there is an infinite number of possible heights for people. You could be five feet and 10.37625 inches tall, or the time it takes to do something like check out on a website could be any huge range of possibilities, 10.7625 seconds for all you know, or how much rainfall in a given day. Again, there's an infinite amount of precision there. So that's an example of continuous data.

To recap, numerical data is something you can measure quantitatively with a number, and it can be either discrete, where it's integer-based like an event count, or continuous, where you can have an infinite range of precision available to that data.

Categorical data

The second type of data that we're going to talk about is categorical data, and this is data that has no inherent numeric meaning.

Most of the time, you can't really compare one category to another directly. Things like gender, yes/no questions, race, state of residence, product category, political party; you can assign numbers to these categories, and often you will, but those numbers have no inherent meaning.

Ordinal data

The last category that you tend to hear about with types of data is ordinal data, and it's sort of a mixture of numerical and categorical data. A common example is star ratings for a movie or music, or what have you.

In this case, we have categorical data in that could be 1 through 5 stars, where 1 might represent poor and 5 might represent excellent, but they do have mathematical meaning. We do know that 5 means it's better than a 1, so this is a case where we have data where the different categories have a numerical relationship to each other. So, I can say that 1 star is less than 5 stars, I can say that 2 stars is less than 3 stars, I can say that 4 stars is greater than 2 stars in terms of a measure of quality. Now you could also think of the actual number of stars as discrete numerical data. So, it's definitely a fine line between these categories, and in a lot of cases you can actually treat them interchangeably.