A Categorical is a pandas data type that corresponds to the categorical variables in statistics. A categorical variable usually takes a fixed number of possible values. Some of the examples that can be considered as categorical are, gender, social class, blood type, country, etc.
Earlier in pandas, you could create a category column after reading the file. Below is a code snippet that shows how this works. We used the astype()
function to convert a column to a category column.
Let’s take a look at the code:
import pandas as pddrinks = pd.read_csv('http://bit.ly/drinksbycountry')print("Datatype of each column:")print(drinks.dtypes)drinks['continent'] = drinks.continent.astype('category')print("\nDatatype after creating category column:")print(drinks.dtypes)
Explanation:
continent
is of type object
and not a categorical column.continent
column as a categorical column using the astype()
function.continent
column is now a categorical column.The above approach works fine, but what if we could do this conversion while reading the file itself?
Take a look at the code to see how this works.
import pandas as pddrinks = pd.read_csv('http://bit.ly/drinksbycountry',dtype={'continent':'category'})print("Datatype of each column:")print(drinks.dtypes)
Explanation:
continent
columnSimilarly, you can set the data type of multiple columns using key-value pairs.