0 votes
in Python Pandas by
Explain Categorical Data in Pandas?

1 Answer

0 votes
by

Categorical are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales. All values of categorical data are either in categories or np.nan.

The categorical data type is useful in the following cases:

A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory,

The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order,

As a signal to other Python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

Related questions

0 votes
asked Aug 13, 2021 in Python Pandas by SakshiSharma
0 votes
asked Nov 9, 2021 in Python Pandas by Robin
...