Categories

Nov 4, 2019 in R Language
Q:

Factor in R: Categorical & Continuous Variables

1 Answer

Nov 4, 2019

Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.

In a dataset, we can distinguish two types of variables: categorical and continuous.

In a categorical variable, the value is limited and usually based on a particular finite group. For example, a categorical variable can be countries, year, gender, occupation.

A continuous variable, however, can take any values, from integer to decimal. For example, we can have the revenue, price of a share, etc..

Categorical Variables

R stores categorical variables into a factor. Let's check the code below to convert a character variable into a factor variable. Characters are not supported in machine learning algorithm, and the only way is to convert a string to an integer.

Syntax

factor(x = character(), levels, labels = levels, ordered = is.ordered(x))

Arguments:

x: A vector of data. Need to be a string or integer, not decimal.

Levels: A vector of possible values taken by x. This argument is optional. The default value is the unique list of items of the vector x.

Labels: Add a label to the x data. For example, 1 can take the label `male` while 0, the label `female`.

ordered: Determine if the levels should be ordered.

Example:

Let's create a factor data frame.

# Create gender vector

gender_vector <- c("Male", "Female", "Female", "Male", "Male")

class(gender_vector)

# Convert gender_vector to a factor

factor_gender_vector <-factor(gender_vector)

class(factor_gender_vector)

Output:

## [1] "character"

## [1] "factor"

Ordinal Categorical Variable

Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.

Example:

We can use summary to count the values for each factor.

# Create Ordinal categorical vector

day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')

# Convert `day_vector` to a factor with ordered level

factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))

# Print the new variable

factor_day

Output:

## [1] evening   morning   afternoon midday    

midnight  evening

Continuous Variables

Continuous class variables are the default value in R. They are stored as numeric or integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers information on different types of car. We can import it by using mtcars and check the class of the variable mpg, mile per gallon. It returns a numeric value, indicating a continuous variable.

Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Madanswer
Nov 4, 2019 in R Language
Nov 6, 2019 in R Language
Jul 28, 2019 in R Language
...