Categories

Nov 13, 2019 in R Language
Q: tapply() function in R Language

1 Answer

Nov 13, 2019

The function tapply() computes a measure (mean, median, min, max, etc..) or a function for each factor variable in a vector.

tapply(X, INDEX, FUN = NULL)
Arguments:
-X: An object, usually a vector
-INDEX: A list containing factor
-FUN: Function applied to each element of x

Part of the job of a data scientist or researchers is to compute summaries of variables. For instance, measure the average or group data based on a characteristic. Most of the data are grouped by ID, city, countries, and so on. Summarizing over group reveals more interesting patterns.

To understand how it works, let's use the iris dataset. This dataset is very famous in the world of machine learning. The purpose of this dataset is to predict the class of each of the three flower species: Sepal, Versicolor, Virginica. The dataset collects information for each species about their length and width.

As a prior work, we can compute the median of the length for each species. tapply() is a quick way to perform this computation.

data(iris)
tapply(iris$Sepal.Width, iris$Species, median)

Output:

##     setosa versicolor  virginica 
##        3.4        2.8        3.0
Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Madanswer
Nov 14, 2019 in R Language
Jul 28, 2019 in R Language
Nov 14, 2019 in R Language
...