The function tapply() computes a measure (mean, median, min, max, etc..) or a function for each factor variable in a vector.
tapply(X, INDEX, FUN = NULL)
-X: An object, usually a vector
-INDEX: A list containing factor
-FUN: Function applied to each element of x
Part of the job of a data scientist or researchers is to compute summaries of variables. For instance, measure the average or group data based on a characteristic. Most of the data are grouped by ID, city, countries, and so on. Summarizing over group reveals more interesting patterns.
To understand how it works, let's use the iris dataset. This dataset is very famous in the world of machine learning. The purpose of this dataset is to predict the class of each of the three flower species: Sepal, Versicolor, Virginica. The dataset collects information for each species about their length and width.
As a prior work, we can compute the median of the length for each species. tapply() is a quick way to perform this computation.
tapply(iris$Sepal.Width, iris$Species, median)
## setosa versicolor virginica
## 3.4 2.8 3.0