Nov 13, 2019 in R Language
Q: tapply() function in R Language

1 Answer

0 votes
Nov 13, 2019

The function tapply() computes a measure (mean, median, min, max, etc..) or a function for each factor variable in a vector.

tapply(X, INDEX, FUN = NULL)
-X: An object, usually a vector
-INDEX: A list containing factor
-FUN: Function applied to each element of x

Part of the job of a data scientist or researchers is to compute summaries of variables. For instance, measure the average or group data based on a characteristic. Most of the data are grouped by ID, city, countries, and so on. Summarizing over group reveals more interesting patterns.

To understand how it works, let's use the iris dataset. This dataset is very famous in the world of machine learning. The purpose of this dataset is to predict the class of each of the three flower species: Sepal, Versicolor, Virginica. The dataset collects information for each species about their length and width.

As a prior work, we can compute the median of the length for each species. tapply() is a quick way to perform this computation.

tapply(iris$Sepal.Width, iris$Species, median)


##     setosa versicolor  virginica 
##        3.4        2.8        3.0

Related questions

0 votes
Nov 14, 2019 in R Language
+2 votes
Jul 28, 2019 in R Language