R Aggregate Function: Summarise & Group_by() Example

Question

R Aggregate Function: Summarise & Group_by() Example

1 Answer

MBarbieri · Answer 1 · 2019-11-14T06:11:21+0000

Summary of a variable is important to have an idea about the data. Although, summarizing a variable by group gives better information on the distribution of the data.

In this tutorial, you will learn how summarize a dataset by group with the dplyr library.

Before you perform summary, you will do the following steps to prepare the data:

Step 1: Import the data
Step 2: Select the relevant variables
Step 3: Sort the data

library(dplyr)

# Step 1
data <- read.csv("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/lahman-batting.csv") % > %

# Step 2
select(c(playerID, yearID, AB, teamID, lgID, G, R, HR, SH))  % > % 

# Step 3
arrange(playerID, teamID, yearID)

A good practice when you import a dataset is to use the glimpse() function to have an idea about the structure of the dataset.

# Structure of the data
glimpse(data)

Summarise()

The syntax of summarise() is basic and consistent with the other verbs included in the dplyr library.

summarise(df, variable_name=condition) 
arguments: 
- `df`: Dataset used to construct the summary statistics 
- `variable_name=condition`: Formula to create the new variable

Look at the code below:

summarise(data, mean_run =mean(R))

Code Explanation

summarise(data, mean_run = mean(R)): Creates a variable named mean_run which is the average of the column run from the dataset data.
Group_by vs no group_by
The function summerise() without group_by() does not make any sense. It creates summary statistic by group. The library dplyr applies a function automatically to the group you passed inside the verb group_by.
Note that, group_by works perfectly with all the other verbs (i.e. mutate(), filter(), arrange(), ...).
It is convenient to use the pipeline operator when you have more than one step. You can compute the average homerun by baseball league.
```
data % > %
	group_by(lgID) % > %
	summarise(mean_run = mean(HR))
```
Code Explanation
- data: Dataset used to construct the summary statistics
- group_by(lgID): Compute the summary by grouping the variable `lgID
- summarise(mean_run = mean(HR)): Compute the average homerun

R Aggregate Function: Summarise & Group_by() Example

Please log in or register to answer this question.

1 Answer

Summarise()

Group_by vs no group_by

Related questions

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media