We have three methods to deal with missing values:
- Exclude all of the missing observations
- Impute with the mean
- Impute with the median
The following table summarizes how to remove all the missing observations
Library | Objective | Code |
---|
base | List missing observations | colnames(df)[apply(df, 2, anyNA)] |
dplyr | Remove all missing values | na.omit(df) |
Imputation with mean or median can be done in two ways
Method | Details | Advantages | Disadvantages |
---|
Step by step with apply | Check columns with missing, compute mean/median, store the value, replace with mutate() | You know the value of means/median | More execution time. Can be slow with big dataset |
Quick way with sapply | Use sapply() and data.frame() to automatically search and replace missing values with mean/median | Short code and fast | Don't know the imputation values |