We will begin with the select() verb. We don't necessarily need all the variables, and a good practice is to select only the variables you find relevant.
We have 181 missing observations, almost 90 percent of the dataset. If you decide to exclude them, you won't be able to carry on the analysis.
The other possibility is to drop the variable Comment with the select() verb.
We can select variables in different ways with select(). Note that, the first argument is the dataset.
- `select(df, A, B ,C)`: Select the variables A, B and C from df dataset.
- `select(df, A:C)`: Select all variables from A to C from df dataset.
- `select(df, -C)`: Exclude C from the dataset from df dataset.
You can use the third way to exclude the Comments variable.
step_1_df <- select(df, -Comments)