1 Answer

0 votes
by

When we are 100% sure that the two datasets won't match, we can consider to return only rows existing in both dataset. This is possible when we need a clean dataset or when we don't want to impute missing values with the mean or median.

The inner_join()comes to help. This function excludes the unmatched rows.

inner_join(df_primary, df_secondary, by ='ID')
...