+1 vote
in Data Analytics by
List out some of the best practices for data cleaning?

1 Answer

0 votes
by

Some of the best practices for data cleaning includes below:

  1. Sort data by different attributes
  2. For large datasets cleanse it stepwise and improve the data with each step until you achieve a good data quality
  3. For large datasets, break them into small data. Working with less data will increase your iteration speed
  4. To handle common cleansing task create a set of utility functions/tools/scripts. It might include, remapping values based on a CSV file or SQL database or, regex search-and-replace, blanking out all values that don’t match a regex
  5. If you have an issue with data cleanliness, arrange them by estimated frequency and attack the most common problems
  6. Analyze the summary statistics for each column ( standard deviation, mean, number of missing values,)
  7. Keep track of every date cleaning operation, so you can alter changes or remove operations if required

Related questions

0 votes
asked Dec 17, 2022 in Data Analytics by AdilsonLima
+1 vote
asked May 21, 2021 in Data Analytics by SakshiSharma
...