Login
Remember
Register
Ask a Question
List out some of the best practices for data cleaning?
+1
vote
asked
May 21, 2021
in
Data Analytics
by
SakshiSharma
List out some of the best practices for data cleaning?
data-cleaning
data-analytics
Please
log in
or
register
to answer this question.
1
Answer
0
votes
answered
May 21, 2021
by
SakshiSharma
Some of the best practices for data cleaning includes below:
Sort data by different attributes
For large datasets cleanse it stepwise and improve the data with each step until you achieve a good data quality
For large datasets, break them into small data. Working with less data will increase your iteration speed
To handle common cleansing task create a set of utility functions/tools/scripts. It might include, remapping values based on a CSV file or SQL database or, regex search-and-replace, blanking out all values that don’t match a regex
If you have an issue with data cleanliness, arrange them by estimated frequency and attack the most common problems
Analyze the summary statistics for each column ( standard deviation, mean, number of missing values,)
Keep track of every date cleaning operation, so you can alter changes or remove operations if required
...