Home
Recent Q&A
Java
Cloud
JavaScript
Python
SQL
PHP
HTML
C++
Data Science
DBMS
Devops
Hadoop
Machine Learning
Azure
Blockchain
Devops
Ask a Question
List out some of the best practices for data cleaning?
Home
Data Analytics
List out some of the best practices for data cleaning?
+1
vote
asked
May 21, 2021
in
Data Analytics
by
SakshiSharma
List out some of the best practices for data cleaning?
data-cleaning
data-analytics
Please
log in
or
register
to answer this question.
1
Answer
0
votes
answered
May 21, 2021
by
SakshiSharma
Some of the best practices for data cleaning includes below:
Sort data by different attributes
For large datasets cleanse it stepwise and improve the data with each step until you achieve a good data quality
For large datasets, break them into small data. Working with less data will increase your iteration speed
To handle common cleansing task create a set of utility functions/tools/scripts. It might include, remapping values based on a CSV file or SQL database or, regex search-and-replace, blanking out all values that don’t match a regex
If you have an issue with data cleanliness, arrange them by estimated frequency and attack the most common problems
Analyze the summary statistics for each column ( standard deviation, mean, number of missing values,)
Keep track of every date cleaning operation, so you can alter changes or remove operations if required
Related questions
+1
vote
Q: List of some best tools that can be useful for data-analysis?
asked
May 21, 2021
in
Data Analytics
by
SakshiSharma
data-analysis-tools
data-analytics
0
votes
Q: What do you understand by data cleaning?
asked
Mar 13, 2022
in
PySpark
by
rajeshsharma
data-cleaning
0
votes
Q: What is called data cleaning?
asked
Oct 20, 2019
in
Dataware house
by
rajeshsharma
data-cleaning
+1
vote
Q: What are some of the statistical methods that are useful for data-analyst?
asked
May 22, 2021
in
Data Analytics
by
rajeshsharma
data-analyst
data-analytics
statistical-methods
0
votes
Q: What method of data representation is best suited to the demonstration of data results if that data is of differing nominal values and needs to represent quantitative data on different axes?
asked
Dec 17, 2022
in
Data Analytics
by
AdilsonLima
data-analytics
+1
vote
Q: List out the libraries in Python used for Data Analysis and Scientific Computations.
asked
May 30, 2022
in
Data Science
by
sharadyadav1986
python-libraries
data-analytics
0
votes
Q: A histogram is used for which type of data?
asked
Dec 17, 2022
in
Data Analytics
by
AdilsonLima
data-analytics
0
votes
Q: Which of the following is a powerful visualization technique for illustrating hierarchical data and part-to-whole relationships?
asked
Dec 17, 2022
in
Data Analytics
by
AdilsonLima
data-analytics
+1
vote
Q: what are the key skills required for Data Analyst?
asked
May 22, 2021
in
Data Analytics
by
rajeshsharma
data-analyst
data-analytics
+1
vote
Q: What is the name of the framework developed by Apache for processing large data set for an application in a distributed computing environment?
asked
May 21, 2021
in
Data Analytics
by
SakshiSharma
data-analyst
data-analytics
framework
apache
...