bookdown.org/martin_shepperd/ModernDataBook/C5_DataQualCheck.html
1 Users
0 Comments
26 Highlights
0 Notes
Tags
Top Highlights
Validity
Accuracy
Completeness
Consistency
Traceability
Timeliness
There is a definite ordering for any data quality process: 1. Identify problems … 2. then determine and document the response - ignore (not recommended) — clean — impute
first
Data quality checking
many useful packages to help
everyday and necessary part of a data scientist’s work
One straightforward and useful package is {validate}
simple features of this package with a hypothetical example based on some education data.
show
Education example, we have a data frame with four variables or columns
first two variables are factors
remaining two are numeric
oddities
three levels
function
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.