The Data Refinement process ensures that input data is accurate, consistent, and ready for effective entity resolution. This process involves three key activities: validation, alignment, and enrichment. Validation identifies and corrects erroneous or potentially incorrect data, such as removing clearly fake or default values (e.g., invalid email addresses). Alignment ensures that data conforms to predefined reference values, formats, or fields, ensuring that previously inconsistent or varied formats (e.g., phone numbers) are now standardized and compatible for matching. Enrichment involves enhancing the data by adding further attributes that improve entity resolution. By making data as consistent as possible, this process reduces errors in associating records and improves the overall accuracy of entity resolution.
Note: DMBoK2 section 1.3.3.4.3 refers to this process as: "Data Validation, Standardization, and Enrichment"
|
|
Source | Data Management Body of Knowledge 2nd Edition (DMBoK2) |
Publisher | DAMA |