Profiling in data warehousing project
Profiling into a data warehousing and business project can help success and more…
A good profiler analyzes data, structure and all elements with a basic attitude:
EVERYTHING IS POSSIBLE!!
All Data must be analyze and never thing that a source is accurate. Human is not perfect and can make some mistake.
It’s a collection of basic metadata about data attributes. It includes basic attribute listings, detailed descriptions and usage patterns, as well as reference information, including valid values and their meanings, default values, etc.
Subject area models define main data subjects – categories of high level business objects whose data is stored in the database. Relational data models depict logical relationships between various entities and attributes.
Data models and dictionary are the source of initial knowledge about data. Data profiling is a group of experimental techniques aimed at examining the data and understanding its actual structure and dependencies.
The reason it is so important is that actual data is often very different from what is theoretically expected. Over time data models and dictionaries become inaccurate. Data profiling is like an X-Ray showing the hidden truth. It is key to building correct data mappings and quality rules. As a rule of thumb, the more in-depth analysis and profiling we conduct the easier it is to design a comprehensive set of data mappings and quality rules and achieve greater success in data conversion and consolidations.
Difference with data cleansing
- it shows content of the data.
- it helps data governance committee to define data cleansing rules.
- Data cleansing rules have been implemented by development team.
- Conclusion: data profiling doesn’t give solution to resolve data quality issues.
- Data profiling is often mistakenly equated to attribute profiling. The cause of that mistake is the proliferation of efficient attribute profiling tools. However, comprehensive data profiling is a far broader exercise.
- Techniques are:
examines subjects in different tables or on different systems and helps to find where the information about each subject is stored;
- is an exercise in identifying entity keys and relationships as well as counting occurrences for each relationship in the data model. It is necessary to validate existing relational data models or build them when none are available;
- examines values of individual data attributes and provides information about frequencies and distributions of their values. It helps to identify meaning and allowed values for an attribute;
- looks for patterns in historical data, such as temporal distribution of the data, patterns of values for different time periods, etc…;
State-transition model profiling
- examines lifecycle of state-dependent objects and provides actual information about the order and characteristics of states and actions. It helps build or validate state-transition models;
- uses various pattern recognition techniques to find hidden relationships between attribute values.
- 4State-transition model profiling examines life...
- Timeline profiling looks for patterns in histor...
- Analyzing profiling results Data profiles provi...
- Mining basic statistics Attribute profiling pro...
- Attribute profiling examines values of individu...