Relationship profiling in data warehouse
Relationship profiling is an exercise in identifying entity keys and relationships as well as counting occurrences for each relationship in the data model. It is necessary to validate existing relational data models or build them when none are available.
Definition of relationship profiling
Relational data model describes high-level logical data structure using standard concepts, such as:
- Entity is a class of structurally similar persons, things, places, concepts, or events about which the data is recorded. Each representative of an entity is called entity occurrence;
- Attribute is a most primitive atomic characteristic of an entity;
- Relationship is an association between occurrences of two entities;
- Relationship cardinality indicates how many occurrences of each entity can participate in the relationship;
- Primary key is a nominated set of attributes that uniquely identifies each entity occurrence;
- Foreign key ties an attribute or a collection of attributes of one entity with the primary key of another entity.
In practice data models are often not kept up-to-date with the actual data. Entity-relationship profiling provides information about actual entity relationships. Several relationship profiling :
Identity profiling checks primary keys and other unique keys within entities. It provides information about true identify of various entities and identifies any duplicates.
Reference profiling checks foreign keys. It provides information about foreign key violation in the real data.
Relationship cardinality is rarely represented correctly in relational data models. For example, optionality is frequently built into the entity-relationship diagrams simply because real data is imperfect. Strong entities are routinely allowed to have no corresponding weak entity records simply because database designers expect bad and missing data.
Cardinality profiling is used to understand true relationship cardinality. It is a simple exercise in counting all relationship occurrences. Once counted the results are presented in a cardinality frequency diagram. The diagram will show how many of the parent records have 0, 1, 2 and so on corresponding dependent records.
- 4State-transition model profiling examines life...
- Timeline profiling looks for patterns in histor...
- Analyzing profiling results Data profiles provi...
- Mining basic statistics Attribute profiling pro...
- Attribute profiling examines values of individu...