The VDM Registry
- Common Codes
- For each code category registered, the list of canonical values with names, abbreviations, descriptions and associations to other codes are stored and managed. Multilingual support is an option. Each such code is assigned a universal ID (UID) that can be used as a feature or a key in the data warehouse. The code-UID assignment is "sticky" in that once a code is assigned a UID, even if deleted and reinserted will always get the same UID. Each canonical codes is associated with match-codes (values coming-in from source systems) so that lookup functions can be used to convert incoming code values to canonical ones. The system allows for the use of contexts, so that exceptions to regular rules can be accommodated.
- Surrogate Key Service
- A series of "domains" are maintained, each with its own ID generator, to provide universally unique IDs for the series. UIDs can be 4-byte or 8-byte integers and their assignments is sticky - that is, if a UID is associated with a literal once, that literal always gets the same UID. The service also provides an API with the natural key as parameter that returns the UID for that value. If the value appears for the first time, a new key is generated, otherwise the previously assigned UID is returned. The service guarantees the integrity of the mapping even if multiple requests are issued simultaneously from different processes.
- Run Groups and Logging (Batches)
- Run Groups are batch-IDs tracking the pedigree of database data. A Run-Group is associated with every source file-name entering the system. The same file processed more than once takes the same Run-Group, however not all run-groups have to have a file association. Run Groups associated with files are only allowed once into the system. If the same run-group is attempted to run through an ETL process, the system stops unless an explicit instruction to "Reverse" the old run-group is given, in which case the system will remove the data of the old run-group before reprocessing the new run-group. This, of course, assumes insert-only processing of the base data, or the retention of "before" images. Non-source-file run groups have dependent run-groups that help trace the lineage of secondary transformations, usually performed inside the database. These rules are customized based on the situation and needs at hand. A log is maintained for each run-group. Every time the process runs, a message is appended to an XML area of the record with details associated with the run. The history and status by run-group, date range, fact family, parser version, channel of execution or other tracked parameter can be retrieved.