VDM Public

Public information; No registration required

Versioned Dimensional Model (VDM)

“The Versioned Dimensional Model” (VDM) originated in 1991, before dimensional modeling was publicized by Kimball. The earliest publications (1992) were in the Relational Journal (Codd and Date’s Relational Institute) and later through the 90s in trade magazines and presented in conferences. Since 2000 various white papers and reports have been only available to existing and prospective customers upon request.

VDM Access: 

The VDM Registry

The Registry is a collection of functions with persistence that reliably handle the following areas:
Common Codes
VDM Access: 

Parallelism Models

There are three modes of parallelism shown below: Three Parallelism models
VDMETL on Regular Unix
VDM Access: 

Operational Setup

In order to facilitate the easy management of files we propose one of two methods of setting up the environment.
Moving Files Through Directories
Under this approach, files are moved into a landing directory. When the files are ready for processing, they are moved again to a directory that is monitored by demon processes which recognize the file, chose the correct channel the file should be processed in and invoke the vdmetl command to process it. After the processing is complete, the file is moved to a "check" directory and from there to the archive.
VDM Access: 

The Mapping Manifest - A Machine-Readable Specification

The specific manifests discussed here capture mappings from Cobol copybook names and positions to table columns. The purpose of these manifests is two-fold:
  1. As human-readable documents, they provide detail mapping and transformation specifications.
  2. As machine-readable documents, they assist in the development of correct and validated parsers that can process stage files and populate database tables.
VDM Access: 

HDFS and Unix Integration

There are two straightforward and one more complex way of achieving HDFS-VDM integration.

VDM Access: 

HDFS Design and Synergies with VDM

The Hadoop Distributed File System (HDFS) uses divide and conquer techniques behind the covers to distribute data and processing. The design of HDFS according to "Tom White - The Definitive Guide to Hadoop - O'Reiley" is driven by three primary objectives:

VDM Access: 

VDMETL - Hadoop Integration

There are two levels at which integration can occur:

  • Use HDFS...
    • to transform and store files for use by Hadoop
    • to land, store, manage, process and archive source data targeted to traditional DBMS platforms
  • Enable VDM-generated parsers and other modules to run under Hadoop using the map-reduce architecture
VDM Access: 

The VDMETL© Run-Time Framework

The run-time VDMETL© framework consists of the components required to decompress, read, parse and load  source files into their intended target. It uses parsers generated with VDMGEN© and services for handling code lookups, exceptions, and logging. The framework components are described in more detail through the links below.

VDM Access: 

VDMETL Framework

The VDMETL© framework is a customizable open set of scripts and processes utilizing the VDMGEN© capabilities to create, maintain and execute ETL data provisioning activities with simple available Unix tools. The run-time framework can vary from a basic parallel scalable run-time framework relying on Unix communicating processes that is provided, to a Hadoop/HDFS streaming framework. A totally redesigned Spark-based solution is under development. Parsers are generated from human/machine readable mapping specification documents.

VDM Access: 

Pages