VDM Public

Public information; No registration required

TY/LY Relation

Every TY (this year's) day, week, period or quarter, has a corresponding LY (last year) day, week, period or quarter. This implicit mapping is straightforward for 52-week years. The acid test, however, is in handling 53-week years.

There are two common approaches for defining the TY-LY relationship:

  1. The TY and LY have the same day number (1-371) and week number (1-53.) Higher levels, i.e. periods and quarters, are made up of the same days and weeks following a 4-4-5 pattern, so any TY day belongs to the same period as its corresponding LY day. The only oddity is that we have to skip the 53d week in year-to-year comparisons.
VDM Access: 

Balancing Storage & CPU Organization

Each partition has to have not only its own directories and filesystems, but these file systems must be striped over enough disk spindles connected through an I/O path of adequate bandwidth.

VDM Access: 

Platform Requirement

The following configuration is based on these general guidelines. It covers data partition nodes, coordinator/administration node, application server and spare. We propose initially to configure all nodes as production data servers for flexibility and expandability. QA and Test nodes can be built with higher density and less I/O bandwidth per core. The following diagram captures the data path components, unit rates and relationships driving their configuration selections and sizing. Data Path - Unit Rates and Configuration Rationale

    VDM Access: 

    Workload Assumptions and Design Capacities

    The current workload is has the following characteristics:

    Storage

    Current state

    • 6.0 TB of base uncompressed total data volume (about 2TB/year)
    • 3.0 TB of index space (1TB.Year)
    • 4.0 TB of Active base uncompressed data (about 66%)

    Design Capacity

    • 15 TB of base uncompressed data for 5 year history @ 3.0TB/Year based on 2000 stores
    • 5.0 TB Added for Transaction Detail and other new functionality 5.0 TB
    • 9.0 TB of index space
    • 10.0 TB of Active base uncompressed data including new MQTs.
    • After compression - active data 5.0 TB, raw data 10.0 TB - Index 9.0 TB
    VDM Access: 

    Time-Versioned Behavior

    When it comes to time versioning of objects in a dimension, VDM enforces the following two basic rules:

    • Object identity (ID) is separate from version identifier (VID). A version or an object is identified by the combination {ID+VID}
    • An object cannot have more than one effective version at a given point in time

    There are three methods by which the versioning intervals are assigned:

    Dense versioning

    Each object of the dimensions has exactly one version effective for any point in time from the beginning-of-time to the end-of-time.

    Semi-dense versioning

    VDM Access: 

    Dimensions as Hierarchies

    VDM recognizes three types of dimensions from the perspective of their behavior as hierarchies:

    Trivial Single-Level

    These dimensions are usually modeled as a single entity. They carry an identifier and basic descriptive attributes, such as a name and description. They potentially may have additional attributes that can be used to aggregate facts. As an example, take a Geo-location dimension, which may have Zip-Code as an attribute. We may aggregate and report sales by zip-code and retain summaries with zip as one of the aggregate dimensions.

    Fixed Level Hierarchies

    VDM Access: 

    The Relational Taboo - Recursive Dimensions and Topological Ordering (Nested Sets)

    The original article was published in the October/November 1992 issue of the Relational Journal, and proposed a way to effectively make ancestor-descendent searches in hierarchies working around relational database limitations, i.e. the lack of recursion capability. Since then variants on the technique appeared at trade journals and books. An excellent explanation is included in the "SQL for Smarties" by J. Celco as well as his more recent "Trees and Hierarchies in SQL fr Smarties".

    VDM Access: 

    Time Dimension

    The dimension of time is such a core and critical concept in dimensional design and analytics that sometimes we forget some of the subtleties and nuances involved. Since practically every large table in the data warehouse contains a reference to the time dimension, any time or calendar change may have substantial repercussions. Here are the things to consider:

    Time Grain

    Most business applications require daily grain. There are often requirements that involve time-of-day grain, but before you decide to build a time dimension at the hour or minute level consider the problem and its implications carefully.

    VDM Access: 

    Fact behaviors

    These behaviors primarily influece the functionality of maintaining, accessing and reporting on facts at various aggregation levels.

    VDM Access: 

    Pages