Let's take a simple example to illustrate the approach. Our database contains five facts, and each of them has millions or billions of rows. The facts, although they share dimensions, were developed at different times by different teams, so the naming conventions are not 100% consistent. Our objective is to list the count of rows and distinct count of load-batches and distinct count of accounts for each of the facts. Ideally we could write a simple query like the following, and only change the name of the table we run it against:
The problem is that the spelling of the fields BATCH_NUMBER and ACCOUNT_NUMBER is not consistent across all facts, so we need to vary each query accordingly. To handle this using the VDM approach we can take the following steps:
- Create a manifest, that contains three columns:
- The Fact Table name
- The column name used for Batch Number
- The column name used for Account Number
- The first row of the manifest file contains the labels used as variable names. Subsequent five rows describe each of the five facts.
- Write the SQL template as above, but instead of using the actual names for the three items in the manifest, use variables that will be macro-substituted by the code generator as follows:
- Use __TABNAME__ in place of the FACT in the from clause
- Use __BATCHNAME__ in place of the BATCH_NUMBER
- Use __ACCOUNTNAME__ in place of the ACCOUNT_NUMBER
- The query in the template should look like this:
- Use the vdmiter function to display the SQL after the substitution, or simply execute it.