How General should Facts be in the Database

When certain abstract behaviors are added to the system, the fact taxonomy automatically extends to include more specializations. Let's take the sales fact as an example: When we add the TY/LY comparison behavior, we autmatically break down Sales to TY_Sales, LY_Sales and possibly PY_Sales corresponding to this year, last year and the prior year respectively. Now consider introducing the Relocation behavior and each of these categories of sales are further subdivided into regular and relocated store sales, leading to six distinct metrics. If we bring in the Same-Store Sale distinction the Relocated Store Sales are further broken down to SSS_Sales and Non-SSS-Sales.

The key question we have to reconcile is whether (or to what degree) the database schema should be exposed to all these nuances and refinements of the basic Sales metric. And if so, does the database need to recognize all of the refinements or only some of them? One extreme is to implement all specializations of Sales in the BI tools so that queries can be formulated to extract and properly report each subset of sales from a database that only delivers Sales. Another approach is for the database to deliver each of these refined metrics (TY_SSS_SLS, LY_SSS_SLS, TY_RELO_SLS etc.) and the BI tools to map directly to them. There are obvious advantages and disadvantages in both approaches that can be summarized in these two lines of thinking:

  • Keeping the database metrics generic rather than specific leads to simpler and more stable structure and content. The metadata-driven capabilities of BI tools can generate SQL of substantial complexity that does not need to be built into the database
  • Building business transformation rules inside each of the BI, reporting and extraction tools is redundant and prone to inconsistency. If SSS_LY_Sales is defined at the database, then the BI can use it, extraction tools for feeding internal or external systems can share it, and the definition needs to be managed in one place only.

A criterion, that is not often considered, is the ability to leverage aggregations in delivering these metrics. This does not necessarily require that each metric appear in an MQT, but it asks for making such metrics computable from aggregated metrics and other attributes in MQTs. Here are some guidelines to do this:

  1. Include additive metrics. Metrics that are not additive can some times be converted to additive ones. For example avoid averages, and include amounts and counts. By dividing the amount by the count at any level an average can be computed.
  2. Enable localization of computation by making each metric computable from a fact and its dimensions. LY Sales should be directly computed from the Sales fact and the time dimension. Similarly, relocated store sales and same store sales should be computable from the fact, the store and time dimensions
  3. Check for symmetry, and make sure that opposite behaviors have symetrical non-contextual computations. Ensure that metrics that are computed differently for symatrical behaviors are resolved in the database. For example, TY/LY Sales under the Relo and Non-Relo behavior are symetrical, since the designation of "relo" or "non-relo" applies equally to TY and LY. However, if you consider the TY/LY and Same-Store behaviors, you will notice that the "Same Store" classification is different for TY sales than it is for LY Sales. More specifically, sales for a particular day and a particular store, can be consider same store in comparisons with the previous year, and non-same-store when compared to the next year. Although the calculation of whether last year's fact is qualified as same-store can be derived from this year's store and date, such a computation would violate the localization rule defined earlier. In order to decide if last year's fact can be considered same-store we would need this computation to be derived by the fact and its dimensions directly. Since the direction of comparison requires a different computation or different indicator or flag, then for the database must be aware of the distinction between TY and LY metrics.
VDM Access: