Metric Integration through Union-Pivot Views
When code generators, such as MicroStragegy, build reports that integrate metrics from different fact tables they typically produce small contained queries or table expressions against each fact, and then they use a pivoting step that align the metrics into rows by their common key. Depending on the code generation settings a report may follow one of two strategies:
- It uses temporary tables and independent queries for collecting the required intermediate results from each fact (e.g. Sales, Sales Adjustments, Inventory, Inventory Adjustments) and each separate set of criteria (e.g. this year vs. last year). A final query pulls the generated temporary tables into a single result using a full outer join with coalesced keys in the on-clause. Indexes and runstats can be generated for each temporary table.
- All the queries are issued as table expressions in a single with-query ending in the usual pivot. Syntactically the table expressions are equivalent to the individual queries described earlier with the same final pivot step.
The first option is legacy, while the second options only evolved more recently since table expressions made it into the SQL standard. The advantage of the first approach, is that it isolates the complexity into stepwise queries, while the advantage of the second, is the avoidance of the explicit table overhead, and the potential for the optimizer to improve and rewrite the statement as one unit. For reports with a large number of facts and metrics that generate a large number of passes, the first method may still be more efficient as the second option overwhelms the optimizer which has to deal with an explosive increase of branches to evaluate. More often than not the clock stops the pruning and searching for the best option much too early resulting in an inefficient plan that is typically slower than option 1. It seems that the first option degrades when there are too many rows entering the final pivot stage, while the second option suffers when there are too many facts/stages involved. One problem that both approaches suffer from, is the way that Microstrategy integrates the intermediate result sets by the use of nested outer joins. A union-based pivot is better suited and is likely to improve both options. In the case of the temporary tables, it results in a simple sort-merge that almost always beats the outer join technique. More importantly however it helps simplify the optimization of the second option. The union imposes a separate optimization block for each of its member table expressions, and thus imposes a logical segmentation to the optimization, drastically limiting complexity. This makes the second approach possible with all its advantages and mitigating its disadvantages. Although it is inevitable that sooner or later Microstrategy should correct this inefficiency, and provide an option to union-pivot, in the meantime we can integrate facts into a smaller number of union-pivot views which Micrstrategy can use in place of the underlying facts, thus creating fewer passes with more metrics on each pass. As the views get queried, the union pivot built into the view efficiently integrates the metrics leaving a smaller number of intermediate results and fewer rows for the final outer-join pivot inevitably generated by Microstrategy. To illustrate the concept assume four facts: SLS (Sales), SLS_ADJ (sales adjustments), RCPT (store receipts), RCPT_ADJ (receipt adjustments) and assume that each is keyed on product key P, sore key S, and date key D. Finally lets assume that each of these facts carries an amount AMT and a QTY. Here is how the base facts would look: SLS(P#, S#, D#, SLS_AMT, SLS_QTY) SLS_ADJ(P#, S#, D#, SLS_ADJ_AMT, SLS_ADJ_QTY) RCPT(P#, S#, D#, RCPT_AMT, RCPT_QTY) RCPT_ADJ(P#, S#, D#, RCPT_ADJ_AMT, RCPT_ADJ_QTY) A typical Microstrategy report involving amounts from these four facts comparing the last financial period vs. the same period last year, would create eight passes, one for each fact for this year's period and one for last year's. It would finally outer join the eight intermediate results to return the report. To improve the situation, we can produce this union-pivot view. Now, Microstrategy will have to pick all metrics from the same table, our new view, which will require only two passes (one for each year) before the outer-join-pivot. When the query gets executed, DB2 will push the predicates through the union-all view and will execute a four-pass equivalent plan for each year. It will only apply one "group by" around each and pass the results to the outer-join-pivot step.