Multi-Fact Cubes and Metric Integration
Most BI reports involve more than one fact table whose metrics need to be pulled together. Pivot union views can be used to combine metrics from multiple facts of same (or almost same) dimensionality while exploiting their MQTs. Two key considerations in building such views are:
- Maintain best performance for widest range of queries
- The structure of pivot union views typically entails a small number of members in a "union all" query, each of the members involving a fact joined to some dimension(s). The sources of inefficiency can stem from two situations:
- Queries that can be answered using a subset of the union members should not be burdened accessing the unneeded members.
- Queries that don't need to take all the joins should not have to be burdened by the unneeded joins.
The first concern can be addressed in two ways. The most efficient way is to expose a constant literal for each member of the union which can be filtered by the query. In those situations the optimizer will eliminate the unneeded members from the plan. This approach provides the best guarantees but it requires the user or tool issuing the query to know something about these literal constants and to filter them accordingly. Another way is to associate a view column with a predicate that differentiates each member of the query. For example, if each table represents a separate year, then associating each member in the view definition with a condition "where yr=9999" or "DT between '2008-01-01' and '2008-12-31'". This approach requires that the plan accesses all members, but the actual cost of such access is typically negligible.
The second concern is to avoid unnecessary joins. Since the optimizer sees a join in the view, even if no columns are involved in the query from the joined table it will have to execute the join just in case it is a filtering join (i.e. it reduces the rows of the result set). The way to assure the optimizer that there is no row in the fact that does not have a dimension counterpart is to declare a foreign key and make it mandatory. In that case, the optimizer is likely to ignore the join. If the foreign key is optional (i.e. nulls are allowed) then the optimizer can still drop the dimension join but will filter-out rows with null foreign keys.
- How do we deal with dimensions that are present in one contributing fact but not the other?