Saturday, December 3, 2011


I fielded a question recently on the forums about what bridge tables are and what role they can play in a dimensional design.  The specific case was for someone trying to design a dimensional model for a healthcare organzation with a clinical research unit.  In their case, they had observations and lab results that they envisioned as fact tables.  And they had patients, physicians, procedures and diagnoses that they envisioned as dimensions.  The challenge was how to capture the many-to-many relationship between the events and the procedures and diagnoses.  And what benefits bridge tables would provide in this situation. 

Reality can be complicated.  This use case is a great example of that.  We don't always have the luxury of simple data with simple relationships.  That would be too easy.  In the real world, there are complicated scenarios like this one, that force us to design more flexible models.  Bridges are one way to provide the needed flexibility in a dimensional design.
Specifically, bridges allow you to capture the existence of a multi-valued dimension. So in our example use case in healthcare, each observation could have multiple procedures and multiple diagnoses related to it. 3 of our options would be:

1. Force the granularity to a level where the fact table has single valued dimension relationships. In this case, 1 row per procedure per diagnosis. If we can easily allocate all of the other relationships down to this lower level, try it. I don't believe this would be practical though with procedures and diagnoses. The reality is typically that the patient can have multiples.

2. Force a fixed number of slots in the fact table for a set limit of procedures and diagnoses. Say 5 of each. That means 10 FKs in the fact table, even for cases where there were less than 5 procedures and 5 diagnoses. And worse, when there were more than 5 or either, your model doesn't allow it, so that data is not captured and your users cannot answer that question. I've seen it implemented this way in billing systems and carried into the data mart design as a result, but would only recommend it if your users agree that anything beyond your fixed number of slots is not important to their analysis. If you go with this approach, I'd recommend still capturing the history with all diagnoses in a persistent staging area or some sort of backup, just so when they change their mind or an opportunity comes up that requires those additional slots, you'll have the historical data to support a redesign/backfill.

3. Use bridges to capture the many-to-many relationship between an observation and procedures and diagnoses. The bridge allows for a single key to be planted in the fact table. And the bridge captures all of the groupings of procedures or diagnoses that exist in your data.

NOTE: the colors in the models represent Yellow as playing a fact role and blue as playing a dimension role (like in the SSAS DSV)
The elegance of the bridge approach over a fixed-slot based approach is apparent in the flexibility to properly handle more (or less) procedures/diagnoses, and to enable filtering on a single attribute. For example, with SQL, if you wanted to see only the observations where a patient had a given diagnosis, in the slot-based model, you'd have to apply your filter to each slot (WHERE diagnosis1 = 'this' OR diagnosis2 = 'this OR diagnosis3 = 'this OR...). On the bridge, you simply apply your condition on the single attribute. You get a similar behavior in a drag-and-drop OLAP solution like Analysis Services. You will have to model the bridge as a many-to-many in Analysis Services or PowerPivot, so performance will play a role in your design. There are plenty of resources to assist with deciding what is too large and with how to implement these models in Analysis Services and PowerPivot. Some examples:

Russo - Many-To-Many Revolution

Regarding the need to drill across to the other fact table, you should be able to create a relationship from the Diagnosis Group and Procedure Group tables to the other fact, assuming the same group applies to both facts. Through that relationship, you can drill across to the other fact table.

Another good resource is always the Kimball writings on these topics. I would definitely recommend taking a look at The Kimball Group Reader. Article 10.3 actually has a full explanation of a healthcare bill with a diagnosis bridge model (like the full bridge I’ve shown above). They also have many other writings on this topic (typically indexed under multi-valued dimensions or bridges).

And one additional note. In the healthcare environment I worked in before, we had the same many-to-many issue with physicians. Multiple physicians (admitting, attending, nurses, tech, etc.) could be involved in a given episode. Another opportunity to put this M2M pattern to use if your environment has a similar reality.
Hope this helps.

No comments:

Post a Comment