Data Vault: Lower Total Cost of Ownership (Part 1)

When an organization undertakes a major Information Technology project, most executives, managers, and project sponsors overlook an important component in the total cost of ownership equation; “maintenance and enhancements“.

I am not referring to software, hardware, and cloud hosting maintenance fees.  When capitalized dollars are used as the funding source for the project, due diligence is given to these items as these costs are folded into the depreciation schedule associated with the endeavor.

I am referring to the steady state costs.

Based on the architecture and design patterns employed in the project, what are the future carrying expenses as enhancements and maintenance activities are performed?

In my experience, there are two use cases which affect the underlying data warehouse supporting business intelligence applications.  First, is the discovery of a new dimension.  Second, is the consumption of a new data source.

Assume an organization has a robust “Sales” Data Warehouse.  Since the “Sales” team paid for the project, the dimension and fact tables were designed from a “Sales” perspective.  The data in the warehouse contains both pipeline and closed deals.  In support of bonus and commission analytics, payment details are stored in the warehouse closed deals.

Since the launch, the finance team has been granted access to the platform.  Upon use of the platform, the finance team notices several financial dimensions (Cost Center, General Ledger Account, etc.) are absence from the payment details.  The finance team requests these dimensions be added to the “Sales” data warehouse.

The additional and most likely unplanned operating expense required to implement this use case will vary based on the underlying platform architecture and design patterns.

The cost is dependent on the changes required in the warehouse data model, the design pattern implemented within the existing Extract, Transformation, and Load (ETL) software, the extent of regression testing required to ensure financial integrity, and the cycle time required to implement these changes throughout the development, quality assurance, and production platforms.

In a Kimball design, the use case will generate several new Dimensions.  Each new Dimensional key will require inclusion into one or more Fact tables.  For every existing Fact table impacted by the change, a full table reload will be required.  It’s possible that the input data sources required to reload the Fact table(s) may no longer be available.  This will necessitate the development of conversion ETL software which utilizes the existing Fact table as an input source.  In which case, the missing input sources will default the key for each new Dimension.

The existing ETL software program responsible for loading the fact table(s) will require maintenance.  The exercise introduces risk to the existing code line.  In order to mitigate the risk, extensive testing, and balancing must be implemented.

In a Data Vault design, dimensions are implemented as Hubs and Satellites.  The Hub contains the business natural key, while the Satellite contains the remaining attribution.  Satellites are effective dated so that historical changes to dimensional data is readily available.

Relationships between Hubs are modeled in Link tables, and the attribution of the relationship is stored in a Link Satellite.

In the Data Vault, each new dimension is implemented as a Hub and a SatelliteLink and Link Satellite tables are created based on the relationships discovered by the introduction of the new entities.  The Link Satellite table contains the fact table attribution of the particular business relationship.  Fact table attribution becomes distributed in a data vault.  Attributes are stored based on the business relationship between the Hubs.

In the Data Vault design, the original portion of the “Sales” model remains in tact.  ETL software is developed to support the processing of the additional financial Hubs, Satellites, and Links to the existing portion of the “Sales” vault.  The existing ETL software which manages the transaction data into the vault remains unchanged.  The logical data model is extended to seamlessly blend the new database artifacts as appropriate.

Bare Cove’s best practice recommends the creation of a logical abstraction layer from every physical data model.  This provides a single point of performance tuning and a single access point to the data.  In both Kimball and Data Vault designs, I have deployed logical models.   Physical Data Vaults can be logically persisted as a Kimball design.

In part two of the series, I will tackle the second use case; the consumption of an additional data source.

Editors Note: I am practitioner of Data Vault modeling.  I am not certified in this subject matter.  The concepts presented here are introductory in nature and are not substitute for professional training.  I do not recommend the implementation of a Data Vault without the proper level of training, support, and/or certification.

Dan Linstedt is the inventor of the Data Vault.  Please visit his web site for further information, training, and certification.



Leave a Reply