Have you ever tried to fill every corner of a box with a single ball? The ball might fit but there will always be gaps.
Have you ever compared the characteristics of an apple to an orange? The results will always be the same, but the only conclusion that can be drawn is that they are different. You can’t glean any additional information from the apple-to-orange comparison.
Now think of a data warehouse design. The data warehouse will leave gaps, make comparative analysis difficult, and it won’t lend itself to self-service business intelligence if it’s built without:
- a properly formatted physical structure
- data that’s been subjected to a rigorous filtering and transformation process
- a data warehousing schema that’s easy for an end user to use and understand
Bin Jiang, a distinguished professor of a large university in China, suggests that the infrastructure of the data warehouse is an extremely important component. The infrastructure includes the system hardware and software that make up the data warehouse.
Jiang is correct, but I have been on multiple data warehousing projects where infrastructure components (CPUs, memory, storage, etc.) have been decided by teams other than the data warehousing team without consultation or coordination. Because of this lack of coordination between teams, some reworks and modifications to the infrastructure are required. This problem could be overcome by teams working collaboratively and following proven data warehousing standards.
Jiang lists two other noteworthy points regarding the data warehouse infrastructure – it should be unique and it should provide functionalities suited for data analysis with little or no data manipulation required by the end user. These functionalities include:
- Data integration – All types of data including data that is structurally and semantically different should be integrated.
- Data collection – The data warehouse should be the only infrastructure within the organization that keeps the collected snapshots available online for the whole organization for as long as the business requires.
- Data preparation – Having the appropriate filtering and transformation in place to make the data useful.
Source : Dennis Earl Hardy - Tibco Spotfire