Strengthening the analytics relationship between IT and the business
If a tree falls in a forest, but nobody is around to hear it fall, does it make a sound?
Or how about this one: If you build a system to support analytics across your organization and load it with tons of data, but nobody really uses it, does your organization really have analytical data?
Don’t worry, you didn’t go back in time to a college philosophy class — you won’t be graded on your responses to either of these questions.
You can think of a data warehouse as a direct ancestor of a data lake. Data warehousing came onto the scene around 1990, and it has been the primary go-to approach for enterprise analytics in the decades since.
Far too many of today’s data warehouses are like that tree falling in a forest. The IT side of your company originally set out to build an enterprise-wide home for analytical data that will support reporting, business intelligence, data visualization, and other analytical needs from every corner of your organization.
Alas, that data warehouse, like so many others, came up short. Maybe the data warehouse doesn’t contain certain sets of data that are needed for critical analytics. Perhaps the data warehouse contents aren’t properly organized and structured and are difficult to access with the business intelligence tools available. Whatever the reason may be, your organization’s business users finally said, “To heck with it!” and built their own smaller-scale data marts to satisfy their own departmental or functional analytical needs.
Along the way, a sense of distrust built up — at least when it came to analytics and data — between your IT organization and the business users who are supposed to be their customers. Not good!
The data lake presents your organization with an opportunity for a fresh start. You can apply many of the best practices and also the painful lessons from 30-plus years of data warehousing to your data lake efforts and avoid repeating the mistakes and shortcomings of the past. As your data lake gets built, no matter if you’re on the IT side or the business side of your company, you can help rebuild that essential trust, especially when it comes to all-important analytics and the resulting data-driven insights.
Reducing Existing Stand-Alone Data Marts
You really can’t argue with the original concept of an enterprise data warehouse! Figure 2-1 illustrates the basic idea of a single home for most or all of the data needed to support a broad range of analytics across the entire enterprise.
Sounds like a great idea, right?
FIGURE 2-1:The vision of an enterprise data warehouse.
Dealing with the data fragmentation problem
A lofty vision is one thing; reality is often something else. Figure 2-2 illustrates how almost every organization’s idea of centralized, enterprise-scale data warehousing eventually surrendered to a landscape littered with numerous stand-alone, nonintegrated data marts.
Okay, so maybe the idea of “Do your own thing, and build your own data mart” got out of control. Now that you can see what a mess that approach created, why not just retire those data marts and fold them into your enterprise data warehouse that’s probably underutilized?
A collection of independent data marts is almost always hampered by a lack of common master data (for example, to sales, a “customer” may be something different than a “customer” is to your marketing team), different software packages and technologies across the data marts, and other challenges. Taken together, these challenges make it almost impossible to consolidate separate, independent data marts back into a single data warehouse. Most organizations instead throw their hands up in the air and say that they’re following a federated data warehouse approach. You “create” a federated data warehouse by simply declaring that some or all of your data marts are part of a “federation” that, when considered together, are sort of like a data warehouse. “Um … yeah, that’s our story, and we’re sticking to it. It’s magic!” (Not really … and not all that valuable from an enterprise-wide perspective.)
FIGURE 2-2:The reality of numerous stand-alone data marts.
Decision point: Retire, isolate, or incorporate?
What should you do about your proliferation of data marts now that your organization is building a data lake? The short answer: Get rid of the data marts … or at least most of them!
You have three main options for how to deal with your proliferation of independent data marts as part of your data lake initiative:
Retire some or all of the data marts, and replace them with data lake functionality.
Isolate some of the data marts, and leave them in place alongside your new data lake.
Incorporate some of your data marts as components of your data lake.
If your existing data marts are creaking and groaning and are now coming up short even for the analytical needs of their respective users, here’s a great idea: Get rid of them!
Figure 2-3 shows how your new data lake gives you the perfect opportunity to not only get your data mart proliferation under control, but also upgrade your overall analytics.
FIGURE 2-3:Using a data lake to retire data marts.
Chances are, most of your data marts, especially those that have been around for a while, support descriptive analytics (basic business intelligence functions such as drilling deeper into summarized data to gain additional insights from lower levels of your data). But what about advanced analytical needs such as machine learning or other data mining and artificial intelligence–enabled analytical needs? Probably not so much!
So, why keep those aging data marts around? Redirect the data feeds from your source systems into your new data lake, and rebuild your analytics for accounting, your human resources (HR) organization, sales and marketing, and other parts of your enterprise within the data lake environment.
What if one of your existing data marts is an absolute work of genius? Suppose that three or four years ago, your company built a data mart to support your annual strategic planning cycle. Your strategic planning data mart has data feeds from numerous applications and systems around your enterprise. Do you really want to reinvent the wheel just because you’re now building a data lake?
Great news: You don’t have to throw away your data mart baby along with the data lake water! (Okay, maybe not the best metaphor, but you get the idea.)
Читать дальше