Alan R. Simon - Data Lakes For Dummies

Здесь есть возможность читать онлайн «Alan R. Simon - Data Lakes For Dummies» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Data Lakes For Dummies: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Lakes For Dummies»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Take a dive into data lakes  “Data lakes” is the latest buzz word in the world of data storage, management, and analysis. 
decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs. 
With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored. 
Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

Data Lakes For Dummies — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Lakes For Dummies», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Establishing a Migration Path for Your Data Warehouses

Data warehousing has been on the scene since around 1990, which means that thousands of enterprise-wide data warehouses have been built and deployed over the years. In fact, looking back at the B-52 analogy earlier in this chapter, you can think of a data warehouse as the equivalent of a propeller-driven airplane that preceded the jet aircraft era, which, of course, makes the data lake the equivalent of that technology-leaping jet.

Data Lakes For Dummies - изображение 59Some ultramodern, large-scale enterprise data warehouses have been built in the past several years, using relatively new technologies such as the SAP HANA in-memory database management system. Many others, however, were built on older relational databases and are still chugging along. They still work okay, for the most part. But in this new era of data lakes, it’s time to decide what to do about the old-timers.

Sending a faithful data warehouse off to a well-deserved retirement

If your data warehouse is really showing its age, your best bet is to hold a nice retirement party in the company cafeteria with cake and ice cream for everyone and with a few speeches about how wonderful the data warehouse has served the company’s enterprise-wide reporting and business intelligence mission over the years. (Okay, you can probably skip the cake and ice cream, as well as the cafeteria party itself.)

Then you can do the same thing for your data warehouse that you do for any of your creaky, brittle data marts. Build a new set of data feeds from your source applications and systems into the data lake. Then within your data lake, rebuild the data models that your data warehouse used to support business intelligence and reporting alongside machine learning and other advanced analytics (see Figure 2-6).

Data Lakes For Dummies - изображение 60

FIGURE 2-6:Migrating your data warehouse into your new data lake.

Data Lakes For Dummies - изображение 61Your old data warehouse contents were likely stored in a dimensional model such as a star schema or a snowflake schema. Inside a data lake, the equivalent models might also be dimensional. Alternatively, you could be using a columnar database such as Amazon Redshift. You can still use a visualization tool such as Tableau or a classic business intelligence tool such as MicroStrategy, but your database design will differ from your old data warehouse.

Resettling a data warehouse into your data lake environment

Suppose you and your team actually did a fantastic job architecting and building your data warehouse. You did your work and deployed the data warehouse only a few years ago, using fairly modern technology. To put it simply, your data warehouse just isn’t ready for retirement. But you still want to build a data lake to take advantage of modern big data technology. What should you do in this case?

Just as with a solidly built data mart, you can sort of “forklift” a well-architected data warehouse into your data lake environment. You’ll still have to do some rewiring of data feeds, and you’ll be adding complexity to your overall analytical data architecture. But there’s no sense in exiling a solidly built data warehouse into oblivion if it can still deliver value for you for a while to come.

Aligning Data with Decision Making

Data Lakes For Dummies - изображение 62You don’t set out to build a data lake just to stuff tons of data into a modern big data environment. You build a data lake to support analytics throughout your enterprise. And the reason for your organization’s analytics is to deliver data-driven insights, with the emphasis on the term data-driven.

For better or for worse, the term analytics means different things to different people. As you set out to build your data lake, you need to understand what analytics means to your organization.

Deciding what your organization wants out of analytics

You should think of analytics as a continuum of questions that you ask about some particular function or business process within your organization, with the answers coming from your data:

What happened?

Why did it happen?

What’s happening right now?

What’s likely to happen?

What’s something interesting and important out of this mountain of data?

What are our options?

What should we do?

Data Lakes For Dummies - изображение 63Your data lake needs to support the entire analytics continuum in all corners of your organization.

Suppose that Jan, your company’s CPO, is incredibly pleased with the work that Raul’s team did to have your data lake support machine learning models for the evaluation cycle. So, she asks Raul to expand the HR organization’s usage of analytics that are enabled by the data lake. Raul sits down with his analysts, Julia and Dhiraj, to create a master list of analytical questions that should be considered for implementation.

Raul’s team has the easiest time with “What happened?” types of questions, because these are what your company’s data warehouse and data marts have been producing for years. Now, though, your data marts and data warehouse will either be retired or incorporated into the data lake environment, so your data lake can take over this mission and serve up the data to answer questions along the lines of:

Which employees have consistently been rated in the top quintile in each department during the past three years?

Which employees have received the largest percentage salary increases during all evaluation periods during the past five years?

How many new employees were hired in each of the past three years?

How many employees left during each of the past years? How many of those resigned? How many were involuntarily terminated? How many retired?

Because your company’s executives are somewhat on the formal side, your list of “What happened?” questions will be categorized under the label descriptive analytics. In other words, your data lake will be producing analytics that describe something that happened in the past (which might be the very recent past, several years ago, or perhaps even farther back). But just like your existing data warehouse and data marts mostly do, your data lake will now be producing descriptive analytics.

You also need the data lake to help you dig into the reasons something happened. For example, your descriptive analytics tell you that the number of employees who voluntarily resigned from the company last year was 25 percent above the yearly average for the previous five years. Inquiring minds want to know why!

Diagnostic analytics help you dig into the “why” factor for what your descriptive analytics tell you, and — congratulations! — your data lake will take on another assignment. In this case, you can be sure that Jan, your CPO, will be digging for answers now that she’s clued in to the increase in employee turnover.

Raul is well aware that, although insight into past results is an important part of your company’s analytics continuum, Jan and the other executives — as well as many others at all levels of your organization — also need deep insights into what’s happening right now. Before working in HR, Raul used to be in the supply chain organization. His specialty there was providing up-to-the-minute, near-real-time reports and visualizations for logistics and transportation throughout the entire supply chain.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Data Lakes For Dummies»

Представляем Вашему вниманию похожие книги на «Data Lakes For Dummies» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Data Lakes For Dummies»

Обсуждение, отзывы о книге «Data Lakes For Dummies» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x