LibCat » Книги » Приключения » unrecognised » Alan R. Simon - Data Lakes For Dummies

Alan R. Simon - Data Lakes For Dummies

Здесь есть возможность читать онлайн «Alan R. Simon - Data Lakes For Dummies» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Data Lakes For Dummies
Автор:
Alan R. Simon
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
5 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 100
- 1
- 2
- 3
- 4
- 5

Data Lakes For Dummies: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Lakes For Dummies»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Take a dive into data lakes “Data lakes” is the latest buzz word in the world of data storage, management, and analysis.
decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs.
With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored.
Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

Data Lakes For Dummies — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Lakes For Dummies», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

The Hadoop open source environment, particularly the HDFS, is one of the first and most popular examples of big data. Some of the earliest data lakes were built, or at least begun, using HDFS as the foundation.

Data Lakes For Dummies - изображение 33 For purposes of establishing a data lake foundation, Amazon’s S3 and Microsoft’s ADLS both qualify as big data. Why? Both S3 and ADLS support the three Vs of big data, which are as follows:

Storing extremely large volumes of data

Supporting a variety of data, including structured, unstructured, and semi-structured data

Allowing very high velocity for incoming data into the data lake rather than requiring or at least encouraging periodic batches of data

Data Lakes For Dummies - изображение 34 Think of big data as a core technology foundation that supports the three Vs of next-generation data management. Big data by itself, however, is just a platform. It’s the natural body of water — the lake itself — at a popular lakeside resort. When you divide your big data into multiple zones, add capabilities to transmit data across those zones, and then govern the whole environment, you’ve built a data lake surrounding that big data foundation. You’ve done the analytical data equivalent of building the docks, the restaurants, and the boat slips surrounding the lake itself.

The Data Lake Water Gets Murky

In addition to data lakes, you may come across references to data ponds, data puddles, data rivers, data oceans, and data hot tubs. (Just kidding about the last one.) What’s going on here?

Data Lakes For Dummies - изображение 35 Your job when planning, architecting, building, and using a data lake is complicated by the fact that you don’t have an official definition published by some sort of standards body, such as the American National Standards Institute (ANSI) or the International Organization for Standardization (ISO). That means that you or anyone else can define, use, and even publish your own terminology. You can call a smaller portion of a data lake a “data pond” if you want, or refer to a collection of data lakes as a “data ocean.”

Don’t panic! Of all the “data plus a body of water” terms you’ll run across, data lake is by far the most commonly used. All the characteristics of a data lake — solid architecture, support for multiple forms of data, a support ecosystem surrounding the data — apply to what you can call a data pond or any other term.

If William Shakespeare were still around and plied his trade as an enterprise data architect rather than as a writer, he would put it this way: “A data lake by any other name would still be worth the time and effort to build.”

BACK TO THE FUTURE WITH NAME CHANGES

In the early 1990s, data warehousing was the newest and most popular game in town for analytical data management. By the mid-’90s, the concept of a data warehouse was adapted to a data mart — essentially, a smaller-scale data warehouse. The original idea behind a data mart called for the data warehouse feeding a subset of its data into one or more data marts — sort of a “wholesaler-retailer” model.

The first generation of data warehouse projects, especially very large ones, was hallmarked by a high failure rate. By the late ’90s, data warehouses were viewed as large, complex, and expensive efforts that were also very risky. A data mart, on the other hand, was smaller, less complex, and less expensive, and, thus, considered to be less risky.

The need for integrated analytical data was stronger than ever by the end of the ’90s. But just try to get funding for a data warehousing project! Good luck!

Time for plan B.

Data warehouses went out of style for a while. Instead, data marts became the go-to solution for analytic data. No matter how big and complex an environment was, chances are, you’d refer to it as a data mart rather than a data warehouse. In fact, the idea of an independent data mart sprung up, and the original architecture for a data mart — receiving data from a data warehouse rather than directly from source systems — became known as a dependent data mart.

Fast-forward a couple of decades, and it’s back to the future. First, big data sort of evolved into data lakes. Now you have analysts, consultants, and vendors complicating the picture with their own terminology. This won’t be the last time you’ll see shifting names and terminology in the world of analytic data, so stay tuned!

Chapter 2

Planning Your Day (and the Next Decade) at the Data Lake

IN THIS CHAPTER

картинка 36 Taking advantage of big data

картинка 37 Broadening your data type horizons

картинка 38 Implementing a built-to-last analytical data environment

картинка 39 Reeling in existing stand-alone data marts

картинка 40 Blockading new stand-alone data marts

картинка 41 Deciding what to do about your data warehouses

картинка 42 Aligning your data lake plans with your organization’s analytical needs

картинка 43 Setting your data velocity speed limits

картинка 44 Getting a handle on your analytical costs

Suppose that you and about 15 other family members or friends all head to your favorite lake for a weeklong summer vacation.

You love going to the lake because you jump into your sailboat every day and spend hours out on the water. Others in your group, though, have their own favorite pastimes. Some prefer a boat with a little more “oomph” and spend their days in speedboats, zooming up and down the length of the lake. Others prefer leisurely canoeing. Some are into waterskiing, so they take turns latching onto one of those speedboats and zipping along the water. Others in your group are into fishing, and that’s how they spend most of their time at the lake. Still others aren’t all that interested in even going out on the water at all — they plop down on the beach to read, soak up some rays, and even grab a snooze every afternoon.

A data lake is very much like that weeklong trip to your favorite lake. Because a data lake is an enterprise-scale effort, spanning numerous organizations and departments, as well as many different business functions, you and your coworkers will likely seek a variety of varying benefits and outcomes from all that hard work.