LibCat » Книги » Приключения » unrecognised » Dave Fowler - The Informed Company

Dave Fowler - The Informed Company

Здесь есть возможность читать онлайн «Dave Fowler - The Informed Company» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
The Informed Company
Автор:
Dave Fowler
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

The Informed Company: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «The Informed Company»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Learn how to manage a modern data stack and get the most out of data in your organization! Thanks to the emergence of new technologies and the explosion of data in recent years, we need new practices for managing and getting value out of data. In the modern, data driven competitive landscape the «best guess» approach—reading blog posts here and there and patching together data practices without any real visibility—is no longer going to hack it.
provides definitive direction on how best to leverage the modern data stack, including cloud computing, columnar storage, cloud ETL tools, and cloud BI tools. You'll learn how to work with Agile methods and set up processes that's right for your company to use your data as a key weapon for your success . . . You'll discover best practices for every stage, from querying production databases at a small startup all the way to setting up data marts for different business lines of an enterprise.
In their work at Chartio, authors Fowler and David have learned that most businesspeople are almost completely self-taught when it comes to data. If they are using resources, those resources are outdated, so they're missing out on the latest cloud technologies and advances in data analytics. This book will firm up your understanding of data and bring you into the present with knowledge around what works and what doesn't.
Discover the data stack strategies that are working for today's successful small, medium, and enterprise companies Learn the different Agile stages of data organization, and the right one for your team Learn how to maintain Data Lakes and Data Warehouses for effective, accessible data storage Gain the knowledge you need to architect Data Warehouses and Data Marts Understand your business's level of data sophistication and the steps you can take to get to «level up» your data
is the definitive data book for anyone who wants to work faster and more nimbly, armed with actionable decision-making data.

The Informed Company — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «The Informed Company», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

This stage is right if:

Only a few people are going to be working with this dataset.

Data needs are minimal at the moment.

Only a few small data sources exist.

The only people who need to make new visuals are fairly technical.

It's time for the next stage if:

Data is accessed from multiple places/applications.

There are needs for unique or combined charts/dashboards for cloud application sources like Salesforce.

A growing number of people need access to data.

There are performance issues.

Data is getting too big for a transactional database to operate efficiently.

Nontechnical users need to create charts without help.

Stage 2. Lake

Once companies must run analyses on multiple sources of data, each of which need joining, filtering, and manipulation, a company must move to a data lake. Blended data sources enable several actors in an organization to query a large subset of the company's complete data. In turn, funneling various sources into a data lake supports database performance at a reasonably large (not necessarily “big data”) scale.

A central motivation for a data lake lies with the need for piping data to business intelligence tools. For example, when working with data from Salesforce, Hubspot, Jira, and Zendesk, each service has its own in‐app dashboards and unique data application programming interfaces (APIs). Configuring input data streams for each business tool is a confusing, time‐consuming, and unsustainable workflow. It cannot really be done, especially at scale. Likewise, performing in‐house analyses across various sources can wildly complicate otherwise simple queries. On the other hand, having a data lake, which holds all relevant data in one place, allows analysts to use straightforward SQL queries to obtain business insights.

The central challenge faced by companies in the lake stage is knowing what toolset and methodology will unify and (safely) store your data. Companies looking to combine their data also run into performance issues, which we offer solutions to. And perhaps most important of all, choosing an architecture during lake development informs how easy (or hard) it will be for a company to build their future data warehouse .

This stage is right if:

There's a need for unique or combined charts/dashboards for cloud application sources like Salesforce.

A core set of people can learn the ins and outs of the structure of the messy data.

You're intimidated by data modeling. (Don't be: that's why this book exists.)

There's no time for even light data modeling.

Large datasets need performant queries.

It's time for the next stage if:

More than a few people are going to be working with this dataset.

A clean source of truth would eliminate integrity issues.

There's a need to separate the structure of the data from the always‐changing transactional sources.

There's a need to adopt DRY (Don't Repeat Yourself) principles.

The Informed Company - изображение 8 Modeling

Data requires transformation so that it is more usable by different people or systems within a database. Modeling refers to this process of making these transformations to the data.

The Informed Company - изображение 9 DRY

An acronym that represents a software design ethos that avoids repetitive patterns through modularity of design.

Stage 3. Warehouse (Single Source of Truth)

As more people begin to work with the data lake, questions begin to multiply: What data is where? Why? What particular criteria should queries use when looking for data insights? What do these schemata mean? Unavoidable complexities make it harder to obtain data, especially by less‐technical colleagues. Even among in‐house experts, more schemata and entities (i.e. tables and views) in turn cause more communication headaches. In time, the data lake serves all data but makes it harder to obtain the right data. It gets harder to write queries and share the knowledge within an organization.

All of these problems can be addressed with a clean and simplified version of the data, something we refer to as “a single source of truth.”

This stage—creating a data warehouse—has historically been quite a nightmare, and there are many books written on how best to model data for analytical processing. But these days, there are more straightforward paradigms that have been tried and tested: ones that not only streamline having to document the oddities found across an organization's schemata but also save time in having to repeat, edit, and maintain messy “boilerplate” query steps (e.g. “every time you query the order's table, make sure to adjust all orders from England to be in local time”).

In the data warehouse section of the book, we review how to clean data lakes and investigate standard practices for managing data complexity. In addition, we offer ways to establish an architecture with data integrity in mind. We provide modeling tool suggestions and an example SQL style guide. Finally, we give our recommendations for team structure, such as a lead to oversee this process and warehouse maintenance.

Warehouse

This stage is right if:

More than a few people are going to be working with data.

A clean source of truth would eliminate integrity issues.

There's a need to adopt consistent structure on top of the data lakes.

There's a need to adopt DRY principles.

It's time for the next stage if:

The democratization of data would help others explore and understand data without help.

It's time to teach and enable business users to be more effective.

Projects exist that require different formats than what currently exist in the data lake.

Having truly informed employees is vital to your company's competitive success.

Stage 4. Marts

Good news, your data is clean and the BI product speaks directly with tables in the warehouse. Using a tool like Tableau or Looker, non‐analysts within the organization can self‐serve their questions. By that, we mean they are empowered to engage with the data directly rather than needing to have an analyst build or run queries for them. This is excellent news: more people who use the data become increasingly informed, and everyone across the company can use data to their advantage.

But given enough time, hundreds of tables accumulate in a warehouse. Users become overwhelmed when trying to find relevant data. It's also possible that, depending on the team, department, or use case, different people want to use the same data structured in different ways. So while the meanings of individual fields are unified, the abstractions used by different departments have diverged.

To sort through these challenges, we progress to the data mart stage. These are smaller, more specific sources of truth for a team or topic of investigation. For example, the sales team may only need 12 or so tables from the central warehouse, while the marketing team may need 20 tables—some of them the same, but some different.

Just as a warehouse lead manages data warehouses, data marts benefit from being facilitated by mart leads. The mart lead helps educate and communicate subject matter expertise within the domain of each respective mart while supporting everyday maintenance tasks. Not only will further simplification of data into local marts improve usability, but the integrity of data also becomes easier to maintain. After all, the responsibility of maintenance distributes to mart leads rather than to a single person. The organization that leverages data marts effectively is an example of intra‐company data literacy in action.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать