1 Cover
2 Title Page The Informed Company How to Build Modern Agile Data Stacks that Drive Winning Insights Dave FowlerMatt David
3 Copyright
4 Dedication
5 About This BookWhy Write This Book Who This Book Is For Who This Book Is Not For Who Wrote the Book Who Edited the Book Influences How This Book Was Written How to Read This Book
6 Foreword
7 Introduction Merging Business Context with Data Information The Four Stages of Agile Data Organization
8 STAGE 1: SOURCE aka Siloed Data Chapter One: Starting with Source Data Common Options for Analyzing Source Data Chapter Two: The Need to Replicate Source Data Chapter Three: Source Data Best Practices Keep a Complexity Wiki Page Snippet Dictionary Use a BI Product Double Check Results Keep Short Dashboards Design Before Building
9 STAGE 2: DATA LAKE aka Data Combined Chapter Four: Why Build a Data Lake? What Is a Data Lake? Reasons to Build a Data Lake Summarized Chapter Five: Choosing an Engine for the Data Lake Modern Columnar Warehouse Engines Modern Warehouse Engine Products Database Engines Recommendation Chapter Six: Extract and Load (EL) Data ETL versus ELT EL/ETL Vendors Extract Options Load Options Multiple Schemas Other Extract and Load Routes Chapter Seven: Data Lake Security Access in Central Place Permission Tiers Chapter Eight: Data Lake Maintenance Why SQL? Data Sources Performance Upgrade Snippets to Views
10 STAGE 3: DATA WAREHOUSE aka the Single Source of Truth Chapter Nine: The Power of Layers and Views Make Readable Views Layer Views on Views Start with a Single View Chapter Ten: Staging Schemas Orient to the Schemas Pick a Table and Clean It Other Staging Modeling Considerations Building on Top of Staging Schemas Chapter Eleven: Model Data with dbt Version Control Modularity and Reusability Package Management Organizing Files Macros Incremental Tables Testing Chapter Twelve: Deploy Modeling Code Branch Using Version Control Software Commit Message Test Locally Code Review Schedule Runs Chapter Thirteen: Implementing the Data Warehouse Manage Dependencies Combine Tables Within Schemas Combine Tables Across Schemas Keep the Grain Consistent Create Business Metrics Keeping Accurate History Chapter Fourteen: Managing Data Access How to Secure Sensitive Data in the Data Warehouse How to Secure Sensitive Data in a BI Tool Chapter Fifteen: Maintaining the Source of Truth Track New Metrics Deprecate Old Metrics Deprecate Old Schemas Resolve Conflicting Numbers Handling Ongoing Requests and Ongoing Feedback Updating Modeling Code Manage Access Tuning to Optimize Code Review All Modeling Maintenance Checklist
11 STAGE 4: DATA MARTS aka Data Democratized Chapter Sixteen: Data Mart Implementation Views on the Data Warehouse Segment Tables Access Update Chapter Seventeen: Data Mart Maintenance Educate Team Identifies Issues Identify New Needs Help Track Success Chapter Eighteen: Modern versus Traditional Data Stacks: What's Changed?What's Changed? Chapter Nineteen: Row‐ versus Column‐Oriented Database Row‐Oriented Databases Column‐Oriented Databases Summary Chapter Twenty: Style Guide Example Simplify Clean Naming Conventions Share It Chapter Twenty-One: Building an SST Example First Attempt—Same Tables with Prefixes Second Attempt—Operational Schema (Source Agnostic) Third Attempt—Application Separate, Other Sources Smashed Less Planning, More Implementing
12 Acknowledgments and Contributions Thank‐yous
13 Index
14 End User License Agreement
1 Chapter 5TABLE 5.1 Selection Factors
2 Chapter 10TABLE 10.1 FriendsTABLE 10.2TABLE 10.3TABLE 10.4
3 Chapter 13TABLE 13.1 OrdersTABLE 13.2TABLE 13.3TABLE 13.4TABLE 13.5
4 Chapter 19TABLE 19.1 Facebook_FriendsTABLE 19.2TABLE 19.3TABLE 19.4TABLE 19.5TABLE 19.6TABLE 19.7 Facebook_FriendsTABLE 19.8TABLE 19.9TABLE 19.10TABLE 19.11TABLE 19.12TABLE 19.13TABLE 19.13
5 Chapter 21TABLE 21.1 Duplicate Records by Email
1 About This Book Figure A.1 Data management is a collaborative process. Figure A.2
2 Introduction Figure I.1 Business context vs technical know how chart. Figure I.2 The four stages of agile data organization represent a process th...
3 Chapter 1 Figure 1.1 Various methods for data analysis for data sources. Figure 1.2 Example built in dashboard showing common metrics from Zendesk.... Figure 1.3 A basic export feature on a web services dashboard providing a CS... Figure 1.4 pgAdmin dashboard is a popular IDE for PostgreSQL. Figure 1.5 Geckoboard like dashboard displaying standard sales metrics. Figure 1.6 Mixpanel cohort analysis. Figure 1.7 Chartio Dashboard Executive Summary of Sales Metrics.
4 Chapter 2 Figure 2.1 A production system should be used with care.Figure 2.2 A cloned data source with read‐only access.Figure 2.3 It is dangerous to use a production system for non production pur...
5 Chapter 3Figure 3.1 Using an SQL file in an editor to manage dashboard metrics.Figure 3.2 A custom SQL building feature, "Visual SQL" from Chartio.Figure 3.3 A very long dashboard.Figure 3.4 A simple dashboard outline.Figure 3.5 The book cover for How to Design a Dashboard, by Matt David.
6 Chapter 4Figure 4.1 A data lake containing multiple data sources.Figure 4.2 Figure 4.3 A dashboard using data from various data sources.Figure 4.4 For dashboards that consist of large aggregations a transactional...
7 Chapter 5Figure 5.1 Figure 5.2 A transactional database can read and write rows quickly, and an ...Figure 5.3 Figure 5.4 Figure 5.5 Figure 5.6
8 Chapter 6Figure 6.1 Data can be transformed while being moved to a data warehouse.Figure 6.2 Data can first be loaded into a lake and transformed there. This ...Figure 6.3 Figure 6.4 Figure 6.5 Figure 6.6
9 Chapter 7Figure 7.1 Security moves from each data source to the data lake.Figure 7.2 A table of data sources and with a key graphic indicating which c...
10 Chapter 8Figure 8.1 Source data being loaded with SQL into a data lake.Figure 8.2 More sources being added to the data lake.Figure 8.3 Data sources having errors connecting to the data lake.Figure 8.4 A warning about the connection to the data lake.Figure 8.5 A database receiving a query, then fetching data from a cached st...Figure 8.6 The “Workload Management Configuration” section on an AWS Redshif...Figure 8.7 In Google Cloud's BigQuery, it's possible to set maximum bytes bi...Figure 8.8 In Chartio, a dashboard setting for controlling the frequency of ...
11 Chapter 9Figure 9.1 A view references a table’s data without changing how the table’s...Figure 9.2 A view is an SQL abstraction on top of underlying data.Figure 9.3 Views can reference other views allowing you to create layers wit...Figure 9.4 It may not be obvious at first how to model data, start with maki...Figure 9.5 A warehouse is a cleaned usable version of the data in the lake....
12 Chapter 10Figure 10.1 The four stages of agile data organization with an intermediary ...Figure 10.2 Figure 10.3 Figure 10.4 Figure 10.5 Figure 10.6 indicates all the types of cleaning we have done.Figure 10.7 Figure 10.8 Figure 10.9 A typical SQL query joining tables de‐normalized to a new struct...
13 Chapter 11Figure 11.1 Figure 11.2 Common modeling errors people make.Figure 11.3 An illustration of table entities connected to a local machine s...
14 Chapter 12Figure 12.1 Diagram showing the process lifecycle of model updates.Figure 12.2
15 Chapter 13Figure 13.1 Figure 13.2 Figure 13.3 Figure 13.4 A diagram of two tables joined to create a wider table.Figure 13.5 Figure 13.6 Figure 13.7 Two tables, combined with a UNION SQL statement.Figure 13.8 A screenshot from a dash showing a metric for new trials and dai...
16 Chapter 14Figure 14.1 The stages of agile data organization showing how security for m...Figure 14.2 Warehouse resources with security protocols applied.
Читать дальше