Follow Valerie’s contributions to technical blogs on Twitter at dataindataout.
1 Cover
2 Acknowledgments Acknowledgments I have been fortunate to work again with professionals from Waterside Productions, Wiley, and Google to create this Study Guide. Carole Jelen, vice president of Waterside Productions, and Jim Minatel, associate publisher at John Wiley & Sons, continue to lead the effort to create Google Cloud certification guides. It was a pleasure to work with Gary Schwartz, project editor, who managed the process that got us from outline to a finished manuscript. Thanks to Christine O’Connor, senior production editor, for making the last stages of book development go as smoothly as they did. I was also fortunate to work with Valerie Parham-Thompson again. Valerie’s technical review improved the clarity and accuracy of this book tremendously. Thank you to the Google Cloud subject-matter experts who reviewed and contributed to the material in this book: Name Title Damon A. Runion Technical Curriculum Developer, Data Engineering Julianne Cuneo Data Analytics Specialist, Google Cloud Geoff McGill Customer Engineer, Data Analytics Susan Pierce Solutions Manager, Smart Analytics and AI Rachel Levy Cloud Data Specialist Lead Dustin Williams Data Analytics Specialist, Google Cloud Gbenga Awodokun Customer Engineer, Data and Marketing Analytics Dilraj Kaur Big Data Specialist Rebecca Ballough Data Analytics Manager, Google Cloud Robert Saxby Staff Solutions Architect Niel Markwick Cloud Solutions Architect Sharon Dashet Big Data Product Specialist Barry Searle Solution Specialist - Cloud Data Management Jignesh Mehta Customer Engineer, Cloud Data Platform and Advanced Analytics My sons James and Nicholas were my first readers, and they helped me to get the manuscript across the finish line. This book is dedicated to Katherine, my wife and partner in so many adventures.
3 About the Author About the Author Dan Sullivan is a principal engineer and software architect. He specializes in data science, machine learning, and cloud computing. Dan is the author of the Official Google Cloud Certified Professional Architect Study Guide (Sybex, 2019), Official Google Cloud Certified Associate Cloud Engineer Study Guide (Sybex, 2019), NoSQL for Mere Mortals (Addison-Wesley Professional, 2015), and several LinkedIn Learning courses on databases, data science, and machine learning. Dan has certifications from Google and AWS, along with a Ph.D. in genetics and computational biology from Virginia Tech.
4 About the Technical Editor About the Technical Editor Valerie Parham-Thompson has experience with a variety of open source data storage technologies, including MySQL, MongoDB, and Cassandra, as well as a foundation in web development in software-as-a-service (SaaS) environments. Her work in both development and operations in startups and traditional enterprises has led to solid expertise in web-scale data storage and data delivery. Valerie has spoken at technical conferences on topics such as database security, performance tuning, and container management. She also often speaks at local meetups and volunteer events. Valerie holds a bachelor’s degree from the Kenan Flagler Business School at UNC-Chapel Hill, has certifications in MySQL and MongoDB, and is a Google Certified Professional Cloud Architect. She currently works in the Open Source Database Cluster at Pythian, headquartered in Ottawa, Ontario. Follow Valerie’s contributions to technical blogs on Twitter at dataindataout .
5 Introduction
6 Assessment Test
7 Answers to Assessment Test
8 Chapter 1 Selecting Appropriate Storage Technologies From Business Requirements to Storage Systems Technical Aspects of Data: Volume, Velocity, Variation, Access, and Security Types of Structure: Structured, Semi-Structured, and Unstructured Schema Design Considerations Exam Essentials Review Questions
9 Chapter 2 Building and Operationalizing Storage Systems Cloud SQL Cloud Spanner Cloud Bigtable Cloud Firestore BigQuery Cloud Memorystore Cloud Storage Unmanaged Databases Exam Essentials Review Questions
10 Chapter 3 Designing Data Pipelines Overview of Data Pipelines GCP Pipeline Components Migrating Hadoop and Spark to GCP Exam Essentials Review Questions
11 Chapter 4 Designing a Data Processing Solution Designing Infrastructure Designing for Distributed Processing Migrating a Data Warehouse Exam Essentials Review Questions
12 Chapter 5 Building and Operationalizing Processing Infrastructure Provisioning and Adjusting Processing Resources Monitoring Processing Resources Exam Essentials Review Questions
13 Chapter 6 Designing for Security and Compliance Identity and Access Management with Cloud IAM Using IAM with Storage and Processing Services Data Security Ensuring Privacy with the Data Loss Prevention API Legal Compliance Exam Essentials Review Questions
14 Chapter 7 Designing Databases for Reliability, Scalability, and Availability Designing Cloud Bigtable Databases for Scalability and Reliability Designing Cloud Spanner Databases for Scalability and Reliability Designing BigQuery Databases for Data Warehousing Exam Essentials Review Questions
15 Chapter 8 Understanding Data Operations for Flexibility and Portability Cataloging and Discovery with Data Catalog Data Preprocessing with Dataprep Visualizing with Data Studio Exploring Data with Cloud Datalab Orchestrating Workflows with Cloud Composer Exam Essentials Review Questions
16 Chapter 9 Deploying Machine Learning Pipelines Structure of ML Pipelines GCP Options for Deploying Machine Learning Pipeline Exam Essentials Review Questions
17 Chapter 10 Choosing Training and Serving Infrastructure Hardware Accelerators Distributed and Single Machine Infrastructure Edge Computing with GCP Exam Essentials Review Questions
18 Chapter 11 Measuring, Monitoring, and Troubleshooting Machine Learning Models Three Types of Machine Learning Algorithms Deep Learning Engineering Machine Learning Models Common Sources of Error in Machine Learning Models Exam Essentials Review Questions
19 Chapter 12 Leveraging Prebuilt Models as a Service Sight Conversation Language Structured Data Exam Essentials Review Questions
20 Appendix Answers to Review Questions Chapter 1: Selecting Appropriate Storage Technologies Chapter 2: Building and Operationalizing Storage Systems Chapter 3: Designing Data Pipelines Chapter 4: Designing a Data Processing Solution Chapter 5: Building and Operationalizing Processing Infrastructure Chapter 6: Designing for Security and Compliance Chapter 7: Designing Databases for Reliability, Scalability, and Availability Chapter 8: Understanding Data Operations for Flexibility and Portability Chapter 9: Deploying Machine Learning Pipelines Chapter 10: Choosing Training and Serving Infrastructure Chapter 11: Measuring, Monitoring, and Troubleshooting Machine Learning Models Chapter 12: Leveraging Prebuilt Models as a Service
21 Index
22 End User License Agreement
1 Chapter 1 Table 1.1 Table 1.2 Table 1.3 Table 1.4 Table 1.5 Table 1.6 Table 1.7
2 Chapter 9Table 9.1
3 Chapter 11Table 11.1Table 11.2Table 11.3
1 Chapter 1 Figure 1.1 Choosing a storage technology in GCP Figure 1.2 Example graph of friends
2 Chapter 2 Figure 2.1 Basic Cloud SQL configuration Figure 2.2 Optional configuration parameters in Cloud SQL Figure 2.3 Configuring Cloud Spanner Figure 2.4 Configuring a Bigtable cluster Figure 2.5 Cost of a three-node Bigtable production cluster Figure 2.6 BigQuery interactive interface with sample query
3 Chapter 3Figure 3.1 A simple directed graphFigure 3.2 A simple cyclic graphFigure 3.3 An example ingestion stage of a data pipelineFigure 3.4 Data pipeline with transformationsFigure 3.5 Example pipeline DAG with storageFigure 3.6 Complete data pipeline from ingestion to analysisFigure 3.7 A stream with sliding and tumbling three windowFigure 3.8 Data pipeline with both a hot path and a cold pathFigure 3.9 Creating a Cloud Dataflow job in the console using a templateFigure 3.10 Specifying parameters for the Word Count Template
Читать дальше