Interactive Online Learning Environment and TestBank
Learning the material in the Official Google Cloud Certified Professional Engineer Study Guide is an important part of preparing for the Professional Data Engineer certification exam, but we also provide additional tools to help you prepare. The online TestBank will help you understand the types of questions that will appear on the certification exam.
The sample tests in the TestBank include all the questions in each chapter as well as the questions from the assessment test. In addition, there are two practice exams with 50 questions each. You can use these tests to evaluate your understanding and identify areas that may require additional study.
The flashcards in the TestBank will push the limits of what you should know for the certification exam. Over 100 questions are provided in digital format. Each flashcard has one question and one correct answer.
The online glossary is a searchable list of key terms introduced in this Study Guide that you should know for the Professional Data Engineer certification exam.
To start using these to study for the Google Cloud Certified Professional Data Engineer exam, go to www.wiley.com/go/sybextestprepand register your book to receive your unique PIN. Once you have the PIN, return to www.wiley.com/go/sybextestprep, find your book, and click Register, or log in and follow the link to register a new account or add this book to an existing account.
People learn in different ways. For some, a book is an ideal way to study, whereas other learners may find video and audio resources a more efficient way to study. A combination of resources may be the best option for many of us. In addition to this Study Guide, here are some other resources that can help you prepare for the Google Cloud Professional Data Engineer exam:
The Professional Data Engineer Certification Exam Guide: https://cloud.google.com/certification/guides/data-engineer/
Exam FAQs: https://cloud.google.com/certification/faqs/
Google’s Assessment Exam: https://cloud.google.com/certification/practice-exam/data-engineer
Google Cloud Platform documentation: https://cloud.google.com/docs/
Cousera’s on-demand courses in “Architecting with Google Cloud Platform Specialization” and “Data Engineering with Google Cloud” are both relevant to data engineering: www.coursera.org/specializations/gcp-architecture https://www.coursera.org/professional-certificates/gcp-data-engineering
QwikLabs Hands-on Labs: https://google.qwiklabs.com/quests/25
Linux Academy Google Cloud Certified Professional Data Engineer video course: https://linuxacademy.com/course/google-cloud-data-engineer/
The best way to prepare for the exam is to perform the tasks of a data engineer and work with the Google Cloud Platform.
Exam objectives are subject to change at any time without prior notice and at Google’s sole discretion. Please visit the Google Cloud Professional Data Engineer website ( https://cloud.google.com/certification/data-engineer) for the most current listing of exam objectives.
Objective |
Chapter |
Section 1: Designing data processing system |
|
1.1 Selecting the appropriate storage technologies |
1 |
1.2 Designing data pipelines |
2, 3 |
1.3 Designing a data processing solution |
4 |
1.4 Migrating data warehousing and data processing |
4 |
Section 2: Building and operationalizing data processing systems |
|
2.1 Building and operationalizing storage systems |
2 |
2.2 Building and operationalizing pipelines |
3 |
2.3 Building and operationalizing infrastructure |
5 |
Section 3: Operationalizing machine learning models |
|
3.1 Leveraging prebuilt ML models as a service |
12 |
3.2 Deploying an ML pipeline |
9 |
3.3 Choosing the appropriate training and serving infrastructure |
10 |
3.4 Measuring, monitoring, and troubleshooting machine learning models |
11 |
Section 4: Ensuring solution quality |
|
4.1 Designing for security and compliance |
6 |
4.2 Ensuring scalability and efficiency |
7 |
4.3 Ensuring reliability and fidelity |
8 |
4.4 Ensuring flexibility and portability |
8 |
Chapter 1 Selecting Appropriate Storage Technologies
Google Cloud Professional Data Engineer Exam objectives covered in this chapter include the following:
1 Designing data processing systems✔ 1.1 Selecting the appropriate storage technologiesMapping storage systems to business requirementsData modelingTradeoffs involving latency, throughput, transactionsDistributed systemsSchema design
Data engineers choose how to store data for many different situations. Sometimes data is written to a temporary staging area, where it stays only seconds or less before it is read by an application and deleted. In other cases, data engineers arrange long-term archival storage for data that needs to be retained for years. Data engineers are increasingly called on to work with data that streams into storage constantly and in high volumes. Internet of Things (IoT) devices are an example of streaming data.
Another common use case is storing large volumes of data for batch processing, including using data to train machine learning models. Data engineers also consider the range of variety in the structure of data. Some data, like the kind found in online transaction processing, is highly structured and varies little from one datum to the next. Other data, like product descriptions in a product catalog, can have a varying set of attributes. Data engineers consider these and other factors when choosing a storage technology.
This chapter covers objective 1.1 of the Google Cloud Professional Data Engineer exam—Selecting appropriate storage technologies. In this chapter, you will learn about the following:
The business aspects of choosing a storage system
The technical aspects of choosing a storage system
The distinction between structured, semi-structured, and unstructured data models
Designing schemas for relational and NoSQL databases
By the end of this chapter, you should understand the various criteria data engineers consider when choosing a storage technology. In Chapter 2, “Building and Operationalizing Storage Systems,” we will delve into the details of Google Cloud storage services.
From Business Requirements to Storage Systems
Business requirements are the starting point for choosing a data storage system. Data engineers will use different types of storage systems for different purposes. The specific storage system you should choose is determined, in large part, by the stage of the data lifecycle for which the storage system is used.
The data lifecycle consists of four stages:
Ingest
Store
Process and analyze
Explore and visualize
Ingestion is the first stage in the data lifecycle, and it entails acquiring data and bringing data into the Google Cloud Platform (GCP). The storage stage is about persisting data to a storage system from which it can be accessed for later stages of the data lifecycle. The process and analyze stage begins with transforming data into a usable format for analysis applications. Explore and visualize is the final stage, in which insights are derived from analysis and presented in tables, charts, and other visualizations for use by others.
Читать дальше