Dan Sullivan - Official Google Cloud Certified Professional Data Engineer Study Guide

Здесь есть возможность читать онлайн «Dan Sullivan - Official Google Cloud Certified Professional Data Engineer Study Guide» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Official Google Cloud Certified Professional Data Engineer Study Guide: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Official Google Cloud Certified Professional Data Engineer Study Guide»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

The proven Study Guide that prepares you for this new Google Cloud exam The 
, provides everything you need to prepare for this important exam and master the skills necessary to land that coveted Google Cloud Professional Data Engineer certification. Beginning with a pre-book assessment quiz to evaluate what you know before you begin, each chapter features exam objectives and review questions, plus the online learning environment includes additional complete practice tests. 
Written by Dan Sullivan, a popular and experienced online course author for machine learning, big data, and Cloud topics, 
is your ace in the hole for deploying and managing analytics and machine learning applications. 
• Build and operationalize storage systems, pipelines, and compute infrastructure
• Understand machine learning models and learn how to select pre-built models
• Monitor and troubleshoot machine learning models
• Design analytics and machine learning applications that are secure, scalable, and highly available. 
This exam guide is designed to help you develop an in depth understanding of data engineering and machine learning on Google Cloud Platform.

Official Google Cloud Certified Professional Data Engineer Study Guide — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Official Google Cloud Certified Professional Data Engineer Study Guide», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Text files of natural language content

Audio files

Video files

Binary large objects (BLOBs)

It should be pointed out that data is considered unstructured if it does not have a schema that influences how the data is stored or accessed. Unstructured data may have an internal structure that is not relevant to the way it is stored. For example, natural language is highly structured according to the syntax rules of languages. Audio and video files may have an internal format that includes metadata as well as content. Here again, there is structure within the file, but that structure is not used by storage systems, and that is the reason why this kind of data is classified as unstructured.

Google’s Storage Decision Tree

Google has developed a decision tree for choosing a storage system that starts with distinguishing structured, semi-structured, and unstructured data. Figure 1.1is based on the decision tree published at https://cloud.google.com/solutions/data-lifecycle-cloud-platform.

Figure 11 Choosing a storage technology in GCP Schema Design Considerations - фото 8

Figure 1.1 Choosing a storage technology in GCP

Schema Design Considerations

Structured and semi-structured data has a schema associated with it. Structured data is usually stored in relational databases whereas semi-structured data is often stored in NoSQL databases. The schema influences how data is stored and accessed, so once you have determined which kind of storage technology to use, you may then need to design a schema that will support optimal storage and retrieval.

Official Google Cloud Certified Professional Data Engineer Study Guide - изображение 9The distinction between relational and NoSQL databases is becoming less pronounced as each type adopts features of the other. Some relational databases support storing and querying JavaScript Object Notation (JSON) structures, similar to the way that document databases do. Similarly, some NoSQL databases now support ACID (atomicity, consistency, isolation, durability) transactions, which are a staple feature of relational databases.

Relational Database Design

Data modeling for relational databases begins with determining which type of relational database you are developing: an online transaction processing (OLTP) database or an online analytical processing (OLAP) database.

OLTP

Online transaction processing (OLTP) databases are designed for transaction processing and typically follow data normalization rules. There are currently 10 recognized forms of normalization, but most transaction processing systems follow no more than three of those forms:

The first form of normalization requires that each column in the table have an atomic value, no repeating groups, and a primary key, which is one or more ordered columns that uniquely identify a row.

The second form of normalization includes the first form and creates separate tables for values that apply to multiple rows and links them using foreign keys. A foreign key is one or more ordered columns that correspond to a primary key in another table.

The third form of normalization, which includes the second form, eliminates any columns from a table that does not depend on the key.

These rules of normalization are designed to reduce the risk of data anomalies and to avoid the storage of redundant data. Although they serve those purposes well, they can lead to high levels of I/O operations when joining tables or updating a large number of indexes. Using an OLTP data model requires a balance between following the rules of normalization to avoid anomalies and designing for performance.

Denormalization —that is, intentionally violating one of the rules of normalization—is often used to improve query performance. For example, repeating customer names in both the customer table and an order table could avoid having to join the two tables when printing invoices. By denormalizing, you can reduce the need to join tables since the data that would have been in another table is stored along with other data in the row of one table.

OLAP

Online analytical processing (OLAP) data models are often used for data warehouse and data mart applications. OLAP models are also called dimensional models because data is organized around several dimensions. OLAP models are designed to facilitate the following:

Rolling up and aggregating data

Drilling down from summary data to detailed data

Pivoting and looking at data from different dimensions—sometimes called slicing and dicing

OLAP can be implemented in relational database or in specialized multidimensional data stores.

SQL Crash Course

In-depth knowledge of SQL is not necessarily required to pass the Google Cloud Professional Data Engineer exam, but knowledge of SQL may help if a question includes a SQL statement.

SQL has three types of statements that developers use:

Data definition language (DDL) statements, which are used to create and modify database schemas

Data manipulation language (DML) statements, which are used to insert, update, delete, and query data

Data query language (DQL) statements, which is a single statement: SELECT

Table 1.4shows examples of data definition statements and their function. Table 1.5shows data manipulation examples, and Table 1.6shows query language examples.

Table 1.4 Data definition language examples

DDL statement Example Explanation
CREATE TABLE CREATE TABLE address (address_id INT PRIMARY KEY, street_name VARCHAR(50), city VARCHAR(50), state VARCHAR(2) ); Creates a table with four columns. The first is an integer and the primary key; the other three are variable-length character strings.
CREATE INDEX CREATE INDEX addr_idx ON address(state); Creates an index on the state column of the address table.
ALTER TABLE ALTER TABLE address ADD (zip VARCHAR(9)); Adds a column called zip to the address table. ALTER is also used to modify and drop entities.
DROP INDEX DROP INDEX addr_idx; Deletes the index addr_idx.

Table 1.5 Data manipulation language examples

Data Manipulation Language
DML Statement Example Explanation
INSERT INSERT INTO address VALUES (1234, ’56 Main St’, ’Seattle’, ’WA’); Adds rows to the table with the specified values, which are in column order
UPDATE UPDATE address SET state = ’OR’ Sets the value of the state column to ’OR’ for all rows
DELETE DELETE FROM address WHERE state = ’OR’ Removes all rows that have the value ’OR’ in the state column

Table 1.6 Data query language examples

Data Query Language
DDL statement Example Explanation
SELECT … FROM SELECT address_id, state FROM address Returns the address_id and state values for all rows in the address table
SELECT … FROM … WHERE SELECT address_id, state FROM address WHERE state = ’OR’ Returns the address_id and state values for all rows in the address table that have the value ’OR’ in the state column
SELECT … FROM … GROUP BY SELECT state, COUNT(*) FROM address GROUP BY state Returns the number of addresses in each state
SELECT … FROM … GROUP BY … HAVING SELECT state, COUNT(*) FROM address GROUP BY state HAVING COUNT(*) > 50 Returns the number of addresses in each state that has at least 50 addresses

NoSQL Database Design

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Official Google Cloud Certified Professional Data Engineer Study Guide»

Представляем Вашему вниманию похожие книги на «Official Google Cloud Certified Professional Data Engineer Study Guide» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Official Google Cloud Certified Professional Data Engineer Study Guide»

Обсуждение, отзывы о книге «Official Google Cloud Certified Professional Data Engineer Study Guide» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x