Seifedine Kadry - Big Data

Здесь есть возможность читать онлайн «Seifedine Kadry - Big Data» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Big Data: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Big Data»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Learn Big Data from the ground up with this complete and up-to-date resource from leaders in the field  Big Data: Concepts, Technology, and Architecture You’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work. 
Big Data Accessibly organized, 
 includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include: 
The common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns Relational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases Virtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization Apache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive The Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization Perfect for data scientists, data engineers, and database managers, 
 also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book.

Big Data — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Big Data», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

2.4.1 RDBMS Databases

RDBMS is vertically scalable and exhibits ACID (atomicity, consistency, isolation, durability) properties and support data that adhere to a specific schema. This schema check is made at the time of inserting or updating data, and hence they are not ideal for capturing and storing data arriving at high velocity. The architectural limitation of RDBMS makes it unsuitable for big data solutions as a primary storage device.

For the past decades, relational database management systems that were running in corporate data centers have stored the bulk of the world’s data. But with the increase in volume of the data, RDBMS can no longer keep pace with the volume, velocity, and variety of data being generated and consumed.

Big data, which is typically a collection of data with massive volume and variety arriving at a high velocity, cannot be effectively managed with traditional data management tools. While conventional databases are still existing and used in a large number of applications, one of the key advancements in resolving the problems with big data is the emergence of modern alternate database technologies that do not require any fixed schema to store data; rather, the data is distributed across the storage paradigm. The main alternative databases are NoSQL and NewSQL databases.

2.4.2 NoSQL Databases

A NoSQL (Not Only SQL) database includes all non‐relational databases. Unlike RDBMS, which exhibits ACID properties, a NoSQL database follows the CAP theorem (consistency, availability, partition tolerance) and exhibits the BASE (basically, available, soft state, eventually consistent) model, where the storage devices do not provide immediate consistency; rather, they provide eventual consistency. Hence, these databases are not appropriate for implementing large transactions.

The various types of NoSQL databases, namely, Key‐value databases, document databases, column‐oriented databases, graph databases, were discussed in detail in Section 2.3. Table 2.2shows examples of various types of NoSQL databases.

Table 2.2 Popular NoSQL databases.

Key‐value databases Document databases Column databases Graph databases
Redis MongoDB DynamoDB Neo4j
Riak CouchDB Cassandra OrientDB
SimpleDB RethinkDB Accumulo ArangoDB
BerkeleyDB Oracle MarkLogic Big Table FlockDB

2.4.3 NewSQL Databases

NewSQL databases provide scalable performance similar to that of NoSQL systems combining the ACID properties of a traditional database management system. VoltDB, NuoDB, Clustrix, MemSQL, and TokuDB are some of the examples of NewSQL database.

NewSQL databases are distributed in nature, horizontally scalable, fault tolerant, and support relational data model with three layers: the administrative layer, transactional layer, and storage layer. NewSQL database is highly scalable and operates in shared nothing architecture. NewSQL has SQL compliant syntax and uses relational data model for storage. Since it supports SQL compliant syntax, transition from RDBMS to the highly scalable system is made easy.

The applications targeting these NewSQL systems are those that execute the same queries repeatedly with different inputs and have a large number of transactions. Some of the commercial products of NewSQL databases are briefed below.

2.4.3.1 Clustrix

Clustrix is a high performance, fault tolerant, distributed database. Clustrix is used in applications with massive, high transactional volume.

2.4.3.2 NuoDB

NuoDB is a cloud based, scale‐out, fault tolerant, distributed database. They support both batch and real‐time SQL queries.

2.4.3.3 VoltDB

VoltDB is a scale‐out, in‐memory, high performance, fault tolerant, distributed database. They are used to make real‐time decisions to maximize business value.

2.4.3.4 MemSQL

MemSQL is a high performance, in‐memory, fault tolerant, distributed database. MemSQL is known for its blazing fast performance and used for real‐time analytics.

2.5 Scaling Up and Scaling Out Storage

Scalability is the ability of the system to meet the increasing demand for storage capacity. A system capable of scaling delivers increased performance and efficiency. With the advent of the big data era there is an imperative need to scale data storage platforms to make them capable of storing petabytes of data. The storage platforms can be scaled in two ways:

Scaling‐up (vertical scalability)

Scaling‐out (horizontal scalability)

Scaling‐up.The vertical scalability adds more resources to the existing server to increase its capacity to hold more data. The resources can be computation power, hard drive, RAM, and so on. This type of scaling is limited to the maximum scaling capacity of the server. Figure 2.13shows a scale‐up architecture where the RAM capacity of the same machine is upgraded from 32 GB to 128 GB to meet the increasing demand.

Scaling‐out . The horizontal scalability adds new servers or components to meet the demand. The additional component added is termed as node. Big data technologies work on the basis of scaling out storage. Horizontal scaling enables the system to scale wider to meet the increasing demand. Scaling out storage uses low cost commodity hardware and storage components. The components can be added as required without much complexity. Multiple components connect together to work as a single entity. Figure 2.14shows the scale‐out architecture where the capacity is increased by adding additional commodity hardware to the cluster to meet the increasing demand.

Figure 213 Scaleup architecture Figure 214 Scaleout architecture - фото 26

Figure 2.13 Scale‐up architecture.

Figure 214 Scaleout architecture Chapter 2 Refresher 1 The set of loosely - фото 27

Figure 2.14 Scale‐out architecture.

Chapter 2 Refresher

1 The set of loosely connected computers is called _____.LANWANWorkstationClusterAnswer:dExplanation: In a computer cluster all the participating computers work together on a particular task.

2 Cluster computing is classified intoHigh‐availability clusterLoad‐balancing clusterBoth a and bNone of the aboveAnswer:c

3 The computer cluster architecture emerged as a result of ____.ISAWorkstationSupercomputersDistributed systemsAnswer:dExplanation: A distributed system is a computer system spread out over a geographic area.

4 Cluster adopts _______ mechanism to eliminate the service interruptions.Sharding ReplicationFailoverPartitionAnswer:c

5 _______ is the process of switching to a redundant node upon the abnormal termination or failure of a previously active node.ShardingReplicationFailoverPartitionAnswer:c

6 _______ adds more storage resources and CPU to increase capacity.Horizontal scalingVertical scalingPartitionAll of the mentionedAnswer:bExplanation: When the primary steps down, the MongoDB closes all client connections.

7 _______ is the process of copying the same data blocks across multiple nodes.ReplicationPartitionShardingNone of the aboveAnswer:aExplanation: Replication is the process of copying the same data blocks across multiple nodes to overcome the loss of data when a node crashes.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Big Data»

Представляем Вашему вниманию похожие книги на «Big Data» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Big Data»

Обсуждение, отзывы о книге «Big Data» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x