LibCat » Книги » Приключения » unrecognised » Seifedine Kadry - Big Data

Seifedine Kadry - Big Data

Здесь есть возможность читать онлайн «Seifedine Kadry - Big Data» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Big Data
Автор:
Seifedine Kadry
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Big Data: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Big Data»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Learn Big Data from the ground up with this complete and up-to-date resource from leaders in the field Big Data: Concepts, Technology, and Architecture You’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work.
Big Data Accessibly organized,
includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include:
The common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns Relational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases Virtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization Apache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive The Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization Perfect for data scientists, data engineers, and database managers,
also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book.

Big Data — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Big Data», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Table of Contents

1 Cover

2 Title Page

3 Copyright Page

4 Dedication Page

5 Acknowledgments

6 About the Author

7 1 Introduction to the World of Big Data 1.1 Understanding Big Data 1.2 Evolution of Big Data 1.3 Failure of Traditional Database in Handling Big Data 1.4 3 Vs of Big Data 1.5 Sources of Big Data 1.6 Different Types of Data 1.7 Big Data Infrastructure 1.8 Big Data Life Cycle 1.9 Big Data Technology 1.10 Big Data Applications 1.11 Big Data Use Cases Chapter 1 Refresher Conceptual Short Questions with Answers Frequently Asked Interview Questions

8 2 Big Data Storage Concepts 2.1 Cluster Computing 2.2 Distribution Models 2.3 Distributed File System 2.4 Relational and Non‐Relational Databases 2.5 Scaling Up and Scaling Out Storage Conceptual Short Questions with Answers

9 3 NoSQL Database 3.1 Introduction to NoSQL 3.2 Why NoSQL 3.3 CAP Theorem 3.4 ACID 3.5 BASE 3.6 Schemaless Databases 3.7 NoSQL (Not Only SQL) 3.8 Migrating from RDBMS to NoSQL Chapter 3 Refresher Conceptual Short Questions with Answers

10 4 Processing, Management Concepts, and Cloud Computing 4.1 Data Processing 4.2 Shared Everything Architecture 4.3 Shared‐Nothing Architecture 4.4 Batch Processing 4.5 Real‐Time Data Processing 4.6 Parallel Computing 4.7 Distributed Computing 4.8 Big Data Virtualization Part II: Managing and Processing Big Data in Cloud Computing4.9 Introduction 4.10 Cloud Computing Types 4.11 Cloud Services 4.12 Cloud Storage 4.13 Cloud Architecture Chapter 4 Refresher Conceptual Short Questions with Answers Cloud Computing Interview Questions

11 Chapter 5: Driving Big Data with Hadoop Tools and Technologies 5.1 Apache Hadoop 5.2 Hadoop Storage 5.3 Hadoop Computation 5.4 Hadoop 2.0 5.5 HBASE 5.6 Apache Cassandra 5.7 SQOOP 5.8 Flume 5.9 Apache Avro 5.10 Apache Pig 5.11 Apache Mahout 5.12 Apache Oozie 5.13 Apache Hive 5.14 Hive Architecture 5.15 Hadoop Distributions Chapter 5 Refresher Conceptual Short Questions with Answers Frequently Asked Interview Questions

12 6 Big Data Analytics 6.1 Terminology of Big Data Analytics 6.2 Big Data Analytics 6.3 Data Analytics Life Cycle 6.4 Big Data Analytics Techniques 6.5 Semantic Analysis 6.6 Visual analysis 6.7 Big Data Business Intelligence 6.8 Big Data Real‐Time Analytics Processing 6.9 Enterprise Data Warehouse Conceptual Short Questions with Answers

13 7 Big Data Analytics with Machine Learning 7.1 Introduction to Machine Learning 7.2 Machine Learning Use Cases 7.3 Types of Machine Learning Chapter 7 Refresher Conceptual Short Questions with Answers

14 8 Mining Data Streams and Frequent Itemset 8.1 Itemset Mining 8.2 Association Rules 8.3 Frequent Itemset Generation 8.4 Itemset Mining Algorithms 8.5 Maximal and Closed Frequent Itemset 8.6 Mining Maximal Frequent Itemsets: the GenMax Algorithm 8.7 Mining Closed Frequent Itemsets: the Charm Algorithm 8.8 CHARM Algorithm Implementation 8.9 Data Mining Methods 8.10 Prediction 8.11 Important Terms Used in Bayesian Network 8.12 Density Based Clustering Algorithm 8.13 DBSCAN 8.14 Kernel Density Estimation 8.15 Mining Data Streams 8.16 Time Series Forecasting

15 9 Cluster Analysis9.1 Clustering 9.2 Distance Measurement Techniques 9.3 Hierarchical Clustering 9.4 Analysis of Protein Patterns in the Human Cancer‐Associated Liver 9.5 Recognition Using Biometrics of Hands 9.6 Expectation Maximization Clustering Algorithm 9.7 Representative‐Based Clustering 9.8 Methods of Determining the Number of Clusters 9.9 Optimization Algorithm 9.10 Choosing the Number of Clusters 9.11 Bayesian Analysis of Mixtures 9.12 Fuzzy Clustering 9.13 Fuzzy C‐Means Clustering

16 10 Big Data Visualization 10.1 Big Data Visualization 10.2 Conventional Data Visualization Techniques 10.3 Tableau 10.4 Bar Chart in Tableau 10.5 Line Chart 10.6 Pie Chart 10.7 Bubble Chart 10.8 Box Plot 10.9 Tableau Use Cases 10.10 Installing R and Getting Ready 10.11 Data Structures in R 10.12 Importing Data from a File 10.13 Importing Data from a Delimited Text File 10.14 Control Structures in R 10.15 Basic Graphs in R

17 Index

18 End User License Agreement

List of Tables

1 Chapter 1 Table 1.1 Differences in the attributes of big data and RDBMS. Table 1.2 Data Mining vs. Big Data.

2 Chapter 2 Table 2.1 Student course registration database. Table 2.2 Popular NoSQL databases.

3 Chapter 8Table 8.1 Market basket data.Table 8.2 Itemset in a transaction.Table 8.3 Support of each items in a transaction.Table 8.4 Market basket data.Table 8.5 Binary database.Table 8.6 Vertical database.Table 8.7 Market Basket data.Table 8.8 Database.Table 8.9 Frequency of occurrence.Table 8.10 Priority of the items.Table 8.11 Itemset in a transaction.Table 8.12 Maximal/closed frequent itemset.Table 8.13 Transaction database.Table 8.14 Frequent itemsets with minsup = 3.Table 8.15 Frequent itemsets with tidset.Table 8.16 Transaction database.Table 8.17 Frequent Itemset with minsup = 3.Table 8.18 Tidset of the frequent itemset.Table 8.19 Comparison between Traditional data mining technique and mining da...

4 Chapter 10Table 10.1 Tableau data types.

List of Illustrations

1 Chapter 1 Figure 1.1 Evolution of Big Data. Figure 1.2 3 Vs of big data. Figure 1.3 High‐velocity data sets generated online in 60 seconds. Figure 1.4 Big data—data variety. Figure 1.5 Sources of big data. Figure 1.6 Human‐ and machine‐generated data. Figure 1.7 Structured data—employee details of an organization. Figure 1.8 Unstructured data—the result of a Google search. Figure 1.9 XML file with employee details. Figure 1.10 Big data life cycle. Figure 1.11 Data integration. Figure 1.12 Hadoop core components.

2 Chapter 2 Figure 2.1 Big data storage architecture. Figure 2.2 Cluster computing. Figure 2.3 Symmetric clusters. Figure 2.4 Asymmetric cluster. Figure 2.5 Distribution model. Figure 2.6 (a) Sharding. (b) Sharding example. Figure 2.7 Replication. Figure 2.8 Data replication. Figure 2.9 Master‐Slave model. Figure 2.10 Peer‐to‐peer model. Figure 2.11 Combination of sharding and replication. Figure 2.12 Data divided across multiple related tables. Figure 2.13 Scale‐up architecture. Figure 2.14 Scale‐out architecture.

3 Chapter 3 Figure 3.1 Properties of a system following CAP theorem. Figure 3.2 RBDMS life cycle.Figure 3.3 RDBMS vs. NoSQL databases.Figure 3.4 A key‐value store database.Figure 3.5 General representation of graph database.Figure 3.6 Neo4J Relationships with properties.Figure 3.7 Relationship graph between course and employee.

4 Chapter 4Figure 4.1 Data processing cycle.Figure 4.2 Shared everything architecture.Figure 4.3 Symmetric multiprocessing memory.Figure 4.4 Distributed shared memory.Figure 4.5 Shared‐nothing architecture.Figure 4.6 Batch processing.Figure 4.7 Real‐time processing.Figure 4.8 Real‐time and batch computation systems example.Figure 4.9 Parallel computing.Figure 4.10 Distributed computing.Figure 4.11 System architecture before and after virtualization.Figure 4.12 Isolation.Figure 4.13 Service‐oriented architecture.Figure 4.14 Google File System architecture.Figure 4.15 Read algorithm: (a) The first three steps. (b) The last three st...Figure 4.16 Write algorithm: (a) The first three steps. (b) Steps 4 and 5. (...Figure 4.17 Cloud architecture.

5 Chapter 5Figure 5.1 Hadoop architecture.Figure 5.2 Hadoop ecosystem.Figure 5.3 Distributed file system vs. single machine.Figure 5.4 HDFS architecture.Figure 5.5 File write.Figure 5.6 File read.Figure 5.7 MapReduce model.Figure 5.8 Combiner illustration.Figure 5.9 JobTracker and TaskTracker.Figure 5.10 Word count algorithm.Figure 5.11 Hadoop 1.0 vs Hadoop 2.0.Figure 5.12 Active NameNode and standby NameNode.Figure 5.13 Hadoop 2.0.Figure 5.14 ResourceManager.Figure 5.15 NodeManager.Figure 5.16 YARN architecture.Figure 5.17 HBase architecture.Figure 5.18 RegionServer architecture.Figure 5.19 SQOOP import and export.Figure 5.20 SQOOP 1.0 architecture.Figure 5.21 Flume architecture.Figure 5.22 Pig – internal process.Figure 5.23 Oozie workflow.Figure 5.24 Apache Hive architecture.