1 Cover
2 Title Page
3 Copyright Page
4 Dedication Page
5 Acknowledgments
6 About the Author
7 1 Introduction to the World of Big Data 1.1 Understanding Big Data 1.2 Evolution of Big Data 1.3 Failure of Traditional Database in Handling Big Data 1.4 3 Vs of Big Data 1.5 Sources of Big Data 1.6 Different Types of Data 1.7 Big Data Infrastructure 1.8 Big Data Life Cycle 1.9 Big Data Technology 1.10 Big Data Applications 1.11 Big Data Use Cases Chapter 1 Refresher Conceptual Short Questions with Answers Frequently Asked Interview Questions
8 2 Big Data Storage Concepts 2.1 Cluster Computing 2.2 Distribution Models 2.3 Distributed File System 2.4 Relational and Non‐Relational Databases 2.5 Scaling Up and Scaling Out Storage Conceptual Short Questions with Answers
9 3 NoSQL Database 3.1 Introduction to NoSQL 3.2 Why NoSQL 3.3 CAP Theorem 3.4 ACID 3.5 BASE 3.6 Schemaless Databases 3.7 NoSQL (Not Only SQL) 3.8 Migrating from RDBMS to NoSQL Chapter 3 Refresher Conceptual Short Questions with Answers
10 4 Processing, Management Concepts, and Cloud Computing 4.1 Data Processing 4.2 Shared Everything Architecture 4.3 Shared‐Nothing Architecture 4.4 Batch Processing 4.5 Real‐Time Data Processing 4.6 Parallel Computing 4.7 Distributed Computing 4.8 Big Data Virtualization Part II: Managing and Processing Big Data in Cloud Computing4.9 Introduction 4.10 Cloud Computing Types 4.11 Cloud Services 4.12 Cloud Storage 4.13 Cloud Architecture Chapter 4 Refresher Conceptual Short Questions with Answers Cloud Computing Interview Questions
11 Chapter 5: Driving Big Data with Hadoop Tools and Technologies 5.1 Apache Hadoop 5.2 Hadoop Storage 5.3 Hadoop Computation 5.4 Hadoop 2.0 5.5 HBASE 5.6 Apache Cassandra 5.7 SQOOP 5.8 Flume 5.9 Apache Avro 5.10 Apache Pig 5.11 Apache Mahout 5.12 Apache Oozie 5.13 Apache Hive 5.14 Hive Architecture 5.15 Hadoop Distributions Chapter 5 Refresher Conceptual Short Questions with Answers Frequently Asked Interview Questions
12 6 Big Data Analytics 6.1 Terminology of Big Data Analytics 6.2 Big Data Analytics 6.3 Data Analytics Life Cycle 6.4 Big Data Analytics Techniques 6.5 Semantic Analysis 6.6 Visual analysis 6.7 Big Data Business Intelligence 6.8 Big Data Real‐Time Analytics Processing 6.9 Enterprise Data Warehouse Conceptual Short Questions with Answers
13 7 Big Data Analytics with Machine Learning 7.1 Introduction to Machine Learning 7.2 Machine Learning Use Cases 7.3 Types of Machine Learning Chapter 7 Refresher Conceptual Short Questions with Answers
14 8 Mining Data Streams and Frequent Itemset 8.1 Itemset Mining 8.2 Association Rules 8.3 Frequent Itemset Generation 8.4 Itemset Mining Algorithms 8.5 Maximal and Closed Frequent Itemset 8.6 Mining Maximal Frequent Itemsets: the GenMax Algorithm 8.7 Mining Closed Frequent Itemsets: the Charm Algorithm 8.8 CHARM Algorithm Implementation 8.9 Data Mining Methods 8.10 Prediction 8.11 Important Terms Used in Bayesian Network 8.12 Density Based Clustering Algorithm 8.13 DBSCAN 8.14 Kernel Density Estimation 8.15 Mining Data Streams 8.16 Time Series Forecasting
15 9 Cluster Analysis9.1 Clustering 9.2 Distance Measurement Techniques 9.3 Hierarchical Clustering 9.4 Analysis of Protein Patterns in the Human Cancer‐Associated Liver 9.5 Recognition Using Biometrics of Hands 9.6 Expectation Maximization Clustering Algorithm 9.7 Representative‐Based Clustering 9.8 Methods of Determining the Number of Clusters 9.9 Optimization Algorithm 9.10 Choosing the Number of Clusters 9.11 Bayesian Analysis of Mixtures 9.12 Fuzzy Clustering 9.13 Fuzzy C‐Means Clustering
16 10 Big Data Visualization 10.1 Big Data Visualization 10.2 Conventional Data Visualization Techniques 10.3 Tableau 10.4 Bar Chart in Tableau 10.5 Line Chart 10.6 Pie Chart 10.7 Bubble Chart 10.8 Box Plot 10.9 Tableau Use Cases 10.10 Installing R and Getting Ready 10.11 Data Structures in R 10.12 Importing Data from a File 10.13 Importing Data from a Delimited Text File 10.14 Control Structures in R 10.15 Basic Graphs in R
17 Index
18 End User License Agreement
1 Chapter 1 Table 1.1 Differences in the attributes of big data and RDBMS. Table 1.2 Data Mining vs. Big Data.
2 Chapter 2 Table 2.1 Student course registration database. Table 2.2 Popular NoSQL databases.
3 Chapter 8Table 8.1 Market basket data.Table 8.2 Itemset in a transaction.Table 8.3 Support of each items in a transaction.Table 8.4 Market basket data.Table 8.5 Binary database.Table 8.6 Vertical database.Table 8.7 Market Basket data.Table 8.8 Database.Table 8.9 Frequency of occurrence.Table 8.10 Priority of the items.Table 8.11 Itemset in a transaction.Table 8.12 Maximal/closed frequent itemset.Table 8.13 Transaction database.Table 8.14 Frequent itemsets with minsup = 3.Table 8.15 Frequent itemsets with tidset.Table 8.16 Transaction database.Table 8.17 Frequent Itemset with minsup = 3.Table 8.18 Tidset of the frequent itemset.Table 8.19 Comparison between Traditional data mining technique and mining da...
4 Chapter 10Table 10.1 Tableau data types.
1 Chapter 1 Figure 1.1 Evolution of Big Data. Figure 1.2 3 Vs of big data. Figure 1.3 High‐velocity data sets generated online in 60 seconds. Figure 1.4 Big data—data variety. Figure 1.5 Sources of big data. Figure 1.6 Human‐ and machine‐generated data. Figure 1.7 Structured data—employee details of an organization. Figure 1.8 Unstructured data—the result of a Google search. Figure 1.9 XML file with employee details. Figure 1.10 Big data life cycle. Figure 1.11 Data integration. Figure 1.12 Hadoop core components.
2 Chapter 2 Figure 2.1 Big data storage architecture. Figure 2.2 Cluster computing. Figure 2.3 Symmetric clusters. Figure 2.4 Asymmetric cluster. Figure 2.5 Distribution model. Figure 2.6 (a) Sharding. (b) Sharding example. Figure 2.7 Replication. Figure 2.8 Data replication. Figure 2.9 Master‐Slave model. Figure 2.10 Peer‐to‐peer model. Figure 2.11 Combination of sharding and replication. Figure 2.12 Data divided across multiple related tables. Figure 2.13 Scale‐up architecture. Figure 2.14 Scale‐out architecture.
3 Chapter 3 Figure 3.1 Properties of a system following CAP theorem. Figure 3.2 RBDMS life cycle.Figure 3.3 RDBMS vs. NoSQL databases.Figure 3.4 A key‐value store database.Figure 3.5 General representation of graph database.Figure 3.6 Neo4J Relationships with properties.Figure 3.7 Relationship graph between course and employee.
4 Chapter 4Figure 4.1 Data processing cycle.Figure 4.2 Shared everything architecture.Figure 4.3 Symmetric multiprocessing memory.Figure 4.4 Distributed shared memory.Figure 4.5 Shared‐nothing architecture.Figure 4.6 Batch processing.Figure 4.7 Real‐time processing.Figure 4.8 Real‐time and batch computation systems example.Figure 4.9 Parallel computing.Figure 4.10 Distributed computing.Figure 4.11 System architecture before and after virtualization.Figure 4.12 Isolation.Figure 4.13 Service‐oriented architecture.Figure 4.14 Google File System architecture.Figure 4.15 Read algorithm: (a) The first three steps. (b) The last three st...Figure 4.16 Write algorithm: (a) The first three steps. (b) Steps 4 and 5. (...Figure 4.17 Cloud architecture.
5 Chapter 5Figure 5.1 Hadoop architecture.Figure 5.2 Hadoop ecosystem.Figure 5.3 Distributed file system vs. single machine.Figure 5.4 HDFS architecture.Figure 5.5 File write.Figure 5.6 File read.Figure 5.7 MapReduce model.Figure 5.8 Combiner illustration.Figure 5.9 JobTracker and TaskTracker.Figure 5.10 Word count algorithm.Figure 5.11 Hadoop 1.0 vs Hadoop 2.0.Figure 5.12 Active NameNode and standby NameNode.Figure 5.13 Hadoop 2.0.Figure 5.14 ResourceManager.Figure 5.15 NodeManager.Figure 5.16 YARN architecture.Figure 5.17 HBase architecture.Figure 5.18 RegionServer architecture.Figure 5.19 SQOOP import and export.Figure 5.20 SQOOP 1.0 architecture.Figure 5.21 Flume architecture.Figure 5.22 Pig – internal process.Figure 5.23 Oozie workflow.Figure 5.24 Apache Hive architecture.
Читать дальше