LibCat » Книги » Приключения » unrecognised » Seifedine Kadry - Big Data

Seifedine Kadry - Big Data

Здесь есть возможность читать онлайн «Seifedine Kadry - Big Data» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Big Data
Автор:
Seifedine Kadry
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Big Data: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Big Data»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Learn Big Data from the ground up with this complete and up-to-date resource from leaders in the field Big Data: Concepts, Technology, and Architecture You’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work.
Big Data Accessibly organized,
includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include:
The common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns Relational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases Virtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization Apache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive The Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization Perfect for data scientists, data engineers, and database managers,
also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book.

Big Data — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Big Data», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Customer retention is becoming important in the competitive markets, where financial institutions might cut down the rate of interest or offer better products to attract customers. Big data solutions assist the financial institutions to retain the customers by monitoring the customer activity and identify loss of interest in financial institutions personalized offers or if customers liked any of the competitors’ products on social media.

Chapter 1 Refresher

1 Big Data is _________.StructuredSemi‐structuredUnstructuredAll of the aboveAnswer:dExplanation: Big Data is a blanket term for the data that are too large in size, complex in nature, and which may be structured, unstructured, or semi‐structured and arriving at high velocity as well.

2 The hardware used in big data is _________.High‐performance PCsLow‐cost commodity hardwareDumb terminalNone of the aboveAnswer:bExplanation: Big data uses low‐cost commodity hardware to make cost‐effective solutions.

3 What does commodity hardware in the big data world mean?Very cheap hardwareIndustry‐standard hardwareDiscarded hardwareLow specifications industry‐grade hardwareAnswer:dExplanation: Commodity hardware is a low‐cost, low performance, and low specification functional hardware with no distinctive features.

4 What does the term “velocity” in big data mean?Speed of input data generationSpeed of individual machine processorsSpeed of ONLY storing dataSpeed of storing and processing dataAnswer:d

5 What are the data types of big data?Structured dataUnstructured dataSemi‐structured dataAll of the aboveAnswer:dExplanation: Machine‐generated and human‐generated data can be represented by the following primitive types of big dataStructured dataUnstructured dataSemi‐Structured data

6 JSON and XML are examples of _________.Structured dataUnstructured dataSemi‐structured dataNone of the aboveAnswer:cExplanation: Semi‐structured data are that which have a structure but do not fit into the relational database. Semi‐structured data are organized, which makes it easier for analysis when compared to unstructured data. JSON and XML are examples of semi‐structured data.

7 _________ is the process that corrects the errors and inconsistencies.Data cleaningData IntegrationData transformationData reductionAnswer:aExplanation: The data‐cleaning process fills in the missing values, corrects the errors and inconsistencies, and removes redundancy in the data to improve the data quality.

8 __________ is the process of transforming data into an appropriate format that is acceptable by the big data database.Data cleaningData IntegrationData transformationData reductionAnswer:cExplanation: Data transformation refers to transforming or consolidating the data into an appropriate format that is acceptable by the big data database and converting them into logical and meaningful information for data management and analysis.

9 __________ is the process of combining data from different sources to give the end users a unified data view.Data cleaningData integrationData transformationData reductionAnswer:b

10 __________ is the process of collecting the raw data, transmitting the data to a storage platform, and preprocessing them.Data cleaningData integrationData aggregationData reductionAnswer:c

Conceptual Short Questions with Answers

1 What is big data? Big data is a blanket term for the data that are too large in size, complex in nature, which may be structured or unstructured, and arriving at high velocity as well.

2 What are the drawbacks of traditional database that led to the evolution of big data? Below are the limitations of traditional databases, which has led to the emergence of big data.Exponential increase in data volume, which scales in terabytes and petabytes, has turned out to become a challenge to the RDBMS in handling such a massive volume of data.To address this issue, the RDBMS increased the number of processors and added more memory units, which in turn increased the cost.Almost 80% of the data fetched were of semi‐structured and unstructured format, which RDBMS could not deal with.RDBMS could not capture the data coming in at high velocity.

3 What are the factors that explain the tremendous increase in the data volume? Multiple disparate data sources are responsible for the tremendous increase in the volume of big data. Much of the growth in data can be attributed to the digitization of almost anything and everything in the globe. Paying e‐bills, online shopping, communication through social media, e‐mail transactions in various organizations, a digital representation of the organizational data, and so forth, are some of the examples of this digitization around the globe.

4 What are the different data types of big data? Machine‐generated and human‐generated data can be represented by the following primitive types of big dataStructured dataUnstructured dataSemi‐Structured data

5 What is semi‐structured data? Semi‐structured data are that which have a structure but does not fit into the relational database. Semi‐structured data are organized, which makes it easier for analysis when compared to unstructured data. JSON and XML are examples of semi‐structured data.

6 What does the three Vs of big data mean? Volume–Size of the dataVelocity–Rate at which the data is generated and is being processedVariety–Heterogeneity of data: structured, unstructured, and semi‐structured

7 What is commodity hardware? Commodity hardware is a low‐cost, low‐performance, and low‐specification functional hardware with no distinctive features. Hadoop can run on commodity hardware and does not require any high‐end hardware or supercomputers to execute its jobs.

8 What is data aggregation? The data aggregation phase of the big data life cycle involves collecting the raw data, transmitting the data to a storage platform, and preprocessing them. Data acquisition in the big data world means acquiring the high‐volume data arriving at an ever increasing pace.

9 What is data preprocessing? Data preprocessing is an important process performed on raw data to transform it into an understandable format and provide access to a consistent and an accurate data. The data generated from multiple sources are erroneous, incomplete, and inconsistent because of their massive volume and heterogeneous sources, and it is pointless to store useless and dirty data. Additionally, some analytical applications have a crucial requirement for quality data. Hence, for effective, efficient, and accurate data analysis, systematic data preprocessing is essential.

10 What is data integration? Data integration involves combining data from different sources to give the end users a unified data view.

11 What is data cleaning? The data‐cleaning process fills in the missing values, corrects the errors and inconsistencies, and removes redundancy in the data to improve the data quality. The larger the heterogeneity of the data sources, the higher the degree of dirtiness. Consequently, more cleaning steps may be involved.

12 What is data reduction? Data processing on massive data volume may take a long time, making data analysis either infeasible or impractical. Data reduction is the concept of reducing the volume of data or reducing the dimension of the data, that is, the number of attributes. Data reduction techniques are adopted to analyze the data in reduced format without losing the integrity of the actual data and yet yield quality outputs.

13 What is data transformation? Data transformation refers to transforming or consolidating the data into an appropriate format that is acceptable by the big data database and converting them into logical and meaningful information for data management and analysis.

Frequently Asked Interview Questions