Seifedine Kadry - Big Data

Здесь есть возможность читать онлайн «Seifedine Kadry - Big Data» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Big Data: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Big Data»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Learn Big Data from the ground up with this complete and up-to-date resource from leaders in the field  Big Data: Concepts, Technology, and Architecture You’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work. 
Big Data Accessibly organized, 
 includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include: 
The common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns Relational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases Virtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization Apache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive The Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization Perfect for data scientists, data engineers, and database managers, 
 also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book.

Big Data — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Big Data», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

1 Introduction to the World of Big Data

CHAPTER OBJECTIVE

This chapter deals with the introduction to big data, defining what actually big data means. The limitations of the traditional database, which led to the evolution of Big Data, are explained, and insight into big data key concepts is delivered. A comparative study is made between big data and traditional database giving a clear picture of the drawbacks of the traditional database and advantages of big data. The three Vs of big data (volume, velocity, and variety) that distinguish it from the traditional database are explained. With the evolution of big data, we are no longer limited to the structured data. The different types of human‐ and machine-generated data—that is, structured, semi-structured, and unstructured—that can be handled by big data are explained. The various sources contributing to this massive volume of data are given a clear picture. The chapter expands to show the various stages of big data life cycle starting from data generation, acquisition, preprocessing, integration, cleaning, transformation, analysis, and visualization to make business decisions. This chapter sheds light on various challenges of big data due to its heterogeneity, volume, velocity, and more.

1.1 Understanding Big Data

With the rapid growth of Internet users, there is an exponential growth in the data being generated. The data is generated from millions of messages we send and communicate via WhatsApp, Facebook, or Twitter, from the trillions of photos taken, and hours and hours of videos getting uploaded in YouTube every single minute. According to a recent survey 2.5 quintillion (2 500 000 000 000 000 000, or 2.5 × 10 18) bytes of data are generated every day. This enormous amount of data generated is referred to as “big data.” Big data does not only mean that the data sets are too large, it is a blanket term for the data that are too large in size, complex in nature, which may be structured or unstructured, and arriving at high velocity as well. Of the data available today, 80 percent has been generated in the last few years. The growth of big data is fueled by the fact that more data are generated on every corner of the world that needs to be captured.

Capturing this massive data gives only meager value unless this IT value is transformed into business value. Managing the data and analyzing them have always been beneficial to the organizations; on the other hand, converting these data into valuable business insights has always been the greatest challenge. Data scientists were struggling to find pragmatic techniques to analyze the captured data. The data has to be managed at appropriate speed and time to derive valuable insight from it. These data are so complex that it became difficult to process it using traditional database management systems, which triggered the evolution of the big data era. Additionally, there were constraints on the amount of data that traditional databases could handle. With the increase in the size of data either there was a decrease in performance and increase in latency or it was expensive to add additional memory units. All these limitations have been overcome with the evolution of big data technologies that lets us capture, store, process, and analyze the data in a distributed environment. Examples of Big data technologies are Hadoop, a framework for all big data process, Hadoop Distributed File System (HDFS) for distributed cluster storage, and MapReduce for processing.

1.2 Evolution of Big Data

The first documentary appearance of big data was in a paper in 1997 by NASA scientists narrating the problems faced in visualizing large data sets, which were a captivating challenge for the data scientists. The data sets were large enough, taxing more memory resources. This problem is termed big data. Big data, the broader concept, was first put forward by a noted consultancy: McKinsey. The three dimensions of big data, namely, volume, velocity, and variety, were defined by analyst Doug Laney. The processing life cycle of big data can be categorized into acquisition, preprocessing, storage and management, privacy and security, analyzing, and visualization.

The broader term big data encompasses everything that includes web data, such as click stream data, health data of patients, genomic data from biologic research, and so forth.

Figure 1.1shows the evolution of big data. The growth of the data over the years is massive. It was just 600 MB in the 1950s but has grown by 2010 up to 100 petabytes, which is equal to 100 000 000 000 MB.

Figure 11 Evolution of Big Data 13 Failure of Traditional Database in - фото 2

Figure 1.1 Evolution of Big Data.

1.3 Failure of Traditional Database in Handling Big Data

The Relational Database Management Systems (RDBMS) was the most prevalent data storage medium until recently to store the data generated by the organizations. A large number of vendors provide database systems. These RDBMS were devised to store the data that were beyond the storage capacity of a single computer. The inception of a new technology is always due to limitations in the older technologies and the necessity to overcome them. Below are the limitations of traditional database in handling big data.

Exponential increase in data volume, which scales in terabytes and petabytes, has turned out to become a challenge to the RDBMS in handling such a massive volume of data.

To address this issue, the RDBMS increased the number of processors and added more memory units, which in turn increased the cost.

Almost 80% of the data fetched were of semi‐structured and unstructured format, which RDBMS could not deal with.

RDBMS could not capture the data coming in at high velocity.

Table 1.1shows the differences in the attributes of RDBMS and big data.

1.3.1 Data Mining vs. Big Data

Table 1.2shows a comparison between data mining and big data.

Table 1.1 Differences in the attributes of big data and RDBMS.

ATTRIBUTES RDBMS BIG DATA
Data volume gigabytes to terabytes petabytes to zettabytes
Organization centralized distributed
Data type structured unstructured and semi‐structured
Hardware type high‐end model commodity hardware
Updates read/write many times write once, read many times
Schema static dynamic

Table 1.2 Data Mining vs. Big Data.

S. No. Data mining Big data
1) Data mining is the process of discovering the underlying knowledge from the data sets. Big data refers to massive volume of data characterized by volume, velocity, and variety.
2) Structured data retrieved from spread sheets, relational databases, etc. Structured, unstructured, or semi‐structured data retrieved from non‐relational databases, such as NoSQl.
3) Data mining is capable of processing large data sets, but the data processing costs are high. Big data tools and technologies are capable of storing and processing large volumes of data at a comparatively lower cost.
4) Data mining can process only data sets that range from gigabytes to terabytes. Big data technology is capable of storing and processing data that range from petabytes to zettabytes.

1.4 3 Vs of Big Data

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Big Data»

Представляем Вашему вниманию похожие книги на «Big Data» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Big Data»

Обсуждение, отзывы о книге «Big Data» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x