Maria Cristina Mariani - Data Science in Theory and Practice

Здесь есть возможность читать онлайн «Maria Cristina Mariani - Data Science in Theory and Practice» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Data Science in Theory and Practice: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Science in Theory and Practice»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

DATA SCIENCE IN THEORY AND PRACTICE delivers a comprehensive treatment of the mathematical and statistical models useful for analyzing data sets arising in various disciplines, like banking, finance, health care, bioinformatics, security, education, and social services. Written in five parts, the book examines some of the most commonly used and fundamental mathematical and statistical concepts that form the basis of data science. The authors go on to analyze various data transformation techniques useful for extracting information from raw data, long memory behavior, and predictive modeling. The book offers readers a multitude of topics all relevant to the analysis of complex data sets. Along with a robust exploration of the theory underpinning data science, it contains numerous applications to specific and practical problems. The book also provides examples of code algorithms in R and Python and provides pseudo-algorithms to port the code to any other language. Ideal for students and practitioners without a strong background in data science, readers will also learn from topics like: Analyses of foundational theoretical subjects, including the history of data science, matrix algebra and random vectors, and multivariate analysis A comprehensive examination of time series forecasting, including the different components of time series and transformations to achieve stationarity Introductions to both the R and Python programming languages, including basic data types and sample manipulations for both languages An exploration of algorithms, including how to write one and how to perform an asymptotic analysis A comprehensive discussion of several techniques for analyzing and predicting complex data sets Perfect for advanced undergraduate and graduate students in Data Science, Business Analytics, and Statistics programs,
will also earn a place in the libraries of practicing data scientists, data and business analysts, and statisticians in the private sector, government, and academia.

Data Science in Theory and Practice — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Science in Theory and Practice», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

1.3 Who is a Data Scientist?

The term “data scientist” was invented as recently as 2008 when companies realized the need for data professionals who are skilled in organizing and analyzing massive amounts of data. Data scientists are quantitative and analytical data experts who utilize their skills in both technology and social science to find trends and manage the data around them. With the growth of big data integration in business, they have evolved at the forefront of the data revolution. They are part mathematicians, statisticians, computer programmers, and analysts who are equipped with a diverse and wide‐ranging skill set, balancing knowledge in several computer programming languages with advanced experience in statistical learning and data visualization.

There is not a definitive job description when it comes to a data scientist role. However, we outline here some stuffs they do:

Collecting and recording large amounts of unruly data and transforming it into a more usable format.

Solving business‐related problems using data‐driven techniques.

Working with a variety of programming languages, including SAS, Minitab, R, and Python.

Having a strong background of mathematics and statistics including statistical tests and distributions.

Staying on top of quantitative and analytical techniques such as machine learning, deep learning, and text analytics.

Communicating and collaborating with both IT and business.

Looking for order and patterns in data, as well as spotting trends that enables businesses to make informed decisions.

Some of the useful tools that every data scientist or practitioner needs are outlined below:

Data preparation: The process of cleaning and transforming raw data into suitable formats prior to processing and analysis.

Data visualization: The presentation of data in a pictorial or graphical format so it can be easily analyzed.

Statistical learning or Machine learning: A branch of artificial intelligence based on mathematical algorithms and automation. Artificial intelligence (AI) refers to the process of building smart machines capable of performing tasks that typically require human intelligence. They are designed to make decisions, often using real-time data. Real-time data are information that is passed along to the end user immediately it is gathered.

Deep learning: An area of statistical learning research that uses data to model complex abstractions.

Pattern recognition: Technology that recognizes patterns in data (often used interchangeably with machine learning).

Text analytics: The process of examining unstructured data and drawing meaning out of written communication.

We will discuss all the above tools in details in this book. There are several scientific and programming skills that every data scientist should have. They must be able to utilize key technical tools and skills, including R, Python, SAS, SQL, Tableau, and several others. Due to the ever growing technology, data scientist must always learn new and emerging techniques to stay on top of their game. We will discuss the R and Python programming in Chapters 5and 6.

1.4 Big Data

Big data is a term applied to ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by classical data‐processing tools. In particular, it refers to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low latency. Sources of big data includes data from sensors, stock market, devices, video/audio, networks, log files, transactional applications, web, and social media and much of it generated in real time and at a very large scale.

In recent times, the use of the term “big data” (both stored and real‐time) tend to refer to the use of user behavior analytics (UBA), predictive analytics, or certain other advanced data analytics methods that extract value from data. UBA solutions look at patterns of human behavior, and then apply algorithms and statistical analysis to detect meaningful anomalies from those patterns' anomalies that indicate potential threats. For example detection of hackers, detection of insider threats, targeted attacks, financial fraud, and several others.

Predictive analytics deals with the process of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Generally, predictive analytics does not tell you what will happen in the future. However, it forecasts what might happen in the future with some degree of certainty. Predictive analytics goes hand in hand with big data: Businesses and organizations collect large amounts of real‐time customer data and predictive analytics and uses this historical data, combined with customer insight, to forecast future events. Predictive analytics helps organizations to use big data to move from a historical view to a forward‐looking perspective of the customer. In this book, we will discuss several methods for analyzing big data.

1.4.1 Characteristics of Big Data

Big data has one or more of the following characteristics: high volume, high velocity, high variety, and high veracity. That is, the data sets are characterized by huge amounts (volume) of frequently updated data (velocity) in various types, such as numeric, textual, audio, images and videos (variety), with high quality (veracity). We briefly discuss each in detail. Volume: Volume describes the quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not. Velocity: Velocity describes the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in both stored and real‐time. Compared to small data, big data are produced more continually (it could be nanosecond, second, minute, hours, etc.). Two types of velocity related to big data are the frequency of generation and the frequency of handling, recording, and reporting. Variety: Variety describes the type and formats of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from different formats and completes missing pieces through data fusion. Data fusion is a term used to describe the technique of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source. Veracity: Veracity describes the quality of data and the data value. The quality of data obtained can greatly affect the accuracy of the analyzed results. In the next subsection we will discuss some big data architectures. A comprehensive study of this topic can be found in the application architecture guide of the Microsoft technical documentation.

1.4.2 Big Data Architectures

Big data architectures are designed to handle the ingestion, processing, and analysis of data that is too large or complex for classical data-processing application tools. Some popular big data architectures are the Lambda architecture, Kappa architecture and the Internet of Things (IoT). We refer the reader to the Microsoft technical documentation on Big data architectures for a detailed discussion on the different architectures. Almost all big data architectures include all or some of the following components:

Data sources: All big data solutions begin with one or more data sources. Some common data sources includes the following: Application data stores such as relational databases, static files produced by applications such as web server log files, and real‐time data sources such as the Internet of Things (IoT) devices.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Data Science in Theory and Practice»

Представляем Вашему вниманию похожие книги на «Data Science in Theory and Practice» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Data Science in Theory and Practice»

Обсуждение, отзывы о книге «Data Science in Theory and Practice» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x