Maria Cristina Mariani - Data Science in Theory and Practice

Здесь есть возможность читать онлайн «Maria Cristina Mariani - Data Science in Theory and Practice» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Data Science in Theory and Practice: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Science in Theory and Practice»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

DATA SCIENCE IN THEORY AND PRACTICE delivers a comprehensive treatment of the mathematical and statistical models useful for analyzing data sets arising in various disciplines, like banking, finance, health care, bioinformatics, security, education, and social services. Written in five parts, the book examines some of the most commonly used and fundamental mathematical and statistical concepts that form the basis of data science. The authors go on to analyze various data transformation techniques useful for extracting information from raw data, long memory behavior, and predictive modeling. The book offers readers a multitude of topics all relevant to the analysis of complex data sets. Along with a robust exploration of the theory underpinning data science, it contains numerous applications to specific and practical problems. The book also provides examples of code algorithms in R and Python and provides pseudo-algorithms to port the code to any other language. Ideal for students and practitioners without a strong background in data science, readers will also learn from topics like: Analyses of foundational theoretical subjects, including the history of data science, matrix algebra and random vectors, and multivariate analysis A comprehensive examination of time series forecasting, including the different components of time series and transformations to achieve stationarity Introductions to both the R and Python programming languages, including basic data types and sample manipulations for both languages An exploration of algorithms, including how to write one and how to perform an asymptotic analysis A comprehensive discussion of several techniques for analyzing and predicting complex data sets Perfect for advanced undergraduate and graduate students in Data Science, Business Analytics, and Statistics programs,
will also earn a place in the libraries of practicing data scientists, data and business analysts, and statisticians in the private sector, government, and academia.

Data Science in Theory and Practice — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Science in Theory and Practice», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Data storage: Data for batch processing operations is typically stored in a distributed file store that can hold high volumes of large files in various formats. This kind of store is often called a data lake. A data lake is a storage repository that allows one to store structured and unstructured data at any scale until it is needed.

Batch processing: Since data sets are enormous, often a big data solution must process data files using long‐running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Normally, these jobs involve reading source files, processing them, and writing the output to new files. Options include running U‐SQL jobs or using Java, Scala, R, or Python programs. U-SQL is a data processing language that merges the benefits of SQL with the expressive power of ones own code.

Real‐time message ingestion: If the solution includes real‐time sources, the architecture must include a way to capture and store real‐time messages for stream processing. This might be a simple data store, where incoming messages are stored into a folder for processing. However, many solutions need a message ingestion store to act as a buffer for messages and to support scale‐out processing, reliable delivery, and other message queuing semantics.

Stream processing: After obtaining real‐time messages, the solution must process them by filtering, aggregating, and preparing the data for analysis. The processed stream data is then written to an output sink.

Analytical data store: Several big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using analytical tools. The analytical data store used to serve these queries can be a Kimball‐style relational data warehouse, as observed in most classical business intelligence (BI) solutions. Alternatively, the data could be presented through a low‐latency NoSQL technology, such as HBase, or an interactive Hive database that provides a metadata abstraction over data files in the distributed data store.

Analysis and reporting: The goal of most big data solutions is to provide insights into the data through analysis and reporting. Users can analyze the data using mathematical and statistical models as well using data visualization techniques. Analysis and reporting can also take the form of interactive data exploration by data scientists or data analysts.

Orchestration: Several big data solutions consist of repeated data processing operations, encapsulated in workflows, that transform source data, move data between multiple sources and sinks, load the processed data into an analytical data store, or move the results to a report or dashboard.

2 Matrix Algebra and Random Vectors

2.1 Introduction

The matrix algebra and random vectors presented in this chapter will enable us to precisely state statistical models. We will begin by discussing some basic concepts that will be essential throughout this chapter. For more details on matrix algebra please consult (Axler 2015).

2.2 Some Basics of Matrix Algebra

2.2.1 Vectors

Definition 2.1 (Vector)A vector Data Science in Theory and Practice - изображение 36is an array of real numbers Data Science in Theory and Practice - изображение 37, and it is written as:

Data Science in Theory and Practice - изображение 38

Definition 2.2 (Scaler multiplication of vectors)The product of a scalar and a vector is the vector obtained by multiplying each entry in the vector - фото 39, and a vector is the vector obtained by multiplying each entry in the vector by the scalar:

Definition 23 Vector additionThe sum of two vectors of the same size is the - фото 40

Definition 2.3 (Vector addition)The sum of two vectors of the same size is the vector obtained by adding corresponding entries in the vectors:

so that is the vector with the th element - фото 41

so that картинка 42is the vector with the картинка 43th element картинка 44.

2.2.2 Matrices

Definition 2.4 (Matrix)Let картинка 45and картинка 46denote positive integers. An картинка 47‐by‐ картинка 48matrix is a rectangular array of real numbers with rows and columns The notation - фото 49rows and columns The notation denotes the entry in row - фото 50columns:

The notation denotes the entry in row column - фото 51

The notation картинка 52denotes the entry in row картинка 53, column картинка 54of In other words the first index refers to the row number and the second index - фото 55. In other words, the first index refers to the row number and the second index refers to the column number.

Example 2.1

then Definition 25 Transpose of a matrixThe transpose operation - фото 56

then картинка 57.

Definition 2.5 (Transpose of a matrix)The transpose operation Data Science in Theory and Practice - изображение 58of a matrix changes the columns into rows, i.e. in matrix notation Data Science in Theory and Practice - изображение 59, where “ denotes transpose Example 22 Definition 26 Scaler multiplication of a - фото 60” denotes transpose.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Data Science in Theory and Practice»

Представляем Вашему вниманию похожие книги на «Data Science in Theory and Practice» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Data Science in Theory and Practice»

Обсуждение, отзывы о книге «Data Science in Theory and Practice» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x