Maria Cristina Mariani - Data Science in Theory and Practice

Здесь есть возможность читать онлайн «Maria Cristina Mariani - Data Science in Theory and Practice» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Data Science in Theory and Practice: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Data Science in Theory and Practice»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

DATA SCIENCE IN THEORY AND PRACTICE delivers a comprehensive treatment of the mathematical and statistical models useful for analyzing data sets arising in various disciplines, like banking, finance, health care, bioinformatics, security, education, and social services. Written in five parts, the book examines some of the most commonly used and fundamental mathematical and statistical concepts that form the basis of data science. The authors go on to analyze various data transformation techniques useful for extracting information from raw data, long memory behavior, and predictive modeling. The book offers readers a multitude of topics all relevant to the analysis of complex data sets. Along with a robust exploration of the theory underpinning data science, it contains numerous applications to specific and practical problems. The book also provides examples of code algorithms in R and Python and provides pseudo-algorithms to port the code to any other language. Ideal for students and practitioners without a strong background in data science, readers will also learn from topics like: Analyses of foundational theoretical subjects, including the history of data science, matrix algebra and random vectors, and multivariate analysis A comprehensive examination of time series forecasting, including the different components of time series and transformations to achieve stationarity Introductions to both the R and Python programming languages, including basic data types and sample manipulations for both languages An exploration of algorithms, including how to write one and how to perform an asymptotic analysis A comprehensive discussion of several techniques for analyzing and predicting complex data sets Perfect for advanced undergraduate and graduate students in Data Science, Business Analytics, and Statistics programs,
will also earn a place in the libraries of practicing data scientists, data and business analysts, and statisticians in the private sector, government, and academia.

Data Science in Theory and Practice — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Data Science in Theory and Practice», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Table 12.4Discriminant scores for Citigroup in 2009 and IAG stock in 2011.

Table 13.1Data matrix.

Table 13.2Distance matrix.

Table 13.3Stress and goodness of fit.

Table 13.4Data matrix.

Table 14.1Models' performances on the test dataset with 23 variables using AUC and mean square error (MSE) values for the five models.

Table 14.2Top 10 variables selected by the Random forest algorithm.

Table 14.3Performance for the four models using the top 10 features from model Random forest on the test dataset.

Table 15.1Market basket transaction data.

Table 15.2A binary картинка 30representation of market basket transaction data.

Table 15.3Grocery transactional data.

Table 15.4Transaction data.

Table 16.1Models performances on the test dataset.

Table 18.1Percentage of power for Discover data.

Table 18.2Percentage of power for JPM data.

Table 18.3Percentage of power for Microsoft data.

Table 18.4Percentage of power for Walmart data.

Table 19.1Determining картинка 31and картинка 32for картинка 33.

Table 19.2Percentage of total power (energy) for Albuquerque, New Mexico (ANMO) seismic station.

Table 19.3Percentage of total power (energy) for Tucson, Arizona (TUC) seismic station.

Table 21.1Moments of the Poisson distribution with intensity картинка 34.

Table 21.2Moments of the картинка 35distribution.

Table 21.3Scaling exponents of Volcanic Data time series.

Preface

This textbook is dedicated to practitioners, graduate, and advanced undergraduate students who have interest in Data Science, Business analytics, and Statistical and Mathematical Modeling in different disciplines such as Finance, Geophysics, and Engineering. This book is designed to serve as a textbook for several courses in the aforementioned areas and a reference guide for practitioners in the industry.

The book has a strong theoretical background and several applications to specific practical problems. It contains numerous techniques applicable to modern data science and other disciplines. In today's world, many fields are confronted with increasingly large amounts of complex data. Financial, healthcare, and geophysical data sampled with high frequency is no exception. These staggering amounts of data pose special challenges to the world of finance and other disciplines such as healthcare and geophysics, as traditional models and information technology tools can be poorly suited to grapple with their size and complexity. Probabilistic modeling, mathematical modeling, and statistical data analysis attempt to discover order from apparent disorder; this textbook may serve as a guide to various new systematic approaches on how to implement these quantitative activities with complex data sets.

The textbook is split into five distinct parts. In the first part of this book, foundations of Data Science, we will discuss some fundamental mathematical and statistical concepts which form the basis for the study of data science. In the second part of the book, Data Science in Practice, we will present a brief introduction to R and Python programming and how to write algorithms. In addition, various techniques for data preprocessing, validations, and visualizations will be discussed. In the third part, Data Mining and Machine Learning techniques for Complex Data Sets and fourth part of the book, Advanced Models for Big Data Analytics and Complex Data Sets, we will provide exhaustive techniques for analyzing and predicting different types of complex data sets.

We conclude this book with a discussion of ethics in data science: With great power comes great responsibility.

The authors express their deepest gratitude to Wiley for making the publication a reality.

El Paso, TX and Mahwah, NJ, USA

September 2021

Maria Cristina MarianiOsei Kofi TweneboahMaria Pia Beccar‐Varela

1 Background of Data Science

1.1 Introduction

Data science is one of the most promising and high‐demand career paths for skilled professionals in the 21st century. Currently, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, statistical learning, and programming skills. In order to explore and discover useful information for their companies or organizations, data scientists must have a good grip of the full spectrum of the data science life cycle and have a level of flexibility and understanding to maximize returns at each phase of the process.

Data science is a “concept to unify statistics, mathematics, computer science, data analysis, machine learning and their related methods” in order to find trends, understand, and analyze actual phenomena with data. Due to the Coronavirus disease (COVID-19) many colleges, institutions, and large organizations asked their nonessential employees to work virtually. The virtual meetings have provided colleges and companies with plenty of data. Some aspect of the data suggest that virtual fatigue is on the rise. Virtual fatigue is defined as the burnout associated with the over dependence on virtual platforms for communication. Data science provides tools to explore and reveal the best and worst aspects of virtual work.

In the past decade, data scientists have become necessary assets and are present in almost all institutions and organizations. These professionals are data‐driven individuals with high‐level technical skills who are capable of building complex quantitative algorithms to organize and synthesize large amounts of information used to answer questions and drive strategy in their organization. This is coupled with the experience in communication and leadership needed to deliver tangible results to various stakeholders across an organization or business.

Data scientists need to be curious and result‐oriented, with good knowledge (domain specific) and communication skills that allow them to explain very technical results to their nontechnical counterparts. They possess a strong quantitative background in statistics and mathematics as well as programming knowledge with focuses in data warehousing, mining, and modeling to build and analyze algorithms. In fact, data scientists are a group of analytical data expert who have the technical skills to solve complex problems and the curiosity to explore how problems need to be solved.

1.2 Origin of Data Science

Data scientists are part mathematicians, statisticians and computer scientists. And because they span both the business and information technology (IT) worlds, they're in high demand and well‐paid. Data scientists were not very popular some decades ago; however, their sudden popularity reflects how businesses now think about “Big data.” Big data is defined as a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data‐processing application software. That bulky mass of unstructured information can no longer be ignored and forgotten. It is a virtual gold mine that helps boost revenue as long as there is someone who explores and discovers business insights that no one thought to look for before. Many data scientists began their careers as statisticians or business analyst or data analysts. However, as big data began to grow and evolve, those roles evolved as well. Data is no longer just an add on for IT to handle. It is vital information that requires analysis, creative curiosity, and the ability to interpret high‐tech ideas into innovative ways to make profit and to help practitioners make informed decisions.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Data Science in Theory and Practice»

Представляем Вашему вниманию похожие книги на «Data Science in Theory and Practice» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Data Science in Theory and Practice»

Обсуждение, отзывы о книге «Data Science in Theory and Practice» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x