LibCat » Книги » Приключения » unrecognised » Computational Statistics in Data Science

Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Computational Statistics in Data Science
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

In situations where the components of картинка 698 are in different units, stopping simulation when the variability in the estimator is small compared to the size of the estimate is natural. For a choice of norm a relativemagnitude sequential stopping rule terminates simulation at This - фото 699 , a relative‐magnitude sequential stopping rule terminates simulation at

This termination rule essentially controls the coefficient of variation for - фото 700

This termination rule essentially controls the coefficient of variation for картинка 701 . An advantage here is that problem‐free choices of картинка 702 can be used since problems where картинка 703 is small will automatically require smaller cutoff. A clear disadvantage is that this rule is ineffective when картинка 704 .

5.2 MCMC

Although both картинка 705 and картинка 706 may be used in MCMC, a third alternative arises due to the correlation in the Markov chain. A relative‐standard deviation sequential stopping rule terminates the simulation when the Monte Carlo variability (as measured by the volume of the confidence region) is small compared to the underlying variability inherent to the problem That is If this rule is used for IID Monte Carlo then in Equation - фото 707 . That is,

If this rule is used for IID Monte Carlo then in Equation 2 is and - фото 708

If this rule is used for IID Monte Carlo, then Computational Statistics in Data Science - изображение 709 in Equation ( 2) is , and for some other (deterministic) картинка 712 . For MCMC, this sequential stopping rule connects directly to the concept of effective sample size [26]. That is, stopping at is equivalent to stopping when

(7) Thus simulation is terminated when the number of effective samples is larger - фото 714

Thus, simulation is terminated when the number of effective samples is larger than the lower bound in Equation ( 7). Effective sample size measures the number of equivalent IID samples that would produce equivalent variability in картинка 715 . Terminating simulation using Equation ( 7) is intuitive and easy to implement in MCMC sampling once appropriate estimators of картинка 716 and картинка 717 have been obtained.

6 Workflow

We have presented tools for determining when to stop a Monte Carlo simulation. The workflow starts by identifying картинка 718 and картинка 719 and then running a chosen sampler for some small картинка 720 iterations. Preliminary estimates of картинка 721 and картинка 722 or картинка 723 are obtained along with visualizations determining quality of the sampler. The simulation continues until a chosen stopping rule indicates termination using a prespecified картинка 724 . In the following section, we present three examples where we demonstrate this workflow.

In our examples, we assume that a CLT (or asymptotic distribution) for Monte Carlo estimators exists. However, extra care must be taken when working with a generic Monte Carlo procedure. Particularly, importance sampling can often yield estimators with infinite variances, where a CLT cannot hold. See Refs [3, 4] for more details. A CLT is particularly difficult to establish for MCMC due to serial correlation in the Markov chain. However, many individual Markov chains have been shown to be at least polynomially ergodic, for examples, see Jarner and Hansen [30], Roberts and Tweedie [31], Vats [32], Khare and Hobert [33], Tan et al . [34], Hobert and Geyer [35], Jones and Hobert [36].

A similar workflow can be adopted for embarrassingly parallel implementations of Monte Carlo samplers. Given the power of the modern personal computer, most Monte Carlo samplers can run on multiple cores simultaneously, producing more samples in the same clock time. For IID Monte Carlo, averaging estimators across all independent runs is reasonable. However, for estimating картинка 725 in MCMC, estimation quality can be improved by sharing information across multiple runs at the end of the simulation, see Gupta and Vats [37] for more details.

Sequential stopping rules, particularly in MCMC, should not be implemented as a black‐box procedure. Each implementation of the stopping rule must be accompanied with visualizations that give qualitative insights about the quality of the samplers. A better quality sampler can significantly improve estimation and lead to smaller run times. We illustrate this point by comparing samplers in our examples.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Похожие книги на «Computational Statistics in Data Science»

Представляем Вашему вниманию похожие книги на «Computational Statistics in Data Science» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.

Роман Зыков

Роман с Data Science. Как монетизировать большие данные &#91;litres]

Неизвестный Автор

Mathematics in Computational Science and Engineering

Tormod Næs

Multiblock Data Fusion in Statistics and Machine Learning

Никита Сергеев

Аналитика и Data Science. Для не-аналитиков и даже 100% гуманитариев…

Prof Carla Moreira

The Statistical Analysis of Doubly Truncated Data

Maria Cristina Mariani

Data Science in Theory and Practice

Эмили Робинсон

Data Science для карьериста

Lillian Pierson

Data Science For Dummies

Роман Зыков

Роман с Data Science. Как монетизировать большие данные

Field Cady

Data Science

Cole Stryker

Smarter Data Science

Schmidt Jutta

Smart Data statt Big Data

Отзывы о книге «Computational Statistics in Data Science»

Обсуждение, отзывы о книге «Computational Statistics in Data Science» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.