Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

In situations where the components of картинка 698are in different units, stopping simulation when the variability in the estimator is small compared to the size of the estimate is natural. For a choice of norm a relativemagnitude sequential stopping rule terminates simulation at This - фото 699, a relative‐magnitude sequential stopping rule terminates simulation at

This termination rule essentially controls the coefficient of variation for - фото 700

This termination rule essentially controls the coefficient of variation for картинка 701. An advantage here is that problem‐free choices of картинка 702can be used since problems where картинка 703is small will automatically require smaller cutoff. A clear disadvantage is that this rule is ineffective when картинка 704.

5.2 MCMC

Although both картинка 705and картинка 706may be used in MCMC, a third alternative arises due to the correlation in the Markov chain. A relative‐standard deviation sequential stopping rule terminates the simulation when the Monte Carlo variability (as measured by the volume of the confidence region) is small compared to the underlying variability inherent to the problem That is If this rule is used for IID Monte Carlo then in Equation - фото 707. That is,

If this rule is used for IID Monte Carlo then in Equation 2 is and - фото 708

If this rule is used for IID Monte Carlo, then Computational Statistics in Data Science - изображение 709in Equation ( 2) is Computational Statistics in Data Science - изображение 710, and Computational Statistics in Data Science - изображение 711for some other (deterministic) картинка 712. For MCMC, this sequential stopping rule connects directly to the concept of effective sample size [26]. That is, stopping at is equivalent to stopping when 7 Thus simulation is terminated when the - фото 713is equivalent to stopping when

(7) Thus simulation is terminated when the number of effective samples is larger - фото 714

Thus, simulation is terminated when the number of effective samples is larger than the lower bound in Equation ( 7). Effective sample size measures the number of equivalent IID samples that would produce equivalent variability in картинка 715. Terminating simulation using Equation ( 7) is intuitive and easy to implement in MCMC sampling once appropriate estimators of картинка 716and картинка 717have been obtained.

6 Workflow

We have presented tools for determining when to stop a Monte Carlo simulation. The workflow starts by identifying картинка 718and картинка 719and then running a chosen sampler for some small картинка 720iterations. Preliminary estimates of картинка 721and картинка 722or картинка 723are obtained along with visualizations determining quality of the sampler. The simulation continues until a chosen stopping rule indicates termination using a prespecified картинка 724. In the following section, we present three examples where we demonstrate this workflow.

In our examples, we assume that a CLT (or asymptotic distribution) for Monte Carlo estimators exists. However, extra care must be taken when working with a generic Monte Carlo procedure. Particularly, importance sampling can often yield estimators with infinite variances, where a CLT cannot hold. See Refs [3, 4] for more details. A CLT is particularly difficult to establish for MCMC due to serial correlation in the Markov chain. However, many individual Markov chains have been shown to be at least polynomially ergodic, for examples, see Jarner and Hansen [30], Roberts and Tweedie [31], Vats [32], Khare and Hobert [33], Tan et al . [34], Hobert and Geyer [35], Jones and Hobert [36].

A similar workflow can be adopted for embarrassingly parallel implementations of Monte Carlo samplers. Given the power of the modern personal computer, most Monte Carlo samplers can run on multiple cores simultaneously, producing more samples in the same clock time. For IID Monte Carlo, averaging estimators across all independent runs is reasonable. However, for estimating картинка 725in MCMC, estimation quality can be improved by sharing information across multiple runs at the end of the simulation, see Gupta and Vats [37] for more details.

Sequential stopping rules, particularly in MCMC, should not be implemented as a black‐box procedure. Each implementation of the stopping rule must be accompanied with visualizations that give qualitative insights about the quality of the samplers. A better quality sampler can significantly improve estimation and lead to smaller run times. We illustrate this point by comparing samplers in our examples.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Computational Statistics in Data Science»

Представляем Вашему вниманию похожие книги на «Computational Statistics in Data Science» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Computational Statistics in Data Science»

Обсуждение, отзывы о книге «Computational Statistics in Data Science» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x