Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

100 100 Nielsen, M.A. and Chuang, I. (2002) Quantum computation and quantum information, Cambridge University Press.

101 101 Grover, L.K. (1996) A Fast Quantum Mechanical Algorithm for Database Search. Proceedings of the Twenty‐Eighth Annual ACM Symposium on Theory of Computing, pp. 212–219.

102 102 Boyer, M., Brassard, G., Høyer, P., and Tapp, A. (1998) Tight bounds on quantum searching. Fortschritte der Physik: Progress of Physics, 46, 493–505.

103 103 Jordan, S.P. (2005) Fast quantum algorithm for numerical gradient estimation. Phys. Rev. Lett., 95, 050501.

104 104 Harrow, A.W., Hassidim, A., and Lloyd, S. (2009) Quantum algorithm for linear systems of equations. Phys. Rev. Lett., 103, 150502.

105 105 Aaronson, S. (2015) Read the fine print. Nat. Phys., 11, 291–293.

106 106 COPSS (2020) Committee of Presidents of Statistical Societies, https://community.amstat.org/copss/awards/winners(accessed 31 August 2020).

107 107 Wickham, H. (2007) Reshaping data with the reshape package. J. Stat. Soft., 21, 1–20.

108 108 Wickham, H. (2011) The split‐apply‐combine strategy for data analysis. J. Stat. Soft., 40, 1–29.

109 109 Wickham, H. (2014) Tidy data. J. Stat. Soft., 59, 1–23.

110 110 Kahle, D. and Wickham, H. (2013) ggmap: spatial visualization with ggplot2. R J., 5, 144–161.

111 111 Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis, Springer.

2 Statistical Software

Alfred G. Schissler and Alexander D. Knudson

The University of Nevada, Reno, NV, USA

This chapter discusses selected statistical software in a format that will inform users transitioning from basic applications to more advanced applications, including elaborate statistical modeling and machine learning (ML), simulation design, and big data situations. We begin with discussions on the most popular statistical software. In the course of these expositions, we provide some historical context for the computing environment, discuss the foundational principles for the development of the language (purpose), discuss user environments/workflows, and analyze strengths and shortcomings for the language (compared to other popular/notable statistical software), language support, among other software features.

Next, we briefly mention an array of software used for statistical applications. We discuss the specific purpose of each software and how the tool fills a need for data scientists. The aim here is to be fairly complete to provide a comprehensive viewpoint of the statistical software ecosystem and to leave readers with some familiarity with the most prevalent languages and software.

After the presentation of noteworthy software, we transition to describing a handful of emerging and promising statistical computing technologies. Our goal in these sections is to guide users who wish to be early adopters for a software application or readers facing a scale‐limiting aspect to their current statistical programming language. Some of the latest tools for big data statistical applications are discussed in these sections.

To orientate the reader to the discussion below, two tables are provided. Table 1includes a list of the software described in the chapter. Throughout, we discuss user environments and workflow considerations to provide practical guidance, aiming to increase efficiency and describe typical use cases. Table 2summarizes these environments included in the sections that follow.

1 User Development Environments

We begin by discussing user environments rather than focusing on specific statistical programming languages. The subsections below contain descriptions of some selected user development environments and related tools. This introductory material may be omitted if desired, and one can safely proceed to Section 2for descriptions of the most popular statistical software.

Table 1 Summary of selected statistical software.

Software Open source Classification Style Notes
Python Y Popular Programming Versatile, popular
R Y Popular Programming Academia/Industry, active community
SAS N Popular Programming Strong historical following
SPSS N Popular GUI: menu, dialogs Popular in scholarly work
C++ Y Notable Programming Fast, low‐level
Excel N Notable GUI: menu, dialogs Simple, works well for rectangular data
GNU Octave Y Notable Mixed Open source counterpart to MATLAB
Java Y Notable Programming Cross‐platform, portable
JavaScript, Typescript Y Notable Programming Popular, cross‐platform
Maple N Notable Mixed Academia, algebraic manipulation
MATLAB N Notable Mixed Speedy, popular among engineers
Minitab N Notable GUI: menu, dialogs Suitable for teaching and simple analysis
SQL Y Notable Programming Necessary tool for databases
Stata N Notable GUI: menu, dialogs Popular in scholary works
Tableau N Notable GUI: menu, dialogs Popular for business analytics
Julia Y Promising Programming Speedy, underdeveloped
Scala Y Promising Programming Typed version of Java, less boilerplate code

Table 2 Summary of selected user environments/workflows.

Software Virtual environment Multiple languages Remote integration Notes
Emacs, Vim N Y Y Extensible, steep learning curve
Jupyter project Y Y Y Open source, interactive data science
RStudio Y Y Y Excellent at creating reproducible reports/docs

1.1 Extensible Text Editors: Emacs and Vim

GNU's text‐editor Emacs ( https://www.gnu.org/software/emacs/) is completely free software and offers a powerful solution to working with statistical software. Emacs (or EMACS) is an extensible and customizable text editor that could be used to complete the majority of all computer‐based tasks. Once a user learns the keyboard‐centric user interface through muscle memory, editing text for reports or coding becomes rapid and outpaces point‐and‐click style approaches. Emacs works on all major operating systems and gives near‐seamless interaction on Linux‐based computing clusters. The extensibility ensures that while the latest tools develop and change, your interface will remain constant. This quality will provide confidence to adopt new tools and adapt to new trends in software.

Using Emacs for specifically statistical computing, we note the excellent add‐on package called Emacs Speaks Statistics (ESS) that offers a unified user interface for R, S‐Plus, SAS, Stata, and OpenBUGS/JAGS, among other popular statistical packages. An easy‐to‐use package manager provides quick ESS installation. Once installed, a basic workflow would be to open an associated file type (.R,.Rmarkdown, etc.) to trigger ESS mode. In ESS mode, code is highlighted, tab completion enabled for rapid code generation and editing, and help documentation integrated. Code can be interactively evaluated in separate processes (e.g., a single or even multiple R sessions), or code can be run noninteractively through Emacs‐displayed shell processes. Statistical visualizations are displayed in separate windows for easy plot development. As mentioned above, one can work seamlessly on remote servers (using TRAMP mode). This greatly reduces the inefficiencies inherent to switching between local and remote machines.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Computational Statistics in Data Science»

Представляем Вашему вниманию похожие книги на «Computational Statistics in Data Science» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Computational Statistics in Data Science»

Обсуждение, отзывы о книге «Computational Statistics in Data Science» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x