LibCat » Книги » Приключения » unrecognised » Computational Statistics in Data Science

Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Computational Statistics in Data Science
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

100 100 Nielsen, M.A. and Chuang, I. (2002) Quantum computation and quantum information, Cambridge University Press.

101 101 Grover, L.K. (1996) A Fast Quantum Mechanical Algorithm for Database Search. Proceedings of the Twenty‐Eighth Annual ACM Symposium on Theory of Computing, pp. 212–219.

102 102 Boyer, M., Brassard, G., Høyer, P., and Tapp, A. (1998) Tight bounds on quantum searching. Fortschritte der Physik: Progress of Physics, 46, 493–505.

103 103 Jordan, S.P. (2005) Fast quantum algorithm for numerical gradient estimation. Phys. Rev. Lett., 95, 050501.

104 104 Harrow, A.W., Hassidim, A., and Lloyd, S. (2009) Quantum algorithm for linear systems of equations. Phys. Rev. Lett., 103, 150502.

105 105 Aaronson, S. (2015) Read the fine print. Nat. Phys., 11, 291–293.

106 106 COPSS (2020) Committee of Presidents of Statistical Societies, https://community.amstat.org/copss/awards/winners(accessed 31 August 2020).

107 107 Wickham, H. (2007) Reshaping data with the reshape package. J. Stat. Soft., 21, 1–20.

108 108 Wickham, H. (2011) The split‐apply‐combine strategy for data analysis. J. Stat. Soft., 40, 1–29.

109 109 Wickham, H. (2014) Tidy data. J. Stat. Soft., 59, 1–23.

110 110 Kahle, D. and Wickham, H. (2013) ggmap: spatial visualization with ggplot2. R J., 5, 144–161.

111 111 Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis, Springer.

2 Statistical Software

Alfred G. Schissler and Alexander D. Knudson

The University of Nevada, Reno, NV, USA

This chapter discusses selected statistical software in a format that will inform users transitioning from basic applications to more advanced applications, including elaborate statistical modeling and machine learning (ML), simulation design, and big data situations. We begin with discussions on the most popular statistical software. In the course of these expositions, we provide some historical context for the computing environment, discuss the foundational principles for the development of the language (purpose), discuss user environments/workflows, and analyze strengths and shortcomings for the language (compared to other popular/notable statistical software), language support, among other software features.

Next, we briefly mention an array of software used for statistical applications. We discuss the specific purpose of each software and how the tool fills a need for data scientists. The aim here is to be fairly complete to provide a comprehensive viewpoint of the statistical software ecosystem and to leave readers with some familiarity with the most prevalent languages and software.

After the presentation of noteworthy software, we transition to describing a handful of emerging and promising statistical computing technologies. Our goal in these sections is to guide users who wish to be early adopters for a software application or readers facing a scale‐limiting aspect to their current statistical programming language. Some of the latest tools for big data statistical applications are discussed in these sections.

To orientate the reader to the discussion below, two tables are provided. Table 1includes a list of the software described in the chapter. Throughout, we discuss user environments and workflow considerations to provide practical guidance, aiming to increase efficiency and describe typical use cases. Table 2summarizes these environments included in the sections that follow.

1 User Development Environments

We begin by discussing user environments rather than focusing on specific statistical programming languages. The subsections below contain descriptions of some selected user development environments and related tools. This introductory material may be omitted if desired, and one can safely proceed to Section 2for descriptions of the most popular statistical software.

Table 1 Summary of selected statistical software.

Software	Open source	Classification	Style	Notes
Python	Y	Popular	Programming	Versatile, popular
R	Y	Popular	Programming	Academia/Industry, active community
SAS	N	Popular	Programming	Strong historical following
SPSS	N	Popular	GUI: menu, dialogs	Popular in scholarly work
C++	Y	Notable	Programming	Fast, low‐level
Excel	N	Notable	GUI: menu, dialogs	Simple, works well for rectangular data
GNU Octave	Y	Notable	Mixed	Open source counterpart to MATLAB
Java	Y	Notable	Programming	Cross‐platform, portable
JavaScript, Typescript	Y	Notable	Programming	Popular, cross‐platform
Maple	N	Notable	Mixed	Academia, algebraic manipulation
MATLAB	N	Notable	Mixed	Speedy, popular among engineers
Minitab	N	Notable	GUI: menu, dialogs	Suitable for teaching and simple analysis
SQL	Y	Notable	Programming	Necessary tool for databases
Stata	N	Notable	GUI: menu, dialogs	Popular in scholary works
Tableau	N	Notable	GUI: menu, dialogs	Popular for business analytics
Julia	Y	Promising	Programming	Speedy, underdeveloped
Scala	Y	Promising	Programming	Typed version of Java, less boilerplate code

Table 2 Summary of selected user environments/workflows.

Software	Virtual environment	Multiple languages	Remote integration	Notes
Emacs, Vim	N	Y	Y	Extensible, steep learning curve
Jupyter project	Y	Y	Y	Open source, interactive data science
RStudio	Y	Y	Y	Excellent at creating reproducible reports/docs

1.1 Extensible Text Editors: Emacs and Vim

GNU's text‐editor Emacs ( https://www.gnu.org/software/emacs/) is completely free software and offers a powerful solution to working with statistical software. Emacs (or EMACS) is an extensible and customizable text editor that could be used to complete the majority of all computer‐based tasks. Once a user learns the keyboard‐centric user interface through muscle memory, editing text for reports or coding becomes rapid and outpaces point‐and‐click style approaches. Emacs works on all major operating systems and gives near‐seamless interaction on Linux‐based computing clusters. The extensibility ensures that while the latest tools develop and change, your interface will remain constant. This quality will provide confidence to adopt new tools and adapt to new trends in software.

Using Emacs for specifically statistical computing, we note the excellent add‐on package called Emacs Speaks Statistics (ESS) that offers a unified user interface for R, S‐Plus, SAS, Stata, and OpenBUGS/JAGS, among other popular statistical packages. An easy‐to‐use package manager provides quick ESS installation. Once installed, a basic workflow would be to open an associated file type (.R,.Rmarkdown, etc.) to trigger ESS mode. In ESS mode, code is highlighted, tab completion enabled for rapid code generation and editing, and help documentation integrated. Code can be interactively evaluated in separate processes (e.g., a single or even multiple R sessions), or code can be run noninteractively through Emacs‐displayed shell processes. Statistical visualizations are displayed in separate windows for easy plot development. As mentioned above, one can work seamlessly on remote servers (using TRAMP mode). This greatly reduces the inefficiencies inherent to switching between local and remote machines.