Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

3.8 MATLAB, GNU Octave

MATLAB began as FORTRAN subroutines for solving linear (LINPACK) and eigenvalue (EISPACK) problems. Cleve Moler developed most of the subroutines in the 1970s for use in the classroom. MATLAB quickly gained popularity, primarily through word of mouth. Developers rewrote MATLAB in C during the 1980s, adding speed and functionality. The parent company of MATLAB, the Mathworks, Inc., was created in 1984, and MATLAB has since become a fully featured tool that is often used in engineering and developer fields where integration with sensors and controls is a primary concern.

MATLAB has a substantial user base in government, academia, and the private sector. The MATLAB base distribution allows reading/writing data in ASCII, binary, and MATLAB proprietary formats. The data are presented to the user as an array, the MATLAB generic term for a matrix. The base distribution comes with a standard set of mathematical functions including trigonometric, inverse trigonometric, hyperbolic, inverse hyperbolic, exponential, and logarithmic. In addition, MATLAB provides the user with access to cell arrays , allowing for heterogeneous data across the cells and creation analogous to a C/C картинка 258. MATLAB provides the user with numerical methods, including optimization and quadrature functions.

A highly similar yet free and open‐sourced programming language is GNU Octave. Octave offers many if not all features of the core MATLAB distribution, although MATLAB has many add‐on packages for which Octave has no equivalent, and that may prompt a user to choose MATLAB over Octave. We caution analysts against using MATLAB/Octave as their primary statistical computing solution as MATLAB's popularity is diminishing [4] – likely due to open‐source, more fully featured competitors such as R and Python.

3.9 Minitab®

Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner created Minitab in 1972 at the Pennsylvania State University to teach statistics. Now, Minitab Inc. owns the proprietary software. Academia and industry widely employ Minitab 4. The intuitive point‐and‐click design and spreadsheet‐like interface allow users to analyze data with little learning curve. Minitab feels like Excel, but with many more advanced features. This greatly reduces the Minitab learning curve compared to more flexible programming environments.

Minitab offers import tools and a comprehensive set of statistical capabilities. Minitab's features include basic statistics, ANOVA, fixed and mixed models, regression analyses, measurement systems analysis, and graphics including contour and rotating 3D plots. A full feature list resides at http://www.minitab.com/en‐us/products/minitab/features‐list/. For advanced users, a command‐line editor exists. Within the editor, users may customize macros (functions).

Minitab serves its user base well and will continue to be viable in the future. For teaching academics, Minitab provides near immediate access to many statistical methods and graphics. For industry, Minitab offers tools to produce standardized analyses and reports with little training. However, Minitab's flexibility and big data capabilities are limited.

3.10 Workload Managers: SLURM/LSF

Working on shared computing clusters has become commonplace in contemporary data science applications. Some working knowledge of workload managing programs (aka schedulers ) is essential to running statistical software in these environments. Two popular workload managers are SLURM ( https://slurm.schedmd.com/documentation.html) and IBM's platform load sharing facility (LSF), another popular workload management platform for distributed high‐performance computing. These schedulers can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. A user would typically interface with a scheduling program via a command line tool or through a scripting language. The user specifies the hardware resources and program inputs. The scheduler then distributes the work across resources, and jobs are run based on system‐prioritization schemes. In such a way, hundreds or even thousands of programs can be run in parallel, increasing the scale of statistical computations possible within a reasonable time frame. For example, simulations for a novel statistical method could require many thousands of runs at various configurations, and this could be done in days rather than months.

3.11 SQL

Structured Query Language (SQL) is the standard language for relationship database management systems. While not strictly a statistical computing environment, the ability to query databases through SQL is an essential skill for data scientists. Nearly all companies seeking a data scientist require SQL knowledge as much of an analyst's job is extracting, transforming, and loading data from an established relational database.

3.12 Stata®

Stata is commercial statistical software, developed by William Gould in 1985. StatCorp currently owns/develops Stata and markets the product as “fast, accurate, and easy to use with both a point‐and‐click interface and a powerful, intuitive command syntax” ( https://www.stata.com/). However, most Stata users maintain the point‐and‐click workflow. Stata strives to provide user confidence through regulatory certification.

Stata provides hundreds of tools across broad applications and methods. Even Bayesian modeling and maximum‐likelihood estimation are available. With its breadth, Stata targets all sectors – academia, industry, and government.

Overall, Stata impresses through active support and development while possessing some unique characteristics. Interestingly, in scholarly work over the past decade, only SPSS, R, and SAS have overshadowed Stata [4]. Taken together, we anticipate Stata to remain popular. However, Stata's big data capabilities are limited, and we have reservations whether industry will adopt Stata over competitors.

3.13 Tableau®

Tableau stemmed from visualization research by Stanford University's computer science department in 1999. The Seattle‐based company was founded in 2003. Tableau advertises itself as a data exploration and visualization tool, not a statistical software per se . Tableau targets the business intelligence market primarily. However, Tableau provides a free, less powerful version for instruction.

Tableau is versatile and user‐friendly: providing MacOS and Windows versions while supporting web‐based apps on iOS and Android. Tableau connects seamlessly to SQL databases, spreadsheets, cloud apps, and flat files. The software appeals to nontechnical “business” users via its intuitive user interface but also allows “power users” to develop analytical solutions by connecting to an R server or installing TabPy to integrate Python scripts.

Tableau could corner the data visualization market with its easy‐to‐learn interface, yet intricate features. We contend that big data demands visualization as many traditional methods are not well suited for high‐dimensional, observational data. Based on its unique characteristics, Tableau will appeal broadly and could even emerge as a useful tool to supplement an R or Python user's toolkit.

4 Promising and Emerging Statistical Software

With a forward‐thinking mindset, our final section describes a few emerging and promising statistical software languages/packages that have the ability to meet tomorrow's complex modeling demands. If a reader encounters scalability challenges in their current statistical programming language, one of the following options may turn a computationally infeasible model into a useful one.

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Computational Statistics in Data Science»

Представляем Вашему вниманию похожие книги на «Computational Statistics in Data Science» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Computational Statistics in Data Science»

Обсуждение, отзывы о книге «Computational Statistics in Data Science» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x