LibCat » Книги » Приключения » unrecognised » Computational Statistics in Data Science

Computational Statistics in Data Science

Здесь есть возможность читать онлайн «Computational Statistics in Data Science» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Computational Statistics in Data Science
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Computational Statistics in Data Science: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Computational Statistics in Data Science»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

An essential roadmap to the application of computational statistics in contemporary data science
Computational Statistics in Data Science
Computational Statistics in Data Science
Wiley StatsRef: Statistics Reference Online
Computational Statistics in Data Science

Computational Statistics in Data Science — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Computational Statistics in Data Science», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

6 Concluding Remarks

We attempted to evaluate the current statistical software landscape. Admittedly, our treatment has been focused by our experience. We have, however, sought to be fair in our appraisal and provide the burgeoning statistical programmer the information required to make strong tool selection choices and increase their performance. We begin by in‐depth discussions of the most‐popular statistical software, followed by brief descriptions of many other noteworthy tools, and then finally highlighted a handful of emerging statistical software. We hope that this organization is useful, but note that it is solely based on our experiences and informal popularity studies [4]. We also provided a limited prognostication with regard to the statistical software future by identifying issues and applications likely to shape software development. We realize, of course, that the future is usually full of surprises and only time will tell what actually occurs.

Acknowledgments

The work of the two authors, AG Schissler and A Knudson, was partially supported by the NIH grant (1U54GM104944) through the National Institute of General Medical Sciences (NIGMS) under the Institutional Development Award (IDeA) program. The authors thank the Wiley staff and editor of this chapter, Dr Walter W. Piegorsch, for their expertise and support.

References

1 1 R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

2 2 Venables, W. and Ripley, B.D. (2013) S Programming, Springer Science & Business Media, New York, NY, USA.

3 3 Gentleman, R.C., Carey, V.J., Bates, D.M., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol., 5 (10), R80.

4 4 Muenchen, R.A. (2019) The Popularity of Data Science Software, r4stats.com/articles/popularity.

5 5 Oliphant, T.E. (2006) A Guide to NumPy, vol. 1, Trelgol Publishing, Provo, UT, USA, p. 85.

6 6 Jones, E., Oliphant, T., and Peterson, P. (2001) SciPy: open source scientific tools for Python.

7 7 McKinney, W. (2011) pandas: a foundational Python library for data analysis and statistics. Python High Performance Sci. Comput., 14 (9), 1–9.

8 8 Seabold, S. and Perktold, J. (2010) Econometric and Statistical Modeling with Python Skipper Seabold 1 1. Proceedings of the 9th Python in Science Conference, vol. 57, p. 61.

9 9 Hunter, J.D. (2007) Matplotlib: a 2D graphics environment. Comput. Sci. Eng., 9 (3), 90–95.

10 10 Thomas, A., Spiegelhalter, D.J., and Gilks, W.R. (1992) BUGS: a program to perform Bayesian inference using Gibbs sampling. Bayesian Stat., 4 (9), 837–842.

11 11 Plummer, M. (2005) JAGS: just another Gibbs sampler. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria.

12 12 Intel (2007) Intel® Math Kernel Library Reference Manual, https://software.intel.com/en‐us/mkl.

13 13 Whaley, R.C. and Petitet, A. (2005) Minimizing development and maintenance costs in supporting persistently optimized BLAS. Softw. Pract. Exp., 35 (2), 101–121.

14 14 Xianyi, Z., Qian, W., and Chothia, Z. (2012) OpenBLAS, p. 88, http://xianyi.github.io/OpenBLAS.

15 15 Anderson, E., Bischof, C., Demmel, J., et al. (1990) Prospectus for an Extension to LAPACK. Working Note ANL‐90‐118, Argonne National Laboratory.

16 16 Guennebaud, G., et al. (2010) Eigen v3.

17 17 Sanderson, C., and Curtin, R. (2016) Armadillo: a template‐based C++ library for linear algebra. J. Open Source Softw., 1 (2), 26.

18 18 Iglberger, K., Hager, G., Treibig, J., and Rüde, U. (2012) High Performance Smart Expression Template Math Libraries. 2012 International Conference on High Performance Computing and Simulation (HPCS) (pp. 367–373) IEEE.

19 19 Dagum, L., and Menon, R. (1998) OpenMP: an industry standard API for shared‐memory programming. IEEE Comput. Sci. Eng., 5 (1), 46–55.

20 20 Heller, T., Diehl, P., Byerly, Z., et al. (2017) Hpx‐An Open Source C++ Standard Library for Parallelism and Concurrency. Proceedings of OpenSuCo, p. 5.

21 21 Frank, E., Hall, M.A., and Witten, I.H. (2016) The WEKA Workbench, Morgan Kaufmann, Burlington, MA.

22 22 Raff, E. (2017) JSAT: Java statistical analysis tool, a library for machine learning. J. Mach. Learn. Res., 18 (1), 792–796.

23 23 Abadi, M., Agarwal, A., Barham, P., et al. (2015) TensorFlow: large‐scale machine learning on heterogeneous systems.

24 24 Zaharia, M., Xin, R.S., Wendell, P., et al. (2016) Apache spark: a unified engine for big data processing. Commun. ACM, 59 (11), 56–65.

25 25 Meng, X., Bradley, J., Yavuz, B., et al. (2016) Mllib: machine learning in Apache Spark. J. Mach. Learn. Res., 17 (1), 1235–1241.

26 26 Bostock, M., Ogievetsky, V., and Heer, J. (2011) D3 data‐driven documents. IEEE Trans. Vis. Comput. Graph., 17 (12), 2301–2309.

27 27 Bezanson, J., Karpinski, S., Shah, V.B., and Edelman, A. (2012) Julia: a fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145.

28 28 Carpenter, B., Gelman, A., Hoffman, M.D., et al. (2017) Stan: a probabilistic programming language. J. Stat. Softw., 76 (1), 1–32.

Further Reading

1 de Leeuw, J. (2009) Journal of Statistical Software, Wiley Interdiscip. Rev. Comput. Stat., 1 (1), 128–129.

3 An Introduction to Deep Learning Methods

Yao Li1, Justin Wang2, and Thomas C. M. Lee2

1University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

2University of California at Davis, Davis, CA, USA

1 Introduction

Many models in the field of machine learning, such as deep neural networks (DNNs) and graphical models, are naturally represented in a layered network structure. The more layers we use in such models, the more complex the functions that are able to be represented. However, models with many layers are difficult to estimate optimally, and thus those in the machine learning field have generally opted to restrict their model to fewer layers, trading model expressivity for simplicity [1]. Deep learning explores ways to effectively train models with many hidden layers in order to retain the model's expressive powers. One of the most effective approaches to deep learning has been proposed by Hinton and Salakhutdinov [2]. Traditionally, estimating the parameters of network‐based models involves an iterative algorithm with the initial parameters being randomly chosen. Hinton's proposed method involves pretraining , or deliberately presetting in an effective manner, the parameters of the model as opposed to randomly initializing them. In this chapter, we review the architectures and properties of DNNs and discuss their applications.

We first briefly discuss the general machine learning framework and basic machine learning methodology in Section 2. We then discuss feedforward neural networks and backpropagation in Section 3. In Section 4, we explore convolutional neural networks (CNNs), the type of architectures that are usually used in computer vision. In Section 5, we discuss autoencoders, the unsupervised learning models that learn latent features without labels. In Section 6, we discuss recurrent neural networks (RNNs), which can handle sequence data.

2 Machine Learning: An Overview

2.1 Introduction

Machine learning is a field focusing on the design and analysis of algorithms that can learn from data [3]. The field originated from artificial intelligence research in the late 1950s, developing independently from statistics. However, by the early 1990s, machine learning researchers realized that a lot of statistical methods could be applied to the problems they were trying to solve. Modern machine learning is an interdisciplinary field that encompasses theory and methodology from both statistics and computer science.