LibCat » Книги » Приключения » unrecognised » Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Здесь есть возможность читать онлайн «Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Автор:
Daniel J. Denis
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Applied Univariate, Bivariate, and Multivariate Statistics Using Python: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Applied Univariate, Bivariate, and Multivariate Statistics Using Python»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A practical, “how-to” reference for anyone performing essential statistical analyses and data management tasks in Python Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Applied Univariate, Bivariate, and Multivariate Statistics Using Python — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Applied Univariate, Bivariate, and Multivariate Statistics Using Python», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

As an example, in chemistry and nutrition, the oxidative stabilityof an oil is a measure of how quickly the oil starts to degrade when heated and exposed to light. Presumably, consumers would prefer, on this basis, an oil with more oxidative stability than less (frying at very high temperatures can apparently degrade the oil). In a recent study (Guillaume and Ravetti, 2018), it was found that the oxidative stability for olive oil was higher than the oxidative ability of, say, sunflower oil. Hence, one might be tempted to select olive oil instead of sunflower oil on this basis. However, does the difference in oxidative values translate into anything meaningful, or is it simply a measure of numerical difference that for all purposes is somewhat academic? Olive oil may be more stable, but is that “more” amount really worth not using sunflower oil if you indeed prefer sunflower? It’s very easy when analyzing and interpreting data to fall into the ranking trap, where simply because one element ranks higher than another falsely implies a pragmatic or even meaningful increase on a physical level. The headline may be that “Olive oil is #1,” but is #10 practically pretty much the same anyway, or is the utilityof the difference in oils enough to influence one’s decision? The ranking differences may be inconsequential to the decision. For example, if I told you your primary doctor ranked 100th out of 100 individuals graduating out of his or her graduating class, you may at first assume your doctor is not very good. However, the differences between ranking quantities may be extremely slight or so small when translated on a practical level to not matter at all or, at minimum, be negligible. Differences may even be due to measurement error and hence not exist beyond chance. Likewise, the pilot of your aircraft may be virtually as competent as the best pilot out there, but still ranks lower on an imperfect measure. Do not simply assume that the numerical change in what is being assessed represents a meaningful difference when applied to change on a scientific (as opposed to numerical) level. Numerical differences do not necessarily equate to equivalent physical changes. Instead of being eager to include a bunch of measures into your thesis, dissertation or publication, a good idea might be to work on, and deeply validate, what is being measured in the first place.Can something like self-esteem be measured? That is not a small or inconsequential question. You can pick up an existing questionnaire that purports to measure it or you can first critically evaluate whether it is something measurable at all. Regardless of whether we can correlate it with an existing measure does not provide fundamental validity. It only provides statistical validity. The ultimate psychometric issuemay still remain. For instance, how will you convince your committee that what you have measured is actually a good measure of self-esteem?

1.8 Data Analysis, Data Science, Machine Learning, Big Data

In recent years, the “data explosion” has gripped much of science, business, and almost every field of inquiry. Thanks to computers and advanced data warehouse capacities that could have only been dreamt of in years past (and will seem trivial in years to come), the “data deluge” is officially upon us. The facility by which statistical and software analyses can be conducted has increased dramatically. New designations for quantitative analyses often fall under the names of data scienceand machine learning, and because data is so cheap to store, many corporations, both academic and otherwise, can collect and store massive amounts of data – so much so, that analysis of such data sometimes falls under the title of “ big data.” For example, world population data regarding COVID-19 were analyzed in an attempt to spot trends in the virus across age groups, extent of comorbidity with other illnesses, among other things. Such analyses are usually done on very large and evolving databases. The mechanisms for storing and accessing such data are, rightly so, not truly areas of “statistics” per say, and have more to do with data engineeringand the like. The field of machine learning, an area primarily in computer science, is an emerging area that emphasizes modern software technology in analyzing data, deciphering trends, and visually depicting results via advanced and sophisticated graphics. As you venture further into data analysis in general, some of the algorithms you may use may come from this field.

Though the fields of data science, machine learning, and other allied fields are relatively new and exciting, it is nonetheless important for the reader to not simply and automatically associate new words with necessarily new “things.” Human beings are creatures of psychological association, and so when we hear of a new term, we often create a new categoryfor that term in our minds, and we assume that since there is a new word, there must be an equivalent new category. However, that does not necessarily imply the new association we have created is one-to-one with the reality of the object. The new vehicle promoted by a car company may be an older design “updated” rather than an entirely new vehicle. Hence, when you hear new terminology in quantitative areas, it is imperative that you never stop with the word alone, but instead delve in deeper to see what is actually “there” in terms of new substance. Why is this approach to understanding important? It is important because otherwise, especially as a newcomer to these areas, you may come to believe that what you are studying is entirely novel. Indeed, it may be “new,” but it may not be as novel or categorically different from the “old” as you may at first think. Likewise, humanistic psychology of the 1950s was not entirely new. The Greeks had very similar ideas. The marketing was new, but the ideas were generally not.

As an example, suppose you are fitting a model to data in machine learning and are concerned about overfittingthe model to your data, which, in general, means you are fitting a functional form that too closely matches up to the obtained data you have, potentially allowing for poor replication and generalizability if attempted. You may read about overfitting in a machine learning book and believe the concept applies to machine learning. That is, you may believe overfitting is a property of machine learning models only! How false! While it is a term often used in machine learning, it definitely is not a term specific to the field. Historically, not only has the term and concept of overfitting been used in statistics, but prior to the separation of statistics from mathematics, examples of scientists being concerned about overfitting are scattered throughout history! Hence, “overfitting” is not a concept unique to the field from which you are learning, no more than algorithmsare unique to computer science. Historically, algorithms have existed forever, and even the Babylonians were using primitive algorithms (Knuth, 1972). If you are not at least somewhat aware of history, you may come to believe new words and terms necessarily imply new “things.” The concept is usually old news, however. That does not imply the new use of the word is not at least somewhat unique and that it is not being applied to a new algorithm (for example). However, it is likely that the concept has existed well before the word was paired with the thing it is describing in a given field. Likewise, if you believe that support vector machineshave anything to do with machines(and I have had students assume there must be a “machine” component within its mathematics!), you need to remember that words are imperfect descriptors for what is actually there. Indeed, much of languagein general is nothing more than approximations to what we truly wish to communicate. As any linguist will tell you, language is far from a precise method of communication, but it is often the best we can do. Likewise, with music, a series of notes played on the piano with the goal of communicating a sentiment or emotional quality will necessarily not do so perfectly. It is an approximation. But how awkward it would be for the musician to follow up his or her performance with “What I meant to say was …” or “What is really behind those notes is …” Notions of machine learning, data science, statistics, mathematics, all conjure up associations, but you need to unpack and unravel those associations if you are to understand what is really there. In other words, just as an abstract numerical system may not perfectly coincide with the representation of physical phenomena, so it is true that an abstract linguistic system (of which you might say numerical systems might be a special case) rarely coincides perfectly with the objects it seeks to describe.