LibCat » Книги » Приключения » unrecognised » Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Здесь есть возможность читать онлайн «Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Автор:
Daniel J. Denis
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Applied Univariate, Bivariate, and Multivariate Statistics Using Python: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Applied Univariate, Bivariate, and Multivariate Statistics Using Python»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A practical, “how-to” reference for anyone performing essential statistical analyses and data management tasks in Python Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Applied Univariate, Bivariate, and Multivariate Statistics Using Python — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Applied Univariate, Bivariate, and Multivariate Statistics Using Python», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

As an example, suppose we wanted to measure the VO2-maxof participants treated with a new COVID-19 medication vs. those not treated. VO2-max is essentially a measure of oxygen uptake during exercise of greater intensity (e.g. a Tour de France cyclist has better VO2-max than you and I). The VO2-max variable is the response, which is considered continuous, as a function of the independent variable treatment vs. control. For this, we are in the realm of z-testsor t-tests for means, or we could also perform an ANOVA on these variables. A regression analysis is also an option since we can operationalize the independent variable as a binary dummy-coded predictor. When we flip things around, such that the grouping variable is now the response and VO2-max is the predictor, we are in the realm of discriminant analysisor logistic regressionon two groups. Here, we would like to predict group membership based on the continuous predictor. Notice that these models are answering different research questions, but at their core, it stands that they must have great technical similarity. As we will see as we progress, indeed they do. Within a t-test, for example, can be considered, at least on a conceptual level, to house a very primitive discriminant function!When doing a t -test, we don’t “see” the idea of a discriminant function simply because it is not a question we are asking. Nonetheless, it is there in concept underlying the technique. Once you understand the commonality of what underlies virtually all of these models, they will quickly lose their mystery. You will be less inclined to survey a decision-tree using statistical methods and see different procedures. What you will rather see is one larger model with special cases and peculiarities in each method.

картинка 8 Most statistical models, even if used for different research purposes and to answer different research questions, are quite technically similar at their core. One of the goals of learning and understanding statistical modeling is to grasp as quickly as possible this similarity so that you realize that differences in approach often have more to do with differences in research questions rather than differences in underlying technical details .

1.6.1 Continuity Is Not Always Clear-Cut

Having explained the distinction between continuity and discreteness at a mathematical level, at times it can be quite difficult to turn these distinctions into practice. Since, as mentioned, there are no measurably truly continuous variables in a practical sense, the question then becomes when to consider a variable as continuous or not. After all, the number of coins in my pocket can hardly be considered a continuous variable. However, for the number of coins in the entire United States, we might get away with treating the variable as continuous, even if it is not. There are so many coins that computing such things as average number of coins, a measure that assumes continuity, is not that farfetched. Even census data often reports continuous measures on otherwise discrete variables. “The average number of members per household is 3.4” the census may report. Obviously, this is nonsensical since fractions of household members cannot exist! However, since it is convenient to use an arithmetic mean to describe such things, we are implicitly treating the variable as somewhat continuous. The key take-away point is to always inquire about the data that you are computing measures of central tendency or variation on. Do not assume that because the variable is being treated as somewhat continuous in statistical computation that it is in fact continuous in its true nature. It is best to start with the premise that continuity is a theoretical entityand then see how far from that the presumably “continuous” measured research variable veers from it.

1.7 Using Abstract Systems to Describe Physical Phenomena: Understanding Numerical vs. Physical Differences

One of the key starting points to using and applying statistics to real phenomena is to understand and appreciate the difference between the tool you are using and the “stuff” you are applying it to. They are often not one-to-one. Simply because we represent a difference numerically does not imply that the difference exists on a physical level. Making this distinction is extremely important, especially in today’s age where everything is about “data” and hence it is simply taken for granted that what we choose to measure is “real” and our measuring tool and system can capture such differences. In some cases, it can, but in others, automatically equating numerical differences with actual substantive differences is foolish.

As an example, suppose I developed a questionnaire to assess your degree of pizza preference. Suppose I scaled the questionnaire from 0 to 10, where “0” indicates a dislike for pizza and “10” indicates a strong preference. Suppose you circle “7” as your choice and your friend circles “5.” Does that mean you prefer pizza more than your friend? Not necessarily. Simply because you have selected a higher number may not mean you enjoy pizza more. It may simply mean you selected a higher number. The measured distance between 5 and 7 may not equate to an actual difference in pizza preference.

Scales of measurement(Stevens, 1946) have been developed to try to highlight these and other issues, but, as we will see, they are far from adequate in solving the measurement problem. Everything we measure is based on a scale. We attempt to capture the phenomena and assign a numerical measurement to it. A nominal scaleis one in which labels are simply given to values of the variable. For example, “short” vs. “tall” when measuring height would represent a variable measurable on a nominal scale. However, we can do better. Since “tall” presumably contains more height than “short,” we can say tall > short(i.e. tall is greater than short) and assign the variable measurable on an ordinal scale. The next level of measurement is that of an interval scalein which distances between values on the scale are presumed to be equal. For example, the difference in the number of coins in my pocket from 0 to 5 is the same distance between the number of coins from 5 to 10. If the scale has an absolute zero point, meaning that a measurement of “0” actually means “zero coins,” then the scale takes on the extra property of being a ratio scale.

A lot has been made of scales of measurement historically. Their importance is probably overstated in the literature. Where they are especially useful is in helping the researcher understand and better appreciate that simply because they obtain a number to represent something, or a difference in numbers to represent a difference in that “something,” it does not necessarily mean a precise correspondence between numbers and reality has occurred. In many social sciences especially, assuming that such a correspondence exists is a very unrealistic idea. True, that a difference in weight from 100 pounds to 150 pounds represents the same distance as between 150 to 200 pounds, both numerically and physically, for many social variables this correspondence is likely to simply not exist, or, at a minimum, be tremendously difficult to justify. What is more, associating change on an x -axis with change on a y -axis can be done quite easily numerically, but whether it means something physically is an entirely different question.