LibCat » Книги » Приключения » unrecognised » Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Здесь есть возможность читать онлайн «Daniel J. Denis - Applied Univariate, Bivariate, and Multivariate Statistics Using Python» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Автор:
Daniel J. Denis
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Applied Univariate, Bivariate, and Multivariate Statistics Using Python: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Applied Univariate, Bivariate, and Multivariate Statistics Using Python»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A practical, “how-to” reference for anyone performing essential statistical analyses and data management tasks in Python Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Applied Univariate, Bivariate, and Multivariate Statistics Using Python — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Applied Univariate, Bivariate, and Multivariate Statistics Using Python», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

The problem, succinctly put, is that in many sciences, and contrary to the opinion you might expect from someone writing a data analysis text, students learn too much on how to obtain output at the expense of understanding what the output means or the process that is important in drawing proper scientific conclusions from said output. Sadly, in many disciplines, a course in “Statistics” would be more appropriately, and unfortunately, called “ How to Obtain Software Output,” because that is pretty much all the course teaches students to do. How did statistics education in applied fields become so watered down?Since when did cultivating the art of analytical or quantitative thinking not matter? Faculty who teach such courses in such a superficial style should know better and instead teach courses with a lot more “statistical thinking” rather than simply generating software output. Among students (who should not necessarily know better – that is what makes them students), there often exists the illusion that simply because one can obtain output for a multiple regression, this somehow implies a multiple regression was performed correctly in line with the researcher’s scientific aims. Do you know how to conduct a multiple regression? “Yes, I know how to do it in software.” This answer is not a correct answer to knowing how to conduct a multiple regression!One need not even understand what multiple regression is to “compute one” in software. As a consultant, I have also had a client or two from very prestigious universities email me a bunch of software output and ask me “Did I do this right?” assuming I could evaluate their code and output without first knowledge of their scientific goals and aims. “Were the statistics done correctly?” Of course, without an understanding of what they intended to do or the goals of their research, such a question is not only figuratively, but also literally impossible to answeraside from ensuring them that the software has a strong reputation for accuracy in number-crunching.

This overemphasis on computation, software or otherwise, is not right, and is a real problem, and is responsible for many misuses and abuses of applied statistics in virtually every field of endeavor. However, it is especially poignant in fields in the social sciences because the objects on which the statistics are computed are often statisticalor psychometric entitiesthemselves, which makes understanding how statistical modeling works even more vital to understanding what can vs. what cannot be concluded from a given statistical analysis. Though these problems are also present in fields such as biology and others, they are less poignant, since the reality of the objects in these fields is usually more agreed upon. To be blunt, a t -test on whether a COVID-19 vaccine works or not is not too philosophically challenging. Finding the vaccine is difficult science to be sure, but analyzing the results statistically usually does not require advanced statistics. However, a regression analysis on whether social distancing is a contributing factor to depression rates during the COVID-19 pandemic is not quite as easy on a methodological level. One is so-called “hard science” on real objects, the other might just end up being a statistical artifact. This is why social science students, especially those conducting non-experimental research, need rather deep philosophical and methodological training so they do not read “too much” into a statistical result, things the physical scientist may never have had to confront due to the nature of his or her objects of study. Establishing scientific evidence and supporting a scientific claim in many social (and even natural) sciences is exceedingly difficult, despite the myriad of journals accepting for publication a wide variety of incorrect scientific claims presumably supported by bloated statistical analyses. Just look at the methodological debates that surrounded COVID-19, which is on an object that is relatively “easy” philosophically! Step away from concrete science, throw in advanced statistical technology and complexity, and you enter a world where establishing evidence is philosophical quicksand. Many students who use statistical methods fall into these pits without even knowing it and it is the instructor’s responsibility to keep them grounded in what the statistical method can vs. cannot do. I have told students countless times, “No, the statistical method cannot tell you that; it can only tell you this.”

Hence, for the student of empirical sciences, they need to be acutely aware and appreciative of the deeper issues of conducting their own science. This implies a heavier emphasis on not how to conduct a billion different statistical analyses, but on understanding the issues with conducting the “basic” analyses they are performing. It is a matter of fact that many students who fill their theses or dissertations with applied statistics may nonetheless fail to appreciate that very little of scientific usefulness has been achieved. What has too often been achieved is a blatant abuse of statistics masquerading as scientific advancement. The student “bootstrapped standard errors” (Wow! Impressive!), but in the midst of a dissertation that is scientifically unsound or at a minimum very weak on a methodological level.

A perfect example to illustrate how statistical analyses can be abused is when performing a so-called “ mediation” analysis (you might infer by the quotation marks that I am generally not a fan, and for a very good reason I may add). In lightning speed, a student or researcher can regress Y on X, introduce Z as a mediator, and if statistically significant, draw the conclusion that “Z mediates the relationship between Y and X.” That’s fine, so long as it is clearly understood that what has been established is statistical mediation(Baron and Kenny, 1986), and not necessarily anything more. To say that Z mediates Y and X, in a real substantivesense, requires, of course, much more knowledge of the variables and/or of the research context or design. It first and foremost requires defining what one means by “mediation” in the first place. Simply because one computes statistical mediation does not, in any way whatsoever, justify somehow drawing the conclusion that “ X goes through Z on its way to Y, ”or anything even remotely similar. Crazy talk! Of course, understanding this limitation should be obvious, right? Not so for many who conduct such analyses. What would such a conclusion even mean? In most cases, with most variables, it simply does not even make sense, regardless of how much statistical mediation is established. Again, this should be blatantly obvious, however many students (and researchers) are unaware of this, failing to realize or appreciate that a statistical model cannot, by itself, impart a “process” onto variables. All a statistical model can typically do, by itself, is partition variability and estimate parameters.Fiedler et al. (2011) recently summarized the rather obvious fact that without the validity of prior assumptions, statistical mediation is simply, and merely, variance partitioning. Fisher, inventor of ANOVA (analysis of variance), already warned us of this when he said of his own novel (at the time) method that ANOVA was merely a way of “ arranging the arithmetic.” Whether or not that arrangement is meaningful or not has to come from the scientist and a deep consideration of the objects on which that arrangement is being performed. This idea, that the science matters more than the statistics on which it is applied, is at risk of being lost, especially in the social sciences where statistical models regularly “run the show” (at least in some fields) due to the difficulty in many cases of operationalizing or controlling the objects of study.