Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A Beginner’s Guide to Advanced Data Analysis
Daniel J. Denis
This edition first published 2021
© 2021 by John Wiley and Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Daniel J. Denis to be identified as the author of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-119-57814-7 (hardback)
ISBN 978-1-119-57817-8 (ePDF)
ISBN 978-1-119-57818-5 (ePub)
ISBN 978-1-119-57820-8 (oBook)
Cover image: © Photographer/Getty Images
Cover design by Wiley
Set in 9.5/12.5 STIXTwoText by Integra Software Services, Pondicherry, India
To Kaiser
1 Cover
2 Title page Applied Univariate, Bivariate, and Multivariate Statistics Using Python A Beginner’s Guide to Advanced Data Analysis Daniel J. Denis
3 Copyright
4 Dedication
5 Preface
6 Chapter 1: A Brief Introduction and Overview of Applied Statistics 1.1 How Statistical Inference Works1.2 Statistics and Decision-Making1.3 Quantifying Error Rates in Decision-Making: Type I and Type II Errors1.4 Estimation of Parameters1.5 Essential Philosophical Principles for Applied Statistics1.6 Continuous vs. Discrete Variables1.6.1 Continuity Is Not Always Clear-Cut1.7 Using Abstract Systems to Describe Physical Phenomena: Understanding Numerical vs. Physical Differences1.8 Data Analysis, Data Science, Machine Learning, Big Data1.9 “Training” and “Testing” Models: What “Statistical Learning” Means in the Age of Machine Learning and Data Science1.10 Where We Are Going From Here: How to Use This BookReview Exercises
7 Chapter 2: Introduction to Python and the Field of Computational Statistics 2.1 The Importance of Specializing in Statistics and Research, Not Python: Advice for Prioritizing Your Hierarchy2.2 How to Obtain Python2.3 Python Packages2.4 Installing a New Package in Python2.5 Computing z-Scores in Python2.6 Building a Dataframe in Python: And Computing Some Statistical Functions2.7 Importing a .txt or .csv File2.8 Loading Data into Python2.9 Creating Random Data in Python2.10 Exploring Mathematics in Python2.11 Linear and Matrix Algebra in Python: Mechanics of Statistical Analyses2.11.1 Operations on Matrices2.11.2 Eigenvalues and EigenvectorsReview Exercises
8 Chapter 3: Visualization in Python: Introduction to Graphs and Plots 3.1 Aim for Simplicity and Clarity in Tables and Graphs: Complexity is for Fools!3.2 State Population Change Data3.3 What Do the Numbers Tell Us? Clues to Substantive Theory3.4 The Scatterplot3.5 Correlograms3.6 Histograms and Bar Graphs3.7 Plotting Side-by-Side Histograms3.8 Bubble Plots3.9 Pie Plots3.10 Heatmaps3.11 Line Charts3.12 Closing ThoughtsReview Exercises
9 Chapter 4: Simple Statistical Techniques for Univariate and Bivariate Analyses 4.1 Pearson Product-Moment Correlation4.2 A Pearson Correlation Does Not (Necessarily) Imply Zero Relationship4.3 Spearman’s Rho4.4 More General Comments on Correlation: Don’t Let a Correlation Impress You Too Much!4.5 Computing Correlation in Python4.6 T-Tests for Comparing Means4.7 Paired-Samples t-Test in Python4.8 Binomial Test4.9 The Chi-Squared Distribution and Goodness-of-Fit Test4.10 Contingency TablesReview Exercises
10 Chapter 5: Power, Effect Size, P-Values, and Estimating Required Sample Size Using Python 5.1 What Determines the Size of a P-Value?5.2 How P-Values Are a Function of Sample Size5.3 What is Effect Size?5.4 Understanding Population Variability in the Context of Experimental Design5.5 Where Does Power Fit into All of This?5.6 Can You Have Too Much Power? Can a Sample Be Too Large?5.7 Demonstrating Power Principles in Python: Estimating Power or Sample Size5.8 Demonstrating the Influence of Effect Size5.9 The Influence of Significance Levels on Statistical Power5.10 What About Power and Hypothesis Testing in the Age of “Big Data”?5.11 Concluding Comments on Power, Effect Size, and Significance TestingReview Exercises
11 Chapter 6: Analysis of Variance 6.1 T-Tests for Means as a “Special Case” of ANOVA6.2 Why Not Do Several t-Tests?6.3 Understanding ANOVA Through an Example6.4 Evaluating Assumptions in ANOVA6.5 ANOVA in Python6.6 Effect Size for Teacher6.7 Post-Hoc Tests Following the ANOVA F-Test6.8 A Myriad of Post-Hoc Tests6.9 Factorial ANOVA6.10 Statistical Interactions6.11 Interactions in the Sample Are a Virtual Guarantee: Interactions in the Population Are Not6.12 Modeling the Interaction Term6.13 Plotting Residuals6.14 Randomized Block Designs and Repeated Measures6.15 Nonparametric Alternatives6.15.1 Revisiting What “Satisfying Assumptions” Means: A Brief Discussion and Suggestion of How to Approach the Decision Regarding Nonparametrics6.15.2 Your Experience in the Area Counts6.15.3 What If Assumptions Are Truly Violated?6.15.4 Mann-Whitney U Test6.15.5 Kruskal-Wallis Test as a Nonparametric Alternative to ANOVAReview Exercises
Читать дальше