203 183
204 184
205 185
206 186
207 187
208 188
209 189
210 190
211 191
212 192
213 193
214 194
215 195
216 196
217 197
218 198
219 199
220 200
221 201
222 202
223 203
224 204
225 205
226 206
227 207
228 208
229 209
230 210
231 211
232 212
233 213
234 214
235 215
236 216
237 217
238 218
239 219
240 220
241 221
242 222
243 223
244 224
245 225
246 226
247 227
248 228
249 229
250 230
251 231
252 232
253 233
254 234
255 235
256 236
257 237
258 238
259 239
260 240
261 241
262 242
263 243
264 244
265 245
266 246
267 247
268 248
269 249
270 250
271 251
272 252
273 253
274 254
275 255
276 256
277 257
278 258
279 259
280 260
281 261
282 262
283 263
284 264
285 265
286 266
287 267
288 268
289 269
290 270
291 271
292 272
293 273
294 274
295 275
296 276
297 277
298 278
This book is an elementary beginner’s introduction to applied statistics using Python. It for the most part assumes no prior knowledge of statistics or data analysis, though a prior introductory course is desirable. It can be appropriately used in a 16-week course in statistics or data analysis at the advanced undergraduateor beginning graduatelevel in fields such as psychology, sociology, biology, forestry, education, nursing, chemistry, business, law, and other areas where making sense of data is a priority rather than formal theoretical statistics as one may have in a more specialized program in a statistics department. Mathematics used in the book is minimal and where math is used, every effort has been made to unpack and explain it as clearly as possible. The goal of the book is to obtain results using software rather quickly, while at the same time not completely dismissing important conceptual and theoretical features. After all, if you do not understand what the computer is producing, then the output will be quite meaningless. For deeper theoretical accounts, the reader is encouraged to consult other sources, such as the author’s more theoretical book, now in its second edition (Denis, 2021), or a number of other books on univariate and multivariate analysis (e.g., Izenman, 2008; Johnson and Wichern, 2007). The book you hold in your hands is merely meant to get your foot in the door, and so long as that is understood from the outset, it will be of great use to the newcomer or beginner in statistics and computing. It is hoped that you leave the book with a feeling of having better understood simple to relatively advanced statistics, while also experiencing a little bit of what Python is all about.
Pythonis used in performing and demonstrating data analyses throughout the book, but it should be emphasized that the book is not a specialty on Python itself. In this respect, the book does not contain a deep introduction to the software and nor does it go into the languagethat makes up Python computing to any significant degree. Rather, the book is much more “hands-on” in that code used is a starting point to generating useful results. That is, the code employed is that which worked for the problem under consideration and which the user can amend or adjust afterward when performing additional analyses. When it comes to coding with Python, there are usually several ways of accomplishing similar goals. In places, we also cite code used by others, assigning proper credit. There already exist a plethora of Python texts and user manuals that feature the software in much greater depth. Those users wishing to learn Python from scratch and become specialists in the software and aspire to become an efficient and general-purpose programmer should consult those sources (e.g. see Guttag, 2013). For those who want some introductory exposure to Python on generating data-analytic results and wish to understand what the software is producing, it is hoped that the current book will be of great use.
In a book such as this, limited by a fixed number of pages, it is an exceedingly difficult and challenging endeavor to both instruct on statistics and software simultaneously. Attempting to cover univariate, bivariate, and multivariate techniques in a book of this size in any kind of respectable depth or completeness in coverage is, well, an impossibility. Combine this with including software options and the impossibility factor increases! However, such is the nature of books that attempt to survey a wide variety of techniques such as this one – one has to include only the most essential of information to get the reader “going” on the techniquesand advise him or her to consult other sources for further details. Targeting the right mix of theory and software in a book like this is the most challenging part, but so long as the reader (and instructor) recognizes that this book is but a foot-in-the-door to get students “started,” then I hope it will fall in the confidence band of a reasonable expectation. The reader wishing to better understand a given technique or principle will naturally find many narratives incomplete, while the reader hoping to find more details on Python will likewise find the book incomplete. On average, however, it is hoped that the current “mix” is of introductory use for the newcomer. It can be exceedingly difficult to enter the world of statistics and computing. This book will get you started. In many places, references are provided on where to go next.
Unfortunately, many available books on the market for Python are nothing more than slaps in the face to statistical theory while presenting a bunch of computer code that otherwise masks a true understanding of what the code actually accomplishes. Though data scienceis a welcome addition to the mathematical and applied scientific disciplines, and software advancements have made leaps and bounds in the area of quantitative analysis, it is also an unfortunate trend that understanding statistical theory and an actual understanding of statistical methods is sometimes taking a back seat to what we will otherwise call “generating output.” The goal of research and science is not to generate software output. The goal is, or at least should be, to understand in a deeper way whatever output that is generated.Code can be looked up far easier than can statistical understanding. Hence, the goal of the book is to understand what the code represents (at least the important code on which techniques are run) and, to some extent at least, the underlying mathematical and philosophical mechanisms of one’s analysis. We comment on this important distinction a bit later in this preface as it is very important. Each chapter of this book could easily be expanded and developed into a deeper book spanning more than 3–4 times the size of the book in entirety.
The objective of this book is to provide a pragmatic introduction to data analysis and statistics using Python, providing the reader with a starting point foot-in-the-door to understanding elementary to advanced statistical concepts while affording him or her the opportunity to apply some of these techniques using the Python language .
Читать дальше