Finally, I'd like to thank my wife Erin one final time (because you've got to save the best for last).
—Alex
I would like to acknowledge the many people who brought this book together.
First, and foremost, I would like to acknowledge my coauthor-in-crime, Alex Gutman. For years, we discussed writing a book together. When the moment was right, we pulled the trigger. I couldn't have asked for a better coauthor.
Thanks to the wonderful folks at Wiley who helped put this together, including acquisition editor Jim Minatel, and project editor John Sleeva. Also, I would like to acknowledge our technical editors, William Brenneman and Jen Stirrup for your hard work reviewing the book. We took your comments to heart.
Last but not least, thank you to my partner, Katie Gray, who always believed in this project—and me.
—Jordan
1 Cover
2 Title Page Becoming a Data Head How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning ALEX J. GUTMAN JORDAN GOLDMEIER
3 Copyright
4 Dedication
5 About the Authors
6 About the Technical Editors
7 Acknowledgments
8 Foreword NOTE
9 Introduction THE DATA SCIENCE INDUSTRIAL COMPLEX WHY WE CARE DATA IN THE WORKPLACE YOU CAN UNDERSTAND THE BIG PICTURE WHO THIS BOOK IS WRITTEN FOR WHY WE WROTE THIS BOOK WHAT YOU'LL LEARN HOW THIS BOOK IS ORGANIZED ONE LAST THING BEFORE WE BEGIN NOTES
10 PART I: Thinking Like a Data Head CHAPTER 1: What Is the Problem? QUESTIONS A DATA HEAD SHOULD ASK UNDERSTANDING WHY DATA PROJECTS FAIL WORKING ON PROBLEMS THAT MATTER CHAPTER SUMMARY NOTES CHAPTER 2: What Is Data? DATA VS. INFORMATION DATA TYPES HOW DATA IS COLLECTED AND STRUCTURED BASIC SUMMARY STATISTICS CHAPTER SUMMARY NOTES CHAPTER 3: Prepare to Think Statistically ASK QUESTIONS THERE IS VARIATION IN ALL THINGS PROBABILITIES AND STATISTICS CHAPTER SUMMARY NOTES
11 PART II: Speaking Like a Data Head CHAPTER 4: Argue with the Data WHAT WOULD YOU DO? TELL ME THE DATA ORIGIN STORY IS THE DATA REPRESENTATIVE? WHAT DATA AM I NOT SEEING? ARGUE WITH DATA OF ALL SIZES CHAPTER SUMMARY NOTES CHAPTER 5: Explore the Data EXPLORATORY DATA ANALYSIS AND YOU EMBRACING THE EXPLORATORY MINDSET CAN THE DATA ANSWER THE QUESTION? DID YOU DISCOVER ANY RELATIONSHIPS? DID YOU FIND NEW OPPORTUNITIES IN THE DATA? CHAPTER SUMMARY NOTES CHAPTER 6: Examine the Probabilities TAKE A GUESS THE RULES OF THE GAME PROBABILITY THOUGHT EXERCISE BE CAREFUL ASSUMING INDEPENDENCE ALL PROBABILITIES ARE CONDITIONAL ENSURE THE PROBABILITIES HAVE MEANING CHAPTER SUMMARY NOTES CHAPTER 7: Challenge the Statistics QUICK LESSONS ON INFERENCE THE PROCESS OF STATISTICAL INFERENCE THE QUESTIONS YOU SHOULD ASK TO CHALLENGE THE STATISTICS CHAPTER SUMMARY NOTES
12 PART III: Understanding the Data Scientist's Toolbox CHAPTER 8: Search for Hidden Groups UNSUPERVISED LEARNING DIMENSIONALITY REDUCTION PRINCIPAL COMPONENT ANALYSIS CLUSTERING K-MEANS CLUSTERING CHAPTER SUMMARY NOTES CHAPTER 9: Understand the Regression Model SUPERVISED LEARNING LINEAR REGRESSION: WHAT IT DOES LINEAR REGRESSION: WHAT IT GIVES YOU LINEAR REGRESSION: WHAT CONFUSION IT CAUSES OTHER REGRESSION MODELS CHAPTER SUMMARY NOTES CHAPTER 10: Understand the Classification Model INTRODUCTION TO CLASSIFICATION LOGISTIC REGRESSION DECISION TREES ENSEMBLE METHODS WATCH OUT FOR PITFALLS MISUNDERSTANDING ACCURACY CHAPTER SUMMARY NOTES CHAPTER 11: Understand Text Analytics EXPECTATIONS OF TEXT ANALYTICS HOW TEXT BECOMES NUMBERS TOPIC MODELING TEXT CLASSIFICATION PRACTICAL CONSIDERATIONS WHEN WORKING WITH TEXT CHAPTER SUMMARY NOTES CHAPTER 12: Conceptualize Deep Learning NEURAL NETWORKS APPLICATIONS OF DEEP LEARNING DEEP LEARNING IN PRACTICE ARTIFICIAL INTELLIGENCE AND YOU CHAPTER SUMMARY NOTES
13 PART IV: Ensuring Success CHAPTER 13: Watch Out for Pitfalls BIASES AND WEIRD PHENOMENA IN DATA THE BIG LIST OF PITFALLS CHAPTER SUMMARY NOTES CHAPTER 14: Know the People and Personalities SEVEN SCENES OF COMMUNICATION BREAKDOWNS DATA PERSONALITIES CHAPTER SUMMARY NOTES CHAPTER 15: What's Next?
14 Index
15 End User License Agreement
1 Chapter 2 TABLE 2.1 Example Dataset on Advertisement Spending and Revenue
2 Chapter 3TABLE 3.1 Probability Dentists Agree to an Advertising ClaimTABLE 3.2 Possible Combinations of 4 out of 5 Dentists Agreeing
3 Chapter 6TABLE 6.1 Probabilities Scenarios with Associated NotationTABLE 6.2 Cumulative Probability of a Die Roll Less than 7
4 Chapter 7TABLE 7.1 Questions, Null Hypotheses ( H 0), and Alternative Hypotheses ( H a)TABLE 7.2 False Positive vs. False Negative Decision Errors
5 Chapter 8TABLE 8.1 Which of These Two Athletes are “Closest” to Each Other?TABLE 8.2 Clustering Algorithms Get Confused If Your Data Isn't Scaled.TABLE 8.3 Summarizing Unsupervised Learning and the Supervision Required
6 Chapter 9TABLE 9.1 Applications of Supervised LearningTABLE 9.2 Multiple Linear Regression Model Fit to Housing Data. All correspon...TABLE 9.3 Sample Housing Data
7 Chapter 10TABLE 10.1 Simple Dataset for Logistic Regression: Using GPA to Predict Inter...TABLE 10.2 Snapshot of the Intern Dataset from HR. The majors are CS = Comput...TABLE 10.3 Confusion Matrix for Predictions from a Classification Model with ...TABLE 10.4 Confusion Matrix for Predictions from a Classification Model with ...
8 Chapter 11TABLE 11.1 Converting Text to Numbers as a Bag of Words . The numbers represe...TABLE 11.2 Extending the Bag-of-Words Table with Bigrams. The resulting docum...TABLE 11.3 Representing Words as Vectors with Word EmbeddingsTABLE 11.4 A Basic Spam Classifier Example
9 Chapter 13TABLE 13.1 Success Rates of Surgical Techniques to Remove Kidney StonesTABLE 13.2 Simpson's Paradox Lurking in the Success Rates of Surgical Techniq...
10 Chapter 14TABLE 14.1 Seven Scenes of Communication Breakdown
1 Chapter 1 FIGURE 1.1 Sentiment analysis trends
2 Chapter 3FIGURE 3.1 Weekly Customer Survey Results: Percent of Positive Reviews. The ...FIGURE 3.2 Reprint of American Scientist figure
3 Chapter 4FIGURE 4.1 Plot of test drives with critical component failures as a functio...FIGURE 4.2 Plots of flights with incidents of O-ring thermal distress as a f...FIGURE 4.3 Plots of flights with incidents of O-ring thermal distress as a f...FIGURE 4.4 Plot of test drives with and without critical component failures ...
4 Chapter 5FIGURE 5.1 A histogram showing the shape of sales priceFIGURE 5.2 Using box plots to compare sales prices at different quality rank...FIGURE 5.3 A bar chart showing the counts by types of electrical installatio...FIGURE 5.4 A line chart showing the number of houses sold in different month...FIGURE 5.5 A scatter plot showing square footage and sales priceFIGURE 5.6 Square footage and sales price have a correlation of 0.62, which ...FIGURE 5.7 Two datasets with a correlation of 0.8FIGURE 5.8 Datasaurus: Data is free to download and explore. 9Like Anscombe’...
5 Chapter 6FIGURE 6.1 Venn diagram showing the probability of two events happening toge...FIGURE 6.2 Tree diagram for scanning computers for a virus at a large compan...
6 Chapter 8FIGURE 8.1 Sorting cars based on different composite features. Notice how th...FIGURE 8.2 Principal component analysis groups and condenses the columns of ...FIGURE 8.3 PCA finds optimal weights that are used to create composite featu...FIGURE 8.4 The PCA algorithm creates a new dataset, the same size as the ori...FIGURE 8.5 Clustering is a technique that groups rows of a dataset together....FIGURE 8.6 The company's 200 locations, before clusteringFIGURE 8.7 k- means in action on retail locations
7 Chapter 9FIGURE 9.1 Basic paradigm of supervised learning: mapping inputs to outputs...FIGURE 9.2 Many lines would fit this data reasonably well, but which line is...FIGURE 9.3 Least squares regression is finding the line through the data tha...FIGURE 9.4 Two competing models. The model on the left generalizes well, whi...FIGURE 9.5 In this plot, you can see how the model does not do well predicti...
Читать дальше