Part III: Understanding the Data Scientist's Toolbox Data Heads understand the fundamental concepts of how statistical and machine learning models work. You'll gain an intuitive understanding of unsupervised learning, regression, classification, text analytics, and deep learning.
Part IV: Ensuring Success Data Heads understand the common mistakes and traps when working with data. You'll learn about technical pitfalls that cause projects to fail, and you'll learn about the people and personalities involved in data projects. Finally, we provide direction on how to succeed as a Data Head.
ONE LAST THING BEFORE WE BEGIN
We've established that the data field is growing faster than we can articulate the problems and opportunities it creates. We showed that our past (both society's and the authors’) is filled with data failures. And only by understanding that past can we understand the future. We started you down this path by introducing you to several important concepts in the restaurant classification example.
To understand data at a deeper level, you'll need to cut through the noise, think critically about data problems, and communicate effectively with data workers. Armed with this knowledge, we know you'll be well off.
Are you ready? Your journey to become a Data Head begins on the next page.
1 1 Venture Beat. “87% of data science projects failing”: venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production
2 2 www.brookings.edu/wp-content/uploads/2016/06/11_origins_crisis_baily_litan.pdf
3 3 Nate Silver wrote a series of articles describing this in great detail ( fivethirtyeight.com/tag/the-real-story-of-2016). Pollsters wrongly assuming independence, just like in the mortgage crisis, was one mistake.
4 4 Note to our fellow statisticians: We just mean regular confidence, not statistical confidence.
5 5 K-nearest-neighbor can also be used to predict numbers instead of classes. These are called regression problems, and we'll cover them later in the book.
6 6 This idea is discussed in an amazingly helpful book: Wilson, G. (2019). Teaching tech together. CRC Press.
PART I Thinking Like a Data Head
Many companies rush to try the “next big thing” in data without ever pausing to ask the right business questions. Or learn basic data terminology. Or learn how to look at the world through a statistical lens.
Data Heads won't have that problem. Part I, “Thinking Like a Data Head,” prepares you for the road ahead and puts you in the right mindset to think about and understand data. Here's what we'll cover:
Chapter 1: What Is the Problem?
Chapter 2: What Is Data?
Chapter 3: Prepare to Think Statistically
CHAPTER 1 What Is the Problem?
“A problem well stated is a problem half solved.”
—Charles Kettering, inventor & engineer
The first step on your journey to become a Data Head is to help your organization work on data problems that matter.
That may sound obvious, but we suspect many of you have looked on as companies talked about how great data is but then went on to overpromise impact, misinterpret results, or invest in data technologies that didn't add business value. It often seems as if data projects are undertaken because companies like the sound of what they are implementing without fully understanding why the project itself is important.
This interaction leads to wasted time and money and can cause backlash against future data projects. Indeed, in a rush to find the hidden value in data many companies expect, they often fail at the first step in the process: defining a business problem. 1 So, in this chapter, we go back to the start.
In the next sections, we'll look at the helpful questions Data Heads should ask to make sure what you're working on matters. We'll then share an example where not asking these questions leads to a project failure. Finally, we'll discuss some of the hidden costs of not clearly defining a problem right from the start.
QUESTIONS A DATA HEAD SHOULD ASK
In our experience, going back to first principles and asking the fundamental questions required to solve a problem is easier said than done. Every company has a unique culture, and team dynamics don't always lend themselves to openly asking questions, especially ones that might make others feel undermined. And many of those becoming Data Heads find that they don't have the space to even begin asking the important questions that will drive the projects forward. Which is why having a culture in which to ask these questions is as important as the questions themselves.
There's no one-size-fits-all formula for every company and every Data Head. If you are a leader, we ask that you create an open environment that will get the questions going. (This starts with inviting the technical experts into the room.) And ask questions yourself. This exhibits humility, a key leadership trait, and encourages others to join in. If you are more junior, we encourage you to try your best to ask these questions anyway, even if you're concerned it might upset the status quo. Our advice is to simply do your best. From experience, we believe simply asking the right questions always goes a lot further than not.
We want you to be prepared in the right way, trained to spot project warning signs and raise concerns at the outset. With that, we introduce five questions you should ask before attacking a data problem:
1 Why is this problem important?
2 Who does this problem affect?
3 What if we don't have the right data?
4 When is the project over?
5 What if we don't like the results?
Let's explain each in detail.
Why Is This Problem Important?
The first fundamental question is, “Why is this problem important?” It seems simple but it's one that's often overlooked. We get caught up in the hype of how we're going to solve the problem—and what we think it can do—before the project even starts. At the end of this chapter, we'll talk about the true underlying effects of not answering this question. But at a minimum, this question sets the expectations for why a project should be undertaken. This is important as data projects take time and attention—and often additional investments in technology and data. Simply identifying the importance of the problem before starting it will help optimize how company resources are best used.
You can ask the question in different ways:
What keeps you (us) up at night?
Why does this matter?
Is this a new problem, or has it been solved already?
What is the size of the prize? (What's the return on investment?)
You'll want to understand how each person sees the problem. This will help you create alignment on how everyone will end up supporting the project to solve the problem—and if they agree it should start.
During these initial discussions, you'll want to keep the focus on the central business problem and pay close attention if you hear rumblings of recent technology trends. Talk of technical trends can quickly derail the meeting away from its business focus. Be on the lookout for two warning signs:
Methodology focus: In this trope, companies simply think trying some new analysis method or technology will set them apart. You've heard this marketing fluff before: “If you're not using Artificial Intelligence (AI), you're already behind … .” Or, companies find some other buzzword they would like to incorporate (e.g., “sentiment analysis”).
Deliverable focus: Some projects go off track because companies focus too much on what the deliverable will be. They say the project needs to have an interactive dashboard, for example. You start the project, but the outcome becomes about the installation of the new dashboard or business intelligence system. Project teams need to take a step back and trace how what they want to build brings value to the organization.
Читать дальше