LibCat » Книги » Приключения » unrecognised » Alex J. Gutman - Becoming a Data Head

Alex J. Gutman - Becoming a Data Head

Здесь есть возможность читать онлайн «Alex J. Gutman - Becoming a Data Head» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Becoming a Data Head
Автор:
Alex J. Gutman
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Becoming a Data Head: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Becoming a Data Head»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."
Competing on Analytics
Big Data @ Work
The AI Advantage
You’ve heard the hype around data—now get the facts. In
, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it.
You’ll learn how to:
Think statistically and understand the role variation plays in your life and decision making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what’s really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data
is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head—an active participant in data science, statistics, and machine learning. Whether you’re a business professional, engineer, executive, or aspiring data scientist, this book is for you.

Becoming a Data Head — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Becoming a Data Head», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

This setup can span across industries, from drug trials to marketing campaigns. In digital marketing, web designers frequently experiment on us by designing competing layouts or advertisements on web pages. When we shop online, a coin flip happens behind the scenes to determine if you are shown one of two advertisements, call them A and B. After several thousand unknowing guinea pigs visit the site, the web designers see which had led to more “click-throughs.” And because ads A and B were shown randomly, it's possible to determine which ad was better with respect to click-through rates because all other potential confounding features (time of day, type of web surfer, etc.) have been balanced out through randomization. You might hear experiments like this called “A/B tests” or “A/B experiments.”

We will talk more about why this discrepancy matters in Chapter 4, “Argue with the Data.”

Structured vs. Unstructured Data

Data is also said to be structured and unstructured . Structured data is like the data in your spreadsheets or in Table 2.1. It's been presented with a sense of order and structure in the form of rows and columns.

Unstructured data refers to things like text from Amazon reviews, pictures on Facebook, YouTube videos, or audio files. Unstructured data requires clever techniques to convert it into structured data required for analysis methods (see Part III of this book).

Is Data One or Many?

We should clarify where we stand on a debate you may not have known about or cared existed: Is data one or many?

The word data is actually the plural version of the word datum . (Like criteria —the plural of criterion . Or agenda—the plural of the word agendum .) If we were following proper rules of language, we would say “these data are continuous” and not “this data is continuous.”

Both of your authors have attempted to use the correct phrasing the data are …out in the real world and it's not for us. It just sounds weird. And we're not the only ones who think so. The popular data blog FiveThirtyEight.com 3 has argued that its usage is a mass noun , like water or grass.

BASIC SUMMARY STATISTICS

Data does not always look like a dataset or spreadsheet. It's often in the form of summary statistics. Summary statistics enable us to understand information about a set of data.

The three most common summary statistics are mean, median, and mode, and you're probably quite familiar with them. However, we wanted to spend a few minutes discussing these statistics because we frequently see the colloquial terms “normal,” “usual,” “typical,” or “average” used as synonyms for each of the terms. To avoid confusion, let's be clear on what each term means:

The mean is the sum of all the numbers you have divided by the count of all the numbers. The effect of this operation is to give you a sense of what each observation in your series contributes to the entire sum if every observation generated the same amount. The mean is also called the average.

The median is the midpoint of the entire data range if you sorted it in order.

The mode is the most common number in the dataset.

Mean, median, and mode are called measures of location or measures of central tendency. Measures of variation—variance, range, and standard deviation—are measures of spread. The location number tells you where on the number line a typical value falls and spread tells you how spread out the other numbers are from that value.

As a trivial example, the numbers 7, 5, 4, 8, 4, 2, 9, 4, and 100 have mean 15.89, median 5, and mode 4. Notice the mean (average), 15.89, is a number that doesn't appear in the data. This happens a lot: the average number of people in a household in the United States in 2018 was 2.63; basketball star LeBron James scores an average of 27.1 points per game.

It's a common mistake for people to use the average (mean) to represent the midpoint of the data, which is the median. They assume half the numbers must be above average, and half below. This isn't true. In fact, it's common for most of the data to be below (or above) the average. For example, the vast majority of people have greater than the average number of fingers (likely 9. something ).

To avoid confusion and misconceptions, we recommend sticking with mean or average, median, and mode for full transparency. Try not to use words like usual, typical, or normal.

CHAPTER SUMMARY

In this chapter, we gave you a common language to speak about your data in the workplace. Specifically, we described:

Data, datasets, and multiple names for the rows and columns of a dataset

Numerical data (continuous vs. count)

Categorical data (original vs. nominal)

Experimental vs. observational data

Structured vs. unstructured data

Measures of central tendency

With the correct terminologies in place, you're ready to start thinking statistically about the data you come across.

NOTES

1 1 There are additional levels of continuous data, called ratio and interval. Feel free to look them up, but we rarely see the terms used in a business setting. And there are situations when the distinction between continuous and count data doesn't really matter. High count numbers, like website visits, are often considered continuous for the purpose of data analysis rather than count. It's when the count data is near zero that the distinction really matters. We'll explore this more in the coming chapters.

2 2 Here's a quick example of confounding. In a drug trial, if the treatment group consists of only children and no one got sick, you'd be left wondering if their protection from the disease was caused by an effective drug treatment or because children had some inherent protection from the disease. The effect of the drug would be confounded with age. Random assignment between the control and treatment groups prevents this.

3 3 “Data Is” vs. “Data Are”: fivethirtyeight.com/features/data-is-vs-data-are

Конец ознакомительного фрагмента.

Текст предоставлен ООО «ЛитРес».

Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.