LibCat » Книги » Приключения » unrecognised » Alex J. Gutman - Becoming a Data Head

Alex J. Gutman - Becoming a Data Head

Здесь есть возможность читать онлайн «Alex J. Gutman - Becoming a Data Head» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Becoming a Data Head
Автор:
Alex J. Gutman
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Becoming a Data Head: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Becoming a Data Head»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."
Competing on Analytics
Big Data @ Work
The AI Advantage
You’ve heard the hype around data—now get the facts. In
, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it.
You’ll learn how to:
Think statistically and understand the role variation plays in your life and decision making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what’s really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data
is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head—an active participant in data science, statistics, and machine learning. Whether you’re a business professional, engineer, executive, or aspiring data scientist, this book is for you.

Becoming a Data Head — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Becoming a Data Head», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

—Jim Barksdale, former Netscape CEO

Many people work with data without having a dialect for it. However, we want to ensure we're all speaking the same language to make the rest of the book easier to follow. So, in this chapter, we'll give you a brief crash course on data and data types. If you've had a basic statistics or analytics course, you'll know the terms that follow but there may be parts of our discussion not covered in your class.

DATA VS. INFORMATION

The terms data and information are often used interchangeably. In this book, however, we make a distinction between the two.

Information is derived knowledge. You can derive knowledge from many activities: measuring a process, thinking about something new, looking at art, and debating a subject. From the sensors on satellites to the neurons firing in our brains, information is continually created. Communicating and capturing that information, however, is not always simple. Some things are easily measurable while others are not. But we endeavor to communicate knowledge for the benefit of others and to store what we've learned. And one way to communicate and store information is by encoding it. When we do this, we create data. As such, data is encoded information .

An Example Dataset

Table 2.1tells the story of a company. Each month, they run a different marketing campaign online, on television, or in print media (newspapers and magazines). The process they run generates new information each month. The table they've created is an encoding of this information and thus it holds data .

A table of data, like Table 2.1, is called a dataset .

Notice that it has both rows and columns that serve specific functions in how we understand the table. Each row of the table (running horizontally, under the header row) is a measured instance of associated information. In this case, it's a measured instance of information for a marketing campaign. Each column of the table (running vertically) is a list of information we're interested in, organized into a common encoding so that we can compare each instance.

The rows of each table are commonly referred to as observations , records , tuples , or trials . Columns of datasets often go by the names features , fields , attributes , predictors , or variables .

Know Your Audience

Data is studied in many different fields, each with their own lingo, which is why there are many names for the same things. Some data workers, when talking about the columns in a dataset, might prefer “features” while others say “variables” or “predictors.” Part of being a Data Head is being able to navigate conversations within these groups and their preferences.

A data point is the intersection of an observation and a feature. For example, 150 units sold on 2021-02-01 is a data point.

TABLE 2.1Example Dataset on Advertisement Spending and Revenue

Date	Ad Spending	Units Sold	Profit	Location
2021-01-01	2000	100	10452	Print
2021-02-01	1000	150	15349	Online
2021-03-01	3000	200	25095	Television
2021-04-01	1000	175	12443	Online

Table 2.1has a header (a piece of non-numerical data) that helps us understand what each feature means. Note that not every dataset will have a header row. In such cases, the header row is implied, and the person working in the dataset must know what each feature means.

DATA TYPES

There are many ways to encode information, but data workers use a few specific types of encodings that store information and communicate results. The two most common data types are described as numeric or categorical .

Numeric data is mostly made up of numbers but might use additional symbols to identify units. Categorical data is made up of words, symbols, phrases, and (confusingly) sometimes numbers, like ZIP codes. Numeric and categorical data both split into further subcategories.

There are two main types of numeric data:

Continuous data can take on any number in a number line. It represents a fundamentally uncountable set of values. Consider the weather. The outside temperature, if collected and turned into data, would represent a continuous variable. A local news station might measure a temperature of 65.62 Fahrenheit. However, they may choose to report this number to you as 65 degrees Fahrenheit, 66 degrees Fahrenheit, or 65.6 Fahrenheit.

Count (or discrete) data, unlike continuous data, restricts the precision of the data to a whole number. For example, the number of cars you own can be 0, 1, 2, or more, but not 1.23. This reflects the underlying reality of the thing being measured. 1

Categorical data also has two main types:

Ordered (or ordinal) data is categorical data with an inherent order. Surveys, for example, take advantage of ordinal data when they ask you to rate your experience from 1−10. While this looks like count data, it's not possible to say the difference between survey ratings 10 and 9 is the same as the difference between 1 and 0. Of course, ordinal categorial data does not have to be encoded as numbers. Shirt size, for example, is ordinal: small, medium, large, extra-large.

Unordered (or nominal) categorical data does not have an underlying order to follow. Table 2.1, for example, has a Location feature with values Print, Online, Television. Other nominal variables include Yes or No responses; or Democrat or Republican party affiliation. Their order as presented is always arbitrary—it's not possible to say one category is “greater than” another.

You'll notice Table 2.1has a Date feature, which is an additional data type that is sequential and can be used in arithmetic expressions like numeric data.

HOW DATA IS COLLECTED AND STRUCTURED

The preceding section talked about data types within a dataset, but there are larger categories to describe data that refers to how it was collected and how it's structured.

Observational vs. Experimental Data

Data can be described as observational or experimental, depending on how it's collected.

Observational data is collected based on what's seen or heard by a person or computer passively observing some process.

Experimental data is collected following the scientific method using a prescribed methodology.

Most of the data in your company, and in the world, is observational. Examples of observational data include visits to a website, sales on a given date, and the number of emails you receive each day. Sometimes it's saved for a specific purpose; other times, for no purpose at all. We've also heard the phrase “found data” to reference this type of data; it's often created as byproducts from things like sales transactions, credit card payments, Twitter posts, or Facebook likes. In that sense, it's sitting in a database somewhere, waiting to be discovered and used for something. Sometimes observational data is collected because it's free and easy to collect. But it can be deliberately collected, as with customer surveys or political polls.

Experimental data, on the other hand, is not passively collected. It's collected deliberately and methodically to answer specific questions. For these reasons, experimental data represents the gold standard of data for statisticians and researchers. To collect experimental data, you must randomly assign a treatment to someone or something. Clinical drug trials present a common example that generates experimental data. Patients are randomly split into two groups—a treatment group and a control group—and the treatment group is given the drug while the control group is given a placebo. The random assignment of patients should balance out information not relevant to the study (age, socioeconomic status, weight, etc.) so that two groups are as similar as possible in every way, except for the application of the treatment. This allows researchers to isolate and measure the effect of the treatment, without having to worry about potential confounding features that might influence the outcome of the experiment. 2