LibCat » Книги » Приключения » unrecognised » Administrative Records for Survey Methodology

Administrative Records for Survey Methodology

Здесь есть возможность читать онлайн «Administrative Records for Survey Methodology» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Administrative Records for Survey Methodology
Автор:
Неизвестный Автор
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Administrative Records for Survey Methodology: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Administrative Records for Survey Methodology»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

Addresses the international use of administrative records for large-scale surveys, censuses, and other statistical purposes Administrative Records for Survey Methodology Divided into four sections, the first describes the basics of administrative records research and addresses disclosure limitation and confidentiality protection in linked data. Section two focuses on data quality and linking methodology, covering topics such as quality evaluation, measuring and controlling for non-consent bias, and cleaning and using administrative lists. The third section examines the use of administrative records in surveys and includes case studies of the Swedish register-based census and the administrative records applications used for the US 2020 Census. The book's final section discusses combining administrative and survey data to improve income measurement, enhancing health surveys with data linkage, and other uses of administrative data in evidence-based policymaking. This state-of-the-art resource:
Discusses important administrative data issues and suggests how administrative data can be integrated with more traditional surveys Describes practical uses of administrative records for evidence-driven decisions in both public and private sectors Emphasizes using interdisciplinary methodology and linking administrative records with other data sources Explores techniques to leverage administrative data to improve the survey frame, reduce nonresponse follow-up, assess coverage error, mesaure linkage non-consent bias, and perform small area estimation.
Administrative Records for Survey Methodology

Administrative Records for Survey Methodology — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Administrative Records for Survey Methodology», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

In addition, the increasing computerization of administrative records, has facilitated more extensive linking of previously disconnected administrative databases, to create more comprehensive and extensive information. Methods to link databases within administrative units based on common identifiers are easy to implement (see Chapter 9for more details). In the United States, which does not have a legal national identifier or ID document, the increased use of the Social Security Number (SSN) has facilitated linkage of government databases and among commercial data providers. In many European countries, individuals have national identifiers, and efforts are underway to allow for cross-border linkages within the European Union, in order to improve statistics on the workforce and the businesses of the common economic area created by what is now called the European Union. However, even when common identifiers are not available, linkage is possible (see Chapter 15).

The result has been that data on individuals, households, and business have become richer, collected from an increasing variety of sources, both as designed surveys and censuses, as well as organically created “administrative” data. The desire to allow policy makers and researchers to leverage the rich linked data has been held back, however, by the concerns of citizens and businesses about privacy. In the 1960s in the United States, researchers had proposed a “National Data Bank” with the goal of combining survey and administrative data for use by researchers. Congress held hearings on the matter, and ultimately the project did not go forward (Kraus 2013). Instead, and partially as a consequence, privacy laws were formalized in the 1970s. The U.S. “Privacy Act” (Public Law 93-579, 5 U.S.C. § 552a), passed in 1974, specifically prohibited “matching” programs, linking data from different agencies. More recently, the 2016 Australian Census elicited substantial controversy when the Australian Bureau of Statistics (ABS) decided to keep identifiable data collected through the census for a substantially longer time period, with the explicit goal of enabling linkages between the census and administrative data, as well as linkages across historical censuses (Australian Bureau of Statistics 2015; Karp 2016).

Subsequent decades saw a decline in public availability of highly detailed microdata on people, households, and firms, and the emergence of new access mechanisms and data protection algorithms. This chapter will provide an overview of the methods that have been developed and implemented to safeguard privacy, while providing researchers the means to draw valid conclusions from protected data. The protection mechanisms we will describe are both physical and statistical (or algorithmic), but exist because of the need to balance the privacy of the respondents, including the confidentiality protection their data receive, with society’s need and desire for ever more detailed, timely, and accurate statistics.

2.2 Paradigms of Protection

There are no methods for disclosure limitation and confidentiality protection specifically designed for linked data. Protecting data constructed by linking administrative records, survey responses, and “found” transaction records relies on the same methods as might be applied to each source individually. It is the richness inherent in the linkages, and in the administrative information available to some potential intruders, that pose novel challenges.

Statistical confidentiality can be viewed as “a body of principles, concepts, and procedures that permit confidentiality to be afforded to data, while still permitting its use of for statistical purposes” (Duncan, Elliot, and Salazar-González 2011, p. 2). In order to protect the confidentiality of the data they collect, NSOs and survey organizations (henceforth referred to generically as data custodians) employ many methods. Very often, data are released to the public as tabular summaries. Many of the protection mechanisms in use today evolved to protect published tables against disclosure. Generically, the idea is to limit the publication of cells with “too few” respondents, where the notion of “too few” is assessed heuristically.

We will not provide a detailed history or taxonomy of statistical disclosure limitation (SDL) and formal privacy models, instead will refer the reader to other publications on the topic (Duncan, Elliot, and Salazar-González 2011; Dwork and Roth 2014; FCSM 2005). We do need to set up the problem, which we will do by reviewing suppression, coarsening, swapping, and noise infusion (input and output). These are widely used techniques and the main issues that arise in applications to linked data can be understood with reference to these methods.

Suppressionis widely used to protect published tables against statistical disclosure. Suppression describes the removal of sub-tables, cells, or items in a cell from a published collection of tables if the item’s publication would pose a high risk of disclosure. This method attempts to forge a middle ground between the users of tabular summaries, who want increasingly detailed disaggregation, and publication rules based on cell count thresholds. The Bureau of Labor Statistics (BLS) uses suppression as its primary SDL technique for data releases based on business establishment censuses and surveys. From the outset, it was understood that primary suppression – not publishing easily identified data items – did not protect anything if the agency published the rest of the data, including summary statistics. Users could infer the missing items from what was published (Fellegi 1972). The BLS, and other agencies that rely on suppression, make “complementary suppressions” to reduce the probability that a user can infer the sensitive items from the published data (Holan et al. 2010). But there is no optimal complementary suppression technology – there are usually multiple complementary suppression strategies that achieve the same protection.

Researchers, however, are not indifferent to these strategies. A researcher who needs detailed geographic variation will benefit from data in which the complementary suppressions are based on removing detailed industries. A researcher who needs detailed industry variation will prefer data with complementary suppression based on geography. Ultimately, the committee that chooses the complementary suppression strategy will determine which research uses are possible and which are ruled out.

But the problem is deeper than this: suppression is a very ineffective SDL technique. Researchers working with the cooperation of the BLS have shown that the suppression strategy used in major BLS business data publications provides almost no protection if it is applied, as is currently the case, to each data release separately (Holan et al. 2010). Some agencies may use cumulative suppression strategies in their sequential data releases. In this case, once an item has been designated for either primary or complementary suppression, it would disappear from the release tables until the entire product is redesigned.

Many social scientists believe that suppression can be complemented by restricted access agreements that allow the researcher to use all of the confidential data but limit what can be published from the analysis. Such a strategy is not a complete solution because SDL must still be applied to the output of the analysis, which quickly brings the problem of which output to suppress back to the forefront.

Custom tabulations and data enclaves.Another traditional response by data custodians to the demand by researchers for more extensive and detailed summaries of confidential data, was to create a custom tabulation, a table not previously published, but generated by data custodian staff with access rights to the confidential data, and typically subject to the same suppression rules. As these requests increased, the tabulation and analysis work was offloaded onto researchers by providing them with access to protected microdata. This approach has expanded rapidly in the last two decades, and is widely used around the world. We discuss it in detail later in this chapter.