UI – Unemployment Insurance, which in the United States are administered by each of the states (and District of Columbia)
U.S.C – United States Code is the official compilation of laws and regulations in the United States
Analytical validity: It exists when, at a minimum, estimands can be estimated without bias and their confidence intervals (or the nominal level of significance for hypothesis tests) can be stated accurately (Rubin 1987). The estimands can be summaries of the univariate distributions of the variables, bivariate measures of association, or multivariate relationships among all variables.
Coarsening: A method for protecting data that involves mapping confidential values into broader categories, e.g. a histogram.
Confidentiality: A “quality or condition accorded to information as an obligation not to transmit […] to unauthorized parties” (Fienberg 2005, as quoted in Duncan, Elliot, and Salazar-González 2011). Confidentiality addresses data already collected, whereas privacy (see below) addresses the right of an individual to consent to the collection of data.
Data swapping: Sensitive data records (usually households) are identified based on a priori criteria, and matched to “nearby records.” The values of some or all of the other variables are swapped, usually the geographic identifiers, thus effectively relocating the records in each other’s location.
Differential privacy: A class of formal privacy mechanisms. For instance, ε-differential privacy places an upper bound, parameterized by ε, on the ability of a user to infer from the published output whether any specific data item, or response, was in the original, confidential data (Dwork and Roth 2014).
Dirichlet-multinomial distribution: A family of discrete multivariate probability distributions on a finite support of nonnegative integers. The probability vector p of the better-known multinomial distribution is obtained by drawing from a Dirichlet distribution with parameter α.
Input noise infusion: Distorting the value of some or all of the inputs before any publication data are built or released.
Posterior predictive distribution (PPD): In Bayesian statistics, the distribution of all possible values conditional on the observed values.
Privacy: “An individual’s freedom from excessive intrusion in the quest for information and […] ability to choose [… what …] will be shared or withheld from others” (Duncan, Jabine, and de Wolf 1993, quoted in Duncan, Elliot, and Salazar-González 2011). See also confidentiality, above.
Sampling: As part of SDL, works by only publishing a fractional part of the data.
Statistical confidentiality or SDL – Statistical disclosure limitation: Can be viewed as “a body of principles, concepts, and procedures that permit confidentiality to be afforded to data, while still permitting its use for statistical purposes” (Duncan, Elliot, and Salazar-González 2011, p. 2).
Suppression: Describes the removal of cells from a published table if its publication would pose a high risk of disclosure.
John M. Abowd is the Associate Director for Research and Methodology and Chief Scientist, U.S. Census Bureau, the Edmund Ezra Day Professor of Economics, Professor of Statistics and Information Science, and the Director of the Labor Dynamics Institute (LDI) at Cornell University, Ithaca, NY, USA. https://johnabowd.com. Ian M. Schmutte is Associate Professor of Economics at the University of Georgia, Athens, GA, USA. http://ianschmutte.org. Lars Vilhuber is Senior Research Associate in the Department of Economics and Executive Director of Labor Dynamics Institute (LDI) at Cornell University, Ithaca, NY, USA. https://lars.vilhuber.com. The authors acknowledge the support of a grant from the Alfred P. Sloan Foundation (G-2015-13903), NSF Grants SES-1131848, BCS-0941226, TC-1012593. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau, the National Science Foundation, or the Sloan Foundation. All results presented in this work stem from previously released work, were used by permission, and were previously reviewed to ensure that no confidential information is disclosed.
1 Abowd, J.M. and McKinney, K.L. (2016). Noise infusion as a confidentiality protection measure for graph-based statistics. Statistical Journal of the IAOS 32 (1): 127–135. https://doi.org/10.3233/SJI-160958.
2 Abowd, J.M. and Schmutte, I.M. (2015). Economic analysis and statistical disclosure limitation. Brookings Papers on Economic Activity 50 (1): 221–267.
3 Abowd, J.M. and Vilhuber, L. (2012). Did the housing price bubble clobber local labor market job and worker flows when it burst? The American Economic Review 102 (3): 589–593. https://doi.org/10.1257/aer.102.3.589.
4 Abowd, J.M., Haltiwanger, J., and Lane, J. (2004). Integrated longitudinal employer–employee data for the United States. The American Economic Review 94 (2): 224–229.
5 Abowd, J.M., Stinson, M., and Benedetto, G. (2006). Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project. 1813/43929. U.S. Census Bureau. http://hdl.handle.net/1813/43929.
6 Abowd, J.M., Stephens, B.E., Vilhuber, L. et al. (2009). The LEHD infrastructure files and the creation of the quarterly workforce indicators. In: Producer Dynamics: New Evidence from Micro Data (eds. T. Dunne, J.B. Jensen and M.J. Roberts). University of Chicago Press.
7 Abowd, J.M., Kaj Gittings, R., McKinney, K.L., et al. (2012). Dynamically consistent noise infusion and partially synthetic data as confidentiality protection measures for related time series. US Census Bureau Center for Economic Studies Paper No. CES-WP-12-13. http://dx.doi.org/10.2139/ssrn.2159800.
8 Abowd, J.M., Schmutte, I.M., and Vilhuber, L. (2018). Disclosure avoidance and confidentiality protection in linked data. U.S. Census Bureau Center for Economic Studies Working Paper CES-WP-18-07.
9 Australian Bureau of Statistics (2015). Media release – ABS response to privacy impact assessment. Australian Bureau of Statistics. http://abs.gov.au/AUSSTATS/abs@.nsf/mediareleasesbyReleaseDate/C9FBD077C2C948AECA257F1E00205BBE?OpenDocument(accessed 05 August 2020).
10 Bender, S. and Heining, J. (2011). The research-data-centre in research-data-centre approach: a first step towards decentralised international data sharing. IASSIST Quarterly/International Association for Social Science Information Service and Technology 35 (3) https://www.iassistquarterly.com/index.php/iassist/article/view/119.
11 Browning, M., Jones, S., and Kuhn, P.J. (1995). Studies of the Interaction of UI and Welfare Using the COEP Dataset. LU2-153/224-1995E, Unemployment Insurance Evaluation Series. Ottawa: Human Resources Development Canada. http://publications.gc.ca/collections/collection_2015/rhdcc-hrsdc/LU2-153-224-1995-eng.pdf.
12 Bruno, G., D’Aurizio, L., and Tartaglia-Polcini, R. (2009). Remote processing of firm microdata at the Bank of Italy. No. 36, Bank of Italy. http://dx.doi.org/10.2139/ssrn.1396224(accessed 05 August 2020).
13 Bruno, G., D’Aurizio, L., and Tartaglia-Polcini, R. (2014). Remote processing of business microdata at the Bank of Italy. In: Statistical Methods and Applications from a Historical Perspective, Studies in Theoretical and Applied Statistics (eds. F. Crescenzi and S. Mignani), 239–249. Springer International Publishing. http://link.springer.com/chapter/10.1007/978-3-319-05552-7_21.
14 Center for Economic Studies (2016). LODES Version 7. OTM20160223. U.S. Census Bureau. http://lehd.ces.census.gov/doc/help/onthemap/OnTheMapDataOverview.pdf(accessed 05 August 2020).
Читать дальше