A short course sponsored by the Joint Program in Survey Methodology
BALANCING DATA CONFIDENTIALITY AND DATA QUALITY
SEPTEMBER 11-12, 2012
PRESENTED at the Marriott Inn & Conference Center
3501 University Blvd, East, Hyattsville, Maryland 20783 USA
LAWRENCE H. COX
COURSE OBJECTIVES
Tabular and other summary data are built from data pertaining to individual entities (persons, households, businesses, organizations or groups). Microdata are unit-record data containing multiple item responses pertaining to individual entities. On-line data base query systems are now a reality. The need for data products that combine information across data bases and organizations is increasing. Overwhelmingly, the data from which these data products are built is reported at the individual entity level and is confidential.
Ethical survey practice demands that confidential data pertaining to individual persons or entities not be revealed through data products. Ethical concerns are often reinforced by legislation or regulation, such as the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) and the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Confidentiality concerns have been addressed by researchers and government statisticians over several decades, resulting in a suite of increasingly sophisticated and effective methods for statistical disclosure limitation (SDL), several of which have been implemented in software and incorporated in the survey practices of government statistical offices in the U.S. and abroad. Until very recently, however, the effects of disclosure limitation methods on data quality, completeness and usability have been largely ignored. The interplay between data confidentiality methods and data quality and usability is the unifying subject of this course.
This course has three objectives: (1) to familiarize the student with statistical disclosure limitation and SDL methods; (2) to examine potential effects of SDL methods on data completeness, quality and usability; and, (3) to present SDL methods that, in addition to protecting confidentiality effectively, limit abbreviation or deterioration in the usability, quality and completeness of the released data product. Practical data quality questions include: What effect does the SDL method have on key statistics? What effect does the SDL method have on the distribution of the original data? How easy is it to analyze disclosure-limited data compared to original data? Are analytical outputs based on disclosure-limited data acceptable substitutes for those based on original data? How transparent should SDL methods be: how much can/should the agency reveal about how SDL is done?
This course will cover the following topics: reasons for confidentiality protection; legal and regulatory requirements, including CIPSEA and HIPAA; legal and administrative solutions for restricting unauthorized access to confidential data; survey methods for restricting released data and for quantifying and limiting disclosure in tabulations, microdata and on-line data base query systems; using research data centers and controlled remote access to increase authorized access to confidential data; and, balancing the confidentiality protection provided by SDL methods with their effects on the usability, quality and completeness of released data products. Emphasis will be placed on recognizing disclosure and evaluating the effectiveness of disclosure limitation strategies and their effects on data quality by means of lecture, discussion and simple numeric examples. Classroom notes, mathematical preliminaries, URLs, and references on disclosure limitation will be provided. The course is organized around three forms of data release—tabulations, microdata, data base query systems—but much of the material, particularly from the first day, is of general relevance.
THE INSTRUCTOR
Larry Cox has extensive experience in statistical disclosure and the development and implementation of disclosure limitation methods. He has published numerous papers, delivered many lectures, organized conferences and meetings, and taught several courses in the U.S. and abroad on privacy, confidentiality and statistical disclosure limitation. His research on SDL methods has led to adoption and automation of several of these methods by international statistical organizations for large-scale use. His research on quality-preserving SDL methods is at the forefront of this topic. He is currently assessing the state of SDL practice and writing a book on SDL. Dr. Cox's professional experience includes consulting, teaching, and research. He has served as senior research statistician for three government agencies and as Director of the Board on Mathematical Sciences, National Academy of Sciences. He is an elected member of the International Statistical Institute and a Fellow of the American Statistical Association. He has served as Chair of the ASA Committee on Privacy and Confidentiality and two ASA Sections, and on the ASA Board of Directors. He currently serves on the ISI Council.
WHO SHOULD ATTEND
Most survey data are collected under some assurances of confidentiality protection, yet few survey practitioners, managers, and data users are familiar with disclosure limitation methods and their effects on the quality and ease of use of disclosure-limited data products. A glance at statistical data released on the web reveals uneven and often substandard SDL practice by a variety of organizations. Individuals in Federal and State governments, business, universities, and non-profit organizations involved with survey methodology or the preparation or use of survey data will benefit from this course, as will their parent organizations. Statisticians concerned with assessing and preserving data quality will find the course especially useful. Those involved with the introduction of new or redesigned surveys and with CIPSEA or HIPAA will find the material important and timely. The course will address methodological, practical and policy aspects of the problem. Background in elementary statistics and linear algebra is desirable. Key points will be introduced and reinforced by means of simple numeric examples. In-class discussion of the literature will prepare interested students for further study and application of SDL methods. Rapidly developing areas such as statistical data base query systems, quality-preserving SDL, and secure distributed statistical analysis will be highlighted.
TENTATIVE SCHEDULE
Day 1
08:00-09:00 Check-In
09:00-10:15 What is Statistical Disclosure?, Ethical, legal and statistical considerations, Examples of agency policies and practices, Data confidentiality under CIPSEA and HIPAA, Balancing the right to privacy with the need to know, Administrative solutions, Defining statistical disclosure quantitatively, Disclosure checklists, Geography, small domain data, and disclosure, Transparency of disclosure limitation procedures
10:15-10:30 Morning Break
10:30-12:00 Statistical Disclosure Limitation (SDL) for Frequency Count Data, Examining and defining the problem, Rounding and perturbation methods and their effects on data quality, Swapping and switching methods and their effects on data quality
12:00-01:15 Lunch
01:15-02:45 SDL for Aggregate Magnitude Data, Mathematical preliminaries, Quantifying disclosure: Statistical disclosure rules, Cell bounds and disclosure audit, Complementary cell suppression, Mathematical model for the cell suppression problem, Why cell suppression is a very difficult problem, Using mathematical networks for complementary cell suppression, Quality effects of cell suppression, Releasing interval data, Vulnerabilities of complementary cell Suppression
02:45-03:00 Afternoon Break
03:00-04:30 SDL for Aggregate Magnitude Data (cont.), Controlled tabular adjustment (CTA), The CTA method, Quality-preserving controlled tabular adjustment (QP-CTA), Minimum discrimination information controlledtabular adjustment (MDI-CTA), Perturbing the underlying microdata
Day 2
08:30-10:00 SDL in Microdata, Defining microdata disclosure, Likelihood of disclosure and risk of disclosure, Censoring. Rounding. Perturbation, Microaggregation and its effects on data quality, Blank and impute, Synthetic microdata and its effects on data quality, Contextual variables, Research data centers and remote access
10:00-10:15 Morning Break
10:15-11:45 SDL in Microdata (continued), Small domain data, Effectiveness of SDL methods for microdata, Disclosure risk analysis, Defining disclosure and disclosure risk, Secure multi-party regression
11:45-01:00 Lunch
01:00-02:15 New and emerging areas,SDL for on-line data base query systems, Statistical data base query systems as multi-dimensional tables, Estimating confidential and missing data, Releasing marginal totals or log- linear models and effects on data quality, Secure distributed statistical analysis
02:15-02:30 Afternoon Break
02:30-04:00 Wrap-Up and Discussion, Brief discussion of the literature, Questions and comments, Final remarks and discussion
COURSE MATERIALS
Registrants will be provided with a course lecture notebook.
MEALS
JPSM group continental breakfasts, lunches and refreshments are included in the course fee.
FEES
The course fee is $600 for staff at sponsoring agencies and affiliates, $600 for full-time university students, and $810 for other participants. JPSM Sponsor and Affiliate List: http://projects.isr.umich.edu/jpsm/info.cfm#sponsors.
REGISTRATION
Online registration is required. Short Courses: www.jpsm.org/shortcourses . Confirmation of acceptance will be sent after the registration form has been processed. The automatic web registration number is not an acceptance letter. Students are responsible for keeping track of their registration and course dates. Fees and awards are not transferrable due to nonattendance. Contact JPSM if you have any questions concerning the status of the registration. The registration deadline is August 28, 2012.
PAYMENT
Payment by credit card is required. Post registration payment may be done by faxing the payment form https://projects.isr.umich.edu/jpsm/docs/FAX_Payment_UMICH.pdf to (301) 314-7912. Please note the confirmation number when it is received. Payment is required by August 28, 2012.
CANCELLATION
Please notify JPSM as soon as possible if you need to cancel your registration. Cancellation requests should be submitted online. If you cancel by August 28, 2012, the full course fee will be refunded. Cancellation during the period of August 29, 2012 to September 5, 2012 will be subject to a $100 administrative charge; the remainder of the course fee will be reimbursed. In the event of cancellation on or after September 6, 2012, the full course fee will be forfeit.
FELLOWSHIPS
The Joint Program in Survey Methodology strives to increase the number of survey professionals from groups traditionally under-represented in the field. As part of this effort, a limited number of competitive fellowships are available to African-Americans, Latinos, Hispanic Americans, and Native American Indians for the short course. The registrant must be a US citizen or permanent resident.
The applicants should submit:
Online Registration
1. A 500-word essay that indicates their ethnic background, and describes their reasons for wanting to attend this short course and how their participation will enhance their chosen career path
2. A letter of recommendation written by a person knowledgeable about their aptitude and interest in survey methodology.
The online registration, essay, and letter of recommendation are due August 14, 2012. JPSM will evaluate the applications and inform the successful applicants by August 28, 2012. The fellowship covers the registration fee for the course including the materials to be distributed during the course and the JPSM group continental breakfast, luncheon and refreshments. Essays and recommendations may be either faxed to (301) 314-7911 or mailed to JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742.
JPSM CITATION PROGRAMS
The citation programs are built around the JPSM short courses. The JPSM Citation in Introductory Survey Methodology is designed to provide the working professional or interested student with state-of-the-art knowledge about current principles and practices for conducting complex surveys combined with practical skills of day-to-day utility. The JPSM Citation in Introductory Economic Measurement is designed for professional staff who need to know the principles and practices of economic measurement. Completion of either citation program involves taking a semester-length JPSM credit-bearing course and eight JPSM short courses, of which four are specified core courses. For information on the Certificate and Citation Programs visit the website at http://www.jpsm.org or call 301-314-7911.
INQUIRIES
Questions for this course should be directed to the JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742, Phone: (301) 314-7911, Fax: (301) 314-7912, Email: course@survey.umd.edu
JPSM HOME PAGE: www.jpsm.org
JPSM SHORT COURSES: www.jpsm.org/shortcourses
SPONSOR AFFILIATE LIST: projects.isr.umich.edu/jpsm/info.cfm#sponsors
TAX IDENTIFICATION University of Maryland: 55-6002003
DUNS NUMBER University of Maryland: 808124564