|
|
|
BALANCING DATA CONFIDENTIALITY AND DATA QUALITY |
FEBRUARY 23 - 24, 2010
Presented at the Metro Center Marriott, Washington DC
LAWRENCE H. COX
COURSE OBJECTIVES
Ethical survey practice demands that confidential data pertaining to individual persons or entities not be revealed through released data products. Ethical concerns are often reinforced by legislation or regulation, such as the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) and the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Confidentiality concerns have been addressed by researchers and government statisticians over several decades, resulting in a suite of increasingly sophisticated and effective methods for statistical disclosure limitation (SDL), several of which have been implemented in software and incorporated in the survey practices of government statistical agencies in the U.S. and abroad. Until very recently, however, the effects of disclosure limitation methods on data quality, completeness and usability have been largely ignored. The interplay between data confidentiality methods and data quality and usability is the unifying subject of this course.
This course has three objectives: (1) to familiarize the student with statistical disclosure limitation and SDL methods; (2) to examine potential effects of SDL methods on data completeness, quality and usability; and, (3) to present SDL methods that, in addition to protecting confidentiality effectively, limit abbreviation or deterioration in the usability, quality and completeness of the released data product. Practical data quality questions include: What effect does the SDL method have on key statistics? What effect does the SDL method have on the distribution of the original data? How easy are disclosure-limited data to analyze compared to original data? Is analysis based on disclosure-limited data an acceptable substitute for analysis based on original data?
This course will cover the following topics: reasons for confidentiality protection; legal and regulatory requirements, including CIPSEA and HIPAA; legal and administrative solutions for restricting unauthorized access to confidential data; survey methods for restricting released data and for quantifying and limiting disclosure in tabulations, microdata and public use statistical data base query systems; using research data centers and controlled remote access to increase authorized access to confidential data; and, balancing the confidentiality protection provided by SDL methods with their effects on the usability, quality and completeness of released data products. Emphasis will be placed on recognizing disclosure and evaluating the effectiveness of disclosure limitation strategies and their effects on data quality by means of lecture, discussion and simple numeric examples. Classroom notes, mathematical preliminaries, URLs, and references on disclosure limitation will be provided. The course is organized around types of data release—tabulations, microdata, data base query systems—but much of the material, particularly from the first day, is of general relevance.
WHO SHOULD ATTEND
Most survey data are collected under some assurances of confidentiality protection, yet few survey practitioners, managers, and data users are familiar with disclosure limitation methods and their effects on the quality and ease of use of disclosure-limited data products. Individuals in Federal and State governments, business, universities, and non-profit organizations involved with survey methodology or the preparation or use of survey data will benefit from this course. Statisticians concerned with assessing and preserving data quality will find the course especially useful. Those involved with the introduction of new or redesigned surveys and with CIPSEA or HIPAA will find the material important and timely. The course will address methodological, practical and policy aspects of the problem. Background in elementary statistics and linear algebra is desirable. Key points will be introduced and reinforced by means of simple numeric examples. In-class discussion of the literature will prepare interested students for further study and application of SDL methods. Rapidly developing areas such as statistical data base query systems, quality-preserving SDL, and secure distributed statistical analysis will be highlighted.
THE INSTRUCTOR
Larry Cox has extensive experience in statistical disclosure and the development and implementation of disclosure limitation methods. He has published numerous papers, delivered many lectures, organized conferences and meetings, and taught several courses in the U.S. and abroad on privacy, confidentiality and statistical disclosure limitation. His research on SDL methods has led to adoption and automation of several of these methods by international statistical organizations for large-scale use. His recent research on quality-preserving SDL methods is at the forefront of this topic. Dr. Cox's professional experience includes consulting, teaching, and research. He has served as senior research statistician for three government agencies and as Director of the Board on Mathematical Sciences, National Academy of Sciences. He is an elected member of the International Statistical Institute and a Fellow of the American Statistical Association. He has served as Chair of the ASA Committee on Privacy and Confidentiality and on the ASA Board of Directors and the ISI Council.
COURSE MATERIALS
Registrants will be provided with a course lecture notebook.
MEALS
JPSM group continental breakfasts, lunches and refreshments are included in the course fee.
DAILY CHECK-IN
Course registrants should check-in with JPSM Onsite each day of the course.
TENTATIVE SCHEDULE
| TUESDAY: FEBRUARY 23, 2010 | |
| 8:00 - 9:00 | Registrant Check-In and Continental Breakfast |
| 9:00 - 10:15 | What is Statistical Disclosure? |
| Ethical, legal and statistical considerations | |
| Examples of agency policies and practices | |
| Data confidentiality under CIPSEA and HIPAA | |
| Balancing the right to privacy with the need to know | |
| Administrative solutions | |
| Defining statistical disclosure quantitatively | |
| Geography, small domain data and disclosure | |
| 10:15 - 10:30 | Morning Break |
| 10:30 - 12:00 | Statistical Disclosure Limitation (SDL) for Frequency Count Data |
| Examining and defining the problem | |
| Rounding and perturbation methods and their effects on data quality | |
| Swapping and switching methods and their effects on data quality | |
| 12:00 - 1:15 | Lunch Break |
| 1:00 - 2:45 | SDL for Aggregate Magniture Data |
| Mathematical preliminaries | |
| Quantifying disclosure: Statistical disclosure rules | |
| Cell bounds and disclosure audit | |
| Complementary cell suppression | |
| Mathematical statement of the cell suppression problem | |
| Why cell suppression is a very difficult problem | |
| Using mathematical networks for complementary cell suppression | |
| Quality effects of cell suppression | |
| Releasing interval data | |
| 2:45 - 3:00 | Afternoon Break |
| 3:00 - 4:30 | SDL for Aggregate Magnitude Data (cont.) |
| Controlled tabular adjustment (CTA) | |
| The CTA method | |
| Quality-preserving controlled tabular adjustment (QP-CTA) | |
| Minimum discrimination information controlled tabular adjustment (MDI-CTA) | |
| Perturbing the underlying microdata | |
| 4:30 | Adjourn |
| WEDNESDAY: FEBRUARY 24, 2010 | |
| 8:00 - 8:30 | Registrant Check-In and Continental Breakfast |
| 8:30 - 10:00 | SDL in Microdata |
| Defining microdata disclosure | |
| Likelihood of disclosure and risk of disclosure | |
| Censoring, Rounding, Perturbation | |
| Microaggregation and its effects on data quality | |
| Blank and impute | |
| Synthetic microdata and its effects on data quality | |
| Contextual variables | |
| Research data centers and remote access | |
| 10:00 - 10:15 | Morning Break |
| 10:15 - 11:45 | SDL in Microdata (continued) |
| Small domain data | |
| Effectiveness of SDL methods for microdata | |
| Disclosure risk analysis | |
| Defining disclosure and disclosure risk | |
| Secure multi-party regression | |
| 11:45 - 1:00 | Lunch Break |
| SDL in Statistical Data Bases | |
| Statistical data base query systems as multi-dimensional tables | |
| Estimating confidential and missing data | |
| Releasing marginal totals or log-linear models and effects on data quality | |
| Secure distributed statistical analysis | |
| 2:15 - 2:30 | Afternoon Break |
| 2:30 - 4:00 | Wrap-Up and Discussion |
| Brief discussion of the literature | |
| Questions and comments | |
| Final remarks and discussion | |
| 4:00 | Adjourn |
FEES
The course fee is $600 for staff at sponsoring agencies and affiliates, $600 for full-time university students, and $810 for other participants. JPSM Sponsor Affiliate List: http://projects.isr.umich.edu/jpsm/info.cfm#sponsors.
REGISTRATION
Online registration is required. JPSM Short Courses: www.jpsm.org/shortcourses . Confirmation of acceptance will be sent after the registration form has been processed. Registration is not firm until you receive an acceptance email. The email will include directions to the course. The automatic web registration number is not an acceptance letter. The registration deadline is February 9, 2010.
PAYMENT
Payment by credit card is required. Post registration payment may be done by calling (301) 314-7911, or by faxing the payment form to (301) 314-7912. Please note the confirmation number when it is received. Payment is required by February 9, 2010.
CANCELLATION
Please notify JPSM as soon as possible if you need to cancel your registration. Cancellation requests should be submitted online. If you cancel by February 9, 2010 the full course fee will be refunded. Cancellation during the period February 10-15, 2010 will require a $100 administrative charge; the remainder of the course fee will be reimbursed. Cancellation on or after February 16, 2010 is subject to the full fee amount.
FELLOWSHIPS
The Joint Program in Survey Methodology strives to increase the number of survey professionals from groups traditionally under-represented in the field. As part of this effort, a limited number of competitive fellowships are available to African-Americans, Latinos, Hispanic Americans, and Native American Indians for the short course. The registrant must be a US citizen or permanent resident.
Fellowship applicants should submit:
1. The online registration form
2. A 500-word essay describing their reasons for wanting to attend this short course and how their participation will enhance their chosen career path. The essay should indicate the applicant’s background (i.e. African-American, Latino, Hispanic American, or Native American Indian) and why financial support is needed.
3. A letter of recommendation written by a person knowledgeable about their aptitude and interest in survey methodology.
The online registration, essay, and letter of recommendation due January 12, 2010. JPSM will evaluate the applications and inform the successful applicants by January 26, 2010. The fellowship covers the registration fee for the course including the materials to be distributed during the course and the JPSM group continental breakfast, luncheon and refreshments. Essays and recommendations may be either faxed to (301) 314-7911 or mailed to JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742.
JPSM CITATION PROGRAM
The citation program is built around the JPSM two-day short courses. The program is designed to provide the working professional, or student, with state of the art knowledge, current principles and practices of complex surveys and provide practical skills of day-to-day utility. Completion of the citation involves taking a semester length JPSM credit-bearing course "Fundamentals in Survey Methodology" and eight JPSM short courses, of which four must be from the core courses. For information and application materials visit the website http://www.jpsm.umd.edu/certcitat.htm or call 301-314-7911.
INQUIRIES
Questions for this course should be directed to the JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742, Phone: (301) 314-7911, Fax: (301) 314-7912, Email: course@survey.umd.edu
Short Courses: www.jpsm.org/shortcourses
Sponsor Affiliate List: projects.isr.umich.edu/jpsm/info.cfm#sponsors
JPSM Home Page: www.jpsm.org
Tax Indentification Number (University of Maryland): 55-6002003
Primary Funding for JPSM is from the Interagency Council on Statistical Policy