BALANCING DATA CONFIDENTIALITY AND DATA QUALITY     
A two-day short course sponsored by the Joint Program in Survey Methodology 
 

FEBRUARY 23 - 24, 2010
Presented at the Metro Center Marriott, Washington DC

LAWRENCE H. COX

COURSE OBJECTIVES
Ethical survey practice demands that confidential data pertaining to individual persons or entities not be revealed through released data products. Ethical concerns are often reinforced by legislation or regulation, such as the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) and the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Confidentiality concerns have been addressed by researchers and government statisticians over several decades, resulting in a suite of increasingly sophisticated and effective methods for statistical disclosure limitation (SDL), several of which have been implemented in software and incorporated in the survey practices of government statistical agencies in the U.S. and abroad.  Until very recently, however, the effects of disclosure limitation methods on data quality, completeness and usability have been largely ignored. The interplay between data confidentiality methods and data quality and usability is the unifying subject of this course.

This course has three objectives: (1) to familiarize the student with statistical disclosure limitation and SDL methods; (2) to examine potential effects of SDL methods on data completeness, quality and usability; and, (3) to present SDL methods that, in addition to protecting confidentiality effectively, limit abbreviation or deterioration in the usability, quality and completeness of the released data product. Practical data quality questions include: What effect does the SDL method have on key statistics? What effect does the SDL method have on the distribution of the original data? How easy are disclosure-limited data to analyze compared to original data? Is analysis based on disclosure-limited data an acceptable substitute for analysis based on original data?

This course will cover the following topics: reasons for confidentiality protection; legal and regulatory requirements, including CIPSEA and HIPAA; legal and administrative solutions for restricting unauthorized access to confidential data; survey methods for restricting released data and for quantifying and limiting disclosure in tabulations, microdata and public use statistical data base query systems; using research data centers and controlled remote access to increase authorized access to confidential data; and, balancing the confidentiality protection provided by SDL methods with their effects on the usability, quality and completeness of released data products. Emphasis will be placed on recognizing disclosure and evaluating the effectiveness of disclosure limitation strategies and their effects on data quality by means of lecture, discussion and simple numeric examples. Classroom notes, mathematical preliminaries, URLs, and references on disclosure limitation will be provided. The course is organized around types of data release—tabulations, microdata, data base query systems—but much of the material, particularly from the first day, is of general relevance. 

WHO SHOULD ATTEND
Most survey data are collected under some assurances of confidentiality protection, yet few survey practitioners, managers, and data users are familiar with disclosure limitation methods and their effects on the quality and ease of use of disclosure-limited data products. Individuals in Federal and State governments, business, universities, and non-profit organizations involved with survey methodology or the preparation or use of survey data will benefit from this course. Statisticians concerned with assessing and preserving data quality will find the course especially useful. Those involved with the introduction of new or redesigned surveys and with CIPSEA or HIPAA will find the material important and timely. The course will address methodological, practical and policy aspects of the problem. Background in elementary statistics and linear algebra is desirable. Key points will be introduced and reinforced by means of simple numeric examples. In-class discussion of the literature will prepare interested students for further study and application of SDL methods.  Rapidly developing areas such as statistical data base query systems, quality-preserving SDL, and secure distributed statistical analysis will be highlighted.

THE INSTRUCTOR
Larry Cox has extensive experience in statistical disclosure and the development and implementation of disclosure limitation methods. He has published numerous papers, delivered many lectures, organized conferences and meetings, and taught several courses in the U.S. and abroad on privacy, confidentiality and statistical disclosure limitation. His research on SDL methods has led to adoption and automation of several of these methods by international statistical organizations for large-scale use. His recent research on quality-preserving SDL methods is at the forefront of this topic. Dr. Cox's professional experience includes consulting, teaching, and research. He has served as senior research statistician for three government agencies and as Director of the Board on Mathematical Sciences, National Academy of Sciences. He is an elected member of the International Statistical Institute and a Fellow of the American Statistical Association. He has served as Chair of the ASA Committee on Privacy and Confidentiality and on the ASA Board of Directors and the ISI Council.

COURSE MATERIALS
Registrants will be provided with a course lecture notebook.

MEALS
JPSM group continental breakfasts, lunches and refreshments are included in the course fee.

DAILY CHECK-IN
Course registrants should check-in with JPSM Onsite each day of the course.

TENTATIVE SCHEDULE

TUESDAY: FEBRUARY 23, 2010
 8:00 - 9:00 Registrant Check-In and Continental Breakfast
 9:00 - 10:15 What is Statistical Disclosure?
  Ethical, legal and statistical considerations
                Examples of agency policies and practices
                Data confidentiality under CIPSEA and HIPAA
  Balancing the right to privacy with the need to know
                Administrative solutions
                Defining statistical disclosure quantitatively
  Geography, small domain data and disclosure
 10:15 - 10:30 Morning Break
 10:30 - 12:00 Statistical Disclosure Limitation (SDL) for Frequency Count Data
  Examining and defining the problem
  Rounding and perturbation methods and their effects on data quality
  Swapping and switching methods and their effects on data quality
 12:00 - 1:15 Lunch Break
  1:00 -  2:45 SDL for Aggregate Magniture Data
  Mathematical preliminaries
  Quantifying disclosure: Statistical disclosure rules
  Cell bounds and disclosure audit
  Complementary cell suppression
               Mathematical statement of the cell suppression problem
               Why cell suppression is a very difficult problem
               Using mathematical networks for complementary cell suppression
               Quality effects of cell suppression
               Releasing interval data
 2:45 - 3:00 Afternoon Break
 3:00 - 4:30 SDL for Aggregate Magnitude Data (cont.)
  Controlled tabular adjustment (CTA)
               The CTA method
               Quality-preserving controlled tabular adjustment (QP-CTA)
               Minimum discrimination information controlled tabular adjustment (MDI-CTA)
  Perturbing the underlying microdata
 4:30 Adjourn
WEDNESDAY: FEBRUARY 24, 2010
 8:00 - 8:30 Registrant Check-In and Continental Breakfast
 8:30 - 10:00 SDL in Microdata
  Defining microdata disclosure
  Likelihood of disclosure and risk of disclosure
  Censoring, Rounding, Perturbation
  Microaggregation and its effects on data quality
  Blank and impute
  Synthetic microdata and its effects on data quality
  Contextual variables
  Research data centers and remote access
 10:00 - 10:15 Morning Break
 10:15 - 11:45 SDL in Microdata (continued)
  Small domain data
  Effectiveness of SDL methods for microdata
  Disclosure risk analysis
  Defining disclosure and disclosure risk
  Secure multi-party regression
 11:45 - 1:00 Lunch Break
  SDL in Statistical Data Bases
  Statistical data base query systems as multi-dimensional tables
  Estimating confidential and missing data
  Releasing marginal totals or log-linear models and effects on data quality
  Secure distributed statistical analysis
 2:15 - 2:30 Afternoon Break
 2:30 - 4:00 Wrap-Up and Discussion
  Brief discussion of the literature
  Questions and comments
  Final remarks and discussion
 4:00 Adjourn

FEES
The course fee is $600 for staff at sponsoring agencies and affiliates, $600 for full-time university students, and $810 for other participants. JPSM Sponsor Affiliate List: http://projects.isr.umich.edu/jpsm/info.cfm#sponsors.

REGISTRATION
Online registration is required. JPSM Short Courses: www.jpsm.org/shortcourses . Confirmation of acceptance will be sent after the registration form has been processed. Registration is not firm until you receive an acceptance email. The email will include directions to the course. The automatic web registration number is not an acceptance letter. The registration deadline is February 9, 2010.

PAYMENT
Payment by credit card is required. Post registration payment may be done by calling (301) 314-7911, or by faxing the payment form to (301) 314-7912. Please note the confirmation number when it is received. Payment is required by February 9, 2010.

CANCELLATION
Please notify JPSM as soon as possible if you need to cancel your registration. Cancellation requests should be submitted online. If you cancel by February 9, 2010 the full course fee will be refunded. Cancellation during the period February 10-15, 2010 will require a $100 administrative charge; the remainder of the course fee will be reimbursed. Cancellation on or after  February 16, 2010 is subject to the full fee amount.

FELLOWSHIPS
The Joint Program in Survey Methodology strives to increase the number of survey professionals from groups traditionally under-represented in the field. As part of this effort, a limited number of competitive fellowships are available to African-Americans, Latinos, Hispanic Americans, and Native American Indians for the short course. The registrant must be a US citizen or permanent resident.

Fellowship applicants should submit:

1. The online registration form
 
2. A 500-word essay describing their reasons for wanting to attend this short course and how their participation will enhance their chosen career path. The essay should indicate the applicant’s background (i.e. African-American, Latino, Hispanic American, or Native American Indian) and why financial support is needed.
 
3. A letter of recommendation written by a person knowledgeable about their aptitude and interest in survey methodology.
The online registration, essay, and letter of recommendation due January 12, 2010. JPSM will evaluate the applications and inform the successful applicants by January 26, 2010. The fellowship covers the registration fee for the course including the materials to be distributed during the course and the JPSM group continental breakfast, luncheon and refreshments. Essays and recommendations may be either faxed to (301) 314-7911 or mailed to JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742.

JPSM CITATION PROGRAM
The citation program is built around the JPSM two-day short courses. The program is designed to provide the working professional, or student, with state of the art knowledge, current principles and practices of complex surveys and provide practical skills of day-to-day utility. Completion of the citation involves taking a semester length JPSM credit-bearing course "Fundamentals in Survey Methodology" and eight JPSM short courses, of which four must be from the core courses. For information and application materials visit the website http://www.jpsm.umd.edu/certcitat.htm or call 301-314-7911.

INQUIRIES
Questions for this course should be directed to the JPSM Short Course, University of Maryland, 1218 Lefrak Hall, College Park, MD 20742, Phone: (301) 314-7911, Fax: (301) 314-7912, Email: course@survey.umd.edu

Short Courses: www.jpsm.org/shortcourses
Sponsor Affiliate List: projects.isr.umich.edu/jpsm/info.cfm#sponsors
JPSM Home Page: www.jpsm.org
Tax Indentification Number (University of Maryland): 55-6002003 
 
Primary Funding for JPSM is from the Interagency Council on Statistical Policy