If multiple SDA datasets are named in the ’SDADATA=’ specifications in the HARC file, only one of them (usually the main dataset) can have a ’disclosure.txt’ file. The other SDA dataset(s) (usually created to hold recoded and computed variables) must have a file named ’disc-id.txt’ in their STUDYINF directory. This ’disc-id.txt’ file should contain a single ID keyword with the format ’ID=abc’, where ’abc’ is the same ID or name used for this study in the ’disclosure.txt’ file in the main SDA dataset, as described below.
This document describes the possible disclosure rules that may be specified. Note that additional specifications can be added to the ’disclosure.txt’ file, in order to suppress results from TABLES and MEANS that are considered too imprecise to display. Those extra parameters are discussed in a separate document on precision.
The ’disc-id.txt’ file contains only the ’ID=’ keyword. And the specified ID must match the ID in the ’disclosure.txt’ file of the other (main) SDA dataset listed for this study in the HARC file.
The valid keywords are as follows:
Keyword Possible Specification Default (if no keyword) _____________________________________________________________________ DISCLOSURE ID FOR THE STUDY ID= a unique identifier for the REQUIRED dataset with disclosure rules (one word, only letters or numbers) PREVENT AN ANALYSIS FROM BEING RUN VAREXCLUDE= name(s) of variable that All variables allowed cannot use used in analysis COMBEXCLUDE= pairs of variables that All combinations allowed cannot be used together in the same analysis run and cannot be used at all to recode or compute new variables (see notes below) MAXFILTERS= maximum number of selection Any number of filters OK filter variables that can be used in a single run CONTROLVAR= no, if control variables A control variable is OK cannot used used in tables LISTCASE= no, if the ’listcase’ program Listcase run is OK is not allowed to run SUBSET= no, if the ’subset’ program Subset run is OK is not allowed to run SUPPRESS THE OUTPUT AFTER RUNNING AN ANALYSIS MINCELLN= minimum number of cases in a No required minimum cell N table cell to allow a table to be displayed (see notes below) MINCELLWN= minimum number of WEIGHTED No required weighted minimum cases in a table cell to cell N allow a table to be displayed AVGCELLMIN= minimum average cell size to No required average cell N allow a table to be displayed (checks both the mean and the median cell size, excluding cells with no cases) AVGCELLWMIN= minimum WEIGHTED average cell No required weighted average size cell N MINCASEBYIVAR= for regressions, minimum ratio No limit on the number of of valid observations to the independent vars number of independent vars MONITORVAR= varname, (min_values) No special monitored vars (see notes below) SUPPRESS UNWEIGHTED NUMBER OF CASES IN OUTPUT UNWEIGHTEDN= no Show unweighted N’s
For example, you may not want to release analysis results based on cases that are all from the same institution (such as from the same prison). Assuming that there is a variable named ’prison’, you could specify that variable as one to be monitored.
By default the cases must come from at least two distinct categories of the monitored variable(s). However, you can specify a higher required number of categories by giving the desired number of categories in parentheses after the variable name. See the example below.
The default messages, following the keyword that would be used in a language file, are as follows. Notice that one or more variable names or a number will sometimes be output after the given message.
DIS_VAREXCLUDE = To preserve confidentiality, analyses are not permitted using the following variable(s):
DIS_COMBEXCLUDE = To preserve confidentiality, analyses are not permitted using the following combination(s) of variables:
DIS_VAREXCLUDE_RECODE = To preserve confidentiality, RECODE and COMPUTE are not permitted using the following variable(s):
DIS_MAXFILTERS = To preserve confidentiality, the number of filter variables cannot be greater than:
DIS_CONTROLVAR = To preserve confidentiality, tables cannot be run with control variables.
DIS_LISTCASE = To preserve confidentiality, the LISTCASE program cannot be used with this dataset.
DIS_SUBSET = To preserve confidentiality, the SUBSET program cannot be used with this dataset.
DIS_AVGCELLMIN = To preserve confidentiality, tables cannot be displayed unless the average number of observations in each cell is at least:
DIS_AVGCELLWMIN = To preserve confidentiality, tables cannot be displayed unless the average weighted number of observations in each cell is at least:
DIS_MINCELLN = To preserve confidentiality, tables cannot be displayed unless the number of observations in each cell is at least:
DIS_MINCELLWN = To preserve confidentiality, tables cannot be displayed unless the weighted number of observations in each cell is at least:
DIS_MINCASEBYIVAR = To preserve confidentiality, regression analyses cannot be shown unless the ratio of valid observations to the number of independent variables is at least:
DIS_MONITORVAR = To preserve confidentiality, analysis results cannot be displayed for any set of observations that has only a very small number of values on certain sensitive variables. In this case the sensitive variable(s) (and the minimum required number of valid values) was:
DIS_UNWEIGHTEDN = To preserve confidentiality, only weighted N’s can be shown.
# DISCLOSURE SPECIFICATIONS FOR DATA FILE # ID FOR THIS DATASET ID = survey25 # A. PREVENTS AN ANALYSIS FROM BEING RUN # Completely exclude these vars from analysis and recoding/computing VAREXCLUDE = CASEID, LOCATIONID # Exclude these combinations of vars (separated by ’;’) from analysis # Also exclude the individual vars from being used by the ’recode’ # and ’compute’ programs COMBEXCLUDE = RACE, GENDER; AGE, RACE # Maximum number of selection filters allowed in an analysis run MAXFILTERS = 2 # No tables with a control variable if set equal to ’no’ CONTROLVAR = no # The LISTCASE program cannot be run if set equal to ’no’ LISTCASE = no # The SUBSET program cannot be run if set equal to ’no’ SUBSET = no # B. SUPPRESS ANALYSIS OUTPUT AFTER RUNNING A PROGRAM # Required average (mean and median) cell sizes - unweighted and weighted AVGCELLMIN = 10 AVGCELLWMIN = 200 # Required size of smallest cell - unweighted and weighted MINCELLN = 5 MINCELLWN = 100 # Ratio of cases to number of independent vars in regression MINCASEBYIVAR = 100 # Check for at least 2 distinct values on the variable ’INSTITUTION’ # and at least 3 distinct values on ’CBSA’. MONITORVAR = INSTITUTION CBSA(3) # Suppress all unweighted N’s if set equal to ’no’ UNWEIGHTEDN = no
The words "preserve confidentiality" are set up to link to a file that could explain further the disclosure rules and the reasons for setting them up.
DIS_AVGCELLMIN = To preserve confidentiality, tables cannot be displayed unless the average number of observations in each cell is at least: DIS_AVGCELLWMIN = To preserve confidentiality, tables cannot be displayed unless the average weighted number of observations in each cell is at least: DIS_COMBEXCLUDE = To preserve confidentiality, analyses are not permitted using the following combination(s) of variables: DIS_CONTROLVAR = To preserve confidentiality, tables cannot be run with control variables. DIS_LISTCASE = To preserve confidentiality, the LISTCASE program cannot be used with this dataset. DIS_MAXFILTERS = To preserve confidentiality, the number of filter variables cannot be greater than: DIS_MINCASEBYIVAR = To preserve confidentiality, regression analyses cannot be shown unless the ratio of valid observations to the number of independent variables is at least: DIS_MINCELLN = To preserve confidentiality, tables cannot be displayed unless the number of observations in each cell is at least: DIS_MINCELLWN = To preserve confidentiality, tables cannot be displayed unless the weighted number of observations in each cell is at least: DIS_MONITORVAR = To preserve confidentiality, analysis results cannot be displayed for any set of observations that has only a very small number of values on certain sensitive variables. In this case the sensitive variable(s) (and the minimum required number of valid values) was: DIS_SUBSET = To preserve confidentiality, the SUBSET program cannot be used with this dataset. DIS_UNWEIGHTEDN = To preserve confidentiality, only weighted N’s can be shown. DIS_VAREXCLUDE = To preserve confidentiality, analyses are not permitted using the following variable(s): DIS_VAREXCLUDE_RECODE = To preserve confidentiality, RECODE and COMPUTE are not permitted using the following variable(s):
harc | HARC file specifications |
language | LANGUAGE file specifications |
precision | Precision specifications |