Default values for many of the keywords of individual variables may also be specified in this section. (Examples would be for the width of the field, the minimum valid code, or the default missing-data code.) In that case, the corresponding characteristic will be set to this specified default, unless overriden in the individual variable specification.
The description of each variable MUST include its name and its location in the data file (beginning column number). The description must also include the following specifications, IF they are different from the default values: the width (number of columns), the record number, and the number of implied decimal places (if there are implied decimal places in the input field).
Each variable description MAY also include a long label, descriptive text (such as questionnaire wording), and category labels for the code categories. If some of the code values represent invalid response codes, they may be flagged for exclusion from analysis; a minimum and a maximum valid code can also be specified (default values for these specifications can also be set).
The first variable description MUST be for a variable named ‘CASEID’, if the DDL file is to be input to the program MAKESDA in order to create or to add variables to an SDA dataset. If variables are added to an existing SDA dataset, MAKESDA checks the contents of CASEID to make sure that the value for each case matches the value stored previously.
description of the dataset as a whole
*
description of the CASEID variable
*
description of a variable
*
description of another variable
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
(Descriptions of the data file)
title= remainder of the line REQUIRED
records/case= number of records per case 1
reclen= number of characters per record 80
ncases= number of cases No checking for a
specific N of cases
path= directory for new dataset Current directory
charset= character set U.S. ASCII
(used to specify an alternate
encoding of text;
see below)
lang= language No language enforcing
(code to pass to browsers
for display purposes;
see the language document)
(DEFAULT VALUES for individual variable specifications)
blank= a number into which an No default conversion
all-blank field will be for blanks
converted
blank_c= blank conversion for No default conversion
character variables
other= a number into which a field No default conversion
with other non-numeric for other characters
characters will be converted
(numeric type only)
case_c= default case conversion No default case
for character variables conversion
min= default minimum valid code No default min
max= default maximum valid code No default max
md= default missing-data code(s) No default md
for numeric variables
md_c= default MD code(s) for No default md
character variables
sysmdlabel= default label for system (No Data)
missing-data value
record= default record number for 1
location of variables
decimals= default number of implied 0
decimal places
type= default variable type: numeric
numeric or character
width= default number of columns 1
for each variable
If default values for variable specifications have been set as part of the general dataset characteristics, those defaults (or global values) can be overridden for a particular variable by simply re-specifying the keyword as part of the definition of that variable.
Those default values can be nullified for a particular variable by setting the keyword equal to a blank or by specifying ’noglobal’. For example, ‘min= ’ or ‘min=noglobal’ will nullify the default ‘min’ for the current variable being defined (because a minimum valid value does not need to be defined for that variable).
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
name= a single string of 1-32 REQUIRED
alphanumeric characters
iname= name for this item in the No instrument name
instrument or questionnaire
record= number of the record Use dataset default
containing this variable (usually 1)
column= column location of the REQUIRED
left-most character
width= number of columns used by Use dataset default
this variable (usually 1)
decimals= number of implied decimal Use dataset default
places (usually 0)
type= numeric numeric, unless another
type has been set as
dataset default
label= remainder of the line No long variable label
catlabels= category labels and text No category labels
(see discussion below)
md= list of invalid codes and/or No md codes
ranges of codes (separated
by blanks or commas)
See discussion below.
min= minimum valid code No defined minimum
max= maximum valid code No defined maximum
blank= code into which a field System missing-data code
containing only blanks
will be converted
other= code into which a field Unless a non-numeric
containing non-numeric character is defined as MD,
characters will be non-numeric fields will
converted become system missing-data
sysmdlabel= label for system missing-data (No Data)
value (from a blank input
field)
text= descriptive text of any length No text stored for this
(until next keyword) variable
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
name= a single string of 1-32 REQUIRED
alphanumeric characters
iname= name for this item in the No instrument name
instrument or questionnaire
record= number of the record Use dataset default
containing this variable (usually 1)
column= column location of the REQUIRED
left-most character
width= number of columns used by Use dataset default
this variable (usually 1)
type= character REQUIRED, unless default
type is character
label= remainder of the line No long variable label
catlabels= category values and labels No category labels
(see discussion below)
md_c= list of character codes to be No missing data codes
treated as invalid or MD
(multiples separated by
blanks or commas; blanks can
be specified as MD by using
empty quotes -- "")
See discussion below.
blank_c= character field into which an No conversion of blanks
all-blank field will be
converted (including quotes
if you give them)
case_c= upper or lower Mixed case preserved
(convert all the characters
into upper or lower case)
text= descriptive text of any length No text stored for this
(until next keyword) variable
Note in particular that embedded blanks or quotes must be enclosed in single or double quotation marks.
Some examples are as follows:
catlabels= 1 Yes 2 No
catlabels= Y Yes N No
The syntax rules for specifying the category codes of character variables are the same as for specifying missing-data codes, as described in the section immediately above. In particular see that section if the category codes for a character variable include blanks or quotation marks.
Long Category Text
If the text corresponding to a category is long, some analysis programs (outside of SDA) will create a shorter category label; this shorter label would be more appropriate for printing the results of an analysis such as crosstabulation.
Depending on the analysis program, the category label might be created by truncating the text to the first 16 or 20 characters. If the label created by truncating the text would be unclear or ambiguous, it is useful to provide your own abbreviated category label. This is done by enclosing the short label in square brackets after the category text. Programs that read the DDL file can then differentiate between the (long) text of a category and the (short) label corresponding to the same category.
catlabels=
1 Definitely will vote in the next election [Definitely vote]
2 Probably will vote in the next election [Probably vote]
3 Probably will not vote in the next election [Prob not vote]
4 Definitely will not vote in the next election [Def not vote]
Category text can extend over more than one line, provided that a backslash (‘\’) is the last character of every line except the last line:
catlabels=
1 Definitely will vote\
in the next election [Def vote]
2 Probably will vote\
in the next election [Prob vote]
3 Probably will not\
vote in the next election [Prob not vote]
4 Definitely will not vote\
in the next election [Def not vote]
8 Don’t know
9 Refused
If decimals=2, for example, the input value ‘1234’ would be
stored as ‘12.34’.
(This is the same as in previous versions of SDA.)
If decimals=2, for example, the input value ‘1.237’ will retain
all of its decimals and will be stored as ‘1.237’ in versions 2.1
and later of SDA. (In previous versions of SDA that input value
would have been rounded to 2 decimal places and would have been
stored as ‘1.24’.)
In SDA, blank input fields will be set to the system missing-data value, unless the DDL specification for that variable (or for all variables, globally) includes the ‘blank=’ keyword, to specify what number those fields are to be converted into. (For example, one could specify ‘blank=-1’, to convert all blank numeric input fields to ‘-1’ in the SDA dataset.) This conversion does NOT affect the original ASCII data file.
Non-numeric characters such as ’D’ and ’R’ are valid for a numeric variable in SDA, and those characters will be stored as such in the dataset, provided that those characters have been defined as missing-data codes. (If those non-numeric characters have not been defined as missing-data codes, they will be treated as invalid codes.)
A period (’.’) by itself in a field, or an
input field containing
other non-numeric characters that have not been defined as
missing-data codes,
will ordinarily be converted to the system missing-data value in
SDA. However, if the DDL specification for that variable (or for
all variables, globally) includes the ‘other=’ keyword, the non-
numeric fields will be converted by SDA to the value specified
after ‘other=’. That value will then be examined like any other
input value, to see whether it is a valid value or has been
defined as missing-data or out-of-range.
This copy feature is invoked by putting the word ‘copy’ on the asterisked line preceding the variable’s specifications. The variable whose attributes can be copied is either the previous variable (if no specific name is given) or some specific variable defined earlier in the same DDL file. The general layout is as follows:
description of v101
* copy
description of v102,
using all variable definitions of the PREVIOUS variable (v101) that are not specifically redefined in this new variable description.
* copy v75
description of v103,
using all variable definitions for v75 that are not specifically redefined in this new variable description (assuming that v75 has already been defined).
The following keywords are still recognized and are equivalent to the new keywords shown after the equal sign:
labels = catlabels
lrecl = reclen
noglob = noglobal
scale = decimals
The older missing-data keywords ‘MD1=mdvalue1’ and ‘MD2=mdvalue2’ are also recognized and are equivalent to the new form:
MD= mdvalue1, mdvalue2
title= Some Election Study
records/case=2
reclen= 80
path= /mysda/election
*
name= CASEID
label= Case ID of Respondent
record= 1
column= 1
width= 4
*
name= v75
label= R’s Interest in Campaign
record= 1
column= 11
md= 8,9
catlabels=
1 Very Interested
2 Somewhat Interested
5 Not Interested
8 Don’t know, can’t answer [DK]
9 Refused to answer [Ref]
text=
Some people don’t pay much attention to political
campaigns so far this year. How about you, are you very
interested, somewhat interested, or not interested at all?
* copy v75
# Copy the category labels and MD definitions from the variable ’v75’
# (Other specifications are redefined for ’v76’)
name= v76
label= R’s Interest in Primary Election Results
column= 12
text=
How about the results of primary elections.
How interested in those results are you?
Are you very interested, somewhat interested,
or not interested at all?
*
name= age
label= Age of respondent
record= 2
column= 20
width= 2
md= 97-*
catlabels =
97 Age 97 or over
98 Don’t know
99 Refused
*
name= region
label= Character code for each region
record= 2
column= 24
width = 2
type = character
md_c = X
catlabels=
NE Northeastern states
NC North Central states
S Southern states
W Western states
X (Not available)
text =
Region of the country - coded from the state codes
*
name= weight
label= Weight variable
record= 2
column= 50
width= 6
decimals= 4
md= 0
text=
Weight variable with 4 implied decimal places.
_____________________________________________________________________
(For a more extended example, see the
DDL file for the SDA test data
which is distributed with the SDA programs.)
| ddlmod | Modify or merge DDL files |
| language | Using non-English languages |
| makesda | Make SDA variables out of DDL and an ASCII data file |
| q4toddl | Convert CASES Q language files into DDL |
| xconvert | Convert SAS, SPSS, or Stata data definitions into DDL |