SDA 3.5 Documentation for DDLTOX
NAME
ddltox - Convert a DDL file to SAS, SPSS, or Stata definitions or
to XML (for DDI)
USAGE
ddltox [-option] -i input_file
DESCRIPTION
DDLTOX converts the data descriptions in a
DDL
file into data definitions for SAS, SPSS, or Stata. That file,
together with a matching data file, can then be used to create a
system file for one of those systems and to run data analysis
procedures. The program can also generate variable definitions
in XML, following the conventions of the Data Documentation
Initiative
(DDI-
Version 2).
By default, DDLTOX creates SAS data definitions. To create
definitions for SPSS, Stata, or XML, use the ‘-x’ option.
DDLTOX obtains the data description information from the DDL file
given after the ‘-i’ flag on the command line. Since SAS, SPSS,
and Stata do not define the attributes of variables in exactly
the same way as DDL, there are occasionally some problems in
converting from DDL to one of those systems. Some of the options
of DDLTOX are designed to facilitate those conversions.
Warnings and error messages are placed in a file named
‘DDLTOX.MSG’. Users should view the contents of that file after
running the program.
MEANING OF THE OPTIONS
Define Input and Output Files
- -x system
- Type of output to produce -- SAS, SPSS, Stata, or XML.
(Default is SAS; other systems must be specified either in
capital letters or in lowercase.)
- -i fname
- Take input from the DDL file ‘fname’. (This specification
is REQUIRED.)
- -o fname
- Write the new data definitions onto the file ‘fname’
(instead of to the standard output).
Options for Variable Names
- -@ character(s)
- Replace any ’@’ character in a variable name with the
specified conversion character(s). Any ’@’ characters in
category labels will also be converted, so that GOTO information
will reflect the revised variable names.
(This is useful for dealing with names from CASES instruments,
especially when converting to SAS or Stata definitions.)
- -m number
- Override the default maximum for the length of each variable
name (currently 32 characters) and set it to the number given
after ‘-m’. The new maximum must be between 8 and 32.
If a variable name exceeds the maximum length, the corresponding
variable definitions are skipped, and a warning is placed in the
‘DDLTOX.MSG’ file.
- -p prefix
- Use the characters given after ‘-p’ as a prefix to each
variable name. (This is useful if variable names begin with a
number, and they are to be converted to SAS, SPSS, or Stata.)
Options for Category Labels
- -s
- Ignore short (bracketed) category labels -- SAS, SPSS, and
Stata output only
(See the discussion on short/long category labels
below.)
- -n max_characters
- Maximum number of characters to output as a short category
label -- XML output only
(Default is 60)
(See discussion on short/long labels for XML output
below.
)
Other Options
- -v fname
- Limit the variables processed to those contained in the
variable list file ‘fname’.
- -h
- Display short program help and available options. (The
program will not do anything else.)
MISSING DATA CONVERSION
Each system has its own method of indicating which codes are to
be considered invalid and therefore to be excluded from data
analyses.
DDLTOX will attempt to convert as many missing-data
specifications as it can for each variable. If there is
something that cannot be converted, a warning is placed into the
‘DDLTOX.MSG’ file.
SAS missing-data specifications
All numeric missing-data specifications in the DDL file are
converted to IF-statements in SAS. Each such statement sets the
referenced variable to the value ‘.’, if the value in the data
file matches the missing-data condition.
If character missing-data codes are used for numeric variables, a
special missing-data statement is included which applies to all
variables in the SAS definitions.
SPSS missing-data specifications
The first three discreet missing-data codes in the DDL for a
variable are put into an SPSS ‘MISSING VALUES’ statement. Any
additional missing-data or valid range specifications for numeric
variables are converted to IF-statements.
Character missing-data codes for numeric variables are not
converted. If any are encountered, a warning is given.
Stata missing-data specifications
All numeric missing-data specifications in the DDL file are
converted to REPLACE-IF statements in Stata. Each such statement
sets the referenced variable to the value ‘.’, if the value in
the data file matches the missing-data condition.
Character missing-data codes for numeric variables are not
converted. If any are encountered, a warning is given.
XML missing-data specifications
The DDI specification can handle all of the DDL missing-data
conventions except for character missing-data codes for numeric
variables. If any of those specifications are encountered, a
warning is given.
SHORT/LONG CATEGORY LABELS
Short/long category labels for SAS, SPSS, or Stata (-s)
In a DDL file there can be a long label or text description for a
response code, plus a short label given in square brackets. For
example the category label specification could look like:
Consistently votes Democrat [Democrat].
where the long label or text is ‘Consistently votes Democrat’,
and the short label is ‘Democrat’.
- If a short label is given in the DDL file for this category
When generating specifications for SAS, SPSS, or Stata, DDLTOX
will, by default, use the short category label instead of the
longer category label.
If the ‘-s’ option is specified, however, DDLTOX will ignore any
short labels in brackets and will always convert only the long
category label for SAS, SPSS, or Stata. (Note that the ‘-s’
option has no effect when the output is an XML file.)
- If a short label is NOT given in the DDL file for this
category
If a short label is not given, DDLTOX will use up to one line of
the long label (if there is one). If the long label has more
than 60 characters, only the first 60 characters will be used.
Short/long category labels for XML (-n)
In the DDI specification, there are two XML specifications for
the labels of categories -- the ’labl’ element, and the ’text’
element. In general, the ’labl’ element is intended to be used
as a shorter label for statistical analysis programs, whereas the
’text’ element is intended to be used as a longer explanation of
the meaning of a particular category.
In a DDL file the short version of a category label is put
between square brackets after the longer category text. An
example of such a label would be:
Definitely will vote in the next election [Definitely vote]
where the short label is ‘Definitely vote’.
- If BOTH short and long labels are in the DDL file for this
category
When converting such labels into XML definitions, the short label
will be put out using the ’labl’ element, and the longer category
text will be put out using the ’text’ element.
- If there is no short (bracketed) label in the DDL file for
this category
If the category label in the DDL file is less than or equal to a
certain number of characters (default=60), it will be put out
using the ’labl’ element, If the label is longer, it will be put
out using the ’text’ element.
The ’-n’ option allows the user to define what is meant by
’shorter’ or ’longer’ when deciding whether to output the label
with ’labl’ or ’text’. The number given after the option flag
replaces the default dividing criterion of 60 characters.
EXAMPLES
- ddltox -i myDDL -o mySAS -@_
-
Convert DDL to SAS definitions. Convert ’@’ in variable names to
underscore symbols (’_’).
- ddltox -x spss -i myddl -o mySPSS -s
-
Convert DDL to SPSS definitions. Do not use bracketed short
category labels found in the DDL file.
- ddltox -x xml -n 40 -i myddl -o myXML
-
Convert DDL to XML definitions following the DDI specification;
category labels longer than 40 characters will be put out using
the ’txt’ element instead of the ’labl’ element.
SEE ALSO
DDL |
Data Documentation Initiative - Version 2 |
DDL |
Data Description Language |
xconvert |
Convert SAS, SPSS, or Stata defintions into XML or DDL |
CSM, UC Berkeley
April 12, 2011