SDA 3.5 Documentation for Q4TODDL
NAME
Q4TODDL - convert Version 3 and 4 CASES Q-language files to DDL
USAGE
q4toddl [-option] [Q_files]
DESCRIPTION
Q4TODDL generates descriptions of variables in the Data
Description Language
(DDL).
The DDL file, together with a data file, can then produce an SDA
dataset and codebook. A DDL file can also be used to generate a
data description setup file for SPSS, SAS, or Stata (by using the
DDLTOX
program). See the
Q4TODDL examples document
for several example of Q-language input and the DDL output
produced by Q4TODDL.
This version of the Q4TODDL program has been updated for SDA. It
is almost identical to the older CSA version, except that the
interactive interface (based on old character-based technology)
is no longer supported. Q4TODDL still generates the old (CSA-
compatible) version of DDL. However, SDA programs are able to
read such DDL files.
CONTENTS OF THIS DOCUMENT
OVERVIEW
Q4TODDL obtains the information about each variable from a CASES
Q-language instrument designed for computer-assisted
interviewing, data entry, or coding. It is important to
understand that some of the key elements used in DDL to describe
a variable (such as a long variable label and the designation of
certain codes as invalid or missing data) are not necessarily
included in a Q-language instrument. Q4TODDL will use whatever
information it finds in the Q-language file, but some
specifications may have to be added.
These additional specifications can either be added to the DDL
file produced by Q4TODDL, or they can be included in the Q-
language file itself in a way that will not affect the execution
of the CASES programs. The advantage of including these
additional specifications in the Q-language file itself is that
it consolidates in one place all of the relevant specifications
for a variable. Instructions for doing this are given below
under the heading
DDL Keywords as Q-language Extensions.
MODES OF OPERATION
Q4TODDL runs in two different modes, depending on how the program
is invoked.
1. Batch mode: q4toddl -b filename [Q_files]
The name of a file containing the desired options must be given
after the ‘-b’ option flag. See the
Q4TODDL batch summary
for a summary of the relevant keywords.
2. Command-line option mode: q4toddl [-option] Q_files
See the
Q4TODDL command summary
for a summary of the relevant command-line options.
In the following descriptions of Q4TODDL options, the
corresponding batch keywords and command-line options are given.
Every time Q4TODDL runs in either of the two modes, a file named
Q4TODDL.OPT is generated. This file contains the batch-mode
keywords for the options requested and the Q-language files used
as input on the last run. This file can be edited or used as is
to repeat a run in batch mode, using the ‘-b’ option. Since this
file is overwritten each time Q4TODDL runs, rename or copy it if
you want to preserve it.
FILES TO PREPARE
Q4TODDL obtains its information about each variable and about the
dataset as a whole from the following files:
1. Q-language file(s): ’QFile = filenames’
The descriptive information about each variable comes from the Q-
language files. All Q-language files to be processed must have
names ending either with ‘.q’ or ‘.m’. (Files ending in ‘.m’
contain CASES code expanded from macros.) There must be at least
one such file.
2. Layout file: ’Layout = filename’ / ’-l filename’
This file contains the location of each variable in the data
file; it is usually generated either by running the ‘layout’
program in CASES (without the verbose options -v or -V), or by
running the CASES ‘output’ program (using the ‘-i=’ option). The
default name expected for this file is ‘LAYOUT’, but another name
can be specified as an option. Note that the current version of
Q4TODDL requires that there be a layout file.
If necessary, a layout file can be created manually, using the
following format: Each line consists of a variable name (the item
name in the Q-language instrument); variable type (integer,
float, or char); record number; starting column location; width
of the field (the number of columns; if there are implied decimal
places, they are indicated by appending a period (.) and the
number of implied decimals); and finally, an indicator of whether
or not the variable is an input item (for a non-input item, put a
zero in the last field or just leave it blank; for an input item,
put ‘Y’ or ‘1’ or any other character except a zero). Each of
these fields must be separated by one or more blanks. For an
example of a layout file, see the
q4toddl examples help file.
3. File with overall dataset definitions: ’Studydefs = filename’ / ’-s filename’
Information about the dataset as a whole is not contained in
either the Q-language file or the layout file; therefore you must
prepare a separate file containing the appropriate information.
This file should include at least two sets of definitions -- the
overall dataset specifications (study title, directory location,
and default values for some of the specifications for individual
variables) and the specifications for the ‘CASEID’ variable.
(See the
DDL document
for explanations and examples of these specifications.) The
default name for this file is ‘SDEFS’, but another name can be
specified as an option.
Q4TODDL simply copies the contents of this overall dataset
definition file to the beginning of its DDL output file. The
dataset definition file, consequently, could also contain DDL
specifications created on a previous run or by hand.
If this file is not found, Q4TODDL will warn you and will
generate the minimum necessary dataset definition and CASEID
keywords, but they will have blanks after the equal sign, and you
will have to edit the resulting DDL file manually. If you really
want to generate DDL without overall dataset definitions -- in
order to append the output to a DDL file that already has that
information, for example -- simply create a file named ‘SDEFS’
with nothing in it.
4. List of variables: ’Varlist = filename’ / ’-v filename’
If you want the program to generate DDL for only a subset of the
variables listed in the layout file, prepare a file containing
the names of the variables you want. Variable names may be
listed one per line or several per line, separated by spaces,
tabs, or commas. The default name for this file is ‘VARLIST’,
but another name can be specified as an option.
If no list of variables is provided, the program will ordinarily
generate DDL for every input item listed in the layout file
(including ‘no data’ input items, if any). See below for the
option to include non-input items when a varlist is not provided.
Note that DDL is generated for variables in the order in which
those variables are found in the Q-language file(s). A varlist
does not affect that order.
OPTIONS
Q4TODDL has many options to help you extract as much useful
information as possible from the Q-language instrument. Since it
is quick and easy to run the program, you can readily try out the
different options and examine the output. Some examples can be
inspected in the
examples
document.
File Options
1. Alternate Names for Input Files
If the files you prepare ahead of time are placed in your current
directory and are given the default names, Q4TODDL will use their
contents without your having to do anything further. If you give
them different names, you must specify those names as options.
There are three files this applies to:
- Dataset definition file - default name = SDEFS
- Layout file - default name = LAYOUT
- List of variables - default name = VARLIST
There is no default name for the Q-language files. They
must always be specified by name (or by using the appropriate
wildcard characters ‘*’ and ‘?’).
2. Create an Inventory File: ’Inventory’ / ’-i’
If you select this option, Q4TODDL will save a list of all the
items processed (inventory) in the file ‘Q4TODDL.IN1’. This list
is written with one variable name per line, in the order that
they are found in the Q-language file(s). In addition if any
item names have been transformed, either by using a prefix (see
"Miscellaneous Options" below), or by using the special command
‘[##dname= ]’ (see the "Q-Language Extensions" below), or by
expanding array names into references to the individual array
elements, a second list of names is saved in the file
‘Q4TODDL.IN2’. This second inventory list gives the final DDL
names of all the items processed.
A useful application of this inventory option is to run Q4TODDL
without a VARLIST file. You will then generate a complete list
of the (input) items in the Q-language file(s) which have an
entry in the layout file. That list (saved in the file
‘Q4TODDL.IN1’) can be edited and used as the VARLIST for
subsequent Q4TODDL runs.
Another application of this option is to use this list of
variables as input to the SDA XCODEBK program. Without a list of
variables, XCODEBK will output the variable descriptions in
alphabetic order by name, which is unlikely to be the same order
desired for a codebook (usually you want it to be in the same
order as the questions were asked during the interview). If item
names have been transformed by Q4TODDL, the list to use for this
application is the one saved in ‘Q4TODDL.IN2’.
Data Location Conversion
1. Map layout record numbers to other values: ’Map = a:b,c:d’ / ’-m a:b,c:d’
The record number for each variable is obtained from the layout
file specified by the user. That record number is accurate if
the data file has been has been generated by the CASES ‘output’
program, using the ‘-i=itemlist’ option. In such cases, the
‘output’ program generates a layout file that matches the new
data file. This is the recommended procedure. If you create
your data file in that manner, you do NOT need to use this
mapping option.
If, however, the datafile to be described with DDL is a subset of
the original set of records for each case, and if the overall
LAYOUT file (produced by the CASES ‘layout’ command) is used to
determine the locations of each variable, the record numbers in
the DDL must be adjusted to reflect that situation.
For example, if you create a data file containing only records 12
and 14, Q4TODDL needs to be informed that record 12 is now record
1, and record 14 is now record 2, for purposes of generating DDL
to match that specific data file. In this situation you should
also provide Q4TODDL with a list of variables, so that the
program will limit its output to variables on those two records;
otherwise you could end up with conflicting DDL definitions --
variable definitions from the original record 1 could conflict
with definitions from the original record 12 (which is mapped to
record 1 also).
If you want Q4TODDL to generate DDL for variables located on the
zero record (variables identified in the LAYOUT file as being on
record 0), you MUST use this mapping option. A location on
record 0 will not make sense to any statistical system. For
example, say you plan to create a data file that will contain for
each case four regular data records followed by that case’s zero
record (from the ZEROS file used by CASES). If you map record 0
to record 5, then the DDL locations will be correct.
2. Convert locations to a single long record: ’Appending’ / ’-a’
Data files generated by the CASES ‘output’ program (unless the
‘-i=itemlist’ option is used) usually have several records per
case, and each record is 80 columns wide. If the data file to be
described with DDL is going to be converted into a form in which
each case has one long record (to facilitate sorting, selecting
out certain cases, or for some other reason), the record/column
locations have to be adjusted. If this option is selected,
Q4TODDL will automatically make the adjustment -- for instance, a
variable location of record 2, column 10, will appear in the
output DDL file as record 1, column 90. If record mapping has
also been requested, the mapping is carried out first.
Diagnostic Message Handling
1. Save diagnostic messages in a particular file: ’Errorfile=filename’ / ’-e filename’
Select this option if you want warnings and error messages saved
in a file other than ‘Q4TODDL.MSG’. In addition to diagnostic
messages this file contains a cross-reference list of any item
names that have been transformed (except by the use of a standard
prefix), and a list of final DDL names that are longer than 8
characters (and will consequently cause problems for SPSS or
SAS).
2. Report missing variables: ’Report’ / ’-r’
Select this option if you want a list of the variables that were
NOT found in the Q-language files you specified. For a variable
to be considered missing, it must have been listed in the VARLIST
file, if supplied, or otherwise in the LAYOUT file. This option
is useful for detecting misspelled variable names in the VARLIST,
and it can also alert you that certain Q-language files have not
been specified for processing. This report of missing variables
is output to the same place as other diagnostic messages (see the
preceding option).
Missing Data Specifications
Q-language instruments often follow certain conventions in
assigning code values. For example, the code 8 (for a one-column
variable) might usually be reserved for a "don’t know" response,
and the code 9 might usually mean "refused to answer." Q4TODDL
allows you to take advantage of such conventions and thus
facilitate the task of specifying missing-data codes.
If these options are selected, they are overridden if a variable
in the Q-language file has its own missing-data specification
(such as the ‘[##md1= ]’ Q-language extension discussed below).
1. Automatic md1 Generation: ’9fill’ / ’-9’
If this option is selected, the ‘md1=’ keyword is generated for
each variable and set to all nines -- 9, 99, 999, and so forth,
depending on the width of the variable. If a particular variable
does not have a code value of all nines in the Q-language
instrument, no ‘md1=’ specification is generated, UNLESS the
’0fill’ option is ALSO selected.
2. Automatic md2 Generation: ’8fill’ / ’-8’
If this option is selected, the ‘md2=’ keyword is generated for
each variable and set to a value equal to the width filled with
nines, minus one -- 8, 98, 998, and so forth, depending on the
width of the number. If a particular variable does not have such
a code value in the Q-language instrument, no ‘md2=’
specification is generated, UNLESS the ’0fill’ option is ALSO
selected.
3. Force Automatic md1 and md2 Generation: ’0fill’ / ’-0’
This option is only meaningful in conjunction with the ‘8fill’ or
‘9fill’ options discussed above. Normally,those options will NOT
produce an md1 or md2 specification unless the 8 or 9 code is
also specified as a precode in the instrument. The ‘0fill’
option causes this checking of the precodes to be skipped, and
the missing-data specifications are automatically generated
whether or not there is a matching precode in the Q-file for that
particular variable.
Typically this option would be required if the Q-language
instrument allows the input of characters like ’D’ or ’R’ for
"don’t know" or "refusal" responses, and if those characters are
then changed to numeric codes of ’8’ and ’9’.
4. Automatic min Generation: ’Floor’ / ’-f’
Select this option to generate the ‘min=’ keyword for each
variable, with the minimum valid value set to the lowest code
value given in the Q-language instrument for that variable.
Typically you would select this option so that blank fields in
the data file (produced by skip patterns) could be converted to
zero or some negative number, but still be considered outside the
range of valid codes.
Text Control Options
Q4TODDL routinely generates a ‘text=’ segment for each set of DDL
definitions for a variable.
For input items, this text segment consists of
the template area for that item (or for the entire form, if the
item is the first one on a multi-item form). Q-language commands
other than ‘[fill]’, ‘[goto]’, and ‘[etc]’ are removed from the
text. For many or even most input items, this text segment will
show the wording for the question being asked and is more or less
what we usually need in order to document that item.
For non-input items, the default text segment
is all of the plain text in the item; all commands are removed.
This default generation of the ‘text=’ segment can be further
restricted for a particular variable by using the ‘[##bt]’ and
‘[##et]’ Q-language extensions discussed
below.
In addition, the following two options affect the generation of
the ‘text=’ segment for ALL variables.
1. Include all item/form lines in text: ’QUesfull’ / ’-q’
With this option, everything between the form name and the end of
the form is included in the ‘text=’ segment. This includes text
following the first precode, the entire pre-template and post-
template segments, and all Q-language commands. (For multi-item
forms, this option only affects the first item on the form.)
2. Suppress all text sections: ’Textelim’ / ’-t’
Select this option to suppress the generation of ALL ‘text=’
segments. This option overrides the preceding one, if both are
selected. If you only want to suppress the text for some of the
variables, you can use the ‘[##nt]’ Q-language extension, as
described
below.
Miscellaneous Options
1. Attach a prefix to variable names: ’Prefix=abc’ / ’-p abc’
Q4TODDL generates a ‘name=’ specification for each variable, and
by default the name is the item name as given in the Q-language
instrument. If you select this option, and provide a prefix
containing from one to five characters, that prefix will be used
in every ‘name=’ specification. For example, if the prefix is
‘var’, Q-language item names ‘20’ and ‘21’ will be converted to
‘var20’ and ‘var21’. Note, however, that the use of the
‘[##dname= ]’ command for a particular item overrides this option
for that item.
2. Suppress category labels: ’Catselim’ / ’-c’
Q4TODDL generates by default a ‘labels=’ segment for each
variable. This segment contains each category code and, as its
label, up to 80 characters of the text that appears after the
code on the same line of the Q-language file. Select this option
to suppress the generation of those category labels for all
variables. You can suppress category labels for individual
variables by using the ‘[##nc]’ Q-language extension.
3. Create DDL for non-input items: ’Noninput’ / ’-n’
If a VARLIST file is provided for Q4TODDL, the program will
generate DDL for every variable listed in that file, regardless
of whether or not a variable is an input item. If, however, no
VARLIST is provided, Q4TODDL will by default process all of the
input items that have an entry in the layout file (including ‘no
data’ input items) but ignore the non-input items. Select this
option if you want the program to generate DDL for both input and
non-input items in the layout file, if no VARLIST is specified.
Name of the Output File: ’Output=filename’ / ’-o filename’
Unless an output file is named, the DDL produced by Q4TODDL will
be sent to your screen (standard output). You can redirect that
output, but it is usually advisable to specify the desired name
for the output file. If that file already exists, it will be
overwritten. However, a file with a name ending in ’.q’ will not
be overwritten. (This is to avoid accidentally overwriting the
Q-language files used as input to Q4TODDL.)
DDL KEYWORDS AS Q-LANGUAGE EXTENSIONS
Q-language instruments do not usually include all of the elements
used in DDL to describe a variable. As a result, a DDL file
produced by Q4TODDL may have to be edited, in order to be
adequate for input to the SDA ‘makesda’ program or to some other
statistical system. Alternatively, the additional specifications
can be added to the Q-language file itself in a way that will not
affect the execution of the CASES programs (by using the comment
mechanism). Some examples (and their effects) can be inspected
in the
examples
document.
The following Q-language extensions are recognized by Q4TODDL.
In CASES version 4 instruments, the commands for a variable must
be placed either in the template area corresponding to that item
(the part of the template BEFORE the field marker ‘@’) or in the
part of the post-template area corresponding to that item (the
part of the post-template AFTER the field marker ‘[@’ for that
item). In CASES version 3 instruments, the commands can be
placed anywhere in the item before the arrow ‘===>’.
1. DDL Keywords
[##label= ] long variable name -- up to 80 characters
(default is 1st line of text segment)
[##dname= ] DDL name for this item
(default is item name in Q-file)
[##type= ] variable type: decimal, integer, or character
[##scale= ] number of implied decimal points
[##min= ] minimum valid value
[##max= ] maximum valid value
[##md1= ] first missing-data code
[##md2= ] second missing-data code
[##blank= ] number into which an all-blank field
should be converted
[##other= ] number into which a field with non-numeric
characters should be converted
[##[ ] ] short label for a code value -- up to 16 chars
(this bracketed label can be
located either in the template or
in the post-template, after the
relevant code value in ‘<>’ and
before the next code value;
this label is in addition to
any plain text for a category found
in the item template.)
All of the above keywords except the last can be abbreviated to
3|characters, and the equal sign is optional. For instance,
[##label=myvariable] can be abbreviated as [##lab myvariable].
2. Suppression of text or category labels
The following two commands will suppress the corresponding DDL
segment for one variable:
[##nt] Suppress the ‘text=’ segment
[##nc] Suppress category labels
3. Control of text to be output
Q4TODDL by default outputs a ‘text=’ segment that includes
everything between the beginning of the template area and the
first precode in ‘<>’ (or, optionally, everything in the whole
item or form). The following two commands can be used to
restrict further the text that is output to the DDL:
[##bt] Begin text for output to DDL
[##et] End text for output to DDL
Up to 50 pairs of these commands can be used within a single
item, in order to specify the blocks of Q-language text that
should be output as a ‘text=’ DDL segment. Each block specified
in such a manner is output beginning on a new line in the DDL.
Every item has an implied [##bt] marker at the beginning of the
item. Thus, if you only want the first four lines to be output
to the DDL, a closing [##et] command at the end of the fourth
line would be the only command required. Otherwise, [##bt] and
[##et] should always be used in pairs.
NOTE ON NAMES FOR ARRAY ELEMENTS
All or some of the elements of arrays can have DDL definitions
produced by Q4TODDL. For example, if there is an array named
‘myarray’ defined as having a dimension of ‘[2 by 2]’, and if
‘myarray’ appears in the VARLIST, DDL will be produced for all
four elements of the array, and the DDL variable names will be:
myarray_1_1
myarray_1_2
myarray_2_1
myarray_2_2
If you want to generate DDL for only some of the array elements,
specify which elements you want in the VARLIST file. For
example, the second element of the array given above can be
specified in a VARLIST either as ‘myarray(1,2)’ or as
‘myarray(<1>,<2>)’. Notice that this latter form is the same as
the syntax required for an itemlist for the CASES ‘output’
program. A list of items used for ‘output’ can be edited and
used also for Q4TODDL.
If you request Q4TODDL to create an inventory file of items
processed (see the option description above), the names of array
elements will appear with different formats in each of the two
lists. In the file ‘Q4TODDL.IN1’ that second element of the
array will be listed as ‘myarray(<1>,<2>)’. This is the form
useful for creating and editing a list to be used either as a
VARLIST on successive runs of Q4TODDL or as an itemlist for the
CASES ‘output’ program.
In the second inventory list, ‘Q4TODDL.IN2’, that same element of
the array will be listed as ‘myarray_1_2’. This is the form that
would be required on a list of variables used as input to the
‘ddltox’ program to produce a setup for SPSS or SAS. This is
also the form that would be used in a variable list for the SDA
‘xcodebk’ program. Note that if a DDL name for the array has
been supplied with a ‘[##dname=]’ command, the transformed name
is used in this second inventory list.
NOTE ON VARIABLE TYPES
There are four main types of variables that can be specified in a
CASES instrument: integer, float, character, and date. (These
distinctions are new in version 4; see below for the ‘no data’
type of variable). In general Q4TODDL converts both the integer
and the float types in CASES into the DDL integer type (with a
scale specification if necessary). Q4TODDL converts both the
character and date types in CASES into the DDL character type.
The distinction in DDL between integer and decimal types only
refers to the particular characters that are allowed (by the
older CSA programs) in the ascii data file for a variable. Both
of those type specifications, however, produce a numeric variable
in an SDA (or CSA) dataset, and that variable can have implied
decimal places. (See the
DDL
documentation for more on this topic.)
Since the ascii data file produced by the CASES ‘output’ program
does not contain explicit decimal points, all numeric fields look
like integers. If there are implied decimal places in a number,
that information is contained in the layout file. For example,
if the layout file specifies that a variable is of type float,
and has a width of 4.2, Q4TODDL will translate that into a DDL
specification for a variable of type integer, with a width of 4,
and with a scale factor of 2. Be aware, however, that an integer
or float with a field width greater than 9 columns is currently
specified as a character variable in the DDL.
The conversion of character and date variables is more direct.
Character and date variables in CASES are always specified as
character type in DDL. Note, however, that you may not really
want a variable defined by CASES as a character or date variable
to be treated as such for purposes of analysis. CASES version 4
will consider an item to be of type character if it has any non-
numeric precodes that are not designated as ‘missing’, even if
those non-numeric precodes were never actually used. If such
items are really intended to be interpreted as numeric variables
for purposes of analysis, the type of that item can be changed to
integer in the layout file before running Q4TODDL. However, if
non-numeric characters are present in the data field itself, CSA
will not process such an "integer" variable unless the DDL
specification ’other=’ has been added. SDA will set cases with
non-numeric characters in a numeric field to the system missing-
data value, unless the non-numeric codes have been specified as
missing-data.
CASES version 3 does not provide a type specification for each
item in the layout file. For those instruments Q4TODDL attempts
to interpret each item as an integer; however, if the item has
one or more non-numeric precodes or it has a width greater than 9
columns, it is treated as a character variable.
Q4TODDL does not currently attempt to create category labels for
character variables; for numeric variables it generates labels
for the numeric precodes, but it ignores any non-numeric
precodes.
The ‘no data’ type of variable in Version 4 of CASES is used for
informational screens. Such variables do not have data or data
locations and consequently cannot be stored in an SDA dataset.
However, they are sometimes of interest when generating codebooks
that document a CASES instrument. Q4TODDL will treat such items
as input items and will assign record and column specifications
of 0 (zero) in the DDL file. If the XCODEBK program is run
directly from a DDL file (instead of from an SDA dataset), ‘no
data’ items can appear in the codebook.
LIMITS
The maximum number of variables that can be processed at one time
(listed in the LAYOUT file) depends on available memory. but it
should ordinarily handle several thousand variables. The Q-
language files being processed, however, can have an unlimited
number of items.
The maximum number of precodes per item is 110.
SEE ALSO
DDL |
Data Description Language |
q4toddlb |
Summary of batch keywords for Q4TODDL command files |
q4toddlc |
Summary of command-line options for Q4TODDL |
q4toddle |
Examples of Q4TODDL input and output |
CSM, UC Berkeley
April 12, 2011