SDA 3.5 Documentation for Q4TODDL


NAME

Q4TODDL - convert Version 3 and 4 CASES Q-language files to DDL

USAGE

q4toddl [-option] [Q_files]

DESCRIPTION

Q4TODDL generates descriptions of variables in the Data Description Language (DDL). The DDL file, together with a data file, can then produce an SDA dataset and codebook. A DDL file can also be used to generate a data description setup file for SPSS, SAS, or Stata (by using the DDLTOX program). See the Q4TODDL examples document for several example of Q-language input and the DDL output produced by Q4TODDL.

This version of the Q4TODDL program has been updated for SDA. It is almost identical to the older CSA version, except that the interactive interface (based on old character-based technology) is no longer supported. Q4TODDL still generates the old (CSA- compatible) version of DDL. However, SDA programs are able to read such DDL files.

CONTENTS OF THIS DOCUMENT

 
OVERVIEW
 
MODES OF OPERATION
* Batch mode
* Command-line mode
 
FILES TO PREPARE
* Q-language files
* Layout file
* Dataset definitions
* List of variables
 
OPTIONS
 
FILE OPTIONS
* Alternate names for files
* Create list of items processed
 
DATA LOCATION CONVERSION
* Map record numbers to other values
* Convert locations to a single long record
 
DIAGNOSTIC MESSAGE HANDLING
* File for diagnostic messages
* Report missing variables
 
MISSING DATA SPECIFICATIONS
* Automatic MD1 specification
* Automatic MD2 specification
* Force Automatic MD1 and MD2 Generation
* Automatic MIN specification
 
TEXT CONTROL OPTIONS
* Include all item/form lines in text
* Suppress all text sections
 
MISCELLANEOUS OPTIONS
* Attach a prefix to variable names
* Suppress category labels
* Create DDL for non-input items
 
NAME OF THE OUTPUT FILE
 
DDL KEYWORDS AS Q-LANGUAGE EXTENSIONS
* DDL Keywords
* Suppression of text or category labels
* Control of text to be output
 
NOTE ON NAMES FOR ARRAY ELEMENTS
 
NOTE ON VARIABLE TYPES
 
LIMITS

OVERVIEW

Q4TODDL obtains the information about each variable from a CASES Q-language instrument designed for computer-assisted interviewing, data entry, or coding. It is important to understand that some of the key elements used in DDL to describe a variable (such as a long variable label and the designation of certain codes as invalid or missing data) are not necessarily included in a Q-language instrument. Q4TODDL will use whatever information it finds in the Q-language file, but some specifications may have to be added.

These additional specifications can either be added to the DDL file produced by Q4TODDL, or they can be included in the Q- language file itself in a way that will not affect the execution of the CASES programs. The advantage of including these additional specifications in the Q-language file itself is that it consolidates in one place all of the relevant specifications for a variable. Instructions for doing this are given below under the heading DDL Keywords as Q-language Extensions.


MODES OF OPERATION

Q4TODDL runs in two different modes, depending on how the program is invoked.

1. Batch mode: q4toddl -b filename [Q_files]


The name of a file containing the desired options must be given after the ‘-b’ option flag. See the Q4TODDL batch summary for a summary of the relevant keywords.

2. Command-line option mode: q4toddl [-option] Q_files

See the Q4TODDL command summary for a summary of the relevant command-line options.

In the following descriptions of Q4TODDL options, the corresponding batch keywords and command-line options are given.

Every time Q4TODDL runs in either of the two modes, a file named Q4TODDL.OPT is generated. This file contains the batch-mode keywords for the options requested and the Q-language files used as input on the last run. This file can be edited or used as is to repeat a run in batch mode, using the ‘-b’ option. Since this file is overwritten each time Q4TODDL runs, rename or copy it if you want to preserve it.


FILES TO PREPARE

Q4TODDL obtains its information about each variable and about the dataset as a whole from the following files:

1. Q-language file(s): ’QFile = filenames’

The descriptive information about each variable comes from the Q- language files. All Q-language files to be processed must have names ending either with ‘.q’ or ‘.m’. (Files ending in ‘.m’ contain CASES code expanded from macros.) There must be at least one such file.

2. Layout file: ’Layout = filename’ / ’-l filename’

This file contains the location of each variable in the data file; it is usually generated either by running the ‘layout’ program in CASES (without the verbose options -v or -V), or by running the CASES ‘output’ program (using the ‘-i=’ option). The default name expected for this file is ‘LAYOUT’, but another name can be specified as an option. Note that the current version of Q4TODDL requires that there be a layout file.

If necessary, a layout file can be created manually, using the following format: Each line consists of a variable name (the item name in the Q-language instrument); variable type (integer, float, or char); record number; starting column location; width of the field (the number of columns; if there are implied decimal places, they are indicated by appending a period (.) and the number of implied decimals); and finally, an indicator of whether or not the variable is an input item (for a non-input item, put a zero in the last field or just leave it blank; for an input item, put ‘Y’ or ‘1’ or any other character except a zero). Each of these fields must be separated by one or more blanks. For an example of a layout file, see the q4toddl examples help file.

3. File with overall dataset definitions: ’Studydefs = filename’ / ’-s filename’

Information about the dataset as a whole is not contained in either the Q-language file or the layout file; therefore you must prepare a separate file containing the appropriate information. This file should include at least two sets of definitions -- the overall dataset specifications (study title, directory location, and default values for some of the specifications for individual variables) and the specifications for the ‘CASEID’ variable. (See the DDL document for explanations and examples of these specifications.) The default name for this file is ‘SDEFS’, but another name can be specified as an option.

Q4TODDL simply copies the contents of this overall dataset definition file to the beginning of its DDL output file. The dataset definition file, consequently, could also contain DDL specifications created on a previous run or by hand.

If this file is not found, Q4TODDL will warn you and will generate the minimum necessary dataset definition and CASEID keywords, but they will have blanks after the equal sign, and you will have to edit the resulting DDL file manually. If you really want to generate DDL without overall dataset definitions -- in order to append the output to a DDL file that already has that information, for example -- simply create a file named ‘SDEFS’ with nothing in it.

4. List of variables: ’Varlist = filename’ / ’-v filename’

If you want the program to generate DDL for only a subset of the variables listed in the layout file, prepare a file containing the names of the variables you want. Variable names may be listed one per line or several per line, separated by spaces, tabs, or commas. The default name for this file is ‘VARLIST’, but another name can be specified as an option. If no list of variables is provided, the program will ordinarily generate DDL for every input item listed in the layout file (including ‘no data’ input items, if any). See below for the option to include non-input items when a varlist is not provided.

Note that DDL is generated for variables in the order in which those variables are found in the Q-language file(s). A varlist does not affect that order.


OPTIONS

Q4TODDL has many options to help you extract as much useful information as possible from the Q-language instrument. Since it is quick and easy to run the program, you can readily try out the different options and examine the output. Some examples can be inspected in the examples document.


File Options

1. Alternate Names for Input Files

If the files you prepare ahead of time are placed in your current directory and are given the default names, Q4TODDL will use their contents without your having to do anything further. If you give them different names, you must specify those names as options. There are three files this applies to:

There is no default name for the Q-language files. They must always be specified by name (or by using the appropriate wildcard characters ‘*’ and ‘?’).

2. Create an Inventory File: ’Inventory’ / ’-i’

If you select this option, Q4TODDL will save a list of all the items processed (inventory) in the file ‘Q4TODDL.IN1’. This list is written with one variable name per line, in the order that they are found in the Q-language file(s). In addition if any item names have been transformed, either by using a prefix (see "Miscellaneous Options" below), or by using the special command ‘[##dname= ]’ (see the "Q-Language Extensions" below), or by expanding array names into references to the individual array elements, a second list of names is saved in the file ‘Q4TODDL.IN2’. This second inventory list gives the final DDL names of all the items processed.

A useful application of this inventory option is to run Q4TODDL without a VARLIST file. You will then generate a complete list of the (input) items in the Q-language file(s) which have an entry in the layout file. That list (saved in the file ‘Q4TODDL.IN1’) can be edited and used as the VARLIST for subsequent Q4TODDL runs.

Another application of this option is to use this list of variables as input to the SDA XCODEBK program. Without a list of variables, XCODEBK will output the variable descriptions in alphabetic order by name, which is unlikely to be the same order desired for a codebook (usually you want it to be in the same order as the questions were asked during the interview). If item names have been transformed by Q4TODDL, the list to use for this application is the one saved in ‘Q4TODDL.IN2’.


Data Location Conversion

1. Map layout record numbers to other values: ’Map = a:b,c:d’ / ’-m a:b,c:d’

The record number for each variable is obtained from the layout file specified by the user. That record number is accurate if the data file has been has been generated by the CASES ‘output’ program, using the ‘-i=itemlist’ option. In such cases, the ‘output’ program generates a layout file that matches the new data file. This is the recommended procedure. If you create your data file in that manner, you do NOT need to use this mapping option.

If, however, the datafile to be described with DDL is a subset of the original set of records for each case, and if the overall LAYOUT file (produced by the CASES ‘layout’ command) is used to determine the locations of each variable, the record numbers in the DDL must be adjusted to reflect that situation.

For example, if you create a data file containing only records 12 and 14, Q4TODDL needs to be informed that record 12 is now record 1, and record 14 is now record 2, for purposes of generating DDL to match that specific data file. In this situation you should also provide Q4TODDL with a list of variables, so that the program will limit its output to variables on those two records; otherwise you could end up with conflicting DDL definitions -- variable definitions from the original record 1 could conflict with definitions from the original record 12 (which is mapped to record 1 also).

If you want Q4TODDL to generate DDL for variables located on the zero record (variables identified in the LAYOUT file as being on record 0), you MUST use this mapping option. A location on record 0 will not make sense to any statistical system. For example, say you plan to create a data file that will contain for each case four regular data records followed by that case’s zero record (from the ZEROS file used by CASES). If you map record 0 to record 5, then the DDL locations will be correct.

2. Convert locations to a single long record: ’Appending’ / ’-a’

Data files generated by the CASES ‘output’ program (unless the ‘-i=itemlist’ option is used) usually have several records per case, and each record is 80 columns wide. If the data file to be described with DDL is going to be converted into a form in which each case has one long record (to facilitate sorting, selecting out certain cases, or for some other reason), the record/column locations have to be adjusted. If this option is selected, Q4TODDL will automatically make the adjustment -- for instance, a variable location of record 2, column 10, will appear in the output DDL file as record 1, column 90. If record mapping has also been requested, the mapping is carried out first.

Diagnostic Message Handling

1. Save diagnostic messages in a particular file: ’Errorfile=filename’ / ’-e filename’

Select this option if you want warnings and error messages saved in a file other than ‘Q4TODDL.MSG’. In addition to diagnostic messages this file contains a cross-reference list of any item names that have been transformed (except by the use of a standard prefix), and a list of final DDL names that are longer than 8 characters (and will consequently cause problems for SPSS or SAS).

2. Report missing variables: ’Report’ / ’-r’

Select this option if you want a list of the variables that were NOT found in the Q-language files you specified. For a variable to be considered missing, it must have been listed in the VARLIST file, if supplied, or otherwise in the LAYOUT file. This option is useful for detecting misspelled variable names in the VARLIST, and it can also alert you that certain Q-language files have not been specified for processing. This report of missing variables is output to the same place as other diagnostic messages (see the preceding option).

Missing Data Specifications

Q-language instruments often follow certain conventions in assigning code values. For example, the code 8 (for a one-column variable) might usually be reserved for a "don’t know" response, and the code 9 might usually mean "refused to answer." Q4TODDL allows you to take advantage of such conventions and thus facilitate the task of specifying missing-data codes.

If these options are selected, they are overridden if a variable in the Q-language file has its own missing-data specification (such as the ‘[##md1= ]’ Q-language extension discussed below).

1. Automatic md1 Generation: ’9fill’ / ’-9’

If this option is selected, the ‘md1=’ keyword is generated for each variable and set to all nines -- 9, 99, 999, and so forth, depending on the width of the variable. If a particular variable does not have a code value of all nines in the Q-language instrument, no ‘md1=’ specification is generated, UNLESS the ’0fill’ option is ALSO selected.

2. Automatic md2 Generation: ’8fill’ / ’-8’

If this option is selected, the ‘md2=’ keyword is generated for each variable and set to a value equal to the width filled with nines, minus one -- 8, 98, 998, and so forth, depending on the width of the number. If a particular variable does not have such a code value in the Q-language instrument, no ‘md2=’ specification is generated, UNLESS the ’0fill’ option is ALSO selected.

3. Force Automatic md1 and md2 Generation: ’0fill’ / ’-0’

This option is only meaningful in conjunction with the ‘8fill’ or ‘9fill’ options discussed above. Normally,those options will NOT produce an md1 or md2 specification unless the 8 or 9 code is also specified as a precode in the instrument. The ‘0fill’ option causes this checking of the precodes to be skipped, and the missing-data specifications are automatically generated whether or not there is a matching precode in the Q-file for that particular variable.

Typically this option would be required if the Q-language instrument allows the input of characters like ’D’ or ’R’ for "don’t know" or "refusal" responses, and if those characters are then changed to numeric codes of ’8’ and ’9’.

4. Automatic min Generation: ’Floor’ / ’-f’

Select this option to generate the ‘min=’ keyword for each variable, with the minimum valid value set to the lowest code value given in the Q-language instrument for that variable. Typically you would select this option so that blank fields in the data file (produced by skip patterns) could be converted to zero or some negative number, but still be considered outside the range of valid codes.

Text Control Options

Q4TODDL routinely generates a ‘text=’ segment for each set of DDL definitions for a variable.

For input items, this text segment consists of the template area for that item (or for the entire form, if the item is the first one on a multi-item form). Q-language commands other than ‘[fill]’, ‘[goto]’, and ‘[etc]’ are removed from the text. For many or even most input items, this text segment will show the wording for the question being asked and is more or less what we usually need in order to document that item.

For non-input items, the default text segment is all of the plain text in the item; all commands are removed.

This default generation of the ‘text=’ segment can be further restricted for a particular variable by using the ‘[##bt]’ and ‘[##et]’ Q-language extensions discussed below.

In addition, the following two options affect the generation of the ‘text=’ segment for ALL variables.

1. Include all item/form lines in text: ’QUesfull’ / ’-q’

With this option, everything between the form name and the end of the form is included in the ‘text=’ segment. This includes text following the first precode, the entire pre-template and post- template segments, and all Q-language commands. (For multi-item forms, this option only affects the first item on the form.)

2. Suppress all text sections: ’Textelim’ / ’-t’

Select this option to suppress the generation of ALL ‘text=’ segments. This option overrides the preceding one, if both are selected. If you only want to suppress the text for some of the variables, you can use the ‘[##nt]’ Q-language extension, as described below.

Miscellaneous Options

1. Attach a prefix to variable names: ’Prefix=abc’ / ’-p abc’

Q4TODDL generates a ‘name=’ specification for each variable, and by default the name is the item name as given in the Q-language instrument. If you select this option, and provide a prefix containing from one to five characters, that prefix will be used in every ‘name=’ specification. For example, if the prefix is ‘var’, Q-language item names ‘20’ and ‘21’ will be converted to ‘var20’ and ‘var21’. Note, however, that the use of the ‘[##dname= ]’ command for a particular item overrides this option for that item.

2. Suppress category labels: ’Catselim’ / ’-c’

Q4TODDL generates by default a ‘labels=’ segment for each variable. This segment contains each category code and, as its label, up to 80 characters of the text that appears after the code on the same line of the Q-language file. Select this option to suppress the generation of those category labels for all variables. You can suppress category labels for individual variables by using the ‘[##nc]’ Q-language extension.

3. Create DDL for non-input items: ’Noninput’ / ’-n’

If a VARLIST file is provided for Q4TODDL, the program will generate DDL for every variable listed in that file, regardless of whether or not a variable is an input item. If, however, no VARLIST is provided, Q4TODDL will by default process all of the input items that have an entry in the layout file (including ‘no data’ input items) but ignore the non-input items. Select this option if you want the program to generate DDL for both input and non-input items in the layout file, if no VARLIST is specified.

Name of the Output File: ’Output=filename’ / ’-o filename’

Unless an output file is named, the DDL produced by Q4TODDL will be sent to your screen (standard output). You can redirect that output, but it is usually advisable to specify the desired name for the output file. If that file already exists, it will be overwritten. However, a file with a name ending in ’.q’ will not be overwritten. (This is to avoid accidentally overwriting the Q-language files used as input to Q4TODDL.)

DDL KEYWORDS AS Q-LANGUAGE EXTENSIONS

Q-language instruments do not usually include all of the elements used in DDL to describe a variable. As a result, a DDL file produced by Q4TODDL may have to be edited, in order to be adequate for input to the SDA ‘makesda’ program or to some other statistical system. Alternatively, the additional specifications can be added to the Q-language file itself in a way that will not affect the execution of the CASES programs (by using the comment mechanism). Some examples (and their effects) can be inspected in the examples document.

The following Q-language extensions are recognized by Q4TODDL. In CASES version 4 instruments, the commands for a variable must be placed either in the template area corresponding to that item (the part of the template BEFORE the field marker ‘@’) or in the part of the post-template area corresponding to that item (the part of the post-template AFTER the field marker ‘[@’ for that item). In CASES version 3 instruments, the commands can be placed anywhere in the item before the arrow ‘===>’.

1. DDL Keywords

     

[##label= ] long variable name -- up to 80 characters (default is 1st line of text segment) [##dname= ] DDL name for this item (default is item name in Q-file) [##type= ] variable type: decimal, integer, or character [##scale= ] number of implied decimal points [##min= ] minimum valid value [##max= ] maximum valid value [##md1= ] first missing-data code [##md2= ] second missing-data code [##blank= ] number into which an all-blank field should be converted [##other= ] number into which a field with non-numeric characters should be converted [##[ ] ] short label for a code value -- up to 16 chars (this bracketed label can be located either in the template or in the post-template, after the relevant code value in ‘<>’ and before the next code value; this label is in addition to any plain text for a category found in the item template.)

All of the above keywords except the last can be abbreviated to 3|characters, and the equal sign is optional. For instance, [##label=myvariable] can be abbreviated as [##lab myvariable].

2. Suppression of text or category labels

The following two commands will suppress the corresponding DDL segment for one variable:

     [##nt]     Suppress the ‘text=’ segment
     [##nc]     Suppress category labels
     

3. Control of text to be output

Q4TODDL by default outputs a ‘text=’ segment that includes everything between the beginning of the template area and the first precode in ‘<>’ (or, optionally, everything in the whole item or form). The following two commands can be used to restrict further the text that is output to the DDL:

     [##bt]     Begin text for output to DDL
     [##et]     End text for output to DDL
     

Up to 50 pairs of these commands can be used within a single item, in order to specify the blocks of Q-language text that should be output as a ‘text=’ DDL segment. Each block specified in such a manner is output beginning on a new line in the DDL.

Every item has an implied [##bt] marker at the beginning of the item. Thus, if you only want the first four lines to be output to the DDL, a closing [##et] command at the end of the fourth line would be the only command required. Otherwise, [##bt] and [##et] should always be used in pairs.


NOTE ON NAMES FOR ARRAY ELEMENTS

All or some of the elements of arrays can have DDL definitions produced by Q4TODDL. For example, if there is an array named ‘myarray’ defined as having a dimension of ‘[2 by 2]’, and if ‘myarray’ appears in the VARLIST, DDL will be produced for all four elements of the array, and the DDL variable names will be:

     myarray_1_1
     myarray_1_2
     myarray_2_1
     myarray_2_2
     

If you want to generate DDL for only some of the array elements, specify which elements you want in the VARLIST file. For example, the second element of the array given above can be specified in a VARLIST either as ‘myarray(1,2)’ or as ‘myarray(<1>,<2>)’. Notice that this latter form is the same as the syntax required for an itemlist for the CASES ‘output’ program. A list of items used for ‘output’ can be edited and used also for Q4TODDL.

If you request Q4TODDL to create an inventory file of items processed (see the option description above), the names of array elements will appear with different formats in each of the two lists. In the file ‘Q4TODDL.IN1’ that second element of the array will be listed as ‘myarray(<1>,<2>)’. This is the form useful for creating and editing a list to be used either as a VARLIST on successive runs of Q4TODDL or as an itemlist for the CASES ‘output’ program.

In the second inventory list, ‘Q4TODDL.IN2’, that same element of the array will be listed as ‘myarray_1_2’. This is the form that would be required on a list of variables used as input to the ‘ddltox’ program to produce a setup for SPSS or SAS. This is also the form that would be used in a variable list for the SDA ‘xcodebk’ program. Note that if a DDL name for the array has been supplied with a ‘[##dname=]’ command, the transformed name is used in this second inventory list.


NOTE ON VARIABLE TYPES

There are four main types of variables that can be specified in a CASES instrument: integer, float, character, and date. (These distinctions are new in version 4; see below for the ‘no data’ type of variable). In general Q4TODDL converts both the integer and the float types in CASES into the DDL integer type (with a scale specification if necessary). Q4TODDL converts both the character and date types in CASES into the DDL character type.

The distinction in DDL between integer and decimal types only refers to the particular characters that are allowed (by the older CSA programs) in the ascii data file for a variable. Both of those type specifications, however, produce a numeric variable in an SDA (or CSA) dataset, and that variable can have implied decimal places. (See the DDL documentation for more on this topic.)

Since the ascii data file produced by the CASES ‘output’ program does not contain explicit decimal points, all numeric fields look like integers. If there are implied decimal places in a number, that information is contained in the layout file. For example, if the layout file specifies that a variable is of type float, and has a width of 4.2, Q4TODDL will translate that into a DDL specification for a variable of type integer, with a width of 4, and with a scale factor of 2. Be aware, however, that an integer or float with a field width greater than 9 columns is currently specified as a character variable in the DDL.

The conversion of character and date variables is more direct. Character and date variables in CASES are always specified as character type in DDL. Note, however, that you may not really want a variable defined by CASES as a character or date variable to be treated as such for purposes of analysis. CASES version 4 will consider an item to be of type character if it has any non- numeric precodes that are not designated as ‘missing’, even if those non-numeric precodes were never actually used. If such items are really intended to be interpreted as numeric variables for purposes of analysis, the type of that item can be changed to integer in the layout file before running Q4TODDL. However, if non-numeric characters are present in the data field itself, CSA will not process such an "integer" variable unless the DDL specification ’other=’ has been added. SDA will set cases with non-numeric characters in a numeric field to the system missing- data value, unless the non-numeric codes have been specified as missing-data.

CASES version 3 does not provide a type specification for each item in the layout file. For those instruments Q4TODDL attempts to interpret each item as an integer; however, if the item has one or more non-numeric precodes or it has a width greater than 9 columns, it is treated as a character variable.

Q4TODDL does not currently attempt to create category labels for character variables; for numeric variables it generates labels for the numeric precodes, but it ignores any non-numeric precodes.

The ‘no data’ type of variable in Version 4 of CASES is used for informational screens. Such variables do not have data or data locations and consequently cannot be stored in an SDA dataset. However, they are sometimes of interest when generating codebooks that document a CASES instrument. Q4TODDL will treat such items as input items and will assign record and column specifications of 0 (zero) in the DDL file. If the XCODEBK program is run directly from a DDL file (instead of from an SDA dataset), ‘no data’ items can appear in the codebook.


LIMITS

The maximum number of variables that can be processed at one time (listed in the LAYOUT file) depends on available memory. but it should ordinarily handle several thousand variables. The Q- language files being processed, however, can have an unlimited number of items.

The maximum number of precodes per item is 110.


SEE ALSO

DDL Data Description Language
q4toddlb Summary of batch keywords for Q4TODDL command files
q4toddlc Summary of command-line options for Q4TODDL
q4toddle Examples of Q4TODDL input and output


CSM, UC Berkeley
April 12, 2011