SDA 3.5 Documentation for QEXTRACT
NAME
qextract - Extract item definitions from CASES 4.3 instruments
USAGE
qextract -b command_file_name
DESCRIPTION
QEXTRACT extracts item definitions and logical flow information
from CASES version 4.3 Q-language instrument and layout files; it
then writes the parsed information to an
IDL
file, written in the Instrument Documentation Language.
Commands for QEXTRACT are placed in a batch command file, the
name of which is supplied to the program after the ‘-b’ flag.
Most commands are optional, but some commands must be used if the
CASES instrument was translated in certain ways.
Examples of command files
are given below. Simplified instructions are also provided in
the summary document on
Instrument Documentation Procedures.
CONTENTS OF THIS DOCUMENT
OVERVIEW
QEXTRACT reads the CASES Q-language files and the layout file
produced by the CASES ‘layout’ program, and it produces an IDL
file. The IDL file can then be processed by the
XCODEBK
program, to create an Instrument Document (IDOC).
A file of commands must be prepared for the QEXTRACT program; the
name of this file is given on the command line after the ’-b’
flag. This document explains how to prepare such a file.
Every time QEXTRACT is run, a message is appended to the file
named ‘QEXTRACT.MSG’. If warnings or error messages are
generated by the program, they are put in that file, and a
message to that effect appears on the user’s screen.
QEXTRACT obtains the information about each variable or item from
a CASES Q-language instrument designed for computer-assisted
interviewing, data entry, or coding. It is important to
understand that some of the key elements used in IDL to describe
an item (such as a descriptive label and the designation of
certain codes as invalid or missing data) are not necessarily
included in a Q-language instrument. QEXTRACT will use whatever
information it finds in the Q-language files, but some
specifications may have to be added.
These additional specifications for items can either be added to
the IDL file produced by QEXTRACT, or they can be included in the
Q-language instrument itself in a way that will not affect the
execution of the CASES programs. The advantage of including
these additional specifications in the Q-language instrument
itself is that it consolidates in one place all of the relevant
specifications for an item; instructions for doing this are given
below under the heading
Optional Element Tags
in Q-language Instruments.
DESCRIBING THE CASES INSTRUMENT FILES
The descriptive information about each item comes from the CASES
Q-language instrument files. Those files must be available to
QEXTRACT. Furthermore, QEXTRACT must know if certain options
were used in translating the instrument with the CASES ‘qt’
program.
Macros Used in Instruments (macros= yes)
All instrument files to be processed usually have names ending
with ‘.q’. However, if the instrument was translated with the
‘-m’ option for the CASES ‘qt’ program, all the macro-expanded
instrument files will have names ending with ‘.m’. If such is
the case, the ‘macros=yes’ option must be specified for QEXTRACT;
otherwise the program will only look for the unexpanded ‘.q’
files.
No Case Distinction in Item Names (nocase= yes)
If the instrument was translated with the ‘case insensitive’
option specified in the CASES ‘STUDYDEF’ file, there is no
distinction between upper and lower case item names. For
example, a command to ‘goto itemx’ means the same as ‘goto
ITEMX’. The QEXTRACT program needs to know whether or not the
instrument was designed to operate in case-insensitive mode, in
order to figure out the proper logical paths to and from each
item. If the instrument was translated with that CASES option,
the ‘nocase=yes’ option must also be specified for QEXTRACT.
Location of the Instrument Files (cdir= directory)
If the Q-language instrument files are not in the directory in
which QEXTRACT is being run, the ‘CDir=’ option must be
specified. (See the
keywords
for command files below.)
LAYOUT FILE TO PREPARE (Layout= filename)
The layout file produced by the CASES ‘layout’ program contains
information on the location and type of each item in the
instrument, and also on the logical path from one item to the
next. It also contains the list of the Q-language files.
Note that the layout file MUST be generated by running the CASES
‘layout’ program using the ‘-qx’ flag, in order to be used as
input to QEXTRACT. Also, be sure to redirect the output of the
CASES ‘layout’ program to a file. The name of this file must be
supplied to the QEXTRACT program (unless you use the default name
‘LAYOUT’). For example:
layout -qx > LAYOUT
If you redirect the output of the CASES ‘layout’ program to a
file named anything other than ‘LAYOUT’, use the QEXTRACT
‘layout=’ command, to indicate to the program what the name of
the file is. (See the
keywords
for command files below.)
FILES PRODUCED BY QEXTRACT
IDL File (Output= filename)
The main output of the program is the IDL file, which contains
the information necessary to document the instrument. The
default name for this output file is ‘IDOC.IDL’, but another name
can be specified as an option using the ‘output=’ specification.
(See the
keywords
for command files below.) If the named file already exists, it
will be overwritten.
Diagnostic Messages
Diagnostic and error messages are saved in a file named
‘QEXTRACT.MSG’. That file should always be viewed after running
QEXTRACT. Note that diagnostic messages are appended to that
file, so it can contain the record of many runs. Delete that
file when you wish.
Inventory File
QEXTRACT will save a list or inventory of all the items processed
during the QEXTRACT job. This list is written onto the file
‘QEXTRACT.IN1’ with one item name per line, in the order that the
items are found in the Q-language file(s).
List of Orphan Items
QEXTRACT figures out the path to each item in the instrument.
Items with no direct path to them are listed in the file
‘QEXTRACT.ORF’. For each item, the Q-language file in which it
is defined is also given.
Some items are intended to be reached only through a deliberate
skip on the part of the interviewer, such as a skip to an item
that sets up a callback. Those items will appear in the list of
orphan items, but are not instrument problems.
In the current version of QEXTRACT some items that are reached
only via references on the same form or screen will also appear
(improperly) on the list of orphan items, even though they are
not instrument errors.
As a result, this list can contain a certain number of "false
positive" results. Nevertheless, the list should be checked
carefully, since items with no path to them can cause serious
instrument problems.
OPTIONAL ELEMENT TAGS IN Q-LANGUAGE INSTRUMENTS
Q-language instruments do not usually include all of the possible
elements available in IDL files to describe an item. As a
result, IDL files that are produced by QEXTRACT may have to be
edited, in order to provide more complete documentation for the
instrument. An alternative to editing IDL files is to insert the
additional specifications or element tags into the Q-language
file itself in a way that will not affect the execution of the
CASES programs, by using the comment mechanism.
The tags for a specific item or variable may be placed either in
the template area corresponding to that item (the part of the
template BEFORE the field marker ‘@’) or in the part of the post-
template area corresponding to that item (the part of the post-
template AFTER the field marker ‘[@’ for that item).
If elements applicable to single items are specified in the pre-
template area of a multi-item form or screen, those elements will
apply to ALL of the items in the same form, unless overridden by
another specification of that element for a particular item.
In the list that follows, a distinction is made between (1)
element tags that are relevant both for documenting instruments
and for defining data files for current statistical software and
(2) element tags currently relevant only for documenting
instruments.
1. Element tags for both instrument and data file documentation
- [##label= Label for this item]
- One-line label for the content of an item; overrides the
default behavior of QEXTRACT to construct a label out of the
first line or two of item text, if any
- [##md= Missing-data codes or ranges]
- Missing-data codes, in addition to those defined in the
instrument
- [##min= Minimum valid code]
- Minimum value to consider as a valid code, for data analysis
purposes
- [##max= Maximum valid code]
- Maximum value to consider as a valid code, for data analysis
purposes
- [##blank= Number]
- Number into which an all-blank field should be converted
- [##other= Number]
- Number into which a non-numeric field should be converted
- [##type= Variable type]
- Variable type (numeric or character); overrides information
in the LAYOUT file
- [##decimals= Decimal places]
- Number of decimal places (for numeric variables); overrides
information in the LAYOUT file
- [##dname= Dataset name]
- Name to give to this item for data analysis purposes (when
generating definitions for SAS or SPSS)
- [##[Short label ] ]
- Short label for a code value -- up to 16 chars (this
bracketed label can be located either in the template or in the
post-template, after the relevant code value in ‘<>’ and before
the next code value; this label is in addition to any plain text
for a category found in the item template.)
2. Element tags relevant only for documenting instruments
- [##universelabel= Description of item universe]
- Description of how you get to this item in the instrument
- [##flowlabel= Description of forward flow]
- Description of where you go next in the instrument
- [##analysisunit= Unit of analysis]
- Description of who or what the data from this item applies
to
- [##responseunit= Response source]
- Description of who is answering this question
- [##keywords= 1st phrase; 2nd phrase; ...]
- Keywords that will be used for creating a keyword index of
items; separate the individual key words or phrases by semi-
colons
- [##formlabel= Label for this form]
- One-line label for this multi-item form
- [##filelabel= Label for this file]
- One-line label for this Q-file or instrument module (should
be placed near the beginning of the Q-file).
- [##sectionlabel= Label for the current section]
- One-line label for this section or group of items (should be
placed near the beginning of the section).
- [##rosterlabel(xyz) = Label for roster ’xyz’]
- One-line label for this roster (can be placed anywhere in
the instrument).
- [##cyclelabel= Label for this roster cycle]
- One-line label for this cycle through a roster (should be
placed soon after the ‘[roster begin]’ command).
Element names/tags can be in upper or lower case, and they can be
abbreviated down to the first three letters (except for the short
label for a code value, which just has brackets); the equal sign
is optional. For instance, [##label=myvariable] can be
abbreviated as [##lab myvariable]. If descriptions extend over
more than one line, repeat the element tag at the beginning of
each subsequent line.
NOTE ON VARIABLE TYPES
There are three major types of variables that can be specified in
a CASES instrument: integer, float, and character. QEXTRACT
converts both the integer and the float types in CASES into the
numeric type in IDL. If there are implied decimal places in a
number, that information is contained in the layout file. For
example, if the layout file specifies that a variable is of type
float, and has a width of 4.2, QEXTRACT will translate that into
an IDL specification for a variable of numeric type, with a width
of 4, and with 2 implied decimal places.
Character variables in CASES are specified as character type also
in IDL. Note, however, that you may not really want a variable
defined by CASES as a character variable to be treated as such
for purposes of analysis. CASES will consider an item to be of
character type if it has any non-numeric precodes that are not
designated as ‘missing’, even if those non-numeric precodes were
never actually used. If such items are really intended to be
interpreted as numeric variables for purposes of analysis, the
type of that item can be changed to numeric by including
‘[##type=numeric]’ in the Q-language instrument for that item,
before running QEXTRACT.
QEXTRACT does not currently attempt to create category labels for
character variables; for numeric variables it generates labels
for the numeric precodes, but it ignores any non-numeric
precodes.
The ‘no data’ type of variable in CASES is used primarily for
informational screens. Such variables do not have data or data
locations. However, they are of interest when generating an
IDOC. QEXTRACT will treat such items as input items but will not
assign them any record and column specifications in the IDL file.
LIMITS
The maximum number of items that can be processed at one time
(listed in the LAYOUT file) depends on available memory. On PCs,
the processing of more than a few thousand items may result in
excessive swapping of memory to disk. In such cases, it is
helpful to close other open applications, to preserve memory for
running QEXTRACT. If the LAYOUT file is still too large, it may
be necessary to run the job on a computer with more memory.
KEYWORDS FOR COMMAND FILES
The command file contains the optional specifications for the
run. These specifications are given in the form "keyword =
something." Keywords may be given in any order, in upper or
lower case, one to a line. The valid keywords are as follows,
with significant characters shown in capital letters:
Keyword Possible Specification Default (if no keyword)
---------------------------------------------------------------------
Title= Title of the study REQUIRED
Layout= Name of layout file Use ‘LAYOUT’
MACros= Yes Use ’.q’ instrument
(assumes that the instrument files
was translated with the
‘-m’ option, to produce
files with ‘.m’ suffix)
NOCase= Yes Upper/lower case in item
(assumes that the instrument names is significant
was translated with
‘case insensitive’ specified
in the STUDYDEF file)
Output= Name of file into which IDOC.IDL
the IDL will be written
CDir= Name of directory containing Current directory
the CASES instrument files
(either .q or .m files,
whichever are being used,
depending on the ‘macros=’
option.)
Abbreviations
Keywords can be abbreviated down to the number of characters
required to differentiate them from other keywords. Sometimes
only one character is required. The keyword for the layout file,
for instance, can be given as "layout=" or "lay=" or even "l=".
Either upper or lower case may be used.
Comments
Anything on a line beginning with ‘#’ is ignored by the command
processor and can therefore be used for comments. Blank lines
are also ignored.
EXAMPLES OF COMMAND FILES
1. Basic commands, using mostly defaults
Title = Fish and Hunt Survey
output = fish.idl
2. Using all the optional keywords
Title = Survey of Program Participation
output = sipp.idl
layout = qxlayout
macros = yes
nocase = yes
cdir = c:\sipp\e-inst
CSM, UC Berkeley
April 12, 2011