The InDeVal insertion/deletion evaluation tool: a program for finding target regions in DNA sequences and for aiding in sequence comparison

Background The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files designed for this purpose. More broadly, the program is applicable for finding specific target regions (referred to as "variable regions") in DNA sequence. A variable region is any specific sequence fragment of interest, such as an indel region, a codon or codons, or sequence coding for a particular RNA secondary structure. Results InDeVal input is DNA sequence and a template file (sequence flanking each variable region). Additional files contain the variable regions and user-defined messages about the sequence found within them (e.g., taxa sharing each of the different indel patterns). Variable regions are found by determining the position of flanking sequence (referred to as "conserved regions") using the LPAM (Length-Preserving Alignment Method) algorithm. This algorithm was designed for InDeVal and is described here for the first time. InDeVal output is an interactive display of the analyzed sequence, broken into user-defined units. Once the user is satisfied with the organization of the display, the information can be exported to an annotated text file. Conclusions InDeVal can find multiple variable regions simultaneously (28 indel regions in the Poaceae trnL-F files) and display user-selected messages specific to the sequence variants found. InDeVal output is designed to facilitate comparison between the analyzed sequence and previously evaluated sequence. The program's sensitivity to different levels of nucleotide and/or length variation in conserved regions can be adjusted. InDeVal is currently available for Windows in Additional file 1 or from .

Template name: > is required; in provided file, Standard refers to a template without major deletions.
Conserved region: Numbers do not affect the program, but can be included by the user for bookkeeping purposes.
Variable regions: Must be given as exact file names in quotation marks; need not be separated by a conserved region.
Template name (Second template): In provided file, name refers to taxon for which this template's major deletion is specific; location of deletion given in parenthesis (not required by program).

Template name (Third template):
In provided file, quotation marks indicate that most but not all sequenced members of this taxon contain this template's major deletion.
Position and name of this template's major deletion: Square brackets are necessary to indicate that this is deletion information and not sequence. Numbers: Ignored by the program and used optionally to aid user orientation in the file; in provided file, the first figure numbers conserved regions consecutively, the second indicates the number of bases in a given conserved region.
Variable region path specification: Relative to directory containing conserved region file.
Optional user bookkeeping information: # signals that the program should ignore this line.
A conserved region file is organized into one or more templates (15 in the Poaceae trnL-F file, TemplatePtrnLF). The figure shows 3 annotated partial templates from TemplatePtrnLF.
Each template begins with a name, written on a single line beginning with a greater than sign (>), and ends with the name of the next template. TemplatePtrnLF naming conventions, described in the annotations, are not required by InDeVal. Subsequent lines contain conserved region sequence and variable region file names. Characters between quotation marks (") are interpreted as variable region file names; all other characters are interpreted as conserved region sequence. The sequence can contain both upper and lower case letters. Characters not corresponding to a base are ignored except that information enclosed in brackets ([]) is displayed in the Sequence Analysis Window. Lines prefaced by a pound sign (#) are ignored completely and can be used to preface comments, which can aid orientation within the file.
This figure shows a hypothetical Poaceae trnL-F variable region file. A variable region file consists of a set of distinct sequence variations (highlighted in yellow in the figure), each followed by specific information (highlighted in blue), which is bracketed by the symbols > and <. The information (names of taxa in the Poaceae trnL-F files) will be displayed in the Taxa Boxes. Sequence can include spaces, numbers, and symbols (except >, <, and #) to highlight features of interest. These additional characters will not affect InDeVal analysis, but will be displayed in the Variable Region Sequence List Box.
The Poaceae trnL-F variable region file symbol conventions, described in the figure annotations and InDeVal help files, are not required by InDeVal. Higher-level taxa were assigned according to the classification in the NCBI Entrez Nucleotides database. Taxa below the species level were not used, because they were not uniformly applied within the database.
The only hybrid taxon in the files is Miscanthus ×giganteus J. M. Greef & M. Deuter ex Hodkinson & Renvoize, because it was the only one with a unique indel pattern. This figure shows a hypothetical InDeVal output file. InDeVal can write the analysis from the Sequence Analysis Window to a text file. The text file includes information about the analysis, the analyzed sequence, and an annotation. The annotation shows which bases from the analyzed sequence were assigned to each of the separate regions. The user can decide whether or not to show information from the variable region files. This is not recommended with the Poaceae trnL-F files, because there are many variable regions and the information consists of very long taxa lists, which would only clutter the file. However, for other uses, it may be crucial that this information be included in the file.