?? clustalx.hlp
字號:
Colors
Clustal X provides a versatile coloring scheme for the sequence alignment
display. The sequences (or profiles) are colored automatically, when they are
loaded. Sequences can be colored either by assigning a color to specific
residues, or on the basis of an alignment consensus. In the latter case, the
alignment consensus is calculated automatically, and the residues in each
column are colored according to the consensus character assigned to that
column. In this way, you can choose to highlight, for example, conserved
hydrophylic or hydrophobic positions in the alignment.
The 'rules' used to color the alignment are specified in a COLOR PARAMETER
FILE. Clustal X automatically looks for a file called 'colprot.par' for protein
sequences or 'coldna.par' for DNA, in the current directory. (If your running
under UNIX, it then looks in your home directory, and finally in the
directories in your PATH environment variable).
By default, if no color parameter file is found, protein sequences are colored
by residue as follows:
<PRE>
Color Residue Code
ORANGE GPST
RED HKR
BLUE FWY
GREEN ILMV
</PRE>
In the case of DNA sequences, the default colors are as follows:
<PRE>
Color Residue Code
ORANGE A
RED C
BLUE T
GREEN G
</PRE>
The default BACKGROUND COLORING option shows the sequence residues using a
black character on a colored background. It can be switched off to show
residues as a colored character on a white background.
Either BLACK AND WHITE or DEFAULT COLOR options can be selected. The Color
option looks first for the color parameter file (as described above) and, if no
file is found, uses the default residue-specific colors.
You can specify your own coloring scheme by using the LOAD COLOR PARAMETER FILE
option. The format of the color parameter file is described below.
<H4>
COLOR PARAMETER FILE
</H4>
This file is divided into 3 sections:
1) the names and rgb values of the colors
2) the rules for calculating the consensus
3) the rules for assigning colors to the residues
An example file is given here.
<PRE>
--------------------------------------------------------------------
@rgbindex
RED 0.9 0.1 0.1
BLUE 0.1 0.1 0.9
GREEN 0.1 0.9 0.1
YELLOW 0.9 0.9 0.0
@consensus
% = 60% w:l:v:i:m:a:f:c:y:h:p
# = 80% w:l:v:i:m:a:f:c:y:h:p
- = 50% e:d
+ = 60% k:r
q = 50% q:e
p = 50% p
n = 50% n
t = 50% t:s
@color
g = RED
p = YELLOW
t = GREEN if t:%:#
n = GREEN if n
w = BLUE if %:#:p
k = RED if +
--------------------------------------------------------------------
</PRE>
The first section is optional and is identified by the header @rgbindex. If
this section exists, each color used in the file must be named and the rgb
values specified (on a scale from 0 to 1). If the rgb index section is not
found, the following set of hard-coded colors will be used.
<PRE>
RED 0.9 0.1 0.1
BLUE 0.1 0.1 0.9
GREEN 0.1 0.9 0.1
ORANGE 0.9 0.7 0.3
CYAN 0.1 0.9 0.9
PINK 0.9 0.5 0.5
MAGENTA 0.9 0.1 0.9
YELLOW 0.9 0.9 0.0
</PRE>
The second section is optional and is identified by the header @consensus. It
defines how the consensus is calculated.
The format of each consensus parameter is:-
<PRE>
c = n% residue_list
where
c is a character used to identify the parameter.
n is an integer value used as the percentage cutoff
point.
residue_list is a list of residues denoted by a single
character, delimited by a colon (:).
</PRE>
For example: # = 60% w:l:v:i
will assign a consensus character # to any column in the alignment which
contains more than 60% of the residues w,l,v and i.
The third section is identified by the header @color, and defines how colors
are assigned to each residue in the alignment.
The color parameters can take one of two formats:
<PRE>
1) r = color
2) r = color if consensus_list
where
r is a character used to denote a residue.
color is one of the colors in the GDE color lookup table.
residue_list is a list of residues denoted by a single
character, delimited by a colon (:).
</PRE>
Examples:
1) g = ORANGE
will color all glycines ORANGE, regardless of the consensus.
2) w = BLUE if w:%:#
will color BLUE any tryptophan which is found in a column with a consensus of
w, % or #.
>>HELP Q <<
Alignment Quality Analysis
<H3>
QUALITY SCORES
</H3>
--------------
Clustal X provides an indication of the quality of an alignment by plotting
a 'conservation score' for each column of the alignment. A high score indicates
a well-conserved column; a low score indicates low conservation. The quality
curve is drawn below the alignment.
Two methods are also provided to indicate single residues or sequence segments
which score badly in the alignment.
Low-scoring residues are expected to occur at a moderate frequency in all the
sequences because of their steady divergence due to the natural processes of
evolution. The most divergent sequences are likely to have the most outliers.
However, the highlighted residues are especially useful in pointing to
sequence misalignments. Note that clustering of highlighted residues is a
strong indication of misalignment. This can arise due to various reasons, for
example:
1. Partial or total misalignments caused by a failure in the
alignment algorithm. Usually only in difficult alignment cases.
2. Partial or total misalignments because at least one of the
sequences in the given set is partly or completely unrelated to the
other sequences. It is up to the user to check that the set of
sequences are alignable.
3. Frameshift translation errors in a protein sequence causing local
mismatched regions to be heavily highlighted. These are surprisingly
common in database entries. If suspected, a 3-frame translation of
the source DNA needs to be examined.
Occasionally, highlighted residues may point to regions of some biological
significance. This might happen for example if a protein alignment contains a
sequence which has acquired new functions relative to the main sequence set. It
is important to exclude other explanations, such as error or the natural
divergence of sequences, before invoking a biological explanation.
<H3>
LOW-SCORING SEGMENTS
</H3>
--------------------
Unreliable regions in the alignment can be highlighted using the Low-Scoring
Segments option. A sequence-weighted profile is used to indicate any segments
in the sequences which score badly. Because the profile calculation may take
some time, an option is provided to calculate LOW-SCORING SEGMENTS. The
segment display can then be toggled on or off without having to repeat the
time-consuming calculations.
For details of the low-scoring segment calculation, see the CALCULATION section
below.
<H4>
LOW-SCORING SEGMENT PARAMETERS
</H4>
------------------------------
MINIMUM LENGTH OF SEGMENTS: short segments (or even single residues) can be
hidden by increasing the minimum length of segments which will be displayed.
DNA MARKING SCALE is used to remove less significant segments from the
highlighted display. Increase the scale to display more segments; decrease the
scale to remove the least significant.
PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of each
amino acid to each other. The matrix is used to calculate the sequence-
weighted profile scores. There are four 'in-built' Log-Odds matrices offered:
the Gonnet PAM 80, 120, 250, 350 matrices. A more stringent matrix which only
gives a high score to identities and the most favoured conservative
substitutions, may be more suitable when the sequences are closely related. For
more divergent sequences, it is appropriate to use "softer" matrices which give
a high score to many other frequent substitutions. This option automatically
recalculates the low-scoring segments.
DNA WEIGHT MATRIX: Two hard-coded matrices are available:
1) IUB. This is the default scoring matrix used by BESTFIT for the comparison
of nucleic acid sequences. X's and N's are treated as matches to any IUB
ambiguity symbol. All matches score 1.0; all mismatches for IUB symbols score
0.9.
2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score
1.0 and mismatches score 0. All matches for IUB symbols also score 0.
A new matrix can be read from a file on disk, if the filename consists only
of lower case characters. The values in the new weight matrix should be
similarities and should be NEGATIVE for infrequent substitutions.
INPUT FORMAT. The format used for a new matrix is the same as the BLAST
program. Any lines beginning with a # character are assumed to be comments. The
first non-comment line should contain a list of amino acids in any order, using
the 1 letter code, followed by a * character. This should be followed by a
square matrix of scores, with one row and one column for each amino acid. The
last row and column of the matrix (corresponding to the * character) contain
the minimum score over the whole matrix.
<H4>
QUALITY SCORE PARAMETERS
</H4>
------------------------
You can customise the column 'quality scores' plotted underneath the alignment
display using the following options.
SCORE PLOT SCALE: this is a scalar value from 1 to 10, which can be used to
change the scale of the quality score plot.
RESIDUE EXCEPTION CUTOFF: this is a scalar value from 1 to 10, which can be
used to change the number of residue exceptions which are highlighted in the
alignment display. (For an explanation of this cutoff, see the CALCULATION OF
RESIDUE EXCEPTIONS section below.)
PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of
each amino acid to each other.
DNA WEIGHT MATRIX: two hard-coded matrices are available: IUB and CLUSTALW(1.6).
For more information about the weight matrices, see the help above for
the Low-scoring Segments Weight Matrix.
For details of the quality score calculations, see the CALCULATION section
below.
<STRONG>
SHOW LOW-SCORING SEGMENTS
</STRONG>
The low-scoring segment display can be toggled on or off. This option does not
recalculate the profile scores.
<STRONG>
SHOW EXCEPTIONAL RESIDUES
</STRONG>
This option highlights individual residues which score badly in the alignment
quality calculations. Residues which score exceptionally low are highlighted by
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -