?? clustalx.hlp
字號:
This is the on-line help file for Clustal X (version 1.81), using the NCBI
Vibrant Toolkit.
It should be named or defined as: clustalx_help
except with MSDOS in which case it should be named ClustalX.HLP
For full details of usage and algorithms, please read the CLUSTALW.DOC file.
Toby Gibson EMBL, Heidelberg, Germany.
Des Higgins UCC, Cork, Ireland.
Julie Thompson/Francois Jeanmougin IGBMC, Strasbourg, France.
>>HELP G <<
General help for CLUSTAL X (1.8)
Clustal X is a windows interface for the ClustalW multiple sequence alignment
program. It provides an integrated environment for performing multiple sequence
and profile alignments and analysing the results. The sequence alignment is
displayed in a window on the screen. A versatile coloring scheme has been
incorporated allowing you to highlight conserved features in the alignment.
The pull-down menus at the top of the window allow you to select all the
options required for traditional multiple sequence and profile alignment.
You can cut-and-paste sequences to change the order of the alignment; you can
select a subset of sequences to be aligned; you can select a sub-range of the
alignment to be realigned and inserted back into the original alignment.
Alignment quality analysis can be performed and low-scoring segments or
exceptional residues can be highlighted.
ClustalX is available for a number of different platforms including: SUN
Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECStations, Microsoft
Windows (32 bit) for PC's, Linux ELF for x86 PC's and Macintosh PowerMac. (See
the README file for Installation instructions.)
<H4>
SEQUENCE INPUT
</H4>
Sequences and profiles (a term for pre-existing alignments) are input using
the FILE menu. Invalid options will be disabled. All sequences must be included
into 1 file. 7 formats are automatically recognised: NBRF/PIR, EMBL/SWISSPROT,
Pearson (Fasta), Clustal (*.aln), GCG/MSF (Pileup), GCG9 RSF and GDE flat file.
All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in MSF/RSF).
<H4>
SEQUENCE / PROFILE ALIGNMENTS
</H4>
Clustal X has two modes which can be selected using the switch directly above
the sequence display: MULTIPLE ALIGNMENT MODE and PROFILE ALIGNMENT MODE.
To do a MULTIPLE ALIGNMENT on a set of sequences, make sure MULTIPLE ALIGNMENT
MODE is selected. A single sequence data area is then displayed. The ALIGNMENT
menu then allows you to either produce a guide tree for the alignment, or to do
a multiple alignment following the guide tree, or to do a full multiple
alignment.
In PROFILE ALIGNMENT MODE, two sequence data areas are displayed, allowing you
to align 2 alignments (termed profiles). Profiles are also used to add a new
sequence to an old alignment, or to use secondary structure to guide the
alignment process. GAPS in the old alignments are indicated using the "-"
character. PROFILES can be input in ANY of the allowed formats; just use "-"
(or "." for MSF/RSF) for each gap position. In Profile Alignment Mode, a button
"Lock Scroll" is displayed which allows you to scroll the two profiles together
using a single scroll bar. When the Lock Scroll is turned off, the two profiles
can be scrolled independently.
<H4>
PHYLOGENETIC TREES
</H4>
Phylogenetic trees can be calculated from old alignments (read in with "-"
characters to indicate gaps) OR after a multiple alignment while the alignment
is still displayed.
<H4>
ALIGNMENT DISPLAY
</H4>
The alignment is displayed on the screen with the sequence names on the left
hand side. The sequence alignment is for display only, it cannot be edited here
(except for changing the sequence order by cutting-and-pasting on the sequence
names).
A ruler is displayed below the sequences, starting at 1 for the first residue
position (residue numbers in the sequence input file are ignored).
A line above the alignment is used to mark strongly conserved positions. Three
characters ('*', ':' and '.') are used:
'*' indicates positions which have a single, fully conserved residue
':' indicates that one of the following 'strong' groups is fully conserved:-
<PRE>
STA
NEQK
NHQK
NDEQ
QHRK
MILV
MILF
HY
FYW
</PRE>
'.' indicates that one of the following 'weaker' groups is fully conserved:-
<PRE>
CSA
ATV
SAG
STNK
STPA
SGND
SNDEQK
NDEQHK
NEQHRK
FVLIM
HFY
</PRE>
These are all the positively scoring groups that occur in the Gonnet Pam250
matrix. The strong and weak groups are defined as strong score >0.5 and weak
score =<0.5 respectively.
For profile alignments, secondary structure and gap penalty masks are displayed
above the sequences, if any data is found in the profile input file.
>>HELP F <<
Input / Output Files
LOAD SEQUENCES reads sequences from one of 7 file formats, replacing any
sequences that are already loaded. All sequences must be in 1 file. The formats
that are automatically recognised are: NBRF/PIR, EMBL/SWISSPROT, Pearson
(Fasta), Clustal (*.aln), GCG/MSF (Pileup), GCG9/RSF and GDE flat file. All
non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in MSF/RSF).
The program tries to automatically recognise the different file formats used
and to guess whether the sequences are amino acid or nucleotide. This is not
always foolproof.
FASTA and NBRF/PIR formats are recognised by having a ">" as the first
character in the file.
EMBL/Swiss Prot formats are recognised by the letters "ID" at the start of the
file (the token for the entry name field).
CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.
GCG/MSF format is recognised by one of the following:
<UL>
<LI>
- the word PileUp at the start of the file.
</LI><LI>
- the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
at the start of the file.
</LI><LI>
- the word MSF on the first line of the file, and the characters ..
at the end of this line.
</LI>
</UL>
GCG/RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of
the file.
If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the
sequence will be assumed to be nucleotide. This works in 97.3% of cases but
watch out!
APPEND SEQUENCES is only valid in MULTIPLE ALIGNMENT MODE. The input sequences
do not replace those already loaded, but are appended at the end of the
alignment.
SAVE SEQUENCES AS... offers the user a choice of one of six output formats:
CLUSTAL, NBRF/PIR, GCG/MSF, PHYLIP, NEXUS or GDE. All sequences are written
to a single file. Options are available to save a range of the alignment,
switch between UPPER/LOWER case for GDE files, and to output SEQUENCE NUMBERING
for CLUSTAL files.
LOAD PROFILE 1 reads sequences in the same 7 file formats, replacing any
sequences already loaded as Profile 1. This option will also remove any
sequences which are loaded in Profile 2.
LOAD PROFILE 2 reads sequences in the same 7 file formats, replacing any
sequences already loaded as Profile 2.
SAVE PROFILE 1 AS... is similar to the Save Sequences option except that only
those sequences in Profile 1 will be written to the output file.
SAVE PROFILE 2 AS... is similar to the Save Sequences option except that only
those sequences in Profile 2 will be written to the output file.
WRITE ALIGNMENT AS POSTSCRIPT will write the sequence display to a postscript
format file. This will include any secondary structure / gap penalty mask
information and the consensus and ruler lines which are displayed on the
screen. The Alignment Quality curve can be optionally included in the output
file.
WRITE PROFILE 1 AS POSTSCRIPT is similar to WRITE ALIGNMENT AS POSTSCRIPT
except that only the profile 1 display will be printed.
WRITE PROFILE 2 AS POSTSCRIPT is similar to WRITE ALIGNMENT AS POSTSCRIPT
except that only the profile 2 display will be printed.
<H4>
POSTSCRIPT PARAMETERS
</H4>
A number of options are available to allow you to configure your postscript
output file.
PS COLORS FILE:
The exact RGB values required to reproduce the colors used in the alignment
window will vary from printer to printer. A PS colors file can be specified
that contains the RGB values for all the colors required by each of your
postscript printers.
By default, Clustal X looks for a file called 'colprint.par' in the current
directory (if your running under UNIX, it then looks in your home directory,
and finally in the directories in your PATH environment variable). If no PS
colors file is found or a color used on the screen is not defined here, the
screen RGB values (from the Color Parameter File) are used.
The PS colors file consists of one line for each color to be defined, with the
color name followed by the RGB values (on a scale of 0 to 1). For example,
RED 0.9 0.1 0.1
Blank lines and comments (lines beginning with a '#' character) are ignored.
PAGE SIZE: The alignment can be displayed on either A4, A3 or US Letter size
pages.
ORIENTATION: The alignment can be displayed on either a landscape or portrait
page.
PRINT HEADER: An optional header including the postscript filename, and
creation date can be printed at the top of each page.
PRINT QUALITY CURVE: The Alignment Quality curve which is displayed underneath
the alignment on the screen can be included in the postscript output.
PRINT RULER: The ruler which is displayed underneath the alignment on the
screen can be included in the postscript output.
PRINT RESIDUE NUMBERS: Sequence residue numbers can be printed at the right
hand side of the alignment.
RESIZE TO FIT PAGE: By default, the alignment is scaled to fit the page size
selected. This option can be turned off, in which case a font size of 10 will
be used for the sequences.
PRINT FROM POSITION/TO: A range of the alignment can be printed. The default
is to print the full alignment. The first and last residues to be printed are
specified here.
USE BLOCK LENGTH: The alignment can be divided into blocks of residues. The
number of residues in a block is specified here. More than one block may then
be printed on a single page. This is useful for long alignments of a small
number of sequences. If the block length is set to 0, The alignment will not
be divided into blocks, but printed across a number of pages.
>>HELP E <<
Editing Alignments
Clustal X allows you to change the order of the sequences in the alignment, by
cutting-and-pasting the sequence names.
To select a group of sequences to be moved, click on a sequence name and drag
the cursor until all the required sequences are highlighted. Holding down the
Shift key when clicking on the first name will add new sequences to those
already selected.
(Options are provided to Select All Sequences, Select Profile 1 or Select
Profile 2.)
The selected sequences can be removed from the alignment by using the EDIT
menu, CUT option.
To add the cut sequences back into an alignment, select a sequence by clicking
on the sequence name. The cut sequences will be added to the alignment,
immediately following the selected sequence, by the EDIT menu, PASTE option.
To add the cut sequences to an empty alignment (eg. when cutting sequences from
Profile 1 and pasting them to Profile 2), click on the empty sequence name
display area, and select the EDIT menu, PASTE option as before.
The sequence selection and sequence range selection can be cleared using the
EDIT menu, CLEAR SEQUENCE SELECTION and CLEAR RANGE SELECTION options
respectively.
To search for a string of residues in the sequences, select the sequences to be
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -