亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關(guān)于我們
? 蟲蟲下載站

?? clustalw.ms

?? 經(jīng)典生物信息學(xué)多序列比對工具clustalw
?? MS
?? 第 1 頁 / 共 3 頁
字號:
This is just an ASCII text version of the manuscript describingClustal W, without the figures.  It was published:Nucleic Acids Research, 22(22):4673-4680.CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice.Julie D. Thompson, Desmond G. Higgins1 and Toby J. Gibson*European Molecular Biology LaboratoryPostfach 102209Meyerhofstrasse 1D-69012 HeidelbergGermanyPhone:		+49-6221-387398Fax:		+49-6221-387306E-mail:		Gibson@EMBL-Heidelberg.DE		Des.Higgins@EBI.AC.UK		Thompson@EMBL-Heidelberg.DEKeywords:	Multiple alignment, phylogenetic tree, weight matrix, gap		penalty, dynamic programming, sequence weighting.1 Current address: European Bioinformatics InstituteHinxton HallHinxtonCambridge CB10 1RQUK.* To whom correspondence should be addressedABSTRACTThe sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences.   Firstly, individual weights are assigned to each sequence in a partial alignment in order to downweight near-duplicate sequences and upweight the most divergent ones.   Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned.    Thirdly, residue specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure.   Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions.  These modifications are incorporated into a new program, CLUSTAL W which is freely available.  INTRODUCTIONThe simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology.  Multiple alignments are used to find diagnostic patterns to characterise protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis.   The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignment is, therefore, of major importance.   The majority of automatic multiple alignments are now carried out using the "progressive" approach of Feng and Doolittle (1).   In this paper, we describe a number of improvements to the progressive multiple alignment method which greatly improve the sensitivity without sacrificing any of the speed and efficiency which makes this approach so practical.  The new methods are made available in a program called CLUSTAL W which is freely available and portable to a wide variety of computers and operating systems.In order to align just two sequences, it is standard practice to use dynamic programming (2).  This guarantees a mathematically optimal alignment, given a table of scores for matches and mismatches between all amino acids or nucleotides (e.g. the PAM250 matrix (3) or BLOSUM62 matrix (4)) and penalties for insertions or deletions of different lengths.   Attempts at generalising dynamic programming to multiple alignments are limited to small numbers of short sequences (5).  For much more than eight or so proteins of average length, the problem is uncomputable given current computer power.  Therefore, all of the methods capable of handling larger problems in practical timescales, make use of heuristics.    Currently, the most widely used approach is to exploit the fact that homologous sequences are evolutionarily related.  One can build up a multiple alignment progressively by a series of pairwise alignments, following the branching order in a phylogenetic tree (1).  One first aligns the most closely related sequences, gradually adding in the more distant ones.   This approach is sufficiently fast to allow alignments of virtually any size.   Further, in simple cases, the quality of the alignments is excellent, as judged by the ability to correctly align corresponding domains from sequences of known secondary or tertiary structure (6).  In more difficult cases, the alignments give good starting points for further automatic or manual refinement.This approach works well when the data set consists of sequences of different degrees of divergence.   Pairwise alignment of very closely related sequences can be carried out very accurately.   The correct answer may often be obtained using a wide range of parameter values (gap penalties and weight matrix).  By the time the most distantly related sequences are aligned, one already has a sample of aligned sequences which gives important information about the variability at each position.   The positions of the gaps that were introduced during the early alignments of the closely related sequences are not changed as new sequences are added.   This is justified because the placement of gaps in alignments between closely related sequences is much more accurate than between distantly related ones.   When all of the sequences are highly divergent (e.g. less than approximately 25-30% identity between any pair of sequences), this progressive approach becomes much less reliable.There are two major problems with the progressive approach:  the local minimum problem and the choice of alignment parameters.   The local minimum problem stems from the "greedy" nature of the alignment strategy.  The algorithm greedily adds sequences together, following the initial tree.  There is no guarantee that the global optimal solution, as defined by some overall measure of multiple alignment quality (7,8), or anything close to it, will be found.   More specifically, any mistakes (misaligned regions) made early in the alignment process cannot be corrected later as new information from other sequences is added.   This problem is frequently thought of as mainly resulting from an incorrect branching order in the initial tree.  The initial trees are derived from a matrix of distances between separately aligned pairs of sequences and are much less reliable than trees from complete multiple alignments.   In our experience, however, the real problem is caused simply by errors in the initial alignments.  Even if the topology of the guide tree is correct, each alignment step in the multiple alignment process may have some percentage of the residues misaligned.   This percentage will be very low on average for very closely related sequences but will increase as sequences diverge.   It is these misalignments which carry through from the early alignment steps that cause the local minimum problem.   The only way to correct this is to use an iterative or stochastic sampling procedure (e.g. 7,9,10).   We do not directly address this problem in this paper.The alignment parameter choice problem is, in our view, at least as serious as the local minimum problem.   Stochastic or iterative algorithms will be just as badly affected as progressive ones if the parameters are inappropriate: they will arrive at a false global minimum.  Traditionally, one chooses one weight matrix and two gap penalties (one for opening a new gap and one for extending an existing gap) and hope that these will work well over all parts of all the sequences in the data set.   When the sequences are all closely related, this works.  The first reason is that virtually all residue weight matrices give most weight to identities.   When identities dominate an alignment, almost any weight matrix will find approximately the correct solution.   With very divergent sequences, however, the scores given to non-identical residues will become critically important; there will be more mismatches than identities.   Different weight matrices will be optimal at different evolutionary distances or for different classes of proteins.  The second reason is that the range of gap penalty values that will find the correct or best possible solution can be very broad for highly similar sequences (11).   As more and more divergent sequences are used, however, the exact values of the gap penalties become important for success.   In each case, there may be a very narrow range of values which will deliver the best alignment.  Further, in protein alignments, gaps do not occur randomly (i.e. with equal probability at all positions).  They occur far more often between the major secondary structural elements of alpha helices and beta strands than within (12).The major improvements described in this paper attempt to address the alignment parameter choice problem.   We dynamically vary the gap penalties in a position and residue specific manner. The observed relative frequencies of gaps adjacent to each of the 20 amino acids (12) are used to locally adjust the gap opening penalty after each residue.   Short stretches of hydrophilic residues (e.g. 5 or more) usually indicate loop or random coil regions and the gap opening penalties are locally reduced in these stretches.   In addition, the locations of the gaps found in the early alignments are also given reduced gap opening penalties.  It has been observed in alignments between sequences of known structure that gaps tend not to be closer than roughly eight residues on average (12).   We increase the gap opening penalty within eight residues of exising gaps.   The two main series of amino acid weight matrices that are used today are the PAM series (3) and the BLOSUM series (4).   In each case, there is a range of matrices to choose from.  Some matrices are appropriate for aligning very closely related sequences where most weight by far is given to identities, with only the most frequent conservative substitutions receiving high scores.  Other matrices work better at greater evolutionary distances where less importance is attached to identities (13).  We choose different weight matrices, as the alignment proceeds, depending on the estimated divergence of the sequences to be aligned at each stage.  Sequences are weighted to correct for unequal sampling across all evolutionary distances in the data set (14).   This downweights sequences that are very similar to other sequences in the data set and upweights the most divergent ones.  The weights are calculated directly from the branch lengths in the initial guide tree (15).   Sequence weighting has already been shown to be effective in improving the sensitivity of profile searches (15,16).  In the original CLUSTAL programs (17-19), the initial guide trees, used to guide the multiple alignment, were calculated using the UPGMA method (20).  We now use the Neighbour-Joining method (21) which is more robust against the effects of unequal evolutionary rates in different lineages and which gives better estimates of individual branch lengths.  This is useful because it is these branch lengths which are used to derive the sequence weights.  We also allow users to choose between fast approximate alignments (22) or full dynamic programming for the distance calculations used to make the guide tree. The new improvements dramatically improve the sensitivity of the progressive alignment method for difficult alignments involving highly diverged sequences.  We show one very demanding test case of over 60 SH3 domains (23) which includes sequence pairs with as little as 12% identity and where there is only one exactly conserved residue across all of the sequences.   Using default parameters, we can achieve an alignment that is almost exactly correct, according to available structural information (24).   Using the program in a wide variety of situations, we find that it will normally find the correct alignment, in all but the most difficult and pathological of cases.  MATERIAL AND METHODSThe basic alignment methodThe basic multiple alignment algorithm consists of three main stages: 1) all pairs of sequences are aligned separately in order to calculate a distance matrix giving the divergence of each pair of sequences; 2) a guide tree is calculated from the distance matrix; 3) the sequences are progressively aligned according to the branching order in the guide tree.   An example using 7 globin sequences of known tertiary structure (25) is given in figure 1.1) The distance matrix/pairwise alignmentsIn the original CLUSTAL programs, the pairwise distances were calculated using a fast approximate method (22).   This allows very large numbers of sequences to be aligned, even on a microcomputer.   The scores are calculated as the number of k-tuple matches (runs of identical residues, typically 1 or 2 long for proteins or 2 to 4 long for nucleotide sequences) in the best alignment between two sequences minus a fixed penalty for every gap.   We now offer a choice between this method and the slower but more accurate scores from full dynamic programming alignments using two gap penalties (for opening or extending gaps) and a full amino acid weight matrix.   These scores are calculated as the number of identities in the best alignment divided by the number of residues compared (gap positions are excluded).   Both of these scores are initially calculated as percent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site.   We do not correct for multiple substitutions in these initial distances.   In figure 1 we give the 7x7 distance matrix between the 7 globin sequences calculated using the full dynamic programming method.2) The guide treeThe trees used to guide the final multiple alignment process are calculated from the distance matrix of step 1 using the Neighbour-Joining method (21).   This produces unrooted trees with branch lengths proportional to estimated divergence along each branch.   The root is placed by a "mid-point" method (15) at a position where the means of the branch lengths on either side of the root are equal.   These trees are also used to derive a weight for each sequence (15).   The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.   In the example in figure 1, the leghaemoglobin (Lgb2_Luplu) gets a weight of 0.442 which is equal to the length of the branch from the root to it.  The Human beta globin (Hbb_Human) gets a weight consisting of the length of the branch leading to it that is not shared with any other sequences (0.081) plus half the length of the branch shared with the horse beta globin (0.226/2) plus one quarter the length of the branch shared by all four haemoglobins (0.061/4) plus one fifth the branch shared between the haemoglobins and the myoglobin (0.015/5) plus one sixth the branch leading to all the vertebrate globins (0.062).  This sums to a total of 0.221.  By contrast, in the normal progressive alignment algorithm, all sequences would be equally weighted.  The rooted tree with branch lengths and sequence weights for the 7 globins is given in figure 1.  

?? 快捷鍵說明

復(fù)制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
狠狠色丁香久久婷婷综| 欧美日韩精品免费观看视频| 久久久久久久久99精品| 国产suv精品一区二区883| 国产欧美日韩综合| 91激情在线视频| 国产精品麻豆视频| av在线免费不卡| 久久激五月天综合精品| 亚洲国产精品精华液2区45| 在线观看日韩高清av| 久久精品国产秦先生| 亚洲综合清纯丝袜自拍| 久久中文娱乐网| 欧美三级视频在线| 国产精品996| 男男视频亚洲欧美| 欧美高清视频在线高清观看mv色露露十八| 日韩av成人高清| 亚洲国产视频一区| 亚洲女与黑人做爰| 26uuu国产一区二区三区| 欧美性猛片xxxx免费看久爱| 国产suv一区二区三区88区| 免费一级欧美片在线观看| 一区二区三区在线视频免费| 久久久久久久综合日本| 欧美一级理论性理论a| 欧美在线999| 欧洲av一区二区嗯嗯嗯啊| 97精品久久久午夜一区二区三区 | 亚洲一区在线观看免费| 欧美精品一区二区在线观看| 欧美手机在线视频| 91丨九色porny丨蝌蚪| 不卡一区二区三区四区| 福利一区二区在线观看| 国产成人小视频| 国产精品综合在线视频| 国产精品1区2区| 久久福利视频一区二区| 韩国av一区二区三区| 日本va欧美va欧美va精品| 精品午夜久久福利影院| 国产成人aaaa| 国内不卡的二区三区中文字幕| 精品无人码麻豆乱码1区2区| 国产jizzjizz一区二区| 不卡av在线免费观看| 欧美自拍丝袜亚洲| 欧美成人高清电影在线| 日韩毛片精品高清免费| 国产女同性恋一区二区| 国产网站一区二区三区| 国产欧美日韩另类一区| 欧美国产禁国产网站cc| 亚洲日本va午夜在线电影| 日韩国产成人精品| 毛片基地黄久久久久久天堂| 国产主播一区二区三区| 国产精品一区二区在线播放| 日本丰满少妇一区二区三区| 日韩欧美第一区| 亚洲另类春色国产| 国产91精品露脸国语对白| 欧美乱妇一区二区三区不卡视频| 国产三级一区二区| 久久不见久久见中文字幕免费| 波多野结衣91| 国产欧美中文在线| 美腿丝袜亚洲综合| 欧美日韩国产美| 亚洲影院在线观看| 99久久久无码国产精品| 久久精品男人的天堂| 樱桃视频在线观看一区| 精品一区二区三区久久久| 色婷婷av一区二区| 亚洲欧美视频在线观看| 国产99久久精品| 欧美国产在线观看| 国产不卡视频一区| 国产婷婷色一区二区三区四区| 久草精品在线观看| 欧美亚洲国产一区在线观看网站 | 久久日韩精品一区二区五区| 午夜精品福利一区二区三区av| 在线电影一区二区三区| 麻豆一区二区三区| 国产午夜精品一区二区三区视频 | 在线观看91视频| 天堂久久久久va久久久久| 久久久影院官网| 91久久精品日日躁夜夜躁欧美| 五月天亚洲精品| 久久久不卡网国产精品一区| 91麻豆国产福利在线观看| 午夜一区二区三区视频| 日本一区二区在线不卡| 欧美网站大全在线观看| 国产成人免费xxxxxxxx| 亚洲综合久久久| 国产精品久久久久久久久动漫 | 91在线观看高清| 极品瑜伽女神91| 亚洲高清免费在线| 国产精品色哟哟| 亚洲精品一线二线三线无人区| 欧美自拍丝袜亚洲| 成人污污视频在线观看| 免费成人在线观看视频| 亚洲精品国产品国语在线app| 久久久国产精华| 日韩女优电影在线观看| 欧美一区二区三区影视| 欧美中文字幕一区| 在线亚洲免费视频| 色噜噜偷拍精品综合在线| 成人午夜视频网站| 色悠悠久久综合| 久草热8精品视频在线观看| 亚洲电影在线播放| 天使萌一区二区三区免费观看| 亚洲一区二区三区免费视频| 亚洲靠逼com| 亚洲乱码国产乱码精品精98午夜| 国产女人18水真多18精品一级做| 久久夜色精品国产欧美乱极品| 色综合久久久久久久| 99精品视频一区二区| av中文一区二区三区| 91丨九色丨尤物| 欧美少妇bbb| 7777精品伊人久久久大香线蕉超级流畅| 色伊人久久综合中文字幕| 欧美午夜精品一区| 欧美一级二级三级蜜桃| 久久嫩草精品久久久精品一| 久久久影视传媒| 亚洲色图欧美在线| 日韩av在线免费观看不卡| 亚洲一区二区三区精品在线| 看电视剧不卡顿的网站| 国产永久精品大片wwwapp| a在线播放不卡| 欧美日韩一区二区三区四区五区 | 久久精品99国产精品日本| 国产激情一区二区三区| 在线视频你懂得一区| 日韩欧美国产一区二区三区| 国产精品久久久久久久久免费丝袜 | 久久综合av免费| 亚洲综合在线五月| 国产精品亚洲一区二区三区在线| 99riav久久精品riav| 91精品国产一区二区三区| 亚洲天堂网中文字| 激情小说亚洲一区| 欧美亚洲国产怡红院影院| 国产偷国产偷精品高清尤物| 日韩精品亚洲一区二区三区免费| 波多野结衣欧美| 久久免费视频色| 九九九久久久精品| 欧美高清一级片在线| 一区二区在线看| 色综合咪咪久久| 欧美激情一区三区| 韩国v欧美v亚洲v日本v| 日韩亚洲电影在线| 青娱乐精品视频在线| 在线成人免费观看| 婷婷久久综合九色综合绿巨人| 91视视频在线观看入口直接观看www | 久久久久久免费网| 韩国欧美国产一区| 欧美精品一区二区三区蜜桃| 风流少妇一区二区| 久久久久成人黄色影片| 激情综合一区二区三区| 久久亚洲精品小早川怜子| 久久er99精品| 国产日产欧美精品一区二区三区| 韩国中文字幕2020精品| 国产欧美一区二区精品性色| 国产成人精品亚洲777人妖| 国产欧美日本一区视频| 成人短视频下载| 免费不卡在线视频| 91精品国产综合久久久久久| 久久国产成人午夜av影院| 国产三级欧美三级日产三级99| 成人小视频免费观看| 一区二区三区中文字幕精品精品| 91久久免费观看| 久久成人免费日本黄色| 中文字幕在线一区免费| 欧美日韩中文另类| 精品无人区卡一卡二卡三乱码免费卡| 欧美国产精品久久|