亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? clustalw.ms

?? 生物序列比對程序clustw的源代碼
?? MS
?? 第 1 頁 / 共 3 頁
字號:
This is just an ASCII text version of the manuscript describingClustal W, without the figures.  It was published:Nucleic Acids Research, 22(22):4673-4680.CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice.Julie D. Thompson, Desmond G. Higgins1 and Toby J. Gibson*European Molecular Biology LaboratoryPostfach 102209Meyerhofstrasse 1D-69012 HeidelbergGermanyPhone:		+49-6221-387398Fax:		+49-6221-387306E-mail:		Gibson@EMBL-Heidelberg.DE		Des.Higgins@EBI.AC.UK		Thompson@EMBL-Heidelberg.DEKeywords:	Multiple alignment, phylogenetic tree, weight matrix, gap		penalty, dynamic programming, sequence weighting.1 Current address: European Bioinformatics InstituteHinxton HallHinxtonCambridge CB10 1RQUK.* To whom correspondence should be addressedABSTRACTThe sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences.   Firstly, individual weights are assigned to each sequence in a partial alignment in order to downweight near-duplicate sequences and upweight the most divergent ones.   Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned.    Thirdly, residue specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure.   Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions.  These modifications are incorporated into a new program, CLUSTAL W which is freely available.  INTRODUCTIONThe simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology.  Multiple alignments are used to find diagnostic patterns to characterise protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis.   The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignment is, therefore, of major importance.   The majority of automatic multiple alignments are now carried out using the "progressive" approach of Feng and Doolittle (1).   In this paper, we describe a number of improvements to the progressive multiple alignment method which greatly improve the sensitivity without sacrificing any of the speed and efficiency which makes this approach so practical.  The new methods are made available in a program called CLUSTAL W which is freely available and portable to a wide variety of computers and operating systems.In order to align just two sequences, it is standard practice to use dynamic programming (2).  This guarantees a mathematically optimal alignment, given a table of scores for matches and mismatches between all amino acids or nucleotides (e.g. the PAM250 matrix (3) or BLOSUM62 matrix (4)) and penalties for insertions or deletions of different lengths.   Attempts at generalising dynamic programming to multiple alignments are limited to small numbers of short sequences (5).  For much more than eight or so proteins of average length, the problem is uncomputable given current computer power.  Therefore, all of the methods capable of handling larger problems in practical timescales, make use of heuristics.    Currently, the most widely used approach is to exploit the fact that homologous sequences are evolutionarily related.  One can build up a multiple alignment progressively by a series of pairwise alignments, following the branching order in a phylogenetic tree (1).  One first aligns the most closely related sequences, gradually adding in the more distant ones.   This approach is sufficiently fast to allow alignments of virtually any size.   Further, in simple cases, the quality of the alignments is excellent, as judged by the ability to correctly align corresponding domains from sequences of known secondary or tertiary structure (6).  In more difficult cases, the alignments give good starting points for further automatic or manual refinement.This approach works well when the data set consists of sequences of different degrees of divergence.   Pairwise alignment of very closely related sequences can be carried out very accurately.   The correct answer may often be obtained using a wide range of parameter values (gap penalties and weight matrix).  By the time the most distantly related sequences are aligned, one already has a sample of aligned sequences which gives important information about the variability at each position.   The positions of the gaps that were introduced during the early alignments of the closely related sequences are not changed as new sequences are added.   This is justified because the placement of gaps in alignments between closely related sequences is much more accurate than between distantly related ones.   When all of the sequences are highly divergent (e.g. less than approximately 25-30% identity between any pair of sequences), this progressive approach becomes much less reliable.There are two major problems with the progressive approach:  the local minimum problem and the choice of alignment parameters.   The local minimum problem stems from the "greedy" nature of the alignment strategy.  The algorithm greedily adds sequences together, following the initial tree.  There is no guarantee that the global optimal solution, as defined by some overall measure of multiple alignment quality (7,8), or anything close to it, will be found.   More specifically, any mistakes (misaligned regions) made early in the alignment process cannot be corrected later as new information from other sequences is added.   This problem is frequently thought of as mainly resulting from an incorrect branching order in the initial tree.  The initial trees are derived from a matrix of distances between separately aligned pairs of sequences and are much less reliable than trees from complete multiple alignments.   In our experience, however, the real problem is caused simply by errors in the initial alignments.  Even if the topology of the guide tree is correct, each alignment step in the multiple alignment process may have some percentage of the residues misaligned.   This percentage will be very low on average for very closely related sequences but will increase as sequences diverge.   It is these misalignments which carry through from the early alignment steps that cause the local minimum problem.   The only way to correct this is to use an iterative or stochastic sampling procedure (e.g. 7,9,10).   We do not directly address this problem in this paper.The alignment parameter choice problem is, in our view, at least as serious as the local minimum problem.   Stochastic or iterative algorithms will be just as badly affected as progressive ones if the parameters are inappropriate: they will arrive at a false global minimum.  Traditionally, one chooses one weight matrix and two gap penalties (one for opening a new gap and one for extending an existing gap) and hope that these will work well over all parts of all the sequences in the data set.   When the sequences are all closely related, this works.  The first reason is that virtually all residue weight matrices give most weight to identities.   When identities dominate an alignment, almost any weight matrix will find approximately the correct solution.   With very divergent sequences, however, the scores given to non-identical residues will become critically important; there will be more mismatches than identities.   Different weight matrices will be optimal at different evolutionary distances or for different classes of proteins.  The second reason is that the range of gap penalty values that will find the correct or best possible solution can be very broad for highly similar sequences (11).   As more and more divergent sequences are used, however, the exact values of the gap penalties become important for success.   In each case, there may be a very narrow range of values which will deliver the best alignment.  Further, in protein alignments, gaps do not occur randomly (i.e. with equal probability at all positions).  They occur far more often between the major secondary structural elements of alpha helices and beta strands than within (12).The major improvements described in this paper attempt to address the alignment parameter choice problem.   We dynamically vary the gap penalties in a position and residue specific manner. The observed relative frequencies of gaps adjacent to each of the 20 amino acids (12) are used to locally adjust the gap opening penalty after each residue.   Short stretches of hydrophilic residues (e.g. 5 or more) usually indicate loop or random coil regions and the gap opening penalties are locally reduced in these stretches.   In addition, the locations of the gaps found in the early alignments are also given reduced gap opening penalties.  It has been observed in alignments between sequences of known structure that gaps tend not to be closer than roughly eight residues on average (12).   We increase the gap opening penalty within eight residues of exising gaps.   The two main series of amino acid weight matrices that are used today are the PAM series (3) and the BLOSUM series (4).   In each case, there is a range of matrices to choose from.  Some matrices are appropriate for aligning very closely related sequences where most weight by far is given to identities, with only the most frequent conservative substitutions receiving high scores.  Other matrices work better at greater evolutionary distances where less importance is attached to identities (13).  We choose different weight matrices, as the alignment proceeds, depending on the estimated divergence of the sequences to be aligned at each stage.  Sequences are weighted to correct for unequal sampling across all evolutionary distances in the data set (14).   This downweights sequences that are very similar to other sequences in the data set and upweights the most divergent ones.  The weights are calculated directly from the branch lengths in the initial guide tree (15).   Sequence weighting has already been shown to be effective in improving the sensitivity of profile searches (15,16).  In the original CLUSTAL programs (17-19), the initial guide trees, used to guide the multiple alignment, were calculated using the UPGMA method (20).  We now use the Neighbour-Joining method (21) which is more robust against the effects of unequal evolutionary rates in different lineages and which gives better estimates of individual branch lengths.  This is useful because it is these branch lengths which are used to derive the sequence weights.  We also allow users to choose between fast approximate alignments (22) or full dynamic programming for the distance calculations used to make the guide tree. The new improvements dramatically improve the sensitivity of the progressive alignment method for difficult alignments involving highly diverged sequences.  We show one very demanding test case of over 60 SH3 domains (23) which includes sequence pairs with as little as 12% identity and where there is only one exactly conserved residue across all of the sequences.   Using default parameters, we can achieve an alignment that is almost exactly correct, according to available structural information (24).   Using the program in a wide variety of situations, we find that it will normally find the correct alignment, in all but the most difficult and pathological of cases.  MATERIAL AND METHODSThe basic alignment methodThe basic multiple alignment algorithm consists of three main stages: 1) all pairs of sequences are aligned separately in order to calculate a distance matrix giving the divergence of each pair of sequences; 2) a guide tree is calculated from the distance matrix; 3) the sequences are progressively aligned according to the branching order in the guide tree.   An example using 7 globin sequences of known tertiary structure (25) is given in figure 1.1) The distance matrix/pairwise alignmentsIn the original CLUSTAL programs, the pairwise distances were calculated using a fast approximate method (22).   This allows very large numbers of sequences to be aligned, even on a microcomputer.   The scores are calculated as the number of k-tuple matches (runs of identical residues, typically 1 or 2 long for proteins or 2 to 4 long for nucleotide sequences) in the best alignment between two sequences minus a fixed penalty for every gap.   We now offer a choice between this method and the slower but more accurate scores from full dynamic programming alignments using two gap penalties (for opening or extending gaps) and a full amino acid weight matrix.   These scores are calculated as the number of identities in the best alignment divided by the number of residues compared (gap positions are excluded).   Both of these scores are initially calculated as percent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site.   We do not correct for multiple substitutions in these initial distances.   In figure 1 we give the 7x7 distance matrix between the 7 globin sequences calculated using the full dynamic programming method.2) The guide treeThe trees used to guide the final multiple alignment process are calculated from the distance matrix of step 1 using the Neighbour-Joining method (21).   This produces unrooted trees with branch lengths proportional to estimated divergence along each branch.   The root is placed by a "mid-point" method (15) at a position where the means of the branch lengths on either side of the root are equal.   These trees are also used to derive a weight for each sequence (15).   The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.   In the example in figure 1, the leghaemoglobin (Lgb2_Luplu) gets a weight of 0.442 which is equal to the length of the branch from the root to it.  The Human beta globin (Hbb_Human) gets a weight consisting of the length of the branch leading to it that is not shared with any other sequences (0.081) plus half the length of the branch shared with the horse beta globin (0.226/2) plus one quarter the length of the branch shared by all four haemoglobins (0.061/4) plus one fifth the branch shared between the haemoglobins and the myoglobin (0.015/5) plus one sixth the branch leading to all the vertebrate globins (0.062).  This sums to a total of 0.221.  By contrast, in the normal progressive alignment algorithm, all sequences would be equally weighted.  The rooted tree with branch lengths and sequence weights for the 7 globins is given in figure 1.  

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
日韩一区二区精品在线观看| xfplay精品久久| 成人短视频下载| 国内精品伊人久久久久影院对白| 三级久久三级久久久| 午夜激情久久久| 五月激情综合色| 日日夜夜精品免费视频| 午夜欧美一区二区三区在线播放| 玉足女爽爽91| 日韩精品1区2区3区| 免费成人在线影院| 国产黑丝在线一区二区三区| 欧美日韩在线精品一区二区三区激情 | 91国偷自产一区二区开放时间 | av成人免费在线观看| 91麻豆福利精品推荐| 91九色02白丝porn| 欧美人体做爰大胆视频| 欧美一区二区三区在线看| 精品日本一线二线三线不卡| 2024国产精品| 亚洲欧美综合色| 亚洲一区二区三区四区在线免费观看 | 亚洲私人影院在线观看| 一区二区三区毛片| 青娱乐精品视频在线| 国产精品自拍网站| 91亚洲精品久久久蜜桃网站 | 日韩一卡二卡三卡| 久久亚洲综合av| 亚洲少妇30p| 日本伊人午夜精品| 国产91高潮流白浆在线麻豆| 91免费版在线看| 91精品国产综合久久蜜臀| 久久久精品2019中文字幕之3| 亚洲欧洲日产国码二区| 日韩精品一二三| 丁香婷婷综合色啪| 欧美日韩高清一区二区不卡| 精品久久久久av影院| 国产精品成人免费| 久久国产成人午夜av影院| 成人美女视频在线观看18| 欧美日本一区二区三区四区| 久久久午夜精品理论片中文字幕| 伊人开心综合网| 国产在线精品一区二区不卡了| 99久久婷婷国产精品综合| 91精品国产aⅴ一区二区| 国产精品福利av| 麻豆精品在线看| 色美美综合视频| 久久久99精品久久| 视频一区在线视频| 91在线高清观看| 久久无码av三级| 五月婷婷综合在线| 91免费国产在线| 久久久久青草大香线综合精品| 亚洲一区在线观看网站| 国产精品一区二区久久不卡 | 久久青草国产手机看片福利盒子 | 欧美xxxxx裸体时装秀| 亚洲欧洲韩国日本视频| 国内成人免费视频| 欧美日韩一级大片网址| 国产精品乱码人人做人人爱| 久久精品二区亚洲w码| 欧美日韩国产bt| 一区二区三区免费在线观看| 国产成人精品影院| 欧美一区二区三区电影| 亚洲精品视频在线观看免费| 国产69精品久久久久777| 91精品在线观看入口| 亚洲激情av在线| 不卡区在线中文字幕| www国产成人免费观看视频 深夜成人网| 亚洲一区二区三区四区五区中文 | 国产精品久久久久久久久搜平片 | 亚洲天天做日日做天天谢日日欢| 国产剧情在线观看一区二区 | 免费一级片91| 欧美精品色一区二区三区| 一区二区在线观看av| www.亚洲在线| 国产精品免费视频一区| 国产精品一区在线观看乱码 | 中文字幕亚洲成人| 成人久久久精品乱码一区二区三区| 精品国产精品网麻豆系列| 日韩av一区二区在线影视| 欧美日韩日日夜夜| 亚洲成人综合在线| 欧美视频在线播放| 午夜一区二区三区视频| 欧美性大战久久久久久久| 夜夜嗨av一区二区三区四季av| 一本一道久久a久久精品| 最好看的中文字幕久久| 99精品久久只有精品| 最新日韩在线视频| 一本久久综合亚洲鲁鲁五月天 | 亚洲乱码中文字幕| 色狠狠综合天天综合综合| 亚洲欧洲综合另类| 色狠狠桃花综合| 亚洲国产精品久久久久婷婷884| 在线免费观看日本欧美| 亚洲h在线观看| 欧美精品乱码久久久久久按摩 | 91在线国内视频| 亚洲免费在线看| 欧美在线高清视频| 日韩av中文在线观看| 337p粉嫩大胆色噜噜噜噜亚洲 | 欧美一级视频精品观看| 另类专区欧美蜜桃臀第一页| 精品日韩一区二区| 成人免费毛片嘿嘿连载视频| 亚洲欧美怡红院| 欧美午夜精品免费| 日本中文字幕一区| 久久免费美女视频| 99热99精品| 亚洲成人一区二区在线观看| 日韩欧美在线综合网| 成人自拍视频在线| 亚洲男同1069视频| 日韩欧美亚洲国产另类| 国产不卡在线播放| 亚洲综合色网站| 日韩午夜精品电影| 成人激情电影免费在线观看| 亚洲老妇xxxxxx| 日韩欧美色电影| www.亚洲激情.com| 丝袜美腿一区二区三区| 国产欧美视频一区二区| 欧美综合一区二区三区| 久久91精品久久久久久秒播| 最近日韩中文字幕| 日韩欧美一级二级三级久久久| 粉嫩一区二区三区性色av| 亚洲第一二三四区| 国产无遮挡一区二区三区毛片日本| 色婷婷av一区二区三区软件| 毛片一区二区三区| 亚洲视频在线一区二区| 日韩一区二区在线观看| 97se亚洲国产综合自在线观| 日日夜夜一区二区| 中文在线免费一区三区高中清不卡| 欧洲国内综合视频| 国产毛片精品国产一区二区三区| 一区二区三区产品免费精品久久75| 亚洲精品一线二线三线无人区| 91麻豆精东视频| 激情欧美日韩一区二区| 亚洲一区自拍偷拍| 国产精品视频观看| 欧美一区二区三区性视频| 91在线视频免费91| 激情欧美一区二区| 亚洲丶国产丶欧美一区二区三区| 国产嫩草影院久久久久| 欧美老女人第四色| 91猫先生在线| 国产成人8x视频一区二区| 亚洲成人你懂的| 亚洲欧洲一区二区在线播放| 精品国产一二三区| 5566中文字幕一区二区电影| av电影天堂一区二区在线观看| 韩国精品免费视频| 视频在线观看一区| 亚洲一区二区三区四区的| 国产精品第一页第二页第三页| 日韩欧美视频一区| 这里是久久伊人| 在线精品视频免费观看| 成人免费黄色大片| 国产中文字幕一区| 六月丁香综合在线视频| 午夜欧美视频在线观看| 亚洲综合视频网| 亚洲激情自拍偷拍| 成人欧美一区二区三区小说| 国产欧美一区在线| 久久嫩草精品久久久精品一| 91精品欧美一区二区三区综合在| 欧美午夜电影在线播放| 色综合天天综合网天天看片| 成人午夜又粗又硬又大| 国产麻豆欧美日韩一区| 久久66热偷产精品| 狠狠色狠狠色综合日日91app| 99精品国产热久久91蜜凸|