亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? clustalw.ms

?? 生物序列比對程序clustw的源代碼
?? MS
?? 第 1 頁 / 共 3 頁
字號:
This is just an ASCII text version of the manuscript describingClustal W, without the figures.  It was published:Nucleic Acids Research, 22(22):4673-4680.CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice.Julie D. Thompson, Desmond G. Higgins1 and Toby J. Gibson*European Molecular Biology LaboratoryPostfach 102209Meyerhofstrasse 1D-69012 HeidelbergGermanyPhone:		+49-6221-387398Fax:		+49-6221-387306E-mail:		Gibson@EMBL-Heidelberg.DE		Des.Higgins@EBI.AC.UK		Thompson@EMBL-Heidelberg.DEKeywords:	Multiple alignment, phylogenetic tree, weight matrix, gap		penalty, dynamic programming, sequence weighting.1 Current address: European Bioinformatics InstituteHinxton HallHinxtonCambridge CB10 1RQUK.* To whom correspondence should be addressedABSTRACTThe sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences.   Firstly, individual weights are assigned to each sequence in a partial alignment in order to downweight near-duplicate sequences and upweight the most divergent ones.   Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned.    Thirdly, residue specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure.   Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions.  These modifications are incorporated into a new program, CLUSTAL W which is freely available.  INTRODUCTIONThe simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology.  Multiple alignments are used to find diagnostic patterns to characterise protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis.   The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignment is, therefore, of major importance.   The majority of automatic multiple alignments are now carried out using the "progressive" approach of Feng and Doolittle (1).   In this paper, we describe a number of improvements to the progressive multiple alignment method which greatly improve the sensitivity without sacrificing any of the speed and efficiency which makes this approach so practical.  The new methods are made available in a program called CLUSTAL W which is freely available and portable to a wide variety of computers and operating systems.In order to align just two sequences, it is standard practice to use dynamic programming (2).  This guarantees a mathematically optimal alignment, given a table of scores for matches and mismatches between all amino acids or nucleotides (e.g. the PAM250 matrix (3) or BLOSUM62 matrix (4)) and penalties for insertions or deletions of different lengths.   Attempts at generalising dynamic programming to multiple alignments are limited to small numbers of short sequences (5).  For much more than eight or so proteins of average length, the problem is uncomputable given current computer power.  Therefore, all of the methods capable of handling larger problems in practical timescales, make use of heuristics.    Currently, the most widely used approach is to exploit the fact that homologous sequences are evolutionarily related.  One can build up a multiple alignment progressively by a series of pairwise alignments, following the branching order in a phylogenetic tree (1).  One first aligns the most closely related sequences, gradually adding in the more distant ones.   This approach is sufficiently fast to allow alignments of virtually any size.   Further, in simple cases, the quality of the alignments is excellent, as judged by the ability to correctly align corresponding domains from sequences of known secondary or tertiary structure (6).  In more difficult cases, the alignments give good starting points for further automatic or manual refinement.This approach works well when the data set consists of sequences of different degrees of divergence.   Pairwise alignment of very closely related sequences can be carried out very accurately.   The correct answer may often be obtained using a wide range of parameter values (gap penalties and weight matrix).  By the time the most distantly related sequences are aligned, one already has a sample of aligned sequences which gives important information about the variability at each position.   The positions of the gaps that were introduced during the early alignments of the closely related sequences are not changed as new sequences are added.   This is justified because the placement of gaps in alignments between closely related sequences is much more accurate than between distantly related ones.   When all of the sequences are highly divergent (e.g. less than approximately 25-30% identity between any pair of sequences), this progressive approach becomes much less reliable.There are two major problems with the progressive approach:  the local minimum problem and the choice of alignment parameters.   The local minimum problem stems from the "greedy" nature of the alignment strategy.  The algorithm greedily adds sequences together, following the initial tree.  There is no guarantee that the global optimal solution, as defined by some overall measure of multiple alignment quality (7,8), or anything close to it, will be found.   More specifically, any mistakes (misaligned regions) made early in the alignment process cannot be corrected later as new information from other sequences is added.   This problem is frequently thought of as mainly resulting from an incorrect branching order in the initial tree.  The initial trees are derived from a matrix of distances between separately aligned pairs of sequences and are much less reliable than trees from complete multiple alignments.   In our experience, however, the real problem is caused simply by errors in the initial alignments.  Even if the topology of the guide tree is correct, each alignment step in the multiple alignment process may have some percentage of the residues misaligned.   This percentage will be very low on average for very closely related sequences but will increase as sequences diverge.   It is these misalignments which carry through from the early alignment steps that cause the local minimum problem.   The only way to correct this is to use an iterative or stochastic sampling procedure (e.g. 7,9,10).   We do not directly address this problem in this paper.The alignment parameter choice problem is, in our view, at least as serious as the local minimum problem.   Stochastic or iterative algorithms will be just as badly affected as progressive ones if the parameters are inappropriate: they will arrive at a false global minimum.  Traditionally, one chooses one weight matrix and two gap penalties (one for opening a new gap and one for extending an existing gap) and hope that these will work well over all parts of all the sequences in the data set.   When the sequences are all closely related, this works.  The first reason is that virtually all residue weight matrices give most weight to identities.   When identities dominate an alignment, almost any weight matrix will find approximately the correct solution.   With very divergent sequences, however, the scores given to non-identical residues will become critically important; there will be more mismatches than identities.   Different weight matrices will be optimal at different evolutionary distances or for different classes of proteins.  The second reason is that the range of gap penalty values that will find the correct or best possible solution can be very broad for highly similar sequences (11).   As more and more divergent sequences are used, however, the exact values of the gap penalties become important for success.   In each case, there may be a very narrow range of values which will deliver the best alignment.  Further, in protein alignments, gaps do not occur randomly (i.e. with equal probability at all positions).  They occur far more often between the major secondary structural elements of alpha helices and beta strands than within (12).The major improvements described in this paper attempt to address the alignment parameter choice problem.   We dynamically vary the gap penalties in a position and residue specific manner. The observed relative frequencies of gaps adjacent to each of the 20 amino acids (12) are used to locally adjust the gap opening penalty after each residue.   Short stretches of hydrophilic residues (e.g. 5 or more) usually indicate loop or random coil regions and the gap opening penalties are locally reduced in these stretches.   In addition, the locations of the gaps found in the early alignments are also given reduced gap opening penalties.  It has been observed in alignments between sequences of known structure that gaps tend not to be closer than roughly eight residues on average (12).   We increase the gap opening penalty within eight residues of exising gaps.   The two main series of amino acid weight matrices that are used today are the PAM series (3) and the BLOSUM series (4).   In each case, there is a range of matrices to choose from.  Some matrices are appropriate for aligning very closely related sequences where most weight by far is given to identities, with only the most frequent conservative substitutions receiving high scores.  Other matrices work better at greater evolutionary distances where less importance is attached to identities (13).  We choose different weight matrices, as the alignment proceeds, depending on the estimated divergence of the sequences to be aligned at each stage.  Sequences are weighted to correct for unequal sampling across all evolutionary distances in the data set (14).   This downweights sequences that are very similar to other sequences in the data set and upweights the most divergent ones.  The weights are calculated directly from the branch lengths in the initial guide tree (15).   Sequence weighting has already been shown to be effective in improving the sensitivity of profile searches (15,16).  In the original CLUSTAL programs (17-19), the initial guide trees, used to guide the multiple alignment, were calculated using the UPGMA method (20).  We now use the Neighbour-Joining method (21) which is more robust against the effects of unequal evolutionary rates in different lineages and which gives better estimates of individual branch lengths.  This is useful because it is these branch lengths which are used to derive the sequence weights.  We also allow users to choose between fast approximate alignments (22) or full dynamic programming for the distance calculations used to make the guide tree. The new improvements dramatically improve the sensitivity of the progressive alignment method for difficult alignments involving highly diverged sequences.  We show one very demanding test case of over 60 SH3 domains (23) which includes sequence pairs with as little as 12% identity and where there is only one exactly conserved residue across all of the sequences.   Using default parameters, we can achieve an alignment that is almost exactly correct, according to available structural information (24).   Using the program in a wide variety of situations, we find that it will normally find the correct alignment, in all but the most difficult and pathological of cases.  MATERIAL AND METHODSThe basic alignment methodThe basic multiple alignment algorithm consists of three main stages: 1) all pairs of sequences are aligned separately in order to calculate a distance matrix giving the divergence of each pair of sequences; 2) a guide tree is calculated from the distance matrix; 3) the sequences are progressively aligned according to the branching order in the guide tree.   An example using 7 globin sequences of known tertiary structure (25) is given in figure 1.1) The distance matrix/pairwise alignmentsIn the original CLUSTAL programs, the pairwise distances were calculated using a fast approximate method (22).   This allows very large numbers of sequences to be aligned, even on a microcomputer.   The scores are calculated as the number of k-tuple matches (runs of identical residues, typically 1 or 2 long for proteins or 2 to 4 long for nucleotide sequences) in the best alignment between two sequences minus a fixed penalty for every gap.   We now offer a choice between this method and the slower but more accurate scores from full dynamic programming alignments using two gap penalties (for opening or extending gaps) and a full amino acid weight matrix.   These scores are calculated as the number of identities in the best alignment divided by the number of residues compared (gap positions are excluded).   Both of these scores are initially calculated as percent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site.   We do not correct for multiple substitutions in these initial distances.   In figure 1 we give the 7x7 distance matrix between the 7 globin sequences calculated using the full dynamic programming method.2) The guide treeThe trees used to guide the final multiple alignment process are calculated from the distance matrix of step 1 using the Neighbour-Joining method (21).   This produces unrooted trees with branch lengths proportional to estimated divergence along each branch.   The root is placed by a "mid-point" method (15) at a position where the means of the branch lengths on either side of the root are equal.   These trees are also used to derive a weight for each sequence (15).   The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.   In the example in figure 1, the leghaemoglobin (Lgb2_Luplu) gets a weight of 0.442 which is equal to the length of the branch from the root to it.  The Human beta globin (Hbb_Human) gets a weight consisting of the length of the branch leading to it that is not shared with any other sequences (0.081) plus half the length of the branch shared with the horse beta globin (0.226/2) plus one quarter the length of the branch shared by all four haemoglobins (0.061/4) plus one fifth the branch shared between the haemoglobins and the myoglobin (0.015/5) plus one sixth the branch leading to all the vertebrate globins (0.062).  This sums to a total of 0.221.  By contrast, in the normal progressive alignment algorithm, all sequences would be equally weighted.  The rooted tree with branch lengths and sequence weights for the 7 globins is given in figure 1.  

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
久久久午夜精品理论片中文字幕| 久久99热99| 久久久99精品免费观看| 精品一区二区三区香蕉蜜桃| 日韩欧美一区二区在线视频| 亚洲精品v日韩精品| 丰满放荡岳乱妇91ww| 久久―日本道色综合久久| 国产一区二区三区日韩| 国产精品传媒入口麻豆| 色av一区二区| 国产麻豆日韩欧美久久| 国产精品伦一区二区三级视频| 在线精品观看国产| 国产麻豆精品一区二区| 秋霞成人午夜伦在线观看| 国产精品看片你懂得| 欧美在线观看一区二区| 国产激情一区二区三区| 亚洲国产综合色| 国产精品久久三| 久久久久久久网| 欧美一区二区三区四区视频 | 欧美日韩亚洲综合在线 欧美亚洲特黄一级 | 成人性生交大片| 91首页免费视频| 在线不卡欧美精品一区二区三区| 99久久99精品久久久久久| 色88888久久久久久影院野外| 欧美视频在线观看一区| 国产清纯在线一区二区www| 亚洲人精品午夜| 国产精品一区二区久激情瑜伽 | 亚洲精品在线观看视频| 有坂深雪av一区二区精品| 欧美色涩在线第一页| 91在线视频18| 日本成人超碰在线观看| 久久不见久久见免费视频1| 亚洲欧洲性图库| 国v精品久久久网| 99麻豆久久久国产精品免费| www.在线欧美| 欧美色图激情小说| 欧美大胆一级视频| 日本一区二区免费在线| 一区二区免费在线播放| 久久er精品视频| 99免费精品在线| 4438亚洲最大| 亚洲激情在线激情| 国产一区欧美一区| 欧美日韩在线播放一区| 久久综合久久99| 亚洲成av人片| 99久久精品久久久久久清纯| 精品国产第一区二区三区观看体验| 国产精品一区二区x88av| 色婷婷激情一区二区三区| 欧美一级专区免费大片| 亚洲精品国产高清久久伦理二区| 狠狠色狠狠色综合系列| 欧美日韩在线播| 亚洲美女一区二区三区| 国产高清不卡二三区| 日韩欧美一区二区在线视频| 亚洲第一电影网| 欧美日韩在线播放三区| 亚洲在线观看免费| 欧美美女视频在线观看| 亚洲精品免费在线| 91国内精品野花午夜精品| 国产精品国产自产拍高清av| 国产jizzjizz一区二区| 久久日韩精品一区二区五区| 韩国视频一区二区| 亚洲欧美日韩电影| 精品国产1区二区| 五月婷婷欧美视频| 午夜精品一区二区三区三上悠亚| 午夜精品在线看| www.亚洲色图| 91 com成人网| 亚洲免费观看高清完整版在线观看熊 | 在线观看亚洲精品| 亚洲午夜精品一区二区三区他趣| 日产精品久久久久久久性色| eeuss鲁片一区二区三区在线观看| 国产网红主播福利一区二区| 国产91在线|亚洲| 亚洲国产一区视频| 精品久久一二三区| av不卡在线播放| 日本在线播放一区二区三区| 国产亚洲精品免费| 欧美日韩视频在线一区二区| 国产一区二区在线看| 亚洲人成伊人成综合网小说| 日韩色在线观看| 色狠狠色狠狠综合| 成人激情动漫在线观看| 亚洲一区二区三区美女| 久久久国产午夜精品| 91精品国产品国语在线不卡| 国产91色综合久久免费分享| 日韩高清在线电影| 亚洲综合色区另类av| 久久久精品影视| 日韩片之四级片| 欧美一区二区三区男人的天堂| 欧美主播一区二区三区美女| 成人精品视频一区二区三区尤物| 日本美女一区二区| 亚洲一区二区视频| 亚洲色欲色欲www| 国产精品久久看| 国产精品国产三级国产有无不卡| 欧美xxxxxxxxx| 欧美精品一区二区三| 日韩精品一区二| 国产亚洲一区二区三区在线观看 | 国产成人精品三级麻豆| 成人性生交大片免费看中文网站| 国产精品1024久久| av欧美精品.com| 一本在线高清不卡dvd| 91电影在线观看| 91麻豆精品国产91久久久使用方法| 欧美视频三区在线播放| 欧美一个色资源| 国产精品久久久久一区二区三区共 | 亚洲国产欧美在线人成| 精品一区二区三区久久| 色八戒一区二区三区| 欧美日韩国产成人在线免费| 欧美电视剧在线看免费| 国产精品国产馆在线真实露脸| 欧美成人免费网站| 久久久综合九色合综国产精品| 成人欧美一区二区三区小说| 五月婷婷激情综合| 99久久免费精品| 久久丝袜美腿综合| 日韩高清不卡在线| 91一区在线观看| 精品国内二区三区| 亚洲一区二区三区在线看| 国产精品一区二区久久精品爱涩 | 久久国产夜色精品鲁鲁99| 国产九九视频一区二区三区| 欧美疯狂做受xxxx富婆| 国产精品女同一区二区三区| 久久精品国产99国产精品| 9191成人精品久久| 亚洲青青青在线视频| 成人av午夜电影| 国产香蕉久久精品综合网| 日韩精品亚洲专区| 91超碰这里只有精品国产| 视频一区二区三区入口| 在线免费观看一区| 亚洲一区二区视频在线观看| 欧美亚洲国产一区在线观看网站| 中文字幕一区二区三区精华液| 不卡的电影网站| 亚洲精品你懂的| 国产一区二区女| 欧美mv日韩mv国产网站app| 日本成人在线一区| 欧美国产成人精品| 欧美色图免费看| www.欧美亚洲| 国产精品电影院| 欧美一区二区三区公司| 国产夫妻精品视频| 一区二区三区国产精华| 日韩一区二区免费视频| 日韩精品一区二区三区在线 | 一区二区三区91| 精品国产伦理网| 日本高清成人免费播放| 免费观看91视频大全| 亚洲靠逼com| 久久久久国产免费免费| 欧美美女bb生活片| 91蝌蚪国产九色| 欧美喷水一区二区| 五月综合激情网| 综合电影一区二区三区| 精品少妇一区二区三区| 色综合久久久久网| 成人一区二区在线观看| 国产一区二区在线观看视频| 性欧美大战久久久久久久久| 国产精品天美传媒沈樵| 久久午夜免费电影| 国产亚洲va综合人人澡精品| 欧美一级高清片| 日韩免费在线观看| 久久精品一二三|