?? gsi-format.tex
字號:
% Mon Dec 5 15:23:18 1994\section{GSI format}{\tt GSI} (``generic sequence index'') is a format for indexingsequence databases. Database retrieval programs such as {\tt sfetch}can read GSI files when they are available to enable fast retrieval ofa sequence from large databases.GSI files are created from sequence databases by Perl scripts.Scripts are currently provided for indexing GenBank, SwissProt,GenPept, FASTA, and PIR formatted databases.\subsection {GSI programmatic details}A single GSI file indexes one or more files in a sequence database.It is a binary file consisting of a number of fixed-length records.There are three types of records: one header record, one file recordfor every file in the database, and one keyword record for everysequence retrieval key. (The retrieval key is usually the sequencename, but may also be a database accession number.) Every GSI record is 38 bytes long and contains three fields: 32 bytesof text (31 bytes plus a trailing NUL byte), a 2 byte network short,and a 4 byte network long. (``Network short'' and ``network long''refer to portable integer variables of fixed size and byte order. SeePerl manuals for a few more details.)The first record is a header. It contains a short identifying textstring (``GSI''), then the number of files indexed ({\tt nfiles}), andthe number of keywords indexed ({\tt nkeys}).The next {\tt nfiles} records (records 1..{\tt nfiles}) map filenumbers onto file names. The three fields are \verb+<filename> <filenumber> <file format>+. These records must be in numerical orderaccording to their file numbers. Because of the 31-characterrestriction on filename lengths, the sequence files will generallyhave to be in the same directory as the GSI index file. The fileformat number is defined in {\tt squid.h}:\begin{tabular}{rl}0 & Unknown \\ 1 & Intelligenetics\\ 2 & Genbank\\ 4 & EMBL\\ 5 & GCG single sequence\\ 6 & Strider \\ 7 & FASTA\\ 8 & Zuker\\ 9 & Idraw\\ 12 & PIR\\ 13 & Raw\\ 14 & SQUID\\ 16 & GCG data library \\ 101& Stockholm alignment\\102& SELEX alignment\\103& GCG MSF alignment\\104& Clustal alignment\\105& A2M (aligned FASTA) alignment\\106& Phylip\\\end{tabular}The remaining records ({\tt nfiles}+1..{\tt nfiles+nkeys}) are formapping keys onto files and disk offsets. The three fields are\verb+<keyword> <file number> <disk offset>+. These records must besorted in alphabetic order by their retrieval keys, because thefunction GSIGetOffset() locates a keyword in the index file by abinary search.\subsection{Relevant functions}\begin{description}\item[GSIOpen()] Opens a GSI index file.\item[GSIGetRecord()] Gets three fields from the current record.\item[GSIGetOffset()] Looks up a keyword in a GSI index and returns a filename, file format, and disk offset in the file.\item[SeqfilePosition()] Repositions an open sequence file to a given disk offset.\item[GSIClose()] Closes an open GSI index file.\end{description}
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -