?? piecetable.html
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_07) on Sun May 06 17:59:52 GMT 2007 --><TITLE>PieceTable (Heritrix 1.12.1)</TITLE><META NAME="keywords" CONTENT="org.archive.util.ms.PieceTable class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){ parent.document.title="PieceTable (Heritrix 1.12.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="class-use/PieceTable.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../org/archive/util/ms/PieceReader.html" title="class in org.archive.util.ms"><B>PREV CLASS</B></A> NEXT CLASS</FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../index.html?org/archive/util/ms/PieceTable.html" target="_top"><B>FRAMES</B></A> <A HREF="PieceTable.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2"> SUMMARY: NESTED | <A HREF="#field_summary">FIELD</A> | <A HREF="#constructor_summary">CONSTR</A> | <A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL: <A HREF="#field_detail">FIELD</A> | <A HREF="#constructor_detail">CONSTR</A> | <A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.util.ms</FONT><BR>Class PieceTable</H2><PRE>java.lang.Object <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>org.archive.util.ms.PieceTable</B></PRE><HR><DL><DT><PRE> class <B>PieceTable</B><DT>extends java.lang.Object</DL></PRE><P>The piece table of a .doc file. <p>The piece table maps logical character positions of a document's text stream to actual file stream positions. The piece table is stored as two parallel arrays. The first array contains 32-bit integers representing the logical character positions. The second array contains 64-bit data structures that are mostly mysterious to me, except that they contain a 32-bit subfile offset. The second array is stored immediately after the first array. I call the first array the <i>charPos</i> array and the second array the <i>filePos</i> array. <p>The arrays are preceded by a special tag byte (2), followed by the combined size of both arrays in bytes. The number of piece table entries must be deduced from this byte size. <p>Because of this bizarre structure, caching piece table entries is something of a challenge. A single piece table entry is actually located in two different file locations. If there are many piece table entries, then the charPos and filePos information may be separated by many bytes, potentially crossing block boundaries. The approach I took was to use two different buffered streams. Up to n charPos offsets and n filePos structures can be buffered in the two streams, preventing any file seeking from occurring when looking up piece information. (File seeking must still occur to jump from one piece to the next.) <p>Note that the vast majority of .doc files in the world will have exactly 1 piece table entry, representing the complete text of the document. Only those documents that were "fast-saved" should have multiple pieces. <p>Finally, the text contained in a .doc file can either contain 16-bit unicode characters (charset UTF-16LE) or 8-bit CP1252 characters. One .doc file can contain both kinds of pieces. Whether or not a piece is Cp1252 is stored as a flag in the filePos value, bizarrely enough. If the flag is set, then the actual file position is the filePos with the flag cleared, then divided by 2.<P><P><DL><DT><B>Author:</B></DT> <DD>pjack</DD></DL><HR><P><!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) static int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#CP1252_INDICATOR">CP1252_INDICATOR</A></B></CODE><BR> The bit that indicates if a piece uses Cp1252 or unicode.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) static int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#CP1252_MASK">CP1252_MASK</A></B></CODE><BR> The mask to use to clear the Cp1252 flag bit.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) static java.util.logging.Logger</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#LOGGER">LOGGER</A></B></CODE><BR> </TD></TR></TABLE> <!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#PieceTable(org.archive.io.SeekInputStream, int, int, int)">PieceTable</A></B>(<A HREF="../../../../org/archive/io/SeekInputStream.html" title="class in org.archive.io">SeekInputStream</A> tableStream, int offset, int maxCharPos, int cachedRecords)</CODE><BR> Constructor.</TD></TR></TABLE> <!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#getMaxCharPos()">getMaxCharPos</A></B>()</CODE><BR> Returns the maximum character position.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> <A HREF="../../../../org/archive/util/ms/Piece.html" title="class in org.archive.util.ms">Piece</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/util/ms/PieceTable.html#next()">next</A></B>()</CODE>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -