?? maaaaaaa.a

?? Read text in ms word file
?? A
?? 第 1 頁 / 共 5 頁
字號:
<p>1) a count of the number of runs or paragraphs described by the page.</p>

<p>2) an array of <b>fc</b>s recorded in ascending order demarcating the boundaries
between runs or paragraphs that are recorded adjacent to one another in the word file.</p>

<p>3) in <b>character fkp</b>s an array of offsets within the <b>fkp </b>in one to one
correspondence with the array of <b>fc</b>s that locate the properties of the run that
begins at a particular <b>fc.</b></p>

<p>in <b>lvc fkp</b>s an array of offsets within the fkp in one to one correspondence with
the array of fcs that locate the lvcxs that describe the run that begins at a particular
fc.</p>

<p>in <b>paragraph fkp</b>s an array of <b>bx </b>structures follows the array of <b>fc</b>s
in one to one correspondence with the array of <b>fc</b>s. each <b>bx</b> begins with an
offset that locates the properties of the paragraph that begins at a particular fc. the
remainder of the <b>bx</b> contains a <b>phe </b>structure that encodes information about
the height of the paragraph that begins at that <b>fc</b>.</p>

<p>4) a group of <b>chpx</b>s if the <b>fkp</b> stores character properties, a group of <b>papx</b>s
if the <b>fkp</b> stores paragraph<b> </b>and table properties, or a group of lvcxs if the
fkp stores paragraph level and numbering cache information</p>

<p>to find the <b>chpx</b>/<b>papx</b> corresponding to a particular character in a
document, calculate the <b>fc</b> coordinate for that character. then search through the <b>bin
table</b> (see next entry)<b> </b>for the type of property you want to produce, to find
the <b>fkp</b> in the document stream whose array of <b>fc</b>s encompasses the <b>fc</b>
of the document<b> </b>character.</p>

<p>then search within the <b>fkp</b> to find the index of the largest <b>fc </b>entry that
is less than or equal to the <b>fc</b> of the document character. use this index to look
up an offset in the array of offsets (for <b>character fkps</b>) or look up an offset in
the array of <b>bx</b>s (for <b>paragraph fkps</b>) within the <b>fkp</b>. add this offset
to the beginning address of the <b>fkp </b>in memory. this will be the first byte of the
desired <b>chpx</b>/<b>papx.</b></p>

<p><b>bin table</b></p>

<p>each <b>fkp</b> can be viewed as bucket or <b>bin</b> that contains the properties of a
certain range of <b>fc</b>s in the word file. in word files, a <b>plc</b>,<b> </b>the <b>plcfbte
</b>(<b>pl</b>ex of f<b>c</b>s containing <b>b</b>in <b>t</b>able <b>e</b>ntries) is
maintained. it<b> </b>records the association between a particular range of <b>fc</b>s and
the <b>pn </b>(<b>p</b>age <b>n</b>umber) of the <b>fkp </b>that contains the properties
for that <b>fc</b> range in the file. in a <b>complex (fast-saved)</b> word document,<b>
fkp </b>pages are intermingled with pages of text<b> </b>in a random pattern which
reflects the history of past fast saves. in a complex document, a <b>plcfbtechpx</b> which
records the location of every <b>chpx fkp</b> must be stored and a <b>plcfbtepapx</b>
which records the location of every <b>papx fkp </b>must be stored<b>. </b>in a <b>non-complex,
full-saved</b> document, all of the <b>chpx fkps </b>are recorded in consecutive 512-byte
pages with the <b>fkp</b>s recorded in ascending <b>fc </b>order, as are all of the <b>papx
fkps</b>. a plcfbtelvcx serves the same purpose for lvcx fkps.</p>

<p>in a full save document, the plcfbte's may not have been able to be expanded during the
save process due to a lack of ram. in that situation, the plcfbte's will be interspersed
with the property pages in a linked list of fbd pages.</p>

<p><b>sep(section properties)</b></p>

<p>the data structure describing the properties of a particular section.</p>

<p><b>sepx(section property exceptions)</b></p>

<p>a data structure describing how the properties of a particular section differ from a
word-defined standard <b>sep</b>. as in the <b>papx</b>, the differences between the <b>sep</b>
for a section and the standard <b>sep</b> are encoded as list of sprms that describe how
the standard <b>sep</b> can be transformed into the section's <b>sep</b>.<b> </b>by
applying a <b>sepx</b>'s sprms to the standard <b>sep</b>, it is possible to reconstitute
the <b>sep</b> for that section.</p>

<p>the plcfsed, a data structure stored in a word file, records the locations of all sepxs
stored in a word file. the array of cps in the plcfsed records the boundaries of sections
in the word document . the second array in the plcf, an array of seds (section
descriptors), is in 1-to-1 correspondence to the array of cps. each sed stores the
beginning fc of the sepx that records the properties for a section. if the fc stored in a
sed is -1, the section properties of the section are exactly equal to the standard section
properties.</p>

<p>the sep for a particular section may be constructed if a cp of a character in that
section is known. first search the array of cps in the plcsed for the index of the largest
cp that is less than or equal to the cp of the character. use this index to locate the sed
in the plcfsed which describes the section. the fc stored in the sed is the offset from
the beginning of the word file at which the sepx is stored. if the stored fc is equal to
0xffffffff, then the sep for the section is exactly equal to the standard sep (see sep
structure definition) otherwise, read the sepx into memory and create a copy of the
standard sep. finally, apply the sprms stored in the sepx to the standard sep to produce
the sep for a section.</p>

<p><b>dop (document properties)</b></p>

<p>the data structure describing properties that apply to the document as a whole.</p>

<p><b>sub-document</b></p>

<p>a separate logical stream of text with properties for which correspondences with the
main document text are maintained. word's headers/footers, footnotes, endnotes,<b> </b>macro
procedure text, annotation text, and text within textboxes are kept in separate
subdocuments. each subdocument has its own cp coordinate space. in other words, data
structures are stored in word files that are components of these subdocuments. these data
structures contain cp coordinates whose 0 point is the beginning of the subdocument text
stream instead of the beginning of the main document text stream.</p>

<p>in<b> full-saved documents</b>, a simple calculation with values stored in the <b>fib </b>produces<b>
</b>the file offset of the beginning of the subdocument text streams (if they exist). the
length of these streams is also stored.</p>

<p>in <b>fast-saved documents</b>, the <b>piece tables</b> of subdocuments are
concatenated to the end of the main document piece table. in this case, to identify the
beginning of subdocument text , you must sum the length of the main document text stream
with the lengths of any subdocument text streams stored ahead of the subdocument
(information stored in the <b>fib</b>) and treat this sum as a <b>cp</b> coordinate. to
retrieve the text of the subdocument, you must do lookups in the piece table, starting
with the piece that contains the beginning <b>cp</b> coordinate, to find the physical
location of each piece of the subdocument text stream.</p>

<p><b>field</b></p>

<p>a field is a two-part structure that may be recorded in the cp stream of a document.
the first part of the structure contains <b>field codes</b> which instruct window's word
to insert text into the second part of the structure, the <b>field result</b>. fields in
window's word are used to insert text from an external file or to quote another part of a
document, to mark index and table of contents entries and produce indexes and tables of
contents, maintain dde links to other programs, to produce dates, times, page numbers,
sequence numbers, etc. there are 91 different field types.</p>

<p>a <b>field begin mark</b> delimits the beginning of a field and precedes any of the
field codes stored in the field. the end of the field codes and the beginning of the field
result is marked with the <b>field separator</b> and the field result and the field itself
are terminated by a <b>field end mark.</b></p>

<p>the cp locations of the field begin mark, field separator, and field end mark are
recorded in <b>plcfld</b> data structures that are maintained for the main document and
all of the subdocuments of the main document whenever a field is inserted or edited. a
field can be <b>dead</b>, in which case it has no field separator, no field result, and no
entry in the <b>plcfld</b>. (see the definition of the fld structure for a list of
possible dead field code strings.) an array of two-byte <b>fld</b> structures is stored in
the <b>plcfld</b> in one-to-one correspondence with the cp entries recorded. an <b>fld</b>
associated with a <b>field begin mark</b> records the type of the field. an <b>fld</b>
associated with the <b>field end mark</b> records the current status of the field (i.e.
whether the result is dirty or has been edited, whether the result has been locked, etc.)</p>

<p>fields may be nested. 20 levels of nesting are permitted.</p>

<p><b>bookmark</b></p>

<p>a <b>bookmark</b> associates a user definable name with a range of text within a
document. a bookmark is frequently used as an operand in <b>field code</b> instructions
within a field. in window's word a bookmark is represented by three parallel data
structures, the <b>sttbbkmk</b>, the <b>plcbkf</b> and the <b>plcbkl</b>. the <b>sttbbkmk </b>is
a string table which contains the name of each bookmark that is defined. the <b>plcbkf</b>
records the beginning cp position of each bookmark. the <b>plcbkl </b>records the limit cp
position that delimits the end of a bookmark. since bookmarks may be nested within one
another to any level, the <b>bkf</b> structure stored in the <b>plcbkf</b> consists of a
single index which specifies which <b>plcbkl </b>marks the end of the bookmark. the <b>bkl</b>
structure is not written to the file, and the plcbkl contains only cps.</p>

<p><b>picture</b></p>

<p>a picture is represented in the document text stream as a special character, an ascii 1
whose chp has the fspec bit set to 1. the file location of the picture in the word binary
file is stored in the character's chp in chp.fcpic. the fcpic is a byte offset into the
data stream. beginning at the position recorded in chp.fcpic, a header data structure, the
pic, will be stored. if the picture is a reference to a tiff file, a picture file or an
office shape file, the name of the file will be recorded immediately following the pic in
a pascal style string. if the picture is an office shape, a window's metafile or a bitmap,
the shape, metafile or bitmap will immediately follow the pic. pictures that are a
reference to an office shape file will include both the filename and the shape in that
order. pictures inserted with word97 are in the new office shape format (documented
elsewhere). however, pictures can be copied from older files into newer ones and their old
format will persist until the picture is edited or displayed<b>.</b></p>

<p>some files (including all files created by word for the macintosh) may store macintosh
pict pictures as well. in this case, the pic structure is immediately followed by a
standard windows metafile depicting a large &quot;x&quot;, so that older readers expecting
only a metafile after the pic will just display this &quot;x&quot;. if a reader detects
this standard &quot;x&quot; metafile, it can extract the sizes of the standard
&quot;x&quot; metafile and the macintosh pict picture that follows it from an early
portion of this &quot;x&quot; metafile. please see appendix b for a discussion of this
technique.</p>

<p><b>embedded object</b></p>

<p>the native data for embedded objects (objs) is stored similarly to pictures (pics).<b> </b>to
locate the native data for embedded objects, scan the plc of field codes for the mother,
header, footnote and annotation, textbox and header textbox documents
(fib.plcffldmom/hdr/ftn/atn/txbx/hdrtxbx).<b> </b>for each separator field, get the chp.</p>

<p>if chp.fspec=1 and chp.fobj=1, then this separator field has an associated embedded
object. the file location of the object data is stored in chp.fcobj. at the specified
location an object header is stored followed by the native data for the object. see the
_objheader structure.</p>

<p>if chp.fole2=1, then this separator field has an associated ole2 object. the fcpic will
be a unique integer that specifies the name of the object's sub-storage instead of an
offset into the data stream.</p>

<p><b>office art object </b></p>

<p>an office art object is represented in the document stream as a special character, an
ascii 8, which has chp.fspec set to 1 for the run of text containing the character .<b> </b>only
main documents and header documents contain office art objects.<b> </b>the native data for
the office art object may be obtained by taking the cp for the special character and using
this to find the corresponding entry in the <b>plcspa</b>.<b> </b>an entry in this plc
consists of a <b>fspa</b> structure, which is described elsewhere in this document.</p>

<p>office art objects can have text attached to them. text for the textboxes is stored
separately in the textbox subdocument of the main or header document.<b> </b>the textbox
subdocument contains a <b>plctxbxs</b> where the text from cp n to cp n+1 in the
subdocument is the text which is contained in a textbox as specified in the <b>txbxs</b>
structure for this n<sup>th</sup> entry in the <b>plctxbxs</b>. textboxes can be linked in
chains of up to 32 textboxes. ordering of textboxes in the subdocument is completely
unrelated to the document structure due to the nature of textbox linking. to find the text
for a given office art object, the <b>txid</b> property (a long: high word is itxbxs+1,
low word is the sequence number) must be fetched from the office art data for the shape.
this contains an index (itxbxs) into <b>plctxbxs</b> and a sequence number in the chain of
linked textboxes. the text for the entire chain of linked textboxes is stored from the cp
itxbxs to cp itxbxs+1 of plctxbxs. the <b>plctxbxbkd</b> describes the &quot;page
table&quot; within textbox stories (where the textboxes in each linked textbox chain are
thought of as &quot;pages&quot;). so, for each entry in the plctxbxs there is a
corresponding entry in the <b>plctxbxbkd</b> at the same cp, and there may be additional
entries in the <b>plctxbxbkd</b> to describe the breaks from one textbox to the next in
linked textbox chains.</p>

<p>note</p>

<p>in this document, bit 0 is the low-order bit. structures are described as they would be
declared in c for the intel architecture. when numbering bytes in a word from low offset
towards high offset, two-byte integers will have their least significant eight bits stored
in byte 0 and most significant eight bits in byte 1. if bit 31 is the most significant bit
in a four-byte integer, bits 31 through 24 will be stored in byte 3 of a four-byte
integer, bits 23 through 16 will be stored in byte 2, bits 15 through 8 will be stored in
byte 1, and bits 7 through 0 will be stored in byte 0.</p>
<a name="03">

<h2>naming conventions</h2>
</a>
?? 文件大小 4404 K
?? 上傳用戶 jjingle
?? 所屬分類多國語言處理
??? 相關標簽

#Read #text #file #word
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? maaaaaaa.a

?? 快捷鍵說明