?? metakit-fileformat - metakit database system.mht
字號:
From: =?gb2312?B?08kgV2luZG93cyBJbnRlcm5ldCBFeHBsb3JlciA3ILGjtOY=?=
Subject: metakit-fileformat - Metakit Database System
Date: Fri, 29 Feb 2008 00:27:29 +0800
MIME-Version: 1.0
Content-Type: text/html;
charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://www.equi4.com/metakit/metakit-ff.html
X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6000.16545
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<! -- -*- tcl doctools -*-=0A=
--><HTML><HEAD><TITLE>metakit-fileformat - Metakit Database =
System</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dgb2312"><! -- Generated from file 'metakit-ff.man' by =
tcllib/doctools with format 'html'=0A=
--><! -- Copyright (c) 1996-2003 Jean Claude Wippler =
<jcw@equi4.com> -- Copyright (c) 2003 Andreas Kupries =
<andreas_kupries@users.sourceforge.net>=0A=
--><! -- CVS: $Id$ metakit-fileformat.n=0A=
-->
<META content=3D"MSHTML 6.00.6000.16609" name=3DGENERATOR></HEAD>
<BODY>
<H1>metakit-fileformat(n) 1.0 "Metakit Database System"</H1><A =
name=3Dname>
<H2>NAME</H2>
<P>metakit-fileformat - Metakit File Format <! -- Copyright JCW for =
metakit, Copyright AK for this document=0A=
--><! -- __________________________________________=0A=
--><A=20
name=3Ddescription>
<H2>DESCRIPTION</H2>This document specifies the file format used by the =
metakit=20
database library for persistent storage of its databases. The same =
format is=20
also used for serialization and subsequent transfer of a database over =
some=20
communication system, like pipes and sockets.=20
<P>How the metakit library uses files in the specified format is outside =
of the=20
scope of this document, although in some sections, hints for specific =
uses might=20
be given.=20
<P>To ensure an unambiguous use of all terms inside of this document and =
when=20
discussing its contents a glossary was created. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#glossary">GLOSSARY</=
A> at the=20
end.=20
<P>It was decided to specify the format in the form of a grammar in =
Extended=20
Backus-Naur Form (EBNF) augmented by free-form text to capture the=20
context-sensitive parts of the language. It is assumed that the reader =
of this=20
document is familiar with EBNF. <! -- =
__________________________________________=0A=
--><A name=3Dbackground>
<H2>BACKGROUND</H2>The background for this specification is =
<EM>metakit</EM> (<A=20
href=3D"http://www.equi4.com/metakit">http://www.equi4.com/metakit</A>), =
a=20
flexible database system developed by Jean-Claude Wippler (<A=20
href=3D"mailto:jcw@equi4.com">mailto:jcw@equi4.com</A>).=20
<P>In contrast to most other systems which handle their data row-wise it =
manages=20
the data in a column-oriented way, i.e. all data for a single column is =
handled=20
together. This characteristic is reflected in the file format too.=20
<P>What would be called tables in other relational database systems are =
known as=20
<EM>views</EM> in Metakit. Views consist of <EM>columns</EM>, which =
store=20
specific pieces of data in <EM>cells</EM>. The cells at the same =
row-index in=20
all columns of a view are called a <EM>row</EM>.=20
<P>Another difference is metakit's ability to define <EM>subview</EM> =
columns,=20
which are columns where the data in each cell is a complete view in its =
own=20
right, although they are sharing the same structural definition. <! -- =
-- Note: I am told that metakit allows heterogeneous subview columns, =
-- where each cell can have at least one of several possible -- =
structures, but currently lack information on those details. I -- =
especially do not have the information on how this ability is -- =
reflected in the file format. ... Another possibility is that -- each =
cell stores not only the view itself, but also its -- definition. =
-- =0A=
--><! -- __________________________________________=0A=
--><A=20
name=3Dtypedefinitions>
<H2>TYPE DEFINITIONS</H2>Metakit supports the following six types for =
its=20
columns. The key in the list below is the character used by metakit as =
indicator=20
for that type. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#structuredefinition"=
>STRUCTURE=20
DEFINITION</A> for the place in which these characters are used.=20
<DL>
<DT><STRONG>S</STRONG>
<DD>All entries in a column of this type contain strings. <BR><BR>
<DT><STRONG>I</STRONG>
<DD>All entries in a column of this type contain integer numbers =
requiring at=20
most 32 bits of storage space. All entries will use the same number of =
bits to=20
store their data. <BR><BR>
<DT><STRONG>F</STRONG>
<DD>All entries in a column of this type contain single precision =
floating=20
point numbers, each taking up 4 bytes (32 bits) of space. <BR><BR>
<DT><STRONG>D</STRONG>
<DD>All entries in a column of this type contain double precision =
floating=20
point numbers, each taking up 8 bytes (64 bits) of space. <BR><BR>
<DT><STRONG>B</STRONG>
<DD>All entries in a column of this type contain arbitrary binary data =
of=20
arbitrary length. There is no bit-packing, the data is measured in =
bytes.=20
<BR><BR>
<DT><STRONG>L</STRONG>
<DD>All entries in a column of this type contain large integer =
numbers, each=20
taking up 8 bytes (64 bits) of space. </DD></DL><! -- =
__________________________________________=0A=
--><A=20
name=3Dcolumnmapping>
<H2>COLUMN MAPPING</H2>When metakit stores column data into a file or=20
serialization it places them into one or more <EM>itemvectors</EM>, the =
physical=20
containers for the data. How many itemvectors are required is dependent =
on the=20
type of the column, and on the data contained in it.=20
<P>This section describes only the basic mapping required to create the =
table of=20
contents (See <STRONG>TableOfContents</STRONG>), and none of the=20
<EM>secondary</EM> itemvectors indirectly reachable through the =
<EM>primary</EM>=20
itemvectors of a column listed in the table of contents.=20
<DL>
<DT><STRONG>I</STRONG>, <STRONG>L</STRONG>, <STRONG>F</STRONG>,=20
<STRONG>D</STRONG>
<DD>A single primary itemvector is used to store all column data. =
<BR><BR>
<DT><STRONG>S</STRONG>, <STRONG>B</STRONG>
<DD>Depending on the size of the string/binary data stored in the =
entries of a=20
column either two or three primary itemvectors are used to store the =
column=20
data. In addition secondary itemvectors may be reached through these, =
holding=20
the actual string/binary data. </DD></DL>The exact contents of each =
itemvector=20
are described in the upcoming grammar. See <STRONG>IVecData</STRONG> and =
its=20
variants. <! -- __________________________________________=0A=
--><A=20
name=3Dvariablesizeddata>
<H2>VARIABLE SIZED DATA</H2>One of the consequences of using a =
column-wise=20
representation for views is that for any insertion, deletion, or change =
of a row=20
the system has to relocate and copy all itemvectors for all the columns =
in the=20
view. This is not so big a problem for data of a fixed size, like for =
the types=20
<STRONG>I</STRONG>, <STRONG>F</STRONG>, <STRONG>D</STRONG>, and=20
<STRONG>L</STRONG>. For them this operation is only invoked when =
inserting or=20
deleting row. Changing the value of a cell invokes only the relocation =
and=20
copying of the itemvectors for one column, and they tend to be =
relatively small.=20
<P>This situation changes when data of varying and arbitrary length is =
involved,=20
be it strings or just binary data (types <STRONG>S</STRONG> and=20
<STRONG>B</STRONG>). For them the simple method of storing all the data =
in one=20
itemvector and the sizes of the items in a second scales badly as even =
minuscule=20
changes cause the copying of large amount of data.=20
<P>To evade this trap the file format uses a slightly more complex =
method.=20
Instead of only two itemvectors it employs three. The first two are the =
same=20
ones as for the simple method, with a small change. While the first =
itemvector=20
contains the sizes for all items, the second itemvector contains only =
the data=20
for the items with a size > 0. Items whose size is recored as zero =
are not=20
stored in the second itemvector, but are <EM>indirect</EM>ly reachable =
through=20
the third itemvector, a catalog. Each of the third itemvector's items =
records=20
the location of another itemvector in the file on the one hand, and =
information=20
determining to which row in the column the item belongs to. In other =
words, how=20
to interleave the items reachable through the catalog with the items in =
the=20
first two itemvectors to reconstruct their proper order at the logical =
level of=20
the column.=20
<P>With the above structure in place any writer of a database is now =
free in his=20
decision where to actually place the variable sized data of a cell when =
writing=20
to the file. Namely either directly into the second itemvector, or into =
a block=20
of its own with the location of that block recorded in the catalog =
vector. By=20
storing smaller data directly and larger data indirectly the performance =
impact=20
of the large data is reduced considerably, because now only the =
itemvector=20
containing the catalog has to be copied for changes, whereas the large =
data=20
blocks often can be left in the location initially given to them.=20
<P>The relevant symbols of the grammar are =
<STRONG>IVecCatalogData</STRONG> and=20
<STRONG>VariableMapping</STRONG>. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#formatgrammar">FORMA=
T=20
GRAMMAR</A> for their definition.=20
<P>
<P>
<TABLE>
<TBODY>
<TR>
<TD bgColor=3Dblack> </TD>
<TD><PRE class=3Dsample> For the example let us assume that we have =
items 0 and 3 and 6 all
having small amounts of data, items 1 and 4 are empty, and 2 is a
larger memo item. Then the situation would be:
Column 0, the data: concatenated contents of items 0, 3, and 6.
Column 1, the sizes: sizes for entries 0, 3, and 6, the rest zeroes.
Column 2, the memos:
2 as byte-packed integer, meaning skip 2 rows
the size of the data in row 2, as byte-packed integer
the pointer to the data in row 2, as byte-packed integer
</PRE></TD></TR></TBODY></TABLE></P><! -- =
__________________________________________=0A=
--><A=20
name=3Dlexicalunits>
<H2>LEXICAL UNITS</H2>The lexical units of the grammar used here are are =
the=20
fundamental pieces making up a metakit file or serialization. This unit =
is the=20
<EM>byte</EM>, containing 8 <EM>bits</EM>. <! -- =
__________________________________________=0A=
--><A name=3Dformatgrammar>
<H2>FORMAT GRAMMAR</H2>The grammar is written in a bottom up format. =
This means=20
that the more basic elements are specified first, and the specification =
of the=20
complete database is the last element.=20
<DL>
<DT><STRONG>Word</STRONG>
<DD>::=3D byte byte <BR><BR>A 16-bit word consists of two bytes. The =
endianess=20
of words is variable. When reading, the metakit library determines the =
actual=20
endianess from the marker in the <STRONG>Header</STRONG>. When writing =
the=20
metakit library uses the native endianess of the host on which it is =
running.=20
<BR><BR>
<DT><STRONG>Long</STRONG>
<DD>::=3D byte byte byte byte <BR><BR>A 32-bit long word consists of =
four bytes=20
(or two words). The endianess of long words is variable. When reading, =
the=20
metakit library determines the actual endianess from the marker in the =
<STRONG>Header</STRONG>. When writing the metakit library uses the =
native=20
endianess of the host on which it is running. <BR><BR>
<DT><STRONG>bpInt</STRONG>
<DD>::=3D [ bpiSignByte ] { bpiDataByte } bpiStopByte <BR><BR>The name =
is a=20
shortcut for byte-packed integer. It is a notation for storing =
arbitrarily=20
large integer numbers in a very compact way. Note that any number is =
always=20
stored in the most compact way possible. This means that leading zeros =
are=20
always stripped down as much as possible. In other words, no instance =
of=20
<STRONG>bpInt</STRONG> will contain a <STRONG>bpiDataByte</STRONG> of =
value=20
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -