?? tokyocabinet.3
字號(hào):
.TH "TOKYOCABINET" 3 "2009-02-13" "Man Page" "Tokyo Cabinet".SH NAMEtokyocabinet \- a modern implementation of DBM.SH INTRODUCTION.PPTokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed\-length array..PPAs for database of hash table, each key must be unique within a database, so it is impossible to store two or more records with a key overlaps. The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided, although the order is arbitrary. These access methods are similar to ones of DBM (or its followers: NDBM and GDBM) library defined in the UNIX standard. Tokyo Cabinet is an alternative for DBM because of its higher performance..PPAs for database of B+ tree, records whose keys are duplicated can be stored. Access methods of storing, deleting, and retrieving are provided as with the database of hash table. Records are stored in order by a comparison function assigned by a user. It is possible to access each record with the cursor in ascending or descending order. According to this mechanism, forward matching search for strings and range search for integers are realized..PPAs for database of fixed\-length array, records are stored with unique natural numbers. It is impossible to store two or more records with a key overlaps. Moreover, the length of each record is limited by the specified length. Provided operations are the same as ones of hash database..PPTable database is also provided as a variant of hash database. Each record is identified by the primary key and has a set of named columns. Although there is no concept of data schema, it is possible to search for records with complex conditions efficiently by using indices of arbitrary columns..PPTokyo Cabinet is written in the C language, and provided as API of C, Perl, Ruby, Java, and Lua. Tokyo Cabinet is available on platforms which have API conforming to C99 and POSIX. Tokyo Cabinet is a free software licensed under the GNU Lesser General Public License..SH FEATURES.PPTokyo Cabinet is the successor of QDBM and improves time and space efficiency. This section describes the features of Tokyo Cabinet..SH THE DINOSAUR WING OF THE DBM FORK.PPTokyo Cabinet is developed as the successor of GDBM and QDBM on the following purposes. They are achieved and Tokyo Cabinet replaces conventional DBM products..PP.RSimproves space efficiency : smaller size of database file..brimproves time efficiency : faster processing speed..brimproves parallelism : higher performance in multi\-thread environment..brimproves usability : simplified API..brimproves robustness : database file is not corrupted even under catastrophic situation..brsupports 64\-bit architecture : enormous memory space and database file are available..br.RE.PPAs with QDBM, the following three restrictions of traditional DBM: a process can handle only one database, the size of a key and a value is bounded, a database file is sparse, are cleared. Moreover, the following three restrictions of QDBM: the size of a database file is limited to 2GB, environments with different byte orders can not share a database file, only one thread can search a database at the same time, are cleared..PPTokyo Cabinet runs very fast. For example, elapsed time to store 1 million records is 0.7 seconds for hash database, and 1.6 seconds for B+ tree database. Moreover, the size of database of Tokyo Cabinet is very small. For example, overhead for a record is 16 bytes for hash database, and 5 bytes for B+ tree database. Furthermore, scalability of Tokyo Cabinet is great. The database size can be up to 8EB (9.22e18 bytes)..SH EFFECTIVE IMPLEMENTATION OF HASH DATABASE.PPTokyo Cabinet uses hash algorithm to retrieve records. If a bucket array has sufficient number of elements, the time complexity of retrieval is `O(1)'. That is, time required for retrieving a record is constant, regardless of the scale of a database. It is also the same about storing and deleting. Collision of hash values is managed by separate chaining. Data structure of the chains is binary search tree. Even if a bucket array has unusually scarce elements, the time complexity of retrieval is `O(log n)'..PPTokyo Cabinet attains improvement in retrieval by loading RAM with the whole of a bucket array. If a bucket array is on RAM, it is possible to access a region of a target record by about one path of file operations. A bucket array saved in a file is not read into RAM with the `read' call but directly mapped to RAM with the `mmap' call. Therefore, preparation time on connecting to a database is very short, and two or more processes can share the same memory map..PPIf the number of elements of a bucket array is about half of records stored within a database, although it depends on characteristic of the input, the probability of collision of hash values is about 56.7% (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight times). In such case, it is possible to retrieve a record by two or less paths of file operations. If it is made into a performance index, in order to handle a database containing one million of records, a bucket array with half a million of elements is needed. The size of each element is 4 bytes. That is, if 2M bytes of RAM is available, a database containing one million records can be handled..PPTraditional DBM provides two modes of the storing operations: `insert' and `replace'. In the case a key overlaps an existing record, the insert mode keeps the existing value, while the replace mode transposes it to the specified value. In addition to the two modes, Tokyo Cabinet provides `concatenate' mode. In the mode, the specified value is concatenated at the end of the existing value and stored. This feature is useful when adding an element to a value as an array..PPGenerally speaking, while succession of updating, fragmentation of available regions occurs, and the size of a database grows rapidly. Tokyo Cabinet deal with this problem by coalescence of dispensable regions and reuse of them, and featuring of optimization of a database. When overwriting a record with a value whose size is greater than the existing one, it is necessary to remove the region to another position of the file. Because the time complexity of the operation depends on the size of the region of a record, extending values successively is inefficient. However, Tokyo Cabinet deal with this problem by alignment. If increment can be put in padding, it is not necessary to remove the region..SH USEFUL IMPLEMENTATION OF B+ TREE DATABASE.PPAlthough B+ tree database is slower than hash database, it features ordering access to each record. The order can be assigned by users. Records of B+ tree are sorted and arranged in logical pages. Sparse index organized in B tree that is multiway balanced tree are maintained for each page. Thus, the time complexity of retrieval and so on is `O(log n)'. Cursor is provided to access each record in order. The cursor can jump to a position specified by a key and can step forward or backward from the current position. Because each page is arranged as double linked list, the time complexity of stepping cursor is `O(1)'..PPB+ tree database is implemented, based on above hash database. Because each page of B+ tree is stored as each record of hash database, B+ tree database inherits efficiency of storage management of hash database. Because the header of each record is smaller and alignment of each page is adjusted according to the page size, in most cases, the size of database file is cut by half compared to one of hash database. Although operation of many pages are required to update B+ tree, QDBM expedites the process by caching pages and reducing file operations. In most cases, because whole of the sparse index is cached on memory, it is possible to retrieve a record by one or less path of file operations..PPEach pages of B+ tree can be stored with compressed. Two compression method; Deflate of ZLIB and Block Sorting of BZIP2, are supported. Because each record in a page has similar patterns, high efficiency of compression is expected due to the Lempel\-Ziv or the BWT algorithms. In case handling text data, the size of a database is reduced to about 25%. If the scale of a database is large and disk I/O is the bottleneck, featuring compression makes the processing speed improved to a large extent..SH NAIVE IMPLEMENTATION OF FIXED\-LENGTH DATABASE.PP
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -