?? cff.txt
字號:
Culvers Fabulous Filesystem
Version 5.9
June 15, 1995
Copyright 1991, 1992, 1993, 1994, 1995 Norman D. Culver
All Rights Reserved
Send bug reports and suggestions to:
Oxbow Software
1323 S.E. 17th Street #662
Ft. Lauderdale, FL 33316
(954) 463-4754
ndc@icanect.net
This software package is distributed as a preliminary version for
TESTING PURPOSES ONLY, WITHOUT ANY WARRANTY;
without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
GENERAL DESCRIPTION
This distribution contains a single threaded 32 bit portable library
compiled with gcc for the Intel 386 cpu. The library contains external
references to functions in 'cfport.c' which is supplied as source code.
The user can modify cfport.c to accommodate a non-unix like OS.
The user API for the library is declared in the file 'cff.h' and described
in this document. CFF is a filesystem/database engine which handles
allocation of memory, extended memory and disk. It supports incrementally
hashed and B+ tree storage maps in an identical fashion for memory and
disk. The standard flavors of malloc are included. NEW for the first time,
to my knowledge, is the ability to malloc by category and the ability
to malloc PERMANENTLY.
Data is stored in and accessed through 'objects' which are referenced
in the classical filesystem manner. e.g. MEMORY/myobject/subobject/...
Objects are opened and closed in the time honored unix fashion. But there
is a big difference between CFF objects and normal filesystem objects.
Each object has the properties of a directory, stack, file, dictionary and
repository. Objects can be created as F_SORTED, in which case they are
B+ trees, and/or with their own local 'bitmaps' for locality of reference,
or, if hashed, with PREALLOCATED DATA AND ENTRIES for ultra fast inserts.
The objects expand and shrink automatically as entries are added or deleted.
The in-core index for hashed objects is never saved to disk because it can be
rebuilt when the object is loaded. Data can be accessed directly in the
localizer buffers or copied to/from user space.
COPY OBJECTS
The 'cfcopy' command will move an object and all subobjects within and
between filesystems. A filesystem on disk is denoted with the extension
'.cff'. The filesystems for memory and extended memory
(if it exists) are predefined as MEMORY and EXTDMEM. The copy command
will create a filesystem on disk if an object is copied to a '.cff' object.
The command:
cfcopy("newsys.cff", "MEMORY/object/subobject/target");
copies the object named 'target' to the disk file newsys.cff and thus
'target' becomes a filesystem. Conversely, filesystems can become objects, etc.
PATHNAME TRANSLATION
A pathname translation facility permits brevity.
Translation dictionaries are maintained for process, application and system.
Example:
void *h1, *h2;
cfdef("phone book", "myfile.cff/users/applications/faxmodem/phones");
h1 = cfopen("phone book", F_RDWR, NULL);
or
h2 = cfcopy("MEMORY/tempobj", "phone book");
or
h2 = cfcopy("MEMORY/tempobj", h1);
And when finished you can get a fully garbage collected and shrunk disk object
with the command:
cfcopy(h1, h2);
or
cfcopy("phone book", h2);
or
cfcopy("phone book", "MEMORY/tempobj");
EXTERNAL FILES
It is possible to access normal OS files and directories. If a path doesn't
reference a .cff filesystem then it is assumed to be external. Copy to/from an
external file only exercises the file property of an object, so it is
trivial to import/export external data.
FANCY MALLOCS
The CFF library supplies standard versions of malloc, calloc, realloc etc.
which MUST superceed the standard compiler versions. Don't worry, these
implementations are fast and debugged. They are implemented with skip lists
and keep their bookeeping info separate from the allocated data. This can
increase the speed in a demand paged environment by a large factor.
The enhancements to malloc are simple to use but rather complicated to
implement. They permit the programmer to malloc, calloc, free etc. by
CATEGORY and they are FAST. Category 0 is reserved for normal malloc, and
is somewhat segregated in the address space because it calls 'sbrk()'
directly. All other categories call cfmalloc() which does it's own calls
to sbrk (actually PORTSBRK in cfport.c). NOTE: Normal malloc, etc. will work
even if cfinit() is not issued.
Examples:
void *ptr;
ptr = mallocC (5, 1024); // malloc 1K bytes for category 5
freeC (5, ptr); // free the pointer in category 5
freecat (5); // free all memory allocated to category 5
Permanent categories have negative numbers.
NOTE -- VERSION 5.9 HAS DISABLED PERMANENT CATEGORIES
DUE TO THE VARIABILITY IN SUPPORTING OPERATING SYSTEMS
FOR ANY GIVEN COMPATIBLE SET OF OPERATING SYSTEMS, THEY
CAN EASILY BE RE-ENABLED.
ptr = malloc (-20, 1024); // malloc 1K bytes for permanent category -20
freeC (-20, ptr); // free the pointer in category -20
freecat (-20); // free all memory allocated to category -20
Permanent categories are enabled only if the programmer has named a
filesystem when 'cfinit' is called.
main()
{
cfinit("app1", 512, "myfile.cff"); // enables permanent categories in myfile.cff
or
cfinit("app1", 512, NULL); // permanent categories are disabled;
...
cfexit(); // saves active permanent categories and closes objects
// cfinit calls atexit with this function, you do not
// have to include it in the program, but it is harmless to do
// so.
}
Data in active permanent categories is reloaded to exactly the same memory
addresses at program startup and saved in the named filesystem at program exit.
Creep in the data segment as program development proceeds is accommodated
by setting the variable 'heapstart' in cfport.c. The programmer should
pick a number large enough to accommodate the environment in which development
is taking place. If C++ is used, static constructors can gobble a lot of
space before cfinit() is called. The command cfcreep() returns the amount
of space in the safety zone. WARNING: you can't change heapstart after
permanent categories have been saved in a file.
The named filesystem can hold permanent categories for multiple applications
providing the applications are uniquely identified by the first argument
to cfinit().
In a clean design, a programmer would save a small number of pointers (one??)
to a complex memory structure which will come and go automatically as the
application is run. The purpose here is to speed up the process of loading
permanent objects. Some CAD programs take 30 minutes to start; with CFF
startup can be reduced to seconds on similarly endianed machines.
for example:
main()
{
struct masterstruct *masterptr;
cfinit("myapp", 256,"myfile.cff"); // PERMINFO is now defined
if(!cfisnew(PERMINFO))
{/* The file is not new, load the saved copy of masterptr */
cfget(PERMINFO, "masterptr", 9, &masterptr, sizeof(void *));
}
else
{/* The file is new, build the basic structures */
masterptr = mallocC(-1, sizeof(masterstruct)); // or whatever
... // build memory structures based on masterptr
// and various negative categories
}
... // use and modify the permanent structures
// always use negative categories
/* Save the master pointer before exit */
cfreput(PERMINFO, "masterptr", 9, &masterptr, sizeof(void *));
cfexit();
}
PREDEFINED PATHNAMES (use as the leading element of a path)
MEMORY the memory filesystem
EXTDMEM the extended memory filesystem
PREDEFINED PATHNAMES or HANDLES (use as the leading element or as a handle)
PERMFILE the filesystem which was mentioned in cfinit()
PERMCAT an object in PERMFILE containing the app's permanent categories
PERMINFO an object in PERMFILE into which the user can store app info
PREDEFINED HANDLE (use only as a handle)
MEMTEMP a memory object which can be used by the programmer
ALIGNMENT
The granularity of the 5.9 release is 4 bytes for malloc and 32 bytes
for filesystem allocations. I am considering allowing each bitmap to
carry a different alignment, if all programmers used gcc then I could
use long long arguments in the user API.
NODE SIZES
Node (bucket) sizes can range from 512 bytes to 8192 bytes. When an
object is created, the default size is 2048 bytes. The programmer can
modify this size by setting a flag in the mode bits for cfopen. The
root directory of a filesystem is defaulted to 1024 bytes per node.
F_FILEONLY 512 bytes
F_BIGDIR 4096 bytes
F_HUGEDIR 8192 bytes
KEYS, ITEMS and DATA
Keys are the names of things in the filesystem. When used as part of a
path the key must contain only characters in the range 0x40 to 0x7f
(space to tilde) or it will be rejected by the name translation mechanism.
In all other cases a key may contain anything. When using hashed objects
the programmer should make a distinction between long and short keys.
Short keys are 8 bytes or less and are stored directly in the hash buckets
along with Items. Long keys are stored in a special KEYCACHE area and
require the overhead of the hash bucket entry which is 20 bytes plus
the overhead of the KEYCACHE which is 13 bytes + the size of the key.
Note: the minimum KEYCACHE allocation is 32 bytes. When a long key is
stored in a hash bucket the 8 byte key region is converted to an alternative
hash value for the key, this ensures that only one key comparison
is ever done. B+ tree objects store the keys and items together in the
tree nodes. The programmer should ensure that at least 2 keys will fit
into a B+ tree node, 1 key per node results in a B tree.
The key comparison routine can be set for each object with the command
cfkeycmp(handle, funcaddr);
Items are the basic insertable/retrievable elements in a node. They are
64 bits in size with the high 4 bits reserved for tags. B+ trees can support
untagged items if created with F_UNTAGGED set.
The programmer will find that 64 bit items are a bit of a pain, especially
if his compiler does not support the long long type. They are worth the
extra effort, believe me.
When the system stores a tagged item it checks the high 4 bits, if they
are non-zero it assumes that a tag is already present, otherwise the
item is tagged as a STO_VALUE. Be SURE that your program generated items
have the high 4 bits set to zero.
Due to the tagged nature of most items, the system can automatically allocate
and deallocate storage if the item happens to describe a chunk of data.
Currently the maximum size of a single contiguous chunk is 16 Megabytes -1.
This can change if I implement adjustable alignment per bitmap.
See the file cff.h for tag values.
The item comparison routine can be set for each object with the command
cfitemcmp(handle, funcaddr);
A copy of the default keycmp and itemcmp functions are located in the
module cfport.c so that the programmer can see what needs to be done for
alternative functions.
HASHED vs SORTED
The default method for an object is hashed. The root directory of a
filesystem is forced to be hashed even if created as F_SORTED. Thus
a SORTED object cannot become a filesystem with the cfcopy command.
I may change this but it complicates opening a filesystem.
SORTED pros
1. Sequential access of a sorted object produces sorted output.
2. Sorted objects can support untagged items, no subobjects are allowed.
3. Normal mode duplicate entries have the overhead of only one key per node.
4. An unlimited number of normal mode duplicate entries is supported.
5. There is no in-core index for the nodes. This could be important for
an object with 100 million items.
6. Inserts and deletes to sorted objects do not invalidate the mark.
SORTED cons
1. Access and update of sorted objects is MUCH slower than hashed.
2. Key length is limited by the node size.
HASHED pros
1. Hashed objects are fast to update and access, and VERY fast with short keys.
2. Key length is unlimited (16 Megabytes).
3. Hashed objects can be created with PREALLOCATED entries and data. This
produces complete locality of reference and great speed when creating
a database.
HASHED cons
1. The in-core index for each object takes up space (4 bytes per bucket).
This is a consideration when the object contains a lot of items. The
maximum bucket of 8192 bytes contains 406 items.
2. Normal mode duplicate entries is limited to 406 dups.
3. Under pathological conditions the hashed object may fill up, i.e. a
bucket may contain a mix of keys which cannot be split.
4. Sequential access to a hashed object produces unsorted output.
DUPLICATE ENTRIES
Keys with duplicate entries are supported in two ways: the normal
`dupnum' way and the `dupname' way. Dupnames are 48 bit unique names + a 12 bit
unique id. Each object can support and control 4093 dupname sets.
Dupnames can be used as keys or items. Normal dups must be accessed by
a key-item combination or sequentially; the items for each normal
dup should be different.
DO NOT MIX NORMAL AND DUPNAME DUPLICATES.
OBJECTS AS FILES
Any object is a file, just read and write to it. If you read
before writing, read will return ERROR. When an object is closed
space allocated to the file property is truncated to within 128 bytes
of the filesize, filesize is limited to 2G bytes.
OBJECTS AS STACKS
Any object is a stack, just push and pop items or data. Stacks retain
their data when an object is closed and/or copied, stack depth is unlimited.
OBJECTS AS DIRECTORIES
Any object is a directory, just create a subobject (subdirectory) with
the cfopen or cfsubopen commands. The directory property can be accessed
with cfopendir, cfreaddir, cfrewinddir, cftelldir, cfseekdir and cfclosedir.
OBJECTS AS DICTIONARIES
Any object is an dictionary, just insert, find and delete keyed items.
To find out what is in an object, issue the command cfprintentries(something).
OBJECTS AS REPOSITORIES
Any object is a repository, just put and get keyed data chunks
(max 16Meg per chunk, min 32 bytes per chunk).
INTERNAL BUFFERING
The 5.9 version supports 512 buffer headers and unlimited data
buffering. The cfinit command includes an argument for the number of 1K blocks
of data buffering allowed. cfinit("app",700,NULL) designates that 700K of
memory be allocated to buffering.
Data chunks can be accessed directly in the buffer region with the command
cflocalize(). The localizer will allow most of the defined
data region to be allocated to a single localization request, excess
buffers are flushed. The size of the data region may be increased/decreased
dynamically with the cfmodbufs() command.
This localizer does not localize blocks; it localizes chunks which may
be large or small. Most of the time the localizer is working with nodes
and keys so the memory reqirements are not large internally. The file
property is read/written in 8K chunks. If the user creates a huge chunk
then it is up to the user to deal with it.
Huge data chunks can be accessed with the command cfopen_chunk() or cfopen()
followed by reads and writes to the returned handle.
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -