?? ngp.apt
字號:
Some JCR operations are defined to affect the persistent workspace storage
directly without going through the transient space of the session. Such
operations are handled by creating a new draft revision for just that
operation and persisting it as described above. If the operation succeeds,
the session is updated to use the persisted revision as the new base
revision.
[ngp/workspace.jpg] Workspace operation
Advanced Features
The revision model offers very straightforward implementations of many
advanced features. This section discusses some of the most prominent
examples.
* Transactions
Transactions that span multiple Session.save() operations are handled
with an alternative branch of persisted revisions. Instead of making a
persisted revision globally available as the latest revision of the
workspace, it is kept local to the transaction. When the transaction is
committed, all the revisions in the transaction branch are merged into
a single draft revision that is then persisted normally as described above.
[ngp/transaction.jpg] Transaction
If the merged revision can not be persisted (causing the commit to fail) or
if the transaction is explicitly rolled back, then the revisions in the
transaction branch are discarded.
This model can also easily support two-phase commits in a distributed
transaction.
* Namespace and Node Type Management
If the revision model was repository-scoped as discussed above, then
the namespace and node type registries could be managed as normal
(write-protected) content under the global <<<jcr:system>>> subtree as
described in the JCR specification. Such a solution, while probably more
complex than having the registries in custom data structures, would have
many nice features.
If these global registries were managed as normal content then most of
the other advanced features would cover also repository management. For
example it would be possible to transactionally register or modify node
types or to make the node type and namespace registries versionable!
Backup and recovery operations would automatically contain also this
repository metadata, and no extra code would be required for clustering
support of node type or namespace changes. Even observation of the
<<<jcr:system/jcr:nodeTypes>>> subtree would come for free.
* Versioning
Since the revision model by default maintains a full change history of
the entire repository it is possible to heavily optimize versioning
operations. For example a check-in operation can be performed by simply
recording the persisted revision where the checked in node was found.
* Observation
All the information needed for sending JCR observation events is
permanently stored in the persisted revisions, which not only simplifies
the observation implementation but also enables many advanced observation
features.
One tricky issue that this model solves quite nicely is the problem on how
to handle access control of item removal events. If the item in question
has already been removed, then many access control implementations no
longer have a way to determine whether access to that item should be
granted to a given session. With the revision model it is possible to ask
whether a session would have been allowed to access the item in question
when it still did exist, and to filter access to the remove events based
on that information.
The full change history kept by the revision model enables a new feature,
<persistent observation>, in which a client can request all events since
a given checkpoint to be replayed to the registered event listeners
of a session.
The revision history can also be used as a full write-level audit trail
of the content repository.
* Hot and Incremental Backups
Implementing hot backups is almost trivial since persisted revisions are
never modified. Thus it is possible for a backup tool to simply copy the
persisted revisions even if the repository that created them is still
running.
Once a full repository or workspace backup has been made, only new revision
files need to be copied to keep the backed up copy up to date. If the
revisions are stored as files on disk, then standard tools like <<<rsync>>>
can be used to maintain an incremental hot backup of the repository.
* Point-in-Time Recovery
The revision model allows a repository or a workspace to be "rewinded" back
to a previous point in time without doing a full recovery from backups.
This makes it very easy and efficient to undo operations like accidental
removals of large parts of the repository.
* Clustering
A repository cluster can be implemented on top of the revision model by
making sure that operations to persist revisions are synchronized across
cluster nodes.
For example a token passing system can be used to ensure that only one
cluster node can persist changes at a time. Once the node has persisted
a revision it can multicast it to the other nodes and release the
synchronization token. Since all change information is included in the
revision the other nodes can for example easily send the appropriate
observation events.
A node can easily be added to or removed from a cluster. A fresh node
will bootstrap itself by streaming the entire repository contents from
the other nodes.
An isolated cluster node can continue normal operation as a standalone
repository. When the node is returned to the cluster it will first stream
any new revisions from the other cluster nodes and request the
synchronization token to merge those changes with any revisions that were
persisted while the node was isolated. If the merge succeeds, the merged
revisions are multicasted to the cluster and the node takes back its place
within the cluster. If the merge fails, the node will release the
synchronization token and remain isolated from the cluster. In such a case
an administrator needs to either manually resolve the merge failure or
use the point-in-time recovery feature to revert the isolated repository
to a state where it can rejoin the cluster.
Performance
It is still an open question how the revisions could be organized
internally to implement efficient access across histories that might
consists of thousands or even millions of individual revisions.
Efficient internal data structures are a key to achieving this goal,
but there are also a number of high-level optimizations that can be used
on top of the revision level to achieve better performance. Many of these
optimizations are independent of each other and require little or no
changes in other repository operations.
* Internal Data Structures
Simply persisting a list of added, modified, and removed items in a
revision is not likely to produce good performance as any content accesses
would then potentially need to traverse all the revisions to find the
item in question. Even if each revision is internally indexed so that
each item can be accessed in constant time, item access can still take
O(n) time where n is the number of persisted revisions. Thus a key to
improving performance is finding a way to avoid having to iterate through
all past revisions when locating a given node.
One potential approach could be to assign each node a sequence number
based on it's location in the document order of the repository and to
manage these sequence numbers as they change over revisions. Each revision
would list the sequence number ranges that the changes in the revision
affect. With this information it could in many cases infer whether it
even is possible for a node to exist in certain revisions, and thus
the implementation could skip those revisions when looking for the node.
Another alternative would be to use some sort of a backward-looking
item index that indicates the revision in which a given item was last
stored. Unless such an index is stored as a part of the revisions (probably
not in each revision), maintaining it could introduce an unwanted
synchronization block.
Since persisted revisions are never modified it is possible to heavily
read-optimize and index each revision. Especially for common situations
where read performance is heavily prioritized over write performance it
makes sense to spend extra time preparing complex read-only indexes or
other data structures when the revision is persisted. For example it might
be worth the effort to use some statistical access pattern data to find
the best possible ordering and indexing for a persisted revision.
* Combined Revisions
The number and granularity of revisions will likely be a limiting factor
in how efficiently the repository contents can be accessed. Many of the
potential internal revision data structures also work better the more
content there is in a revision. Thus it would be beneficial to increase
the size of individual revisions.
A repository implementation can not affect how large the revisions
persisted by JCR clients are, but it can transparently combine or merge
any number of subsequent small revisions into one larger revision.
[ngp/merge.jpg] Combined revision
The combined revision can be used instead of the smaller revisions for all
operations where the exact revision of a modified item does not matter.
For example when querying and traversing the repository such transparent
combined revisions can speed things up considerably.
Revisions can be combined for example in a low-priority background thread.
Alternatively the repository implementation can offer an administrative
interface for explicitly combining selected revisions. The combine
operation can also be limited to just selected subtrees to optimize
access to those parts of the repository.
As an extreme case the combine operation can be performed on <all> revisions
up to a specified checkpoint. The combined revision will then contain the
full content tree up to that point in time. If the original revisions
are no longer needed for things like point-in-time recovery or persistent
observation, the combined revision could actually even replace all the
individual revisions it contains to avoid using excessive amounts of disk
space.
* Caching and Lazy Loading
Since the persisted revisions are never modified, it is possible to cache
their contents very aggressively. The caches can be very simple since there
is no need for any cache coherency algorithms.
The read-only nature of the revisions also allows many operations to be
postponed to the very last moment the relevant information is needed. For
example a JCR Node instance can simply keep a reference to the on-disk
storage of the last version of the node and load any related information
like property values or child node references only when it is actually
requested.
* Concurrency
In a world where multiprocessor servers and multicore or soon even
manycore processors are commonplace it is essential for best performance
that a software system like a content repository uses every opportunity
for increased concurrency.
The revision model makes it possible to avoid all blocking of read
operations and requires write synchronization only when new revisions are
persisted. With optimistic constraint checking and a fallback mechanism the
write synchronization can even be limited to just the very last step of
the operation to persist a revision. However, this and the clustering
support mentioned above are not the only opportunities of concurrency
that the model allows.
Repository operations like search queries, XML exports, and many
consistency and constraint checks can be formulated as map-reduce
operations that can concurrently operate (map) on many past revisions
and combine (reduce) the partial outcomes into the final result of the
operation. Such algorithms might not be worthwhile on normal repositories,
but offer a way to harness the benefits of massive parallelism in huge
content repositories that may reside in grid environments.
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -