PhpDig is a search engine written in PHP that uses a MySQL database backend. It features Indexing of both static and dynamic pages, spidering of almost all links in HTML content, in hrefs, areamaps, and frames, and full text Indexing. The search results appearence is skin-able, using a very simple templates system. PhpDig是一個用PHP編寫的搜索引擎,它使用MySQL數據庫后臺。它的功能有靜態和動態頁面索引,在HTML內容中用hrefs, areamaps,和frames以及全文本索引構成所有的鏈接網。搜索結果的外觀是可定制外觀的,使用一個非常簡單的模板系統。
ICA is used to classify text in extension to the latent semantic Indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimate the number of classes and produces keywords for each class. The icaML algorithm is used.
Rainbow is a C program that performs document classification usingone of several different methods, including naive Bayes, TFIDF/Rocchio,K-nearest neighbor, Maximum Entropy, Support Vector Machines, Fuhr sProbabilitistic Indexing, and a simple-minded form a shrinkage withnaive Bayes.
The goal of this library is to make ODBC recordsets look just like an STL container. As a user, you can move through our containers using standard STL iterators and if you insert(), erase() or replace() records in our containers changes can be automatically committed to the database for you. The library s compliance with the STL iterator and container standards means you can plug our abstractions into a wide variety of STL algorithms for data storage, searching and manipulation. In addition, the C++ reflection mechanism used by our library to bind to database tables allows us to add generic Indexing and lookup properties to our containers with no special code required from the end-user. Because our code takes full advantage of the template mechanism, it adds minimal overhead compared with using raw ODBC calls to access a database.
Abstract
The Lucene Server project is an attempt to extend the Jakarta Lucene tool with server capabilities.
Lucene is a robust Java API that enables you creating indexes from text sources and perform powerful searches on these indexes. With Lucene, creating an index must be done programmatically and there are almost no possibilities of integrating index management in a distributed environment. In other words, out of the box, Lucene is suitable for integrating Indexing and searching possibilities in a single application but not for providing index/search services for multiple applications.
The Lucene Server project comes with a Java API that propose the following
make it easy to create indexes in a declarative way by simply providing an XML configuration document.
make it easy to personalize the way Lucene must handle different kind of data sources.
provide services for index management and searching that can be accessed from several applications.
enable batch tasks scheduling.