| You are here: w100w.com > C/C++ > Searching > Web Indexing |
| |||
|
Harvest is a system to collect information and make them searchable using a user friendly web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and files on file servers. Current list of supported formats in addition to HTML include dvi, ps, fulltext, mail, man pages, news, troff, WordPerfect, C sources and many more. | ||
| |||
|
webbase is an internet web crawler written in C and later ported to C++. It uses a MySQL database to store information about crawled URLs. It is available as a command line program or as a library (shared or static). It has two main functions: crawl the WEB to get documents and build a full text database with these documents. The crawler part visits the documents and stores intersting information about them locally. It visits the document on a regular basis to make sure that it is still there and updates it if it changes. The full text database uses the local copies of the document to build a searchable index. The full text indexing functions are not included in webbase. | ||
| |||
| ScriptRequests.Com - Discussion | w100w.net - Links |