|
Version 7
|
|
Display document excerpt (with keywords highlighted) on the results page, plus bugfixes.
NEW! Mar '09
|
Version 6
|
|
Index JPG images, index GPS location data for mapping results, address "No" Trust problem and fix a few bugs.
|
Version 5
|
|
Remove Binary Serialization to solve Medium Trust problem; index OpenXML document formats.
|
Version 4
|
|
Refactored codebase and ability to index and search Microsoft Word,
Excel, PowerPoint and Acrobat PDFs. Little improvements like robots.txt
and excluding regions of HTML also added.
|
Version 3
|
|
Adds a "save to disk" for the catalog; feature suggestions,
bug fixes and incorporation of code contributed by others
from previous versions.
|
Version 2
|
|
Extend Searcharoo to populate its search
catalog by Spidering HTML pages - follow links and imagemaps
to process both static and dynamicly generated pages!
You can also search for multiple words.
|
Version 1
|
|
How to build a simple, extensible search engine using ASP.NET that
can crawl files and create a searchable catalog by processing the
text from HTML source.
|
|
|
|
|
Searcharoo.net is an open-source
C#/ACP.NET implementation of a search engine that you can download and use on your website. Pick the most recent
version from the menu and look for a download link.
The default interface should be familiar (and is easily customizable in ASPX/HTML, jQuery/AJAX or Silverlight 2.0)!

The results can show not only the text, but geo-location information (and urls that open in Google Earth):
The articles describe how the engine itself is built, from a simple file-system crawler to
a fully-fledged web-spider. You can comment or ask questions on CodeProject.
In addition to information on this website, these search-related links
might be interesting/useful.
Web search technology is a huge subject, encompassing:
- networking (spidering the web),
- string and markup-language manipulation (parsing HTML)
- proprietary file formats (searching Word, Excel, PDF, etc)
- language and text-parsing (finding words & sentences in documents, stemming and other
linguistic analysis),
- algorithms (finding matches, AND/OR queries, combining multiple word results)
- performance (both increasing spidering speed, and making large catalogs fast to search)
- user interface (presenting search input options, and results)
and I would encourage you to read as much as you can about these subjects and modify Searcharoo for your own specific purpose.
|
|
|