ASPAlliance ASP Kitchen  
Search: Go  

ASP Kitchen: ASP.NET Articles: How to build an ASP.NET website search engine

How to build an ASP.NET website search engine

Once a web site grows beyond a couple of dozen pages then it can sometimes be difficult to create a site navigation scheme that allows users to quickly find exactly what they're looking for. One way to improve site navigation is to add a search facility to the website.

Unfortunately, building a website search facility for your website can be a time consuming exercise. Although ASP.NET supports the searching of files using the Windows Indexing Service, writing code to query can Indexing Service can be quite complex. Furthermore, not all web hosting companies support the use of Indexing Service, so this may not be an option for your website.

This example shows how to build a website search engine for ASP.NET. The code samples are in C#, but could be easily adapted for the VB.NET programming language.

Building Your Own ASP.NET Search Engine

While it is possible to build a file based search facility using C#, the problem with this approach is that a significant amount of effort would be required to build the file content indexing routine. A database would also be required to store the list of words within the website. Furthermore, if the file system is indexed rather than the actual website then it would be possible for undesirable content (e.g. include files, global.asax files, restricted access documents) to be indexed and appear in search results.

Building a word index for a website by using a web crawler is an obvious solution to these problems. The web crawler sees the same website content as an end user, so there is no problem with undesired content appearing in search results. Web crawlers can also be prevented from indexing certain parts of websites by making use of robots.txt files and the robots meta tag. Furthermore, a web crawler is not dependent on the underlying technology used on a website, so can crawl websites regardless of whether they use PHP, ASP, ASP.NET or a combination of all three.

Building a web crawler is not a trivial exercise, so this code sample relies on our web crawling product - The Website Utility. This product crawls any website and automatically builds the .NET class necessary to allow the website to be searched for text strings. Note that version 2.0 of the Microsoft .NET Framework or above is required.

The .NET search engine created by The Website Utility is contained within the partial class TWUSearch of the namespace com.WinnershTriangle.TheWebsiteUtility. The partial class is contained in two files: TWUSearchCode.cs and TWUSearchData.cs. Both of these files should be copied to the ASP.NET web application's App_Code folder - the TWUSearch class is then accessible to other code files in the web application.

The TWUSearch partial class has a number of methods and properties, which are described below:

Methods

  • SetQuery(query as string) (returns void) Displays a message that no matching results were found.
  • GetSearchResults() (returns DataSet): Retrieves search results.
  • GetErrorMessage() (returns string): Retrieves a description of the error.

Properties

  • MaximumSearchResults (int): Gets/sets the number of matching documents.
  • ReturnPageTitles (bool): Optionally turn offs the return of page titles in the DataSet.
  • ReturnPageDescriptions (bool): Optionally turn offs the return of page descriptions in the DataSet.
  • HasErrors (bool): Returns true if an error occurred (use the GetErrorMessage() method to retrieve the error message).
  • DebugMessage (string): Returns debugging messages (for troubleshooting only).

The C# partial class file TWUSearchData.cs contains the data structures needed for the search class. If you re-crawl a website to update the search facility, this is the only file that will have changed, so updating the search facility may be achieved by overwriting the website's previous copy of this file.

Using the ASP.NET Search Object from C#

The source code below shows how to instantiate the .NET website search class and retrieve a DataSet of search results matching the search query. In this example, the query is set from the Text property of a textbox called TWUSearch, and the search results are databound to the GridView1 GridView control.

The results are sorted in descending rank by making use of the DataView's Sort method.

/// <summary>
/// Show the search results after the search button is invoked
/// </summary>
///
<param name="sender"></param>
///
<param name="e"></param>
protected void submitbutton_Click(object sender, EventArgs e)
{
  //Initialise the search class
  com.WinnershTriangle.TheWebsiteUtility.TWUSearch SearchObject = new com.WinnershTriangle.TheWebsiteUtility.TWUSearch();

  //Set search query from the TextBox control
  SearchObject.SetQuery(TWUQuery.Text);

  //Initialise a DataSet for the search results
  DataSet SearchData = new DataSet();

  //Optionally change the maximum number of search results (default is 50)
  SearchObject.MaximumSearchResults = 25;

  //Optionally turn off the return of page titles (default is to return titles)
  SearchObject.ReturnPageTitles = true;

  //Optionally turn off the return of page descriptions (default is to return descriptions)
  SearchObject.ReturnPageDescriptions = true;

  //Retrieve the search results
  SearchData = SearchObject.GetSearchResults();

  //Note that if the search facility encounters an error you can call
  //the GetErrorMessage() method to retrieve a description of the error.

  string SearchError = SearchObject.GetErrorMessage();

  //Check to see if any matching pages were found
  if (SearchObject.NumberOfMatchingPages == 0)
  {

    //Did an error occur?
    if (SearchObject.HasErrors == false)
    {

      //User probably searched for a term that does not exist
      LabelSearchResults.Text = "No matching pages were found for this query. Please try another search.";
      GridView1.Visible = false;
    }

  }

  //Did an error occur?
  if (SearchObject.HasErrors)
  {

    LabelSearchResults.Text = "This search failed due to: " + SearchError + ". Please try another search.";
    GridView1.Visible = false;
  }

  //No errors were encountered and there were matching pages in the search
  //results, so display the search results GridView

  if (SearchObject.HasErrors == false && SearchObject.NumberOfMatchingPages > 0)
  {

  //Create a DataView from the search results data
  DataView SearchDataView = new DataView(SearchData.Tables[0]);

  //Sort the search results by rank
  SearchDataView.Sort = "PageRank DESC";

  GridView1.DataSource = SearchDataView;
  GridView1.Visible = true;

  //Show the number of search results
  LabelSearchResults.Text = SearchObject.NumberOfMatchingPages.ToString() + " matching page(s) were found.";

  //Bind the search results data to the GridView
  GridView1.DataBind();

  }

}

How it Works

The Website Utility extracts all of the words from the website and finds the most relevant pages in the website for each word. Common English words (e.g. got, like, then) are removed, as are words of one or two characters in length. Word rankings depend on many factors, including their distribution through the entire website and their distribution in the content of a specific page.

Pages are sorted in search results according to their ranking for the particular word or words being searched for. The ranking scale goes from 0 to 99. Rank is higher for pages that most closely match the search term. In general, searching for words that are common on the site will produce search results with a lower rank than very specific words that occur on only one or two pages.

Important Note: For very large websites or more sophisticated searching, you may need to consider using a specialised server-based search solution such using ASP.NET to search Microsoft's Indexing Service. The Indexing Service Companion can be used to allow Index Server to search remote websites (and also to search more than one website simultaneously).

Useful Development Tools

ASP Documentation Tool™
Automatically creates technical documentation for ASP 2.0 and 3.0 web applications written in VBScript and JScript. Documentation for Microsoft Access, SQL Server 7/2000 databases and Visual Basic 6.0 components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (5.2Mb ZIP file).

.NET Documentation Tool
Automatically creates technical documentation for .NET Framework applications written in C# or VB.NET (including ASP.NET). Documentation for SQL Server 7/2000/2005 databases and C#/VB.NET components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats. Additional support for ASP.NET web applications. A useful alternative to the popular NDoc code documentor.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (3Mb ZIP file).

SQL Documentation Tool
The SQL Documentation Tool creates technical documentation for Microsoft SQL Server 7.0, 2000 and 2005 databases. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced. The SQL Documentation Tool documents SQL Server Tables, Views, Stored Procedures, Triggers, Table Relationships, Jobs and DTS Packages.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (10.3Mb ZIP file).

VB Documentation Tool
The VB Documentation Tool creates technical documentation for Microsoft Visual Basic 6.0 projects. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (1Mb ZIP file).

The Website Utility
The Website Utility examines websites for errors and areas that need to be optimised for search engines by using a built in web crawling engine. Errors checked for include broken or moved hyperlinks, missing page titles and missing meta tags. It also generates HTML for use in creating website site maps (table of contents pages - like this one), and is able to create both client-side JavaScript search engines and server-side ASP search engines and ASP.NET search engines for a website.
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (3Mb ZIP file).

Text Workbench
Text Workbench is a file search and replacement utility for text files and Microsoft Office documents. Make rapid file replacements on multiple files and folders full of files. Advanced replacement options include regular expressions support. It even works on remote file systems via FTP. A Regular Expression Laboratory allows advanced pattern matching and replacement expressions to be built and tested. This great utility will make your everyday development tasks much easier!
   Download Trial Version of Text Workbench Download Trial Version (3Mb ZIP file; you have the option to either install directly from this link or save the file for later installation).

Indexing Service Companion
The Indexing Service Companion is a utility that extends the functionality of the Microsoft Windows Indexing Service so that it is able to index content from any remote website and also from ODBC compliant databases. As such it can be used as a low cost alternative to Sharepoint's Search Services.
   View Product Documentation View Product Documentation (119K ZIP file).
   Try Sample Search Facility Try Sample Search Facility.
   Download Trial Version Download Trial Version (1.7Mb ZIP file).

ASP Spell Check
ASPSpellCheck is the easy way to add spell checking capabilities to your ASP or ASP.NET websites, Intranets and web applications. The utility allows you to add spell checking capabilities to any HTML text field or rich content editing text box. It works with all common web browsers, and there are no components or databases to install on the server.
   Read a review of the ASP Spell Check server component Read ASPSpellCheck Review.
   View Examples of the ASPSpellCheck component for adding spell checking capabilities to ASP web applications View ASPSpellCheck Examples.
   Download Trial Version of ASPSpellCheck Download Trial Version (3Mb ZIP file; you have the option to either install directly from this link or save the file for later installation).

WAPT Website Application Load, Stress and Performance Testing Software
WAPT is a useful software tool for the automated testing of website performance under various load and stress scenarios.
   Website Application Load, Stress and Performance Testing Software review Read WAPT Review.
   Download Trial Version of WAPT Download Trial Version

Author details

Brett Burridge has worked as a web developer since 1997 and has developed web applications for a range of corporations, start up busiensses and educational establishments.

Brett is presently employed as an Internet developer and technical writer through his own company, Winnersh Triangle Web Solutions Limited. The company produces a number of innovative products, including a range of software documentation tools, which include the ASP Documentation Tool™, the .NET Documentation Tool for VB.NET and C#, and the SQL Server Documentation Tool. Other products include The Website Utility, which functions as a website error checker, search engine optimizer and ASP/ASP.NET search engine builder application.

As well as the ASPAlliance, Brett has written articles for Ariadne.ac.uk, ASPToday, the software documentation portal www.softwaredocumentation.info, and has contributed recipes to the ASP.NET Developer's Cookbook.    links

Outside web development, Brett is interested in travelling (here are my travel logs from New York, Hong Kong and Tokyo), digital photography (here's my photo gallery), tropical fishkeeping and collecting contemporary works of art by artists such as Doug Hyde.

Contact Brett by emailing

Winnersh Triangle Web Solutions - Quality web development at affordable prices

Article history

"How to build an ASP.NET website search engine" published on ASPAlliance.com on 7 January 2007.

ASP Kitchen: ASP.NET Articles: How to build an ASP.NET website search engine

Documentation tools to automate the documentation of SQL Server databases and ASP, C#, VB.NET and VB 6.0 application source code

Download a Free ASP Documentation Tool Now!

Google

Search Engine Builder - Build a search engine for your website!

© page content copyright Brett Burridge 1998 - 2009.