|
ASP Kitchen: ASP.NET Articles: How to build an ASP.NET website search engine How to build an ASP.NET website search engineOnce a web site grows beyond a couple of dozen pages then it can sometimes be difficult to create a site navigation scheme that allows users to quickly find exactly what they're looking for. One way to improve site navigation is to add a search facility to the website.Unfortunately, building a website search facility for your website can be a time consuming exercise. Although ASP.NET supports the searching of files using the Windows Indexing Service, writing code to query can Indexing Service can be quite complex. Furthermore, not all web hosting companies support the use of Indexing Service, so this may not be an option for your website. This example shows how to build a website search engine for ASP.NET. The code samples are in C#, but could be easily adapted for the VB.NET programming language. Building Your Own ASP.NET Search EngineWhile it is possible to build a file based search facility using C#, the problem with this approach is that a significant amount of effort would be required to build the file content indexing routine. A database would also be required to store the list of words within the website. Furthermore, if the file system is indexed rather than the actual website then it would be possible for undesirable content (e.g. include files, global.asax files, restricted access documents) to be indexed and appear in search results. Building a word index for a website by using a web crawler is an obvious solution to these problems. The web crawler sees the same website content as an end user, so there is no problem with undesired content appearing in search results. Web crawlers can also be prevented from indexing certain parts of websites by making use of robots.txt files and the robots meta tag. Furthermore, a web crawler is not dependent on the underlying technology used on a website, so can crawl websites regardless of whether they use PHP, ASP, ASP.NET or a combination of all three. Building a web crawler is not a trivial exercise, so this code sample relies on our web crawling product - The Website Utility. This product crawls any website and automatically builds the .NET class necessary to allow the website to be searched for text strings. Note that version 2.0 of the Microsoft .NET Framework or above is required. The .NET search engine created by The Website Utility is contained within the partial class TWUSearch of the namespace com.WinnershTriangle.TheWebsiteUtility. The partial class is contained in two files: TWUSearchCode.cs and TWUSearchData.cs. Both of these files should be copied to the ASP.NET web application's App_Code folder - the TWUSearch class is then accessible to other code files in the web application. The TWUSearch partial class has a number of methods and properties, which are described below: Methods
Properties
The C# partial class file TWUSearchData.cs contains the data structures needed for the search class. If you re-crawl a website to update the search facility, this is the only file that will have changed, so updating the search facility may be achieved by overwriting the website's previous copy of this file. Using the ASP.NET Search Object from C#The source code below shows how to instantiate the .NET website search class and retrieve a DataSet of search results matching the search query. In this example, the query is set from the Text property of a textbox called TWUSearch, and the search results are databound to the GridView1 GridView control. The results are sorted in descending rank by making use of the DataView's Sort method. /// <summary>/// Show the search results after the search button is invoked /// </summary> /// <param name="sender"></param> /// <param name="e"></param> protected void submitbutton_Click(object sender, EventArgs e) { //Initialise the search class com.WinnershTriangle.TheWebsiteUtility.TWUSearch SearchObject = new com.WinnershTriangle.TheWebsiteUtility.TWUSearch(); //Set search query from the TextBox control //Initialise a DataSet for the search results //Optionally change the maximum number of search results
(default is 50) //Optionally turn off the return of page titles (default
is to return titles) //Optionally turn off the return of page descriptions
(default is to return descriptions) //Retrieve the search results //Note that if the search facility encounters an error you
can call //Check to see if any matching pages were found //Did an error occur? //User probably searched for a
term that does not exist } //Did an error occur? LabelSearchResults.Text = "This search
failed due to: " + SearchError + ". Please try
another search."; //No errors were encountered and there were matching pages
in the search //Create a DataView from the search results data //Sort the search results by rank GridView1.DataSource = SearchDataView; //Show the number of search results //Bind the search results data to the GridView } } How it WorksThe Website Utility extracts all of the words from the website and finds the most relevant pages in the website for each word. Common English words (e.g. got, like, then) are removed, as are words of one or two characters in length. Word rankings depend on many factors, including their distribution through the entire website and their distribution in the content of a specific page. Pages are sorted in search results according to their ranking for the particular word or words being searched for. The ranking scale goes from 0 to 99. Rank is higher for pages that most closely match the search term. In general, searching for words that are common on the site will produce search results with a lower rank than very specific words that occur on only one or two pages. Important Note: For very large websites or more sophisticated searching, you may need to consider using a specialised server-based search solution such using ASP.NET to search Microsoft's Indexing Service. The Indexing Service Companion can be used to allow Index Server to search remote websites (and also to search more than one website simultaneously). Useful Development Tools
Author detailsBrett Burridge has worked as a web developer since 1997 and has developed web applications for a range of corporations, start up busiensses and educational establishments. Brett is presently employed as an Internet developer and technical writer through his own company, Winnersh Triangle Web Solutions Limited. The company produces a number of innovative products, including a range of software documentation tools, which include the ASP Documentation Tool, the .NET Documentation Tool for VB.NET and C#, and the SQL Server Documentation Tool. Other products include The Website Utility, which functions as a website error checker, search engine optimizer and ASP/ASP.NET search engine builder application. As well as the ASPAlliance, Brett has written articles for Ariadne.ac.uk, ASPToday, the software documentation portal www.softwaredocumentation.info, and has contributed recipes to the ASP.NET Developer's Cookbook. links Outside web development, Brett is interested in travelling (here are my travel logs from New York, Hong Kong and Tokyo), digital photography (here's my photo gallery), tropical fishkeeping and collecting contemporary works of art by artists such as Doug Hyde. Contact Brett by emailing Article history"How to build an ASP.NET website search engine" published on ASPAlliance.com on 7 January 2007. ASP Kitchen: ASP.NET Articles: How to build an ASP.NET website search engine |
|
|||||||||||||||||||||||||||
| © page content copyright Brett Burridge 1998 - 2009. | ||||||||||||||||||||||||||||