| ASP Kitchen: Classic ASP Articles: The Indexing Service Companion The Indexing Service CompanionThis article describes the Indexing Service Companion, a Windows application I have created that allows Microsoft Index Server to index content from remote websites and ODBC databases. The ProblemThe Windows Indexing Service is a great product! On the administrative side of things, it is easy to install, performance is good, and once installed maintenance tasks are minimal. The development of search applications using ASP is also made fairly straightforward through the use of the Query and Utility server components. The main limitation of Indexing Service is that it can really only be used to index content hosted on servers on the same machine or network as the machine hosting the Indexing Service service. Although it is possible to set up a share to a Unix/Linux Apache webserver using a file sharing solution such as SAMBA, this isn't always satisfactory because Indexing Service is not case sensitive with respect to filenames, so this can cause problems when displaying search results. Another issue is that it can be a chore to prevent Indexing Service from indexing certain content on a server. Unlike a web robot, it has no concept of the Robots Exclusion Standard specification (i.e. robots.txt files) and is unaffected by the 'robots' meta tag. The SolutionRetrieving and indexing content from a web server by use of a web robot is the solution. The web robot is able to mimic a web browser, starting at one page in the site and traversing the links in the site until it has retrieved all of the pages of the site. The robot will potentially be able to retrieve content from any webserver, regardless of the platform it is hosted on. Two products that allow you to do this are Microsoft's Site Server 3.0 and the author's own Indexing Service Companion. Microsoft Site Server 3.0Microsoft's Site Server 3.0 software suite has a Search application that enhances Indexing Service by allowing you to (amongst other things) retrieve and index content from remote websites using an integrated web robot. For an overview of Site Server 3.0 Search, take a look at an article I wrote for ariadne.ac.uk. Unfortunately Site Server 3.0 Search has a few shortcomings, including:
Indexing Service CompanionThe Indexing Service Companion is the cost effective method of retrieving content from remote webservers for Indexing Service to index. Furthermore it also allows retrieval of content from ODBC databases which can be subsequently indexed by Indexing Service. FeaturesThe main features of the Indexing Service Companion are:
System RequirementsThe Indexing Service Companion can be used on a machine running Microsoft Windows 95 or any subsequent version of Windows. Windows NT 4.0 or Windows 2000 is recommended. It also (of course) requires a server running either Indexing Service on Windows NT 4.0 Server, or the Indexing Service on Windows 2000. Note that Indexing Service Companion does not have to be run from the machine on which the Indexing Service is installed. Configuring and Running the Indexing Service CompanionThe Indexing Service Companion executable file or Perl script needs to be run from the Windows command line. Fortunately there is only a single mandatory parameter, which tells the script which configuration file to use. So to run the Indexing Service Companion for the Sample Project, an MSDOS Command Prompt is opened in the folder where the Indexing Service Companion files are installed installed and the following is typed: IndexServerCompanion.exe --c="SampleProject/SampleProject.ini" It is of course possible to run the Indexing Service Companion from .bat scripts, which can then be scheduled using the AT command or the Windows Task Scheduler. This makes it straightforward to update the Indexing Service's index of website and database content at specific times and frequencies. The configuration file (in this instance it is called SampleProject.ini) is a plain text file containing a number of settings. The Indexing Service Companion is supplied with full documentation in Microsoft's HTML Help format that describes each of the configuration settings. When the script is run, the Indexing Service Companion will display details of its status in the Command Prompt window. A detailed log file is also created. How the Indexing Service Companion WorksThe Indexing Service Companion script contains a fully functional web robot that is able to extract the content from all of the required pages of the specified website. It contains support for the Robots Exclusion Standard specification, and support for the robots meta tag contained within individual pages. Each file extracted from the website is modified to contain a special meta tag that give the original URL (for web content). It is then saved to disk from where it can be indexed by Indexing Service. The contents of these special meta tags can then be used by the ASP page displaying the results of a web search, so that clicking on a search result item will display the original URL. Unfortunately Indexing Service will not allow you to retrieve the content from custom meta tags without making a minor modification in the Indexing Service's Microsoft Management Console (MMC), so there is also a special mode in the Indexing Service Companion that appends the original URL into the page's HTML <title> tag. The Indexing Service Companion is also able to index content from database tables, queries (Microsoft Access) and stored procedures (SQL Server). Database connectivity is achieved through the use of ODBC, so potentially any type of database that has an ODBC driver is supported. Searching Web Content with the Indexing Service CompanionIndexing Service Companion allows content from remote websites to be retrieved and consequently indexed by Indexing Service. A working example of this may be seen on my website at the following URL: This is a search page running on Internet Information Server 4.0 (Windows NT 4 Server) that allows you to search my ASPAlliance site (including the article you are presently reading!), together with the articles I have written for Ariadne.ac.uk and ASPToday.com. Since I don't have administrative access to the Indexing Service on the machine hosting the search page, I have used the feature of the Indexing Service Companion that allows the document's original URL to be appended to the original title. For example the <title> tag of the ASPToday article "ASP Documentation Systems" at http://asptoday.com/content.asp?id=1435 is modified in the file saved to read: <title>ISC_URL=http://asptoday.com/content.asp?id=1435 ASP Documentation Systems</title>The URL and original title are separated by a tab character. The search results page then contains a small piece of ASP code to split this title back into the article's URL and original title. The ASP code for the sample search page may be seen below: Searching Databases with the Indexing Service CompanionThe Indexing Service Companion is able to index content from database tables, queries (Microsoft Access) and stored procedures (SQL Server). It is of course entirely possible to search databases using Structured Query Language (SQL), but by making use of Index Server Companion, it is a lot more straightforward to integrate database searches with Indexing Service search results from web page searches. There are also other advantages: Index Server contains sophisticated pattern matching, and it is often lot faster at returning search results than an equivalent SQL statement would be when using a database such as Microsoft Access. Indexing Service Companion is able to retrieve the rows of a specified database table and make an HTML file containing the data from a specific database row. Indexing Service can then be used to index the HTML files, and it is possible to extract the details of the table and row from which the data originated so that the search results page can be modified to point to the original database data. A sample page produced from the SQL Server sample pubs database is shown below: <html> In this example, the title field is optionally used to give the page a title, and the notes field is used for the description meta tag. Each of the custom ISC_ prefixed meta tags can be queried using Indexing Service, although to retrieve their contents a minor configuration change to Indexing Service is required. It is straightforward to create a page which for example, will return the records where the value of the ISC_type meta tag is "mod_cook". The Indexing Service Companion can also modify the HTML's <title> tag to include the table name and row ID, e.g.: <title>ISC_Table=titles ISC_KeyField=title_id ISC_RowNumber=MC2222 Silicon Valley Gastronomic Treats</title> SummaryThe Indexing Service Companion allows Microsoft Indexing Service to index content from remote websites and ODBC databases, making it a cost effective way of significantly extending the functionality of Indexing Service.Comments/Suggestions?I've released the Indexing Service Companion in the hope that other users may find it useful! I'd love to hear what you think of it. Is it useful? What new features do you want? Email with your thoughts. Downloads
Further information
Useful Development Tools
Author detailsBrett Burridge has worked as a web developer since 1997 and has developed web applications for a range of corporations, start up busiensses and educational establishments. Brett is presently employed as an Internet developer and technical writer through his own company, Winnersh Triangle Web Solutions Limited. The company produces a number of innovative products, including a range of software documentation tools, which include the ASP Documentation Tool, the .NET Documentation Tool for VB.NET and C#, and the SQL Server Documentation Tool. Other products include The Website Utility, which functions as a website error checker, search engine optimizer and ASP/ASP.NET search engine builder application. As well as the ASPAlliance, Brett has written articles for Ariadne.ac.uk, ASPToday, the software documentation portal www.softwaredocumentation.info, and has contributed recipes to the ASP.NET Developer's Cookbook. links Outside web development, Brett is interested in travelling (here are my travel logs from New York, Hong Kong and Tokyo), digital photography (here's my photo gallery), tropical fishkeeping and collecting contemporary works of art by artists such as Doug Hyde. Contact Brett by emailing Article history"The Indexing Service Companion" published on ASPAlliance.com on 30 August 2002. Last revised 15 November 2007. ASP Kitchen: Classic ASP Articles: The Indexing Service Companion |
|
||||||||||||||||||||||||
| © page content copyright Brett Burridge 1998 - 2008. | |||||||||||||||||||||||||