ASPAlliance ASP Kitchen  
Search: Go  

ASP Kitchen: ASPWatch.com articles: Using the HTTP Protocol With PerlScript and ASP

Using the HTTP Protocol With PerlScript and ASP

Introduction

One topic often discussed by ASP programmers is how to access content from other servers using protocols such as HTTP. There are many uses of such procedures, such as ensuring a user entering details into a web form enters a valid URL, or for pulling stock quotes from one site and publishing them via another.

Information

Handy Tip!

If you prefer to write your web applications in Perl then you will be pleased to know that Perl will be available as one of the many programming languages in the ASP.NET Framework.

...More Information

There are several approaches to obtaining content from other servers, and in particular using the HTTP protocol to programmatically access one web page from within another. ASP developers using VBScript or JScript might like to take a look at this article, which describes using an ActiveX object to achieve this. Alternatively the AspHTTP™ component from ServerObjects Inc. is popular with developers.

An alternative approach is to use the PerlScript ActiveX scripting engine. This allows developers to write ASP documents in Perl, rather than the traditional VBScript or JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively easy to learn. It has long been the language of choice for many web developers, and due to the long association of Perl with the Internet, it is also unsurprising to find that it offers excellent support for the development of Internet applications. Perl is also a good choice when writing a script to extracting and parsing content from other servers due to its superior text handling capabilities.

Using PerlScript

If you want to write an ASP document in PerlScript, then you may want to add the following as the first line of your document:

<%@ LANGUAGE="PerlScript" %>

All the code added to this page between the <% %> tags will then be interpreted as PerlScript instead of the server’s default scripting language (which is usually VBScript).

Although you can, in theory, mix VBScript, JScript and PerlScript within the same document, this will lead to decreased server performance when compared to using a single scripting engine. More importantly, you run the risk of your ASP document outputting content from the various scripting engines in a different order to that which you might have intended. 

One further warning is that there will likely be all kinds of security risks from letting your web pages take input from other web pages. You should, therefore, use this sample code with care, or perhaps restrict its use to an Intranet environment rather than on a publicly accessible Internet site. Don’t forget as well that extracting content from third party web services could bring you into legal difficulties unless you have explicit permission to do so!

Anyway, onto the code samples. The first is a function called CheckURL that will determine whether a specified URL exists. The script uses the libwww Perl library, a collection of modules that can be used to programmatically access the web.

<%
sub CheckURL {
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to check

$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

#Initialise the HTTP request
$request = new HTTP::Request 'GET' => $url_to_check;

# Set the UserAgent to receive HTML
$request->header('Accept' => 'text/html');

# Send the HTTP request
$result = $ua->request($request);

# Check the outcome of the HTTP request
if ($result->is_success) {
$url_status = "$url_to_check was detected";
} else {
$url_status = "$url_to_check was not detected";
}

# Return a string with the status of the request
return $url_status;

}
%>

This function can then be called using the following PerlScript (changing the required URL as appropriate):

<%
$Response->Write(CheckURL("http://www.brettb.com/"));
%>

Extending the script

PerlScript offers a wealth of ways for extending the basic script shown above. For example, using the following as the last line of the CheckURL function will cause the script to return the actual HTML from the HTTP request:

return $result->content;

This is useful if you want to parse the HTML in order to extract portions of it.

Alternatively, if you are interested in the precise error message returned from a server, then the following code will be useful:

return $result->error_as_HTML;

If a URL is not found, then the function will return the following:

An Error Occurred
404 Object Not Found

Writing a link extractor

The following code demonstrates how PerlScript can be used to extract all of the hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and LinkCollector. ExtractLinks is the main function. LinkCollector is called from ExtractLinks, and is used to gather the requested document’s hyperlinks into a list. The two functions are shown below:

sub ExtractLinks{

# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to extract links from

$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Use the link extracting HTML parser
use HTML::LinkExtor;

# The URL module is used here to expand URLs by including their base reference
use URI::URL;

# Create a list that will be used to contain details of the links within the document
@LinksList= (); 

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

# Create an instance of the link extracting HTML parser
$parser = HTML::LinkExtor->new(\&LinkCollector);

#Initialise the HTTP request
$result = $ua->request(HTTP::Request->new(GET => $url_to_check),
sub {$parser->parse($_[0])});

# Expand URLs to include the base reference
$base = $result->base;
@LinksList = map { $_ = url($_, $base)->abs; } @LinksList;

# Check the outcome of the HTTP request
# If successful, then return a list of links in the requested document
# otherwise, return an error message

if ($result->is_success) {

for (@LinksList) {
$LinksList = $LinksList . "$_<br>";
}

return "$LinksList";

} else {
return "$url_to_check was not detected";
}

}

# A short subroutine to collect the links into a list
sub LinkCollector {

($tag, %attr) = @_;
push(@LinksList, values %attr);

}
%>

The ExtractLinks subroutine can then be called using something like:

<%
$Response->Write(ExtractLinks("http://www.brettb.com/"));
%>

Further reading

If you want to install ActivePerl on your web server, then download it (free of charge) from the ActiveState website. The installation routine creates an extensive library of documentation, including reference guides to the Perl modules and functions described in this article.

There are plenty of online resources for learning Perl, with http://www.perl.com and http://www.perl.org  being two of the best starting points. There is also a good introductory article about using Perl with ASP on ASPToday, as well as one on Web Techniques.

You might also like to invest in one of these featured books:

ActivePerl with ASP and ADO  Learning Perl (2nd Edition)  Effective Perl Programming: Writing Better Programs With Perl

Useful Development Tools

ASP Documentation Tool™
Automatically creates technical documentation for ASP 2.0 and 3.0 web applications written in VBScript and JScript. Documentation for Microsoft Access, SQL Server 7/2000 databases and Visual Basic 6.0 components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (5.2Mb ZIP file).

.NET Documentation Tool
Automatically creates technical documentation for .NET Framework applications written in C# or VB.NET (including ASP.NET). Documentation for SQL Server 7/2000/2005 databases and C#/VB.NET components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats. Additional support for ASP.NET web applications. A useful alternative to NDoc!
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (3Mb ZIP file).

SQL Documentation Tool
The SQL Documentation Tool creates technical documentation for Microsoft SQL Server 7.0, 2000 and 2005 databases. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced. The SQL Documentation Tool documents SQL Server Tables, Views, Stored Procedures, Triggers, Table Relationships, Jobs and DTS Packages.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (10.3Mb ZIP file).

VB Documentation Tool
The VB Documentation Tool creates technical documentation for Microsoft Visual Basic 6.0 projects. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (1Mb ZIP file).

The Website Utility
The Website Utility examines websites for errors and areas that need to be optimised for search engines by using a built in web crawling engine. Errors checked for include broken or moved hyperlinks, missing page titles and missing meta tags. It also generates HTML for use in creating website site maps (table of contents pages - like this one), and is able to create both client-side JavaScript search engines and server-side ASP search engines and ASP.NET search engines for a website.
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (3Mb ZIP file).

Text Workbench
Text Workbench is a file search and replacement utility for text files and Microsoft Office documents. Make rapid file replacements on multiple files and folders full of files. Advanced replacement options include regular expressions support. It even works on remote file systems via FTP. A Regular Expression Laboratory allows advanced pattern matching and replacement expressions to be built and tested. This great utility will make your everyday development tasks much easier!
   Download Trial Version of Text Workbench Download Trial Version (3Mb ZIP file; you have the option to either install directly from this link or save the file for later installation).

Indexing Service Companion
The Indexing Service Companion is a utility that extends the functionality of the Microsoft Windows Indexing Service so that it is able to index content from any remote website and also from ODBC compliant databases. As such it can be used as a low cost alternative to Sharepoint's Search Services.
   View Product Documentation View Product Documentation (119K ZIP file).
   Try Sample Search Facility Try Sample Search Facility.
   Download Trial Version Download Trial Version (1.7Mb ZIP file).

ASP Spell Check
ASPSpellCheck is the easy way to add spell checking capabilities to your ASP or ASP.NET websites, Intranets and web applications. The utility allows you to add spell checking capabilities to any HTML text field or rich content editing text box. It works with all common web browsers, and there are no components or databases to install on the server.
   Read a review of the ASP Spell Check server component Read ASPSpellCheck Review.
   View Examples of the ASPSpellCheck component for adding spell checking capabilities to ASP web applications View ASPSpellCheck Examples.
   Download Trial Version of ASPSpellCheck Download Trial Version (3Mb ZIP file; you have the option to either install directly from this link or save the file for later installation).

Author details

Brett Burridge has worked as a web developer since 1997 and has developed web applications for a range of corporations, start up busiensses and educational establishments.

Brett is presently employed as an Internet developer and technical writer through his own company, Winnersh Triangle Web Solutions Limited. The company produces a number of innovative products, including a range of software documentation tools, which include the ASP Documentation Tool™, the .NET Documentation Tool for VB.NET and C#, and the SQL Server Documentation Tool. Other products include The Website Utility, which functions as a website error checker, search engine optimizer and ASP/ASP.NET search engine builder application.

As well as the ASPAlliance, Brett has written articles for Ariadne.ac.uk, ASPToday, the software documentation portal www.softwaredocumentation.info, and has contributed recipes to the ASP.NET Developer's Cookbook.    links

Outside web development, Brett is interested in travelling (here are my travel logs from New York, Hong Kong and Tokyo), digital photography (here's my photo gallery), tropical fishkeeping and collecting contemporary works of art by artists such as Doug Hyde.

Contact Brett by emailing

Download a free ASP Documentation Tool now!!!

Article history

"Using the HTTP protocol with PerlScript and ASP" originally published on ASPWatch.com on April 26 2000. Republished on ASPAlliance.com on 1 October 2001.

ASP Kitchen: ASPWatch.com articles: Using the HTTP Protocol With PerlScript and ASP

Documentation tools to automate the documentation of SQL Server databases and ASP, C#, VB.NET and VB 6.0 application source code

Download a Free ASP Documentation Tool Now!

Google

Search Engine Builder - Build a search engine for your website!

© page content copyright Brett Burridge 1998 - 2008.