|
Need a top-notch web developer? Hire ME!
|
... or Creative use of Server Variables
Have you ever wondered how many of those clicks on your hit counter are just
spiders munching everything in sight? Want to know how to catch the "real people"? For example,
I recently built an all-graphics site that wouldn't register with the search engines because there was no
body text. As a workaround, I built a secondary page which displayed relevant keywords to robots
and redirected users to the real site, with the help of a fairly simple function.
The function is called isHuman, and it returns a boolean value: true for human, false
for robot. Pretty simple, if you think about it... but how in the world do you pull off the mask? If you
haven't met before, I'd like to introduce you to the
Server Variables collection.
Many of you already know about these variables, and have used them for everything from
security to personalization. Take notice of the one called HTTP_USER_AGENT; it contains a string which identifies the browser name, version,
and operating system. You may also notice, if you're using a Microsoft browser, that the word "Mozilla" (a Netscape
trademark word) heads up the string. This dates back to the early IE days when a new-to-the-web MS was trying
to make its browser accepted, and the best way to tell the server was to say "It's just like Netscape"... but
that's for another story.
Your user agent is CCBot/1.0 (+http://www.commoncrawl.org/bot.html)
So we can see the browser type. How can we use that? Well, as it turns out, there are
certain key words that are unique to "human" browsers, and likewise for the robots. The following
function, called isHuman, returns a boolean (true/false) value based on three tests for these keywords.
Function isHuman()
Dim strBrowser, strAccepted, strRejected, strCrawlers
Dim arrAccepted, arrRejected, arrCrawlers
Dim strHumanCookie
Dim intCount
Dim booIsHuman
Dim strRefresh
First we set booIsHuman to determine the default value of the function (True for unknown browsers).
If you wish, you can change this initialization or eliminate it and trap a type mismatch error if the
test is inconclusive (you could also make it a string and return whatever you want, but this way seems cleaner to me).
booIsHuman = True
If the function has been run already during this visit AND if the browser supports cookies, we can save time
and processor cycles by finding the isHuman cookie and bypassing the rest of the function.
strHumanCookie = Request.Cookies("isHuman")
If strHumanCookie = "Y" Then
booIsHuman = True
ElseIf strHumanCookie = "N" Then
booIsHuman = False
Else
Next we retrieve the user agent string. Creating the ServerVariables
collection has considerable overhead, so we'll do it only if the user isn't already labeled.
After that, we'll build the criteria strings for our three tests. FYI, the criteria that ship with this
function are derived from Juan Llibre's latest version of browscap.ini, downloaded from
asp.net.do.
strBrowser = UCase(Request.ServerVariables("HTTP_USER_AGENT"))
' This is a list of keywords that will be found in "human" browsers.
strAccepted = "Mozilla|PRODIGY|NaviPress|Lynx|libwww|amaya|iCab|Cyberdog|Mosaic"
strAccepted = strAccepted & "|O'Reilly|HotJava|Java1|JDK|Nokia|Amiga|IBrowse"
strAccepted = UCase(strAccepted)
' This is a list of keywords that will be found in robot browsers.
strRejected = "Powermarks|BorderManager|NetMind|EZResult|WebWhacker"
strRejected = strRejected & "|Robot|Crawler"
strRejected = UCase(strRejected)
' This is a list of user agent strings for known robot browsers.
strCrawlers = "[Mozilla 4.0]|[Mozilla/4.x (Win95)]"
strCrawlers = strCrawlers & "|[Mozilla/5.0 (compatible; MSIE 5.0)]"
strCrawlers = UCase(strCrawlers)
arrAccepted = Split(strAccepted,"|")
arrRejected = Split(strRejected,"|")
arrCrawlers = Split(strCrawlers,"|")
Test 1: check to see if the entire string matches that of a known spider/bot;
this is only for the few that do not contain a specialized bot keyword.
This test will automatically disqualify the agents in arrCrawlers.
For intCount = 0 To UBound(arrCrawlers)
If strBrowser = arrCrawlers(intCount) Then
booIsHuman = False
Exit For
End If
Next
Test 2: check for the presence of an "accepted browser" keyword;
this will let through a few robot
agents which will be picked up by the final test.
If booIsHuman = True Then
For intCount = 0 To UBound(arrAccepted)
If InStr(strBrowser,arrAccepted(intCount)) > 0 Then
booIsHuman = True
Exit For
End If
Next
End If
Test 3: check for the presence of a "rejected bot" keyword;
any bots that make it through the Accepted
test will be screened out here.
If booIsHuman = True Then
For intCount = 0 To UBound(arrRejected)
If InStr(strBrowser,arrRejected(intCount)) > 0 Then
booIsHuman = False
Exit For
End If
Next
End If
We're almost done! Set the cookie to label the user for next time. Remember to use Response.Buffer = True,
or your application will yield the "The HTTP headers are already written to the client browser" error.
If booIsHuman = True Then
Response.Cookies("isHuman") = "Y"
ElseIf booIsHuman = False Then
Response.Cookies("isHuman") = "N"
Else
booIsHuman = "Unknown Browser"
End If
End If
Pass the value out the door, and the suspense is over;
You'll be relieved to know that you've been declared human.
isHuman = booIsHuman
End Function
Download the entire documented function for
your own use; add it to your library or include it solo.
peterbrunone@aspalliance.com
|