|
|
| D: | Domains | Authors.aspalliance.com | Stevesmith | Articles | Easy .NET Screen Scraping |
|
Easy .NET Screen Scraping I named this article's file name 'netscrape', but don't confuse it with the little-used browser with a similar name. In fact, this article is all about how to use .NET's built in library to "screen scrape", that is, how to have .NET send a web request and return the resulting string to you. This required third party components under classic ASP, such as the popular AspHTTP component. However, it's really just incredibly easy with .NET. Everything you need to do screen scraping in .NET is in the System.Net namespace. In particular, you will want to become familiar with the WebRequest and WebResponse objects, which perform the task of sending a request over HTTP and returning the response, respectively. Since actually remembering any of this when you want to grab the contents of a page can be a real strain on the brain, I've created a super simple function to remove any requirement of thought on my part. Feel free to use it in your applications -- donations and/or credit appreciated. Maybe I could pull an Amazon and patent it, like one-click ordering, because it's not immediately obvious... nah, it's pretty darned obvious, as you can see. Almost trivial. But it might save you half an hour trying to figure out the new .NET object model, and it surely saves me time remembering the object model, so it was worth writing this article about. The key point to notice in the code below is the readHtmlPage function. The rest is just necessary code for the example. Also note that you need to include a reference to the System.IO assembly to support the StreamReader class.
I've had a few people write me to tell me that this fails to work with international character sets. The reason for this is that in my sample code I am using the ASCII text encoding. If you need to scrape pages that include non-ASCII characters, use UTF-7 instead, as this forums post describes. Something like this: Dim sr As New StreamReader(objResponse.GetResponseStream(), System.Text.Encoding.UTF7)Thanks to Nicholas Wanderer for sending in this fix. Related Articles:
|
|
|||||||
|
|
|
|
|
|