Capture Links With Regular Expressions

Search

 by Remas Wojciechowski

July 5th, 2001

The function CaptureLinks presented here can be used to capture all hyperlinks from a string.

Description

Use this function if to retireve an array of hyperlinks that are captured from a given string. For example, if you pass:

this is a test <a href="http://www.string.com">string</a> with a local, unquoted <a href=c:\localdirectory\file.txt>link</a>
to the function, it will return an array of two elements:
http://www.string.com
c:\localdirectory\file.txt

Test the code!

Note that the function allows for both quoted and unquoted links. However, unquoted links cannot contain white spaces, i.e. only the portion up to the first white space will be captured. If nothing is captured, an empty string is returned.

If you encounter a link the function doesn't capture, please let me know and I'll try and improve it!

Parameters

strSource
The string to be searched for hyperlinks.

Source Code

Test the code!
<%
Function CaptureLinks(ByVal strSource)

  Dim strPattern, strPattern_Quoted, strPattern_Unquoted
  Dim regex
  Dim colMatches
  Dim itemMatch, itemSubMatch
  Dim aLinks()
  Dim i, j
  
  strPattern_Quoted = "<\s*a\s+href\s*=\s*""\s*([\w.;,:/\\?=&~/@+$%#\-_!*'() ]+)\s*""(?:\s|.)*?>"
  strPattern_Unquoted = "<\s*a\s+href\s*=\s*([\w.;,:/\\?=&~/@+$%#\-!*'()]+)(?:>|(?:\s|.)*?>)"
  strPattern = "(?:" & strPattern_Quoted & "|" & strPattern_Unquoted & ")"
  
  Set regex = New RegExp
  regex.Pattern = strPattern
  regex.IgnoreCase = True
  regex.Multiline = True
  regex.Global = True
  Set colMatches = regex.Execute(strSource)
  If colMatches.Count > 0 Then
    i = 0
    For Each itemMatch In colMatches
      For Each itemSubMatch In itemMatch.SubMatches
        If itemSubMatch <> "" Then
          ReDim Preserve aLinks(i)
          aLinks(i) = itemSubMatch
          i = i + 1
        End If
      Next
    Next
    CaptureLinks = aLinks
  Else
    CaptureLinks = ""
  End If
  Set regex = Nothing
End Function

%>