aspalliance.com | RemASP Home | domains | authors.aspalliance.com | remas | Library | HighlightKeywords

Highlight Keywords

Search

 by Remas Wojciechowski

Fairly often you want to highlight some keywords in a string, e.g. when displaying search results. This article describes how to go about it. There are several ways to accomplish that task. I will discuss three solutions here.

The most straight-forward solution seems to be using the intrinsic VBScript function Replace().

Solution 1: Replace()

' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_Replace
'
'
Function string_SnR_Replace(ByVal strText, ByVal aKeywords, ByVal strHLStart, ByVal strHLEnd, ByVal bCaseSensitive)
    Dim intLB
    Dim intUB
    Dim i
    Dim strTemp
    Dim strHighlighted
    Dim intCompareMode
    intLB = LBound(aKeywords)
    intUB = UBound(aKeywords)
    If bCaseSensitive Then
        intCompareMode = vbBinaryCompare
    Else
        intCompareMode = vbTextCompare
    End If
    strTemp = strText
    For i = intLB To intUB
        strHighlighted = strHLStart & aKeywords(i) & strHLEnd
        strTemp = Replace(strTemp, aKeywords(i), strHighlighted, 1, -1, intCompareMode)
    Next
string_SnR_Replace = strTemp
End Function

Example: Replace Solution

string_SnR_Replace("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False) yields:

Who is <b>cat</b> Stevens? Does he like <b>cat</b>s? <b>cat</b> could <b>be</b> an abbreviation!

The above solution, however, has some serious downfalls:

  • For case-insensitive searches it ignores the case of both the string to be searched (good) and the keyword (bad). Say we have the following string: "Who is Cat Stevens?". If the keyword is "cat", the function will return "Who is cat Stevens?". Since the search was case-insensitive, it is correct that a replacement occured. It is wrong, though, that the replacement changed the original case.
  • It's impossible to perform an exact keyword search, e.g. there is no way to stop the function from replacing "eve" in "Stevens". Adding spaces to the left and to the right of the keyword would cause the function not to function properly if the keyword was terminated by a carriage-return character or it was the first or the last word in the string to be searched.
  • In certain scenarios highlighting multiple keywords in one string to be searched could go wrong:
    • Assume that keyword 1 is a substring of keyword 2 and, if treated separately, they both produce a match. If, however, you try to replace them in the same string and keyword 1 is replaced first, the strings that are added to the string to be searched will effect in keyword 2 not to be found. This problem could be alleviated by sorting the keywords array by the length of the keywords in descending order.
    • If a keyword can be found in one of the strings that are added to the original string and some replacement already occured, the added strings will be treated as part of the original string and replaced. E.g. if in front of the matching keyword the string <span class="highlight"> was to be added, then for a keyword "class" the HTML attribute class would be replaced damaging the HTML document.

We don't want it, do we? It looks like the text has to be analysed more thoroughfully. In solution 2 we will try to simulate the analysis by means of functions like InStr(), ...

Solution 2: InStr() & Co.

' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_InStr
'
'
Function string_SnR_InStr(ByVal strSource, ByVal aKeywords, ByVal strWrap_Left, ByVal strWrap_Right, ByVal bCaseSensitive)
   Dim strNew
   Dim strTemp
   Dim strKwd
   Dim intPos_New
   Dim intPos_Old
   Dim intKwd_Len
   Dim intSrc_Len
   Dim intSearchMode
   Dim intLB
   Dim intUB
   Dim i
  
   strTemp = strSource
   If bCaseSensitive Then
      intSearchMode = vbBinaryCompare
   Else
      intSearchMode = vbTextCompare
   End If
   intLB = LBound(aKeywords)
   intUB = UBound(aKeywords)
  
   For i = intLB To intUB
      strKwd = aKeywords(i)
      strNew = ""
      intPos_Old = 1
      intPos_New = InStr(intPos_Old, strTemp, strKwd, intSearchMode)
      intKwd_Len = Len(strKwd)
      intSrc_Len = Len(strTemp)
      Do While intPos_New > 0
         strNew = strNew & Mid(strTemp, intPos_Old, intPos_New - intPos_Old)
         strNew = strNew & strWrap_Left
         strNew = strNew & Mid(strTemp, intPos_New, intKwd_Len)
         strNew = strNew & strWrap_Right
         intPos_Old = intPos_New + intKwd_Len
         intPos_New = InStr(intPos_Old, strTemp, strKwd, intSearchMode)
      Loop
      If intSrc_Len - intPos_Old > 0 Then strNew = strNew & Right(strTemp, intSrc_Len - intPos_Old)
      strTemp = strNew
   Next
   string_SnR_InStr = strNew
End Function

Example: InStr Solution

string_SnR_InStr("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False) yields:

Who is <b>Cat</b> Stevens? Does he like <b>cat</b>s? <b>CAT</b>could <b>be</b>an abbreviation!

Apparently the case of the original string is maintained, so that the major downfall of the Replace() approach is alleviated. It seems, though, like this code is not efficiency incarnate--a lot of loops and concatenation involved. Anyway, before we make any judgements about the performance, let's have a look at the 3rd solution.

Solution 3: RegExp

This solution employs the regular expressions to perform the replacement. Note, that you need VBScript Engine version 5.0 or later.

' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_RegExp
'
'
Function string_SnR_RegExp(ByVal strText, ByVal aKeywords, ByVal strHighlightStart, ByVal strHighlightEnd, ByVal bCaseSensitive, ByVal bExactSearch)
   Dim objRegExp
  Dim strPattern
  Dim strTemp
  Dim strReplacePattern
  Set objRegExp = New RegExp
  objRegExp.IgnoreCase = Not bCaseSensitive
  If bExactSearch Then
    strPattern = "(\b" & Join(aKeywords, "\b|\b") & "\b)"
  Else
    strPattern = "(" & Join(aKeywords, "|") & ")"
  End If
  strReplacePattern = strHighlightStart & "$1" & strHighlightEnd

  objRegExp.Global = True
  objRegExp.Pattern = strPattern
  strTemp = objRegExp.Replace(strText, strReplacePattern)
    Set objRegExp = Nothing
string_SnR_RegExp = strTemp
End Function

Example: RegExp Solution

Exact, case insensitive search

string_SnR_RegExp("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False, True) yields:

Who is <b>Cat</b> Stevens? Does he like cats? <b>CAT</b> could <b>be</b> an abbreviation!

Case insensitive search, match all occurences

string_SnR_RegExp("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False, False) yields:

Who is <b>Cat</b> Stevens? Does he like <b>cat</b>s? <b>CAT</b> could <b>be</b> an abbreviation!

Efficiency, or Which Solution Should You Use?

In this secition we will compare the efficiency of Solution 2 and Solution 3. Solution 1, due to it's downfalls, is disregarded

On the one hand, one would intuitively expect the 3rd approach to be more efficient than the 2nd. After all that's what regular expressions are for and the 2nd approach uses many loops and concatenation. On the other hand, an RegExp object needs to be created and disposed of anytime the function employing regular expressions is called. Let's see the results of the tests:

Scenario 1:
few keywords (1), all matching, short text to be searched (68 characters)

# IterationsInStrRegExp
10020
10020123
10001831245
10000182612371
Results in miliseconds

Winner: InStr Solution

Scenario 2:
many keywords (9), all matching, short text to be searched (68 characters)

# IterationsInStrRegExp
10713
10087150
10008041525
10000802514892
Results in miliseconds

Winner: InStr Solution

Scenario 3:
many keywords (9), all matching, long text to be searched (8704 characters)

# IterationsInStrRegExp
1011654304
100 2894
1000  
10000  
Results in miliseconds

Winner: RegExp Solution

Scenario 4:
many keywords (9), none matching, long text to be searched (8704 characters)

# IterationsInStrRegExp
10110434
10010754279
10001070542769
10000  
Results in miliseconds

Winner: InStr Solution

Scenario 5:
few keywords (1), all matching, long text to be searched (8704 characters)

# IterationsInStrRegExp
10223637
10022245374
1000 3912
10000  
Results in miliseconds

Winner: RegExp Solution

Conclusions:

  • The InStr Solution is superior when the string to be searched is short and there are few matching keywords.
  • The more matching keywords, the less the advantage of the InStr Solution over the RegExp Solution
  • The RegExp solution is superior when the string to be searchged is long and there are many matching keywords. In that setup the InStr solution is very memory consuming and slow.