|
|
by Remas Wojciechowski
|
Fairly often you want to highlight some keywords in a string, e.g. when displaying search results. This article describes how to go about it.
There are several ways to accomplish that task. I will discuss three solutions here.
The most straight-forward solution seems to be using the intrinsic VBScript function Replace().
Solution 1: Replace()
|
|
|
' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_Replace
'
'
Function string_SnR_Replace(ByVal strText, ByVal aKeywords, ByVal strHLStart, ByVal strHLEnd, ByVal bCaseSensitive) Dim intLB Dim intUB Dim i Dim strTemp Dim strHighlighted Dim intCompareMode intLB = LBound(aKeywords) intUB = UBound(aKeywords) If bCaseSensitive Then intCompareMode = vbBinaryCompare Else intCompareMode = vbTextCompare End If strTemp = strText For i = intLB To intUB strHighlighted = strHLStart & aKeywords(i) & strHLEnd strTemp = Replace(strTemp, aKeywords(i), strHighlighted, 1, -1, intCompareMode) Next string_SnR_Replace = strTemp End Function
|
|
|
|
Example: Replace Solution
string_SnR_Replace("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False) yields: Who is <b>cat</b> Stevens? Does he like <b>cat</b>s? <b>cat</b> could <b>be</b> an abbreviation!
The above solution, however, has some serious downfalls:
- For case-insensitive searches it ignores the case of both the string to be searched (good) and the keyword (bad). Say we have the following string: "Who is Cat Stevens?". If the keyword is "cat", the function will return "Who is cat Stevens?". Since the search was case-insensitive, it is correct that a replacement occured. It is wrong, though, that the replacement changed the original case.
- It's impossible to perform an exact keyword search, e.g. there is no way to stop the function from replacing "eve" in "Stevens". Adding spaces to the left and to the right of the keyword would cause the function not to function properly if the keyword was terminated by a carriage-return character or it was the first or the last word in the string to be searched.
- In certain scenarios highlighting multiple keywords in one string to be searched could go wrong:
- Assume that keyword 1 is a substring of keyword 2 and, if treated separately, they both produce a match. If, however, you try to replace them in the same string and keyword 1 is replaced first, the strings that are added to the string to be searched will effect in keyword 2 not to be found. This problem could be alleviated by sorting the keywords array by the length of the keywords in descending order.
- If a keyword can be found in one of the strings that are added to the original string and some replacement already occured, the added strings will be treated as part of the original string and replaced. E.g. if in front of the matching keyword the string <span class="highlight"> was to be added, then for a keyword "class" the HTML attribute class would be replaced damaging the HTML document.
We don't want it, do we? It looks like the text has to be analysed more thoroughfully. In solution 2 we will try to simulate the analysis by means of functions like InStr(), ...
Solution 2: InStr() & Co.
|
|
|
' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_InStr
'
'
Function string_SnR_InStr(ByVal strSource, ByVal aKeywords, ByVal strWrap_Left, ByVal strWrap_Right, ByVal bCaseSensitive) Dim strNew Dim strTemp Dim strKwd Dim intPos_New Dim intPos_Old Dim intKwd_Len Dim intSrc_Len Dim intSearchMode Dim intLB Dim intUB Dim i strTemp = strSource If bCaseSensitive Then intSearchMode = vbBinaryCompare Else intSearchMode = vbTextCompare End If intLB = LBound(aKeywords) intUB = UBound(aKeywords) For i = intLB To intUB strKwd = aKeywords(i) strNew = "" intPos_Old = 1 intPos_New = InStr(intPos_Old, strTemp, strKwd, intSearchMode) intKwd_Len = Len(strKwd) intSrc_Len = Len(strTemp) Do While intPos_New > 0 strNew = strNew & Mid(strTemp, intPos_Old, intPos_New - intPos_Old) strNew = strNew & strWrap_Left strNew = strNew & Mid(strTemp, intPos_New, intKwd_Len) strNew = strNew & strWrap_Right intPos_Old = intPos_New + intKwd_Len intPos_New = InStr(intPos_Old, strTemp, strKwd, intSearchMode) Loop If intSrc_Len - intPos_Old > 0 Then strNew = strNew & Right(strTemp, intSrc_Len - intPos_Old) strTemp = strNew Next string_SnR_InStr = strNew End Function
|
|
|
|
Example: InStr Solution
string_SnR_InStr("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False) yields: Who is <b>Cat</b> Stevens? Does he like <b>cat</b>s? <b>CAT</b>could <b>be</b>an abbreviation!
Apparently the case of the original string is maintained, so that the major downfall of the Replace() approach is alleviated. It seems, though, like this code is not efficiency incarnate--a lot of loops and concatenation involved. Anyway, before we make any judgements about the performance, let's have a look at the 3rd solution.
Solution 3: RegExp
This solution employs the regular expressions to perform the replacement. Note, that you need VBScript Engine version 5.0 or later.
|
|
|
' ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
' Function: string_SnR_RegExp
'
'
Function string_SnR_RegExp(ByVal strText, ByVal aKeywords, ByVal strHighlightStart, ByVal strHighlightEnd, ByVal bCaseSensitive, ByVal bExactSearch) Dim objRegExp Dim strPattern Dim strTemp Dim strReplacePattern Set objRegExp = New RegExp objRegExp.IgnoreCase = Not bCaseSensitive If bExactSearch Then strPattern = "(\b" & Join(aKeywords, "\b|\b") & "\b)" Else strPattern = "(" & Join(aKeywords, "|") & ")" End If strReplacePattern = strHighlightStart & "$1" & strHighlightEnd
objRegExp.Global = True objRegExp.Pattern = strPattern strTemp = objRegExp.Replace(strText, strReplacePattern) Set objRegExp = Nothing string_SnR_RegExp = strTemp End Function
|
|
|
|
Example: RegExp Solution
Exact, case insensitive search
string_SnR_RegExp("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False, True) yields: Who is <b>Cat</b> Stevens? Does he like cats? <b>CAT</b> could <b>be</b> an abbreviation!
Case insensitive search, match all occurences
string_SnR_RegExp("Who is Cat Stevens? Does he like cats? CAT could be an abbreviation!", Array("cat", "be"), "<b>", "</b>", False, False) yields: Who is <b>Cat</b> Stevens? Does he like <b>cat</b>s? <b>CAT</b> could <b>be</b> an abbreviation!
Efficiency, or Which Solution Should You Use?
In this secition we will compare the efficiency of Solution 2 and Solution 3. Solution 1, due to it's downfalls, is disregarded
On the one hand, one would intuitively expect the 3rd approach to be more efficient than the 2nd. After all that's what regular expressions are for and the 2nd approach uses many loops and concatenation. On the other hand, an RegExp object needs to be created and disposed of anytime the function employing regular expressions is called. Let's see the results of the tests:
|
|
Scenario 1:
few keywords (1), all matching, short text to be searched (68 characters)
| # Iterations | InStr | RegExp |
|---|
| 10 | 0 | 20 | | 100 | 20 | 123 | | 1000 | 183 | 1245 | | 10000 | 1826 | 12371 | | Results in miliseconds |
Winner: InStr Solution
Scenario 2:
many keywords (9), all matching, short text to be searched (68 characters)
| # Iterations | InStr | RegExp |
|---|
| 10 | 7 | 13 | | 100 | 87 | 150 | | 1000 | 804 | 1525 | | 10000 | 8025 | 14892 | | Results in miliseconds |
Winner: InStr Solution
Scenario 3:
many keywords (9), all matching, long text to be searched (8704 characters)
| # Iterations | InStr | RegExp |
|---|
| 10 | 11654 | 304 | | 100 | | 2894 | | 1000 | | | | 10000 | | | | Results in miliseconds |
Winner: RegExp Solution
Scenario 4:
many keywords (9), none matching, long text to be searched (8704 characters)
| # Iterations | InStr | RegExp |
|---|
| 10 | 110 | 434 | | 100 | 1075 | 4279 | | 1000 | 10705 | 42769 | | 10000 | | | | Results in miliseconds |
Winner: InStr Solution
Scenario 5:
few keywords (1), all matching, long text to be searched (8704 characters)
| # Iterations | InStr | RegExp |
|---|
| 10 | 2236 | 37 | | 100 | 22245 | 374 | | 1000 | | 3912 | | 10000 | | | | Results in miliseconds |
Winner: RegExp Solution
|
|
|
|
Conclusions:
- The InStr Solution is superior when the string to be searched is short and there are few matching keywords.
- The more matching keywords, the less the advantage of the InStr Solution over the RegExp Solution
- The RegExp solution is superior when the string to be searchged is long and there are many matching keywords. In that setup the InStr solution is very memory consuming and slow.
|
|