|
Remove White Space from Text
By Steven Smith
In the course of improving this website's search engine, I wrote a routine that would
extract the text from an article given a URL,
strip out the HTML,
and then convert all of the white space
and carriage returns into single spaces. This was done to compress the size of the
text involved, which was then stored in the database and used for full-text searches.
In order to eliminate all whitespace from a string, including newline characters, and
replace it all with single spaces, I used regular expressions (with some help from
Remas).
My code was written using ASP and VBScript (version 5.5 for RegExp support), but I'll
show how it can easily be done in ASP.NET. For a very quick intro to RegExp in
ASP.NET, see my previous article,
Replace In ASP.NET.
First, let look at the source code of the ASP function:
Function RemoveWhiteSpace(strText)
Dim RegEx
Set RegEx = New RegExp
RegEx.Pattern = "\s+"
RegEx.Multiline = True
RegEx.Global = True
strText = RegEx.Replace(strText, " ")
RemoveWhiteSpace = strText
End Function
Ok, now let's see how it would be done in ASP.NET. Just to make this article
more interesting, I'll list the code in all three standard languages
of .NET: VB, C#, and JScript.
1 <%@ Import Namespace="System.IO" %> 2 <%@ Import Namespace="System.Text.RegularExpressions" %> 3 <script language="C#" runat="server"> 4 5 void SubmitBtn_Click(Object sender, EventArgs e){ 6 String strInput; 7 String strOutput; 8 strInput = Text1.Text; 9 strOutput = Regex.Replace(strInput, "\\s+", " "); 10 output.Text = strOutput; 11 } 12 13 </script> 14 <html> 15 <body> 16 <a href="/stevesmith/articles/removewhitespace.asp">Return To Article</a> 17 <form runat="server"> 18 <table width="100%"> 19 <tr> 20 <td valign="top">Add Text, including line breaks, etc.</td> 21 <td valign="top">Text, in PRE tags, without whitespace (may scroll right a long way)</td> 22 </tr> 23 <tr> 24 <td valign="top"> 25 <asp:TextBox TextMode="multiline" id="Text1" width="200px" 26 height="80px" runat="server" /> 27 <asp:Button OnClick="SubmitBtn_Click" Text="Format Text" Runat="server"/> 28 </td> 29 <td valign="top" 30 <pre><asp:label id="output" runat="server" /></pre> 31 </td> 32 </tr> 33 </table> 34 </form> 35 </body> 36 </html>
1 <%@ Import Namespace="System.IO" %> 2 <%@ Import Namespace="System.Text.RegularExpressions" %> 3 <script language="VB" runat="server"> 4 5 Sub SubmitBtn_Click(sender As Object, e As EventArgs) 6 Dim strInput As String 7 Dim strOutput As String 8 strInput = Text1.Text 9 strOutput = Regex.Replace(strInput, "\s+", " ") 10 output.Text = strOutput 11 End Sub 12 13 </script> 14 <html> 15 <body> 16 <a href="/stevesmith/articles/removewhitespace.asp">Return To Article</a> 17 <form runat="server"> 18 <table width="100%"> 19 <tr> 20 <td valign="top">Add Text, including line breaks, etc.</td> 21 <td valign="top">Text, in PRE tags, without whitespace (may scroll right a long way)</td> 22 </tr> 23 <tr> 24 <td valign="top"> 25 <asp:TextBox TextMode="multiline" id="Text1" width="200px" 26 height="80px" runat="server" /> 27 <asp:Button OnClick="SubmitBtn_Click" Text="Format Text" Runat="server"/> 28 </td> 29 <td valign="top" 30 <pre><asp:label id="output" runat="server" /></pre> 31 </td> 32 </tr> 33 </table> 34 </form> 35 </body> 36 </html>
//coming soon
|
| C# |
VB |
JScript |
The full source of the example is shown. You can run the example and see how it
works.
Other useful links on regular expressions:
|
|