ASP.NET provides a suite of validation controls, which make validating inputs on web forms extremely easy compared to the same task using legacy (or classic if you prefer) ASP. One of the more powerful validators is the RegularExpressionValidator which, as you might guess, allows you to validate inputs by providing a regular expression which must match the input. The regular expression pattern is specified by setting the ValidationExpression property of the control. An example validator for a ZIP code field is shown below:
ControlToValidate="ZipCodeTextBox" ErrorMessage="Invalid ZIP code
format; format should be either 12345 or 12345-6789."
ValidationExpression="(\d{5}(-\d{4})?" />
A few things to note about the RegularExpressionValidator:
- It will never be activated by an empty string in the control it is validating. Only the RequiredFieldValidator catches empty strings
- You do not need to specify beginning of string and end of string matching characters (^ and $)—they are assumed. If you add them, it won't hurt (or change) anything—it's simply unnecessary.
- As with all validation controls, the validation is done client-side as well as server side. If your regular expression is not ECMAScript compliant, it will fail on the client. To avoid this, either ensure your expression is ECMAScript compliant, or set the control to perform its validation only on the server.
Regular Expression API
Outside of the ASP.NET validation controls, most of the time when you're using regular expressions in .NET, you'll use the classes found in the System.Text.RegularExpressions namespace. In particular, the main classes you'll want to become familiar with are Regex, Match, and MatchCollection.
Incidentally, there is some dispute as to whether the shortened version of regular expression, regex, should be pronounced /reg-eks/ or /rej-eks/. Personally I prefer the latter, but there are experts in both pronunciation camps, so pick whichever sounds better to you.
The Regex class has a rich set of methods and properties, which can be rather daunting if you haven't used it before. A summary of the most frequently used methods is included here:
Method | Description |
Escape / Unescape | Escapes metacharacters in a string for use as literals in an expression. |
IsMatch | Returns true if the regex finds a match in the input string. |
Match | Returns a Match object if a match is found in the input string. |
Matches | Returns a MatchCollection object containing any and all matches found in the input string. |
Replace | Replaces matches in the input string with a given replacement string. |
Split | Returns an array of strings by splitting up the input string into array elements separated by regex matches. |
In addition to many methods, there are also a number of options that can be specified, usually in the constructor of the Regex object. These options are part of a bitmask, and thus can be OR'd together (yes, you can have both Multiline and Singleline turned on at the same time).
Option | Description |
Compiled | Use this option when you will be doing many match operations in a loop. This saves the step of parsing the expression on each iteration. |
Multiline | Has nothing to do with how many lines are in the input string. Rather, this simply modifies the behavior of ^ and $ so that they match BOL and EOL instead of the beginning and end of the entire input string. |
IgnoreCase | Causes the pattern to ignore case sensitivity when matching the search string. |
IgnorePatternWhitespace | Allows pattern to have as much white space as desired, and also enables the use of in-pattern comments, using the (?# comment #) syntax. |
SingleLine | Has nothing to do with how many lines are in the input string. Rather, will cause the . (period) metacharacter to match any character, instead of any character except \n, which is the default. |
Some common things you may use regular expressions for include validating, matching, and replacing. In many cases, these can be accomplished using static methods of the Regex class, without any need to instantiate the Regex class itself. To perform validation, all you must do is create or find the right expression and apply it to your input string using the IsMatch() method of the Regex class. For example, the following function demonstrates how to use a regular expression to validate a ZIP code:
private void ValidateZipButton_Click(object sender, System.EventArgs e)
{
String ZipRegex = @"^\d{5}$";
if(Regex.IsMatch(ZipTextBox.Text, ZipRegex))
{
ResultLabel.Text = "ZIP is valid!";
}
else
{
ResultLabel.Text = "ZIP is invalid!";
}
}
Similarly, the static Replace() method can be used to replace matches with a particular string, as this snippet demonstrates:
String newText = Regex.Replace(inputString, pattern, replacementText);
Finally, you can iterate through a collection of matches in an input string using code like this:
private void MatchButton_Click(object sender, System.EventArgs e)
{
MatchCollection matches = Regex.Matches(SearchStringTextBox.Text,
MatchExpressionTextBox.Text);
MatchCountLabel.Text = matches.Count.ToString();
MatchesLabel.Text = "";
foreach(Match match in matches)
{
MatchesLabel.Text += "Found " + match.ToString() + " at
position " + match.Index + ".
";
}
}
Where you'll typically need to instantiate an instance of the Regex class is when you need to specify anything outside the default behavior. In particular, setting options. For example, to create an instance of Regex that ignores case and pattern white space, and then retrieve the set of matches for that expression, you would use code like the following:
Regex re = new Regex(pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
MatchCollection mc = re.Matches(inputString);
Complete working versions of these samples are included in the download for this article, as simple ASP.NET pages.
Advanced Topics
Two regular expression features that really make me have to think are named groups and lookaround processing. Since you'll only need these on rare occasions, I'll only briefly describe these topics here.
With named groups, you can name individual matching groups and then refer to these groups within the expression programmatically. This can be especially powerful when combined with the Replace method as a way of reformatting an input string by re-arranging the order and placement of the elements within the input string. For example, suppose you were given a date in string format of the form MM/DD/YYYY and you wanted it in the form DD-MM-YYYY. You could use write an expression to capture the first format, iterate through its Matches collection, parse each string, and use string manipulation to build the replacement string. This would require a fair amount of code and a fair amount of processing. Using named groups, you could accomplish the same things like so:
String MDYToDMY(String input)
{
return Regex.Replace(intput, @"\b(?\d{1,2})/(?\d{1,2}/(?\d{4})\b", "${day}-
${month}-${year}");
}
You can also refer to groups by number as well as by name. In any event such references are collectively referred to as backreferences. Another common use of backreferences is within matching expressions themselves, such as this expression for finding repeated letters: [a-z]\1. This will match 'aa', 'bb', 'cc' and is not the same as [a-z]{2} or [a-z][a-z] which are equivalent and would allow 'ab' or 'ac' or any other two-letter combination. Backreferences allow an expression to remember things about parts of the input string it has already parsed and matched.
"Lookaround processing" refers to positive and negative lookahead and lookbehind capabilities supported by many regular expression engines. Not all regular expression engines support all variations of lookaround processing. These constructs do not consume characters even though they may match them. Some patterns are impossible to describe without lookaround processing, especially ones in which the existence of one part of the pattern depends on the existence of a separate part. The syntax for each flavor of lookaround is described below.
Syntax | Description |
(?=…) | Positive Lookahead |
(?!...) | Negative Lookahead |
(?<=…) | Positive Lookbehind |
(? | Negative Lookbehind |
One example of where lookaround processing is necessary is password validation. Consider a password restriction where the password must be between 4 and 8 characters long, and must contain at least one digit. You could do this by just testing \d for a match and using string operations to test the length, but to do the whole thing in a regular expression requires lookahead. Specifically positive lookahead, as this expression demonstrates: ^(?=.*\d).{4,8}$
Conclusion
Regular expressions provide a very powerful way to describe patterns in text, making them an excellent resource for string validation and manipulation. The .NET Framework provides first-rate support for regular expressions in its System.Text.RegularExpressions namespace and specifically the Regex class found there. Using the API is simple; coming up with the right regular expression is often the tough part. Luckily, regular expressions are highly reusable, and there are many resources online where you can find expressions designed by others or get help with ones you are struggling to create.