Introduction to Regular Expressions in Java

Regular Expressions are text strings for describing a searching or manipulation pattern. Strings in Java have built in support for regex using 3 methods: matches(), split() and replace().

Output:

  • Hi
  • false
  • Hi thara
  • Hi here

java.util.regex Package

For advanced regular expression operations, we can use methods of the java.util.regex package.


This package provides 3 classes:

  1. Pattern
  2. Matcher
  3. PatternSyntaxException

PATTERN AND MATCHER CLASS 

In Java, the Pattern object is used to define the regular expression. Some methods of the Pattern class are:

  • compile() – Compiles the regular expression into a pattern
  • matches() – matches the pattern to the entire input String
  • matcher() – Creates a matcher object

Using this Pattern object, a Matcher object is created to perform regex operations.

Some methods of the Matcher class are:

  • matches() – matches the pattern to the entire input String
  • find() – finds if the pattern occurs anywhere in the input String
  • lookingAt() – finds if the pattern occurs in the input String starting at the beginning of the input string

See the program for a better understanding.

Output:

  • true
  • sdAnn
  • false
  • true
  • false

PatternSyntaxException CLASS

This is an unchecked exception that indicates syntax errors in the regex pattern.

Some methods of the PatternSyntaxException class:

getDescription(), getIndex(), getMessage()

KEYWORDS USED FOR FORMING REGULAR EXPRESSIONS

COMMON SYMBOLS:

  • .               Any character
  • \              Escape character
  • ^regex  Starts with the mentioned regex
  • $regex  Ends with the mentioned regex
  • [abc]      Matches a, b or c
  • [^abc]   Matches characters other than a,b and c
  • A|B        Matches A or B

METACHARACTERS:

  • \d           Any digit
  • \D           Any non digit
  • \s            Space character
  • \S            Any non-space character
  • \w          Any letter
  • \W          Any non letter

QUANTIFIERS:

A quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. The most commonly used are:

  • *             Occurs 0 or more times
  • +             Occurs 1 or more times
  • {X}          Occurs X numbers of times
  • {X,Y}      Occurs between X and Y number of times
  • ?              Occurs 0 or 1 time(s)

EXAMPLE

The regular expression for email is shown below:

^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$

Explanation:

  • ^[_A-Za-z0-9-]+   Starts with a _, -, letter or number
  • (\\.[_A-Za-z0-9-]+)*    ., _, -, letter or number, any number of times
  • @[A-Za-z0-9]+     @ followed by 1 or more letters or numbers
  • (\\.[A-Za-z0-9]+)*      ., letter or number any number of times
  • (\\.[A-Za-z]{2,})$      Ends with a . followed by 2 or more letters

Receive our updates to your inbox

Get more stuff like this
in your inbox

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.