Python RegEx

Problem Statement: To understand Python RegEx.

RegEx stands for Regular Expression. The Regex or Regular Expression is a way to define a pattern for searching a string or a set of strings. 

The re module  

Python contains a built-in module (re) that provides all the required functionality needed and supports the  RegEx.

Syntax:  import re

Python regex module consists of multiple methods. Some of them are explained below.

Regex Metacharacters

The special characters used in RegEx are known as metacharacters.

For example, characters like ‘|’, ‘+’, or ‘*’.

Some of the metacharacters are :

Regex special sequences

A special sequence consists of  “\” and some selected characters. Every special sequence has a unique meaning.

Let us see some of the special sequences.

Regex flags

RE module methods have an optional flags argument which is used to enable various unique features. Let us see some of the flags.

 Let us see some of the advantages of regular expressions.

  • It helps us in searching and replacing text in files. 
  • Validating text input, such as password and email address.
  • Rename a hundred files at any given time.

Example1:

Let us take an example

Program to check whether the given email is valid or not. Where email ends with com, in, net only.

Solution:

Python Code

import re
 
patt = "[a-zA-Z0-9][email protected][a-zA-Z]+\.(com|net|in)"
 
def isValid(email):    
    if re.search(patt, email):
        print("Email is valid")
    else:
        print("Email is Invalid ")
 
 
isValid("[email protected]")
isValid("[email protected]")
isValid("[email protected]")
isValid("[email protected]")

Output:

Email is valid
Email is valid
Email is Invalid
Email is Invalid

Example2:

We have given some employees id’s of company XYZ. Employee id consists of five to seven characters, which are separated into two parts by a space. The first half consists of two to four characters and the other half consists of three characters. The first half consists of a digit followed by one or two uppercase characters and the other half has two uppercase characters, followed by a digit.

Eg : AB1 2CD, AB2C 1WA.

 Write a regex to find if the given employee id belongs to company XYZ or not.

Disclaimer: Don’t jump directly to the solution, try it out yourself first.

Solution :

We will try to make a pattern according to the given question

[A-z]{1,2}[0-9R][0-9A-Z] ? [0-9][A-Z]{2}                  …. (pattern)

Code:

Python Code

import re
 
employee_id = [  "AB1 0AA", 
                 "AZ1A 1AA", 
                 "SW1A 2BA", 
                 "BX3 2BB", 
                 "DH98 1BT", 
                 "N1 9UU", 
                 "EEZ3 1TT", 
                 "TIM E52", 
                 "A B1 A22", 
                 "AB34 2DD", 
                 "SE9 2HG", 
                 "N1 11H", 
                 "AC1V 8DS", 
                 "WCC1 9DD", 
                 "B4C 1LK", 
                 "B28 9AD", 
                 "WE12 7RJ", 
                 "AABB 007", 
                ]
patt = r"[A-z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}"
 
for Eid in employee_id:
    r = re.search(patt, Eid)
    if r:
        print(Eid + " is employee of XYZ")
    else:
        print(Eid + " is not an employee of XYZ")

Output:

AB1 0AA is employee of XYZ
AZ1A 1AA is employee of XYZ
SW1A 2BA is employee of XYZ
BX3 2BB is employee of XYZ
DH98 1BT is employee of XYZ
N1 9UU is employee of XYZ
EEZ3 1TT is employee of XYZ
TIM E52 is not an employee of XYZ
A B1 A22 is not an employee of XYZ
AB34 2DD is employee of XYZ
SE9 2HG is employee of XYZ
N1 11H is not an employee of XYZ
AC1V 8DS is employee of XYZ
WCC1 9DD is employee of XYZ
B4C 1LK is employee of XYZ
B28 9AD is employee of XYZ
WE12 7RJ is employee of XYZ
AABB 007 is not an employee of XYZ

Special thanks to AYUSH KUMAR for contributing to this article on takeUforward. If you also wish to share your knowledge with the takeUforward fam, please check out this articleIf you want to suggest any improvement/correction in this article please mail us at [email protected]