10+ basic examples to learn Python RegEx from scratch

In this Python Tutorial we will concentrate on Regex. A regular expression is a pattern you specify, using special characters to represent combinations of specified characters, digits, and words.

I will cover the basics of different regular expressions in the first part of this tutorial, if you are already familiar with the basics then you can directly jump to Python RegEx section in this tutorial

Introduction to regular expressions

A regular expression can be as simple as a series of characters that match a given word. For example, the following pattern matches the word “hat”; no surprise there.

hat

But what if you wanted to match a larger set of words? For example, let’s say you wanted to match the following combination of letters:

Match a “h” character.
Match any number of “a” characters, but at least one.
Match a “t” character.

Here’s the regular expression that implements these criteria:

ha+t

Here,

Literal characters, such as “h” and “t” in this example, must be matched exactly
The plus sign (+) is a special character.
It does not cause the regular-expression processor to look for a plus sign. Instead, it forms a subexpression, together with “a” that says, “Match one or more ‘a’ characters.”

The pattern ha+t therefore matches any of the following:

hat
haat
haaat
haaaat
..

This was just an overview of regular expression, we will look into different meta characters which we can use with Python regex.

Different meta characters

These are tools for specifying either a specific character or one of a number of characters, such as “any digit” or “any alphanumeric character.” Each of these characters matches one character at a time.

Character	Name	Description
`.`	Dot (Period)	Matches any one character except a newline. If the DOTALL flag is enabled, it matches any character at all.
`^`	Caret	Matches the beginning of the string. If the MULTILINE flag is enabled, it also matches beginning of lines (any character after a newline).
`$`	Dollar	Matches the end of a string. If the MULTILINE flag is enabled, it matches the end of a line (the last character before a newline or end of string).
`[]`	Square brackets	A set of characters you wish to match.
`\`	Backslash	This is used to escape various characters. One of the functions of escape sequences is to turn a special character back into a literal character.
*`expr`**	Wild character(Star)	Modifies meaning of expression expr so that it matcheszero or more occurrences rather than one. For example,`a`* matches “`a`”, “`aa`”, and “`aaa`”, as well as an empty string.
`expr+`	Plus	Modifies meaning of expressionexprso that it matchesone or more occurrencesrather than only one. For example,`a+`matches “`a`”, “`aa`”, and “`aaa`”.
`expr{n}`	Curly braces	Modifies expression so that it matches exactly`n`occurrences of expr. For example,`a{3}`matches “`aaa`”;
`expr{m, n}`		Matches a minimum of m occurrences of`expr`and a maximum of`n`. For example,`x{2,4}y`matches “`xxy`”, “`xxxy`”, and “`xxxxy`”
`expr{m,}`		Matches a minimum of`m`occurrences of expr with no upper limit to how many can be matched. For example,`x{3,}`finds a match if it can match the pattern “`xxx`” anywhere. But it will match more than three if it can. Therefore`zx(3,)y`matches “`zxxxxxy`”.
`expr{,n}`		Matches a minimum of zero, and a maximum of`n`, instances of the expression`expr`. For example,`ca{,2}t`matches “`ct`”, “`cat`”, and “`caat`” but not “`caaat`”.
`expr1 \| expr2`	Alternation	Matches a single occurrence of`expr1`, or a single occurrence of`expr2`, but not both. For example,`a\|b`matches “`a`” or “`b`”. Note that the precedence of this operator is very low, so`cat\|dog`matches “`cat`” or “`dog`”.
`()`	Parentheses	This is used to capture and group sub-patterns
`expr?`	Question mark	Modifies meaning of expressionexprso that it matcheszero or one occurrenceofexpr. For example,`a?`matches “`a`” or an empty string.

`.` - dot (period)

A dot . matches any one character except a newline character.

Pattern	What does pattern mean?	String	Match?	Description
`<strong>c..f</strong>`	`c`matches the character`c`literally (case sensitive) `.`matches any character (except for line terminators) `.`matches any character (except for line terminators) `f`matches the character`f`literally (case sensitive)	`cdef`	YES	Exactly two characters between`c`and`f`
		`c12f`	YES	Exactly two characters between`c`and`f`
		ab`cdef`gh	YES	Doesn't matter what is the starting or ending character. There are exactly two characters between`c`and`f`
		conef	NO	More than 2 characters between`c`and`f`
		c1f	NO	Less than 2 characters between`c`and`f`
		Cdef	NO	`C`is uppercase while pattern contains lowercase`c`

`^` - caret

A caret sign ^ matches the beginning of the string.

Pattern	What does Pattern mean?	String	Match?	Description
`<strong>^a</strong>`	`^`asserts position at start of a line `a`matches the character`a`literally (case sensitive)	`a`pple	YES	The first character`a`of`apple`matches the pattern
		`a`bcd	YES	The first character`a`of`apple`matches the pattern
		Abcd	NO	The first character of`Abcd`is uppercase`A`while the pattern has lowercase`a`
		Cat	NO	The first character of`Car`is`C`is not matching the pattern
`<strong>^en</strong>`	`^`asserts position at start of a line `en`matches the characters`en`literally (case sensitive)	`en`d	YES	The first two character of`end`matches the pattern`en`
		east	NO	The first two character of`east`is not matching our pattern`en`
		bend	NO	The first two character of`bend`is not matching our pattern`en`
		End	NO	The pattern expects the first two characters to be`en`in lowercase while the string has uppercase`E`

`$` - dollar

The dollar symbol $ is used to check if a string ends with provided expression.

Pattern	What does Pattern mean?	String	Match?	Description
`<strong>e$</strong>`	`e`matches the character`e`literally (case sensitive) `$`asserts position at the end of a line	appl`e`	YES	The last character of`apple`matches the pattern
		ABCDE	NO	The last character of`ABCDE`is in Uppercase while our pattern expects lowercase`e`in the end
		`e`	YES	The provided string contains single character`e`which can be considered as both first and last so it will be a match
`<strong>ee$</strong>`	`ee`matches the characters`ee`literally (case sensitive) `$`asserts position at the end of a line	tr`ee`	YES	The last two character of`tree`matches our pattern`ee`at the end of`tree`
`<strong>ee$</strong>`		eye	NO	The pattern expects`ee`in the end of the string`eye`, since we have single`e`hence the match fails

`[]` - square backets

You can add a set of characters inside square brackets which you wish to match.

Pattern	What does pattern mean?	String	Match?	Description
`BL460C_G[789]_DISK`	`BL460C_G`matches the characters`BL460C_G`literally (case sensitive) Match asingle characterpresent in the list`[789]`(case sensitive)	BL460C_G`9`_DISK	YES	`9`is a matching character in the list`[789]`while other character also match literally
		BL460C_G`8`_DISK	YES	`8`is a matching character in the list`[789]`while other character also match literally
		BL460C_G5_DISK	NO	`5`is not part of the list`[789]`even when rest of the characters match so this will not be matched
		BL460C_G78_DISK	NO	`78`is part of the list`[789]`but it can match only single character out of all the values in the list and here there are two characters so it is not a match.

`\` - backslash

The backslash can be used to “escape” special characters, making them into literal characters. The backslash can also add special meaning to certain ordinary characters—for example, causing \d to mean “any digit” rather than a “d”. We will learn about these special sequences later in this tutorial

Pattern	What does this pattern mean?	String	Match?	Description
`p@$$w0rd`	`p@`matches the characters`p@`literally (case sensitive) `$`asserts position at the end of a line `$`asserts position at the end of a line `w0rd`matches the characters`w0rd`literally (case sensitive)	p@$$w0rd	NO	In the pattern`$`means end of line so it won't match`pa$$w0rd`from the string.
`p@\$\$w0rd`	`p@`matches the characters`p@`literally (case sensitive) `\$`matches the character$literally (case sensitive) `\$`matches the character`$`literally (case sensitive) `w0rd`matches the characters`w0rd`literally (case sensitive)	`p@$$w0rd`	YES	Now since we are using backslash as a escape sequence to now`$`is considered as a string instead of meta character.

`*` - Wild character (Star)

The asterisk (*) modifies the meaning of the expression immediately preceding it, so the a, together with the *, matches zero or more “a” characters.

Pattern	What does Pattern mean?	String	Match?	Description
`<strong>ca*t</strong>`	`c`matches the character`c`literally (case sensitive) `a`matches the character`a`literally (case sensitive) ``Quantifier - Matches between zero and unlimited times, as many times as possible, giving back as needed `t`matches the character`t`literally (case sensitive)	`cat`	YES	`a`is followed by`t`where a is present in the string for zero or more times
		`ct`	YES	`a`can be present zero or more times in our string so this is also a match
		`caat`	YES	`a`can be present zero or more times and here`a`is present twice followed by`t`so this is a match
		`caaat`s	YES	`a`can be present zero or more times and here 'a' is present thrice followed by`t`so this is a match
		castle	NO	`a`is present zero or more times but it is not followed by`t`so not a match
		cart	NO	`a`is present zero or more times but it is not followed by`t`so not a match

`+` - plus

The plus (+) sign will match exactly one or more characters of the preceding expression.

Pattern	What does Pattern mean?	String	Match?	Description
`<strong>ca+t</strong>`	`c`matches the character`c`literally (case sensitive) `a+`matches the character a literally (case sensitive) `+`Quantifier - Matches between one and unlimited times, as many times as possible, giving back as needed `t`matches the charactertliterally (case sensitive)	`cat`	YES	`a`is followed by`t`where a is present in the string for one or more times
		`ct`	NO	`a`must be present one or more times in our string so this is not a match
		`caat`	YES	`a`can be present one or more times and here`a`is present twice followed by`t`so this is a match
		`caaat`s	YES	`a`can be present one or more times and here`a`is present thrice followed by`t`so this is a match
		castle	NO	`a`is present more or more times but it is not followed by`t`so not a match
		cart	NO	`a`is present one or more times but it is not followed by`t`so not a match

`?` - question

This means that the preceding expression can be present zero or one times only. So this can be helpful when you feel a certain character in a string can be there or may be not.

Pattern	What does this pattern mean?	String	Match?	Description
`<strong>cas?t</strong>`	`ca`matches the characters`ca`literally (case sensitive) `s?`matches the character`s`literally (case sensitive) `?`Quantifier - Matches between zero and one times	c`a`t	YES	`s`is present zero times so it is a match
		c`as`t	YES	`s`is present one time so it is a match
		casst	NO	`s`is present more than one time so it is not a match

`{}` - curly braces

You can use {n,m} curly braces to match exactly the specified number of occurrences in a string. Here n means the minimum number of occurrence while m represents maximum number of occurrence to match.

Pattern	What does pattern mean?	String	Match?	Description
`<strong>al{2}</strong>`	`a`matches the character`a`literally (case sensitive) `l{2}`matches the character`l`literally (case sensitive) `{2}`Quantifier — Matches exactly 2 times	c`all`	YES	The`a`character is followed by`l`two times as expected by the pattern
		tale	NO	The`a`character is followed by`l`but`l`is present only single time while`{2}`expects l to be present at least 2 times
		f`all`s	YES	The`a`character is followed by`l`and`l`is present 2 times so it is a match
		troll	NO	l is present two times as expected by`{2}`but`a`character is missing before`l`so it is not a match

`|` - alteration

The alteration operator matches a single occurrence of expr1, or a single occurrence of provided expression, but not both.

Pattern	What does pattern mean?	String	Match?	Description
`<strong>cat\|dog</strong>`	1st Alternative`cat` `cat`matches the characters`cat`literally (case sensitive) 2nd Alternative`dog` `dog`matches the characters`dog`literally (case sensitive)	`cat`tle	YES	The pattern will match`cat`in the string`cattle`
		boggy	NO	As there is no`cat`or`dog`in this string`boggy`, there is no match
		`dog`gy	YES	The pattern will match`dog`in the string`doggy`
		battle	NO	As there is no`cat`or`dog`in this string`battle`, there is no match

`()` - parenthesis (group)

Causes the regular-expression evaluator to look at all of expr as a single group. There are two major purposes for doing so. First, a quantifier applies to the expression immediately preceding it; but if that expression is a group, the entire group is referred to. For example, (ab)+ matches “ab”, “abab”, “ababab”, and so on.

Pattern	What does pattern mean	String	Match?	Description
`<strong>(ac)t</strong>`	`ac`matches the characters`ac`literally (case sensitive) `t`matches the character`t`literally (case sensitive)	f`act`	YES	The pattern`ac`is present followed by`t`hence this is a match
`<strong>(ac)t</strong>`		cat	NO	The pattern`ac`should be in the same order as it is a group item so this is not a match.
`<strong>(a\|c)t</strong>`	`a`matches the character`a`literally (case sensitive) `c`matches the character`c`literally (case sensitive)	fa`ct`	YES	As we are using alteration with group, either of`a`or`c`followed by`t`should be present so it is a match
`<strong>(a\|c)t</strong>`		c`at`	YES	Here also the pattern expects either`a`or`c`in the string so this is also a match

Special Sequences

These are different set of pre defined special sequences which can be used to capture different types of patterns in a string.

NOTE

In all the examples “r” in the beginning is making sure that the string is being treated as a “raw string” so that Python itself does not translate any of the characters; it does not translate \n as a newline.

Special character	Description
`\A`	Matches beginning of a string
`\b`	Word boundary which returns a match where the specified characters are at the beginning or at the end of a word. For example,`r'at\b'`matches ‘`cat`’ and ‘`at`’ but not ‘`cats`’.
`\B`	Nonword boundary which returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word. For example,`r'at\B'`matches`bats`and`atlanta`but not`cat`
`\d`	Any digit character. This includes the digit characters 0 through 9.
`\D`	Returns a match where the string DOES NOT contain digits
`\s`	Any whitespace character; may be blank space or any of the following:`\t`,`\n`,`\r`,`\f`, or`\v`
`\S`	Any character that is not a white space, as defined just above.
`\w`	Matches any alphanumeric character (letter or digit) or an underscore (_)
`\W`	Matches any character that is not alphanumeric
`\Z`	Matches the end of a string

Examples

Here I have consolidated all these special sequence and different examples to give you an overview on individual operator:

Regex	Pattern	Test String	Match?	Explanation
`\A`	`r'\Ais'`	`is`this good?	YES	Since`is`is at the starting so it is a match
		I hope this is good	NO	Since`is`is not at the starting so no match
`\b`	`r'\bPython'`	`Python`is <a
href=“https://www.golinuxcloud.com/beginners-tips-for-learning-python/"
target="_blank” rel=“noopener noreferrer”
title=“Beginners Tips for Learning Python”>easy	YES	`Python`is at the beginning of the string so it is a match
		How easy is`Python`	YES	`Python`is again at the beginning of the string so it is a Match. Word boundary is for a word and not sentence so doesn’t matter is`Python`is not at the starting of sentence
		`Python`2 is easy	YES	`Python`is again at the beginning of the string`Python2`so it is a match. It doesn’t matter if`Python2`was at the beginning of sentence.
		Download iPython	NO	`Python`is not at the beginning of the string`iPython`so no match
	`r'Python\b'`	`Python`is easy	YES	`Python`is itself a string so it is a match
		How easy is`Python`	YES	Again`Python`itself is a word in the sentence, it’s position in the sentence doesn’t matter so this is a match
		Python2 is easy	NO	`Python`is supposed to be at the end so this is not a match
		Download i`Python`	YES	The`iPython`string ends with`Python`so this is a match
\B	`r'\BPython'`	Python is easy	NO	`\B`assert position where`\b`does not match
		Python2 is easy	NO
		Download i`Python`	YES
	`r'Python\B'`	Python is easy	NO	`\B`assert position where`\b`does not match
		`Python`2 is easy	YES
		Download iPython	NO
`\d`	`r'\d'`	Passw`0`rd	YES	The string contains a numerical digit between`0-9`
		Password	NO	There are no numerical digit in the provided string
`\D`	`r'\D'`	`Passw0rd`	YES	`\D`matches any character that’s not a digit (equal to`[^0-9]`)
		12345	NO
`\s`	`r'\s'`	Hello``World	YES	`\s`matches any whitespace character (equal to`[\r\n\t\f\v ]`)
		HelloWorld	NO	No whitespace character
`\S`	`r'\S'`	Hello``World	YES	`\S`is opposite to`\s`
		``	NO
`\w`	`r'\w'`	`Passw0rd_123`	YES	`\w`matches any word character (equal to`[a-zA-Z0-9_]`)
		[{(<>)}]	NO
`\W`	`r'\w'`	Passw0rd_123	NO	`\W`is opposite to \w
		`[{(<>)}]`	YES
`\Z`	`r'Python\Z'`	I like`Python`	YES	`\Z`asserts position at the end of the string, or before the line terminator right at the end of the string (if any).
		Python is easy	NO

Python regex

The re module supplies the attributes listed in the earlier section.
It also provides a function that corresponds to each method of a regular expression object (findall, match, search, split, sub, and subn) each with an additional first argument, a pattern string that the function implicitly compiles into a regular expression object.
It’s generally preferable to compile pattern strings into regular expression objects explicitly and call the regular expression object’s methods, but sometimes, for a one-off use of a regular expression pattern, calling functions of module re can be slightly handier.

Iterative searching with `re.findall`

One of the most common search tasks is to find all substrings matching a particular pattern. The syntax to use findall would be:

list = re.findall(pattern, target_string, flags=0)

Here in this syntax, pattern is a regular-expression string or precompiled object, target_string is the string to be searched, and flags is optional. The return value of re.findall is a list of strings, each string containing one of the substrings found. These are returned in the order found.

Example-1: Find all the digits in a string

In this example we will search for all digit in the provided string.

10+ basic examples to learn Python RegEx from scratch

Here we are using \d operator with re.findall to find all the digits and ‘\d+’ means to match all digits present one or more times from the provided string. Output from this script:

~]# python3 regex-eg-1.py
['12', '123', '78', '456']

Example-2: Find words with 6 or more characters

In this example we will write a sample code to find all the words with 6 or more than 6 characters from the provided string using re.findall.

Here we are using re.findall with \w to match alpha numeric character in combination with {6, } to list words with minimum 6 letters or more. Output from this script:

~]# python3 regex-eg-2.py
['testing45', 'test37', 'testing1456']

Example-3: Split all characters in the string

We have a string which contains mathematical operators but we want it to be recognized as strings and each character should be broken down into a list of strings.

Here we are using re.findall to get a list of strings with individual characters from the provided text. Output from this script:

~]# python3 regex-eg-3.py
['12', '15', '+', '3', '100', '-', '*']

HINT

The minus sign (-) has a special meaning within square brackets unless it appears at the very beginning or end of the range which is why we have placed it accordingly in our sample code.

Example-4: Find all the vowels from the string

In this example we will identify all the vowels from the provided string:

Output from this script:

~]# python3 regex-eg-4.py
['12', '15', '+', '3', '100', '-', '*']

Example-5: Find vowels case-insensitive

In the last example we listed the vowels from a string but that was case sensitive, if we had some text with UPPERCASE then they won’t be matched. To perform case insensitive match we need to add an additional IGNORECASE flag using flags=re.I or flags=re.IGNORECASE

Output from this script:

 ~]# python3 regex-eg-5.py
['i', 'i', 'o', 'e', 'A', 'E', 'e']

The `re.split` function

Another way to invoke regular expressions to help analyze text into tokens is to use the re.split function. The general syntax to use re.split would be:

list = re.split(pattern, string, maxsplit=0, flags=0)

In this syntax,

pattern is a regular-expression pattern supporting all the grammar shown until now; however, it doesn’t specify a pattern to find but to skip over. All the text in between is considered a token. So the pattern is really representative of token separators, and not the tokens themselves.
The string, as usual, is the target string to split into tokens.
The maxsplit argument specifies the maximum number of tokens to find. If this argument is set to 0, the default, then there is no maximum number.

Example-1: Split using whitespace

In this example we have a string where we will split the line using whitespace

Output from this script:

~]# python3 regex-eg-1.py
['NAME="/dev/sda"', 'PARTLABEL=""', 'TYPE="disk"']

Example-2: Strip using whitespace from a file

In this example we will create a list of elements using whitespace as stripping pattern. We will take the output of who command into a file who.txt

~]# who > who.txt

Now using our python script we will strip each element into a list.

#!/usr/bin/env python3

import re

f = open('who.txt', 'r')
for eachline in f:
    print(re.split(r'\s\s+|\t', eachline))
f.close()

We have defined \s\s+ which means at least two whitespace characters with an alteration pipe and \t to match tab. Output from this script:

~]# python3 regex-eg-2.py
['root', 'pts/0', '2020-11-02 12:07 (10.0.2.2)\n']
['root', 'pts/1', '2020-11-02 19:14 (10.0.2.2)\n']

Now at the end of each line we are getting a newline character, to strip that we can use rstrip(\n) so the updated code would be:

Output from this script:

~]# python3 regex-eg-3.py
['root', 'pts/0', '2020-11-02 12:07 (10.0.2.2)']
['root', 'pts/1', '2020-11-02 19:14 (10.0.2.2)']

Replace text using `re.sub()`

Another tool is the ability to replace text—that is, text substitution. We might want to replace all occurrences of a pattern with some other pattern. This almost always involves group tagging, described in the previous section.

The re.sub function performs text substitution.

re.sub(find_pattern, repl, target_str, count=0, flags=0)

In this syntax, find_pattern is the pattern to look for, repl is the regular-expression replacement string, and target_str is the string to be searched. The last two arguments are both optional.

The return value is the new string, which consists of the target string after the requested replacements have been made.

Example-1: Replace multiple spaces with single space

In this example I have a string with multiple whitespace characters where we will use re.sub() to replace multiple whitespace with single whitespace character.

Output from this script:

~]# python3 regex-eg-1.py
abc def ghi ktm

Example-2: Replace duplicates

In this example I have a string with multiple duplicate words which I wish to replace with single occurrence of each duplicate word.

Here the replacement string, contains only a reference to the first half of that pattern. This is a tagged string, so this directs the regular-expression evaluator to note that tagged string and use it as the replacement string.

r'\1'

Second, the repeated-word test on “This this” will fail unless the, flags argument is set to re.I (or re.IGNORECASE).

HINT

Python string handling attaches a special meaning to \1; therefore, if you don’t specify the replacement text as a raw string, nothing works—unless you use the other way of specifying a literal backslash \\1

Output from this script:

 ~]# python3 regex-eg-2.py
This is a Python tutorial

Searching a string for patterns using `re.search`

In this section we will learn how to find the first substring that matches a pattern. The re.search function performs this task using following syntax:

match_obj = re.search(pattern, target_string, flags=0)

In this syntax, pattern is either a string containing a regular-expression pattern or a precompiled regular-expression object; target_string is the string to be searched. The flags argument is optional and has a default value of 0.

The function produces a match object if successful and None otherwise. This function is close to re.match in the way that it works, except it does not require the match to happen at the beginning of the string.

By default re.search will search into complete string and will print only the first matching pattern. Let us verify this concept, here I have a text which contains ‘python’ two times. So we will use re.search to find ‘python’ word in the sentence.

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
print(re.search(pat, line))

Output from this script:

~]# python3 regex-eg-1.py
<_sre.SRE_Match object; span=(8, 14), match='python'>

As you see the re.search function has stopped searching after first match i.e. ‘python’ even when ‘python3’ was also a match for our pattern.

Match Object

If you observe the output from re.search, we get a bunch of information along with the matched object. To further optimize the output and get the desired information we can use match object group with re.search

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.search(pat, line)
print('matched from the pattern: ', match_ob.group())
print('starting index: ', match_ob.start())
print('ending index: ', match_ob.end()-1)
print('Length: ', match_ob.end() - match_ob.start())

Here I am printing different information based on the output from re.search using the index position. Output from this script:

 ~]# python3 regex-eg-2.py
matched from the pattern:  python
starting index:  8
ending index:  13
Length:  6

Example-1: Search for a pattern in a log file

Let us take a practical example where we will go through a log file and print all the lines with text CRITICAL

Output from this script:

~]# python3 regex-eg-1.py
2019-11-13T13:03:03Z CRITICAL Error: Unable to find a match
2019-11-13T13:03:14Z CRITICAL Importing GPG key 0x8483C65D:
2019-11-13T13:11:06Z CRITICAL Error: No Matches found

Refining matches with `re.match`

The re.match function returns either a match object, if it succeeds, or the special object None, if it fails. The syntax to use re.match would be:

re.match(s,start=0,end=sys.maxint)

Returns an appropriate match object when a substring of s, starting at index start and not reaching as far as index end, matches r. Otherwise, match returns None.

Let us use our example from previous section where I have added an additional python if else block:

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.match(pat, line)
if match_ob:
    print(match_ob)
    print('matched from the pattern: ', match_ob.group())
    print('starting index: ', match_ob.start())
    print('ending index: ', match_ob.end()-1)
    print('Length: ', match_ob.end() - match_ob.start())
else:
    print('No match found')

Here now instead of re.search we will use re.match to find our pattern in the provided string. Output from this script:

~]# python3 regex-eg-1.py
No match found

Now even though we have python in our string, re.match returns “No match found”, this is because re.match will only search at the first index position. So to get a match we will rephrase our text in the script:

#!/usr/bin/env python3

import re

line = "python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.match(pat, line)
if match_ob:
    print(match_ob)
    print('matched from the pattern: ', match_ob.group())
    print('starting index: ', match_ob.start())
    print('ending index: ', match_ob.end()-1)
    print('Length: ', match_ob.end() - match_ob.start())
else:
    print('No match found')

Output from this script:

 ~]# python3 regex-eg-1.py
<_sre.SRE_Match object; span=(0, 6), match='python'>
matched from the pattern:  python
starting index:  0
ending index:  5
Length:  6

So now re.match was able to match the pattern since the pattern was available at index position 0 so the basic difference between re.search and re.match is that re.match will search for the pattern at first index while re.search will search for the pattern in the entire string.

Example-1: Match for a telephone number

In this example we will collect telephone number from the user and using re.match we will confirm if the syntax of the input number is correct or incorrect. Normally in US, the telephone syntax is:

xxx-xxx-xxxx

Sample script:

Here,

\d{3} matches a digit (equal to [0-9]) where {3} Quantifier — Matches exactly 3 times
Match a single character present in the list below [-] where - matches the character - literally (case sensitive)
\d{3} matches a digit (equal to [0-9]) where {3} Quantifier — Matches exactly 3 times
Match a single character present in the list below [-] where - matches the character - literally (case sensitive)
\d{3,4} matches a digit (equal to [0-9]) where {3,4} Quantifier — Matches between 3 and 4 times, as many times as possible, giving back as needed (greedy)

Output from this script for different inputs:

 ~]# python3 regex-eg-1.py
Match

 ~]# python3 regex-eg-1.py
Enter telephone number: 123-456-1111
Match

 ~]# python3 regex-eg-1.py
Enter telephone number: 1234-123-111
No Match

Using `re.compile`

If you’re going to use the same regular-expression pattern multiple times, it’s a good idea to compile that pattern into a regular-expression object and then use that object repeatedly. The regex package provides a method for this purpose called compile with the following syntax:

regex_object_name = re.compile(pattern)

You will understand better with this example. Here I have used some of the python regex function which we learned in this tutorial. Now if you see we had to use the same pattern multiple times for different regex search so to avoid this we can create a regex pattern object and then use this object to perform your search.

#!/usr/bin/env python3

import re

line = "This is python regex tutorial using python3"
pat = r'\bpython\d'
print('using re.search: ', re.search(pat, line))
print('using re.findall: ', re.findall(pat, line))
print('using re.match: ', re.match(pat, line))

pat_ob = re.compile(pat)
print('Pattern Object: ', pat_ob)
print('using re.compile with re.search: ', pat_ob.search(line))
print('using re.compile with re.findall: ', pat_ob.findall(line))
print('using re.compile with re.match: ', pat_ob.match(line))

As you can see the output from first section without re.compile and second section with re.compile has same output:

 ~]# python3 regex-eg-2.py
using re.search:  <_sre.SRE_Match object; span=(36, 43), match='python3'>
using re.findall:  ['python3']
using re.match:  None
Pattern Object:  re.compile('\\bpython\\d')
using re.compile with re.search:  <_sre.SRE_Match object; span=(36, 43), match='python3'>
using re.compile with re.findall:  ['python3']
using re.compile with re.match:  None

Conclusion

Python regex is a very vast topic but I have tried to cover the most areas which are used in most codes. re.search and re.match can be confusing but you have to remember that re.match will search only at the first index position while re.search will search for the pattern in entire string.

We mostly end up using re.compile, you could perform these tasks without precompiling a regular-expression object. However, compiling can save execution time if you’re going to use the same pattern more than once. Otherwise, Python may have to rebuild a state machine multiple times when it could have been built only once.

Introduction to regular expressions

Different meta characters

. - dot (period)

^ - caret

$ - dollar

[] - square backets

\ - backslash

* - Wild character (Star)

+ - plus

? - question

{} - curly braces

| - alteration

() - parenthesis (group)

Special Sequences

Examples

Python regex

Iterative searching with re.findall

Example-1: Find all the digits in a string

Example-2: Find words with 6 or more characters

Example-3: Split all characters in the string

Example-4: Find all the vowels from the string

Example-5: Find vowels case-insensitive

The re.split function

Example-1: Split using whitespace

Example-2: Strip using whitespace from a file

Replace text using re.sub()

Example-1: Replace multiple spaces with single space

Example-2: Replace duplicates

Searching a string for patterns using re.search

Match Object

Example-1: Search for a pattern in a log file

Refining matches with re.match

Example-1: Match for a telephone number

Using re.compile

Conclusion

Related Articles

`.` - dot (period)

`^` - caret

`$` - dollar

`[]` - square backets

`\` - backslash

`*` - Wild character (Star)

`+` - plus

`?` - question

`{}` - curly braces

`|` - alteration

`()` - parenthesis (group)

Iterative searching with `re.findall`

The `re.split` function

Replace text using `re.sub()`

Searching a string for patterns using `re.search`

Refining matches with `re.match`

Using `re.compile`