Python: Using Regular Expressions

And they asked him, “Are you the one who is to come, or shall we look for another?” And Jesus answered them, “Go and tell John what you hear and see: the blind receive their sight and the lame walk, lepers are cleansed and the deaf hear, and the dead are raised up, and the poor have good news preached to them. Blessed is the one who is not offended by me.”

Matt 11:3-6

I'm a teacher by vocation and gifting; I love to teach. By corollary, I love to learn. I'm curious about everything and I have a blast learning about almost anything. Over the years, I've picked up quite a bit of information through the Internet. This website is my simple attempt to pay back; to contribute to the community and try to make the interwebs a better place.

As such, I don't require registration. I don't want your personal information. I don't set cookies. I won't put up advertisements. If I mention a product or service, it's because I use it, benefit from it, and I think others would too. I don't need, or want, your money. I simply want to offer some information to other curious characters.

I'm writing this site as an experiment. It's all written in a text editor. I make my own graphics, write my own JavaScript, CSS, PHP, and HTML from scratch because I'm trying to learn more about how all these pieces fit together. I use TextWrangler because I think it's a great text editor and it does everything I want in a very simple and efficient way.

If you want to contact me, try FaceBook or Twitter.

September 24, 2014

This is the second in a series of posts I'm writing as I explore some of the cool features of the Python programming language. I've just started learning about Regular Expressions and though there is a lot of depth some parts are quite simple. I only seek to present some of the easy stuff here so this post will be an incomplete introduction.

I'm scripting in Python v3.4.1, which is the current stable version and is what the code in this post is based upon.

Regular expressions are a powerful way to search, replace, parse and maybe even construct complex patterns of text in strings. The re module must be loaded at the beginning of the script (or at least before you use it) and then there are several functions available for working with patterns in strings. You can read about the available functions for yourself in the Python documentation. Let's look at a simple example and then take it apart to see what it's doing. You can copy and paste this code into a text editor and try to run it.

#!/usr/local/bin/python3.4 import re #import the re module text_to_search_through = 'This is just a bit of text with numbers 13579.' text_to_search_for = '[0-9]' result_of_search = ''.join(re.findall(text_to_search_for, text_to_search_through)) print(result_of_search)

The result is just a string '13579' as you would probably expect. If I hadn't used the join method on an empty string then re would have returned a list with five elements--each digit that matches pattern [0-9]. The square brackets are used to create the pattern to be matched. In this case I wanted to match the digits 0 through 9. If I wanted to match any digits in the range 2-6 inclusive then I could simply make my pattern [2-6]. Change the pattern to [a-z] and see what happens:

#!/usr/local/bin/python3.4 import re #import the re module text_to_search_through = 'This is just a bit of text with numbers 13579.' text_to_search_for = '[a-z]' result_of_search = ''.join(re.findall(text_to_search_for, text_to_search_through)) print(result_of_search)

The result 'hisisjustabitoftextwithnumbers' may not be exactly what you expected, but it isn't too hard to figure out what is going on. We are looking for lower case letters so the initial 'T' doesn't match the pattern and thus gets left out. Same with the numbers and the period at the end. Spaces aren't part of the 26 letter alphabet either so everything gets mashed together. That's easy enough to understand. Let's say we want to add the capital letters so we pick up the 'T' at the beginning. It seems logical to add [A-Z] to our search. Try if for yourself:

#!/usr/local/bin/python3.4 import re #import the re module text_to_search_through = 'This is just a bit of text with numbers 13579.' text_to_search_for = '[A-Z][a-z]' result_of_search = ''.join(re.findall(text_to_search_for, text_to_search_through)) print(result_of_search)

Now the result 'Th' might be truly surprising, but remember that re is matching patterns--not characters. It's looking for the pattern of an upper case letter followed by a lowercase letter. That may not be what you expected or even what you wanted, but it certainly illuminates a bit more of how re works. So how do we get all the lowercase and uppercase letters. Just add search values inside the metacharacters like this: [a-zA-Z]. If you want to also get the numbers just add the range you want. Do you want to get spaces and even the period at the end? Just add them: [a-zA-Z0-9. ].

We have barely scratched the surface of re, but that should be enough to get a curious character started down the path. A journey of a thousand miles...


All content and graphics copyright (c) 2012-2017 Brian Dentler - all rights reserved.