Using Bash's regular expressions

Bash has quietly made scripting on Unix systems a lot easier with its own regular expressions. If you're still leaning on grep and sed commands to get your scripts to do what you need from them, maybe it's time to look into what bash can do on its own.

Since version 3 (circa 2004), bash has a built-in regular expression comparison operator, represented by =~. A lot of scripting tricks that use grep or sed can now be handled by bash expressions and the bash expressions might just give you scripts that are easier to read and maintain. As with other comparison operators (e.g., -lt or ==), bash will return a zero if an expression like $digit =~ "[[0-9]]" shows that the variable on the left matches the expression on the right and a one otherwise. This example test asks whether the value of $digit matches a single digit.

if [[ $digit =~ [0-9] ]]; then
    echo "$digit is a digit"
    echo "oops"

If you're wondering what is meant by "regular expression", a brief explanation is in order. A regular expression is some sequence of characters that represents a pattern. For example, the [0-9] in the example above will match any single digit where [A-Z] would match any capital letter. [A-Z]+ would match any sequence of capital letters. The expression ^[A-Z]+$ would, on the other hand, match a string that contains only capital letters. Got that? ^ = the beginning of a string, $ = the end of a string and + = more of the same.

You can also check whether a reply to a prompt is numeric with similar syntax:

echo -n "Your answer> "
read REPLY
if [[ $REPLY =~ ^[0-9]+$ ]]; then
    echo Numeric
    echo Non-numeric

Bash's regex can be fairly complicated. In the test below, we're asking whether the value of our $email variable looks like an email address. Notice that the first expression (the account name) can contain letters, digits and some special characters. The + to the right of the first ] means that we can have any number of such characters. We then see the @ sign sitting between the username and the email domain -- and a literal dot (\.) between the primary part of the domain name and the "com", "net", "gov", etc. part. The comparison is then enclosed in double brackets.


read -p "Enter email: " email

if [[ "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$ ]]
    echo "This email address looks fine: $email"
    echo "This email address is flawed: $email"

Similarly, you can construct tests that determine whether the value of variables is in the proper format for an IP address:


if [ $# != 1 ]; then
    echo "Usage: $0 address"
    exit 1

if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
    echo "Looks like an IPv4 IP address"
elif [[ $ip =~ ^[A-Fa-f0-9:]+$ ]]; then
    echo "Could be an IPv6 IP address"
    echo "oops"

Bash also provides for some simplified looping. Want to loop 100 times? Just do something like this:

for n in {1..100}
    echo $n

And you can loop through letters or through various ranges of letters or numbers using expressions such as these. You don't have to start with 1 or a and you can move backwards through the list.


Want to see how these ranges work? You can also just try expanding them with the echo command.

$echo {a..z}
a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo {5..-1}
5 4 3 2 1 0 -1

What a swell shell!

2-Minute Linux Tip: Learn how to use the history command

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

Copyright © 2013 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022