Generating regexes and Gmail filters

Mark Gibbs is very impressed with a service that generates code for regular expressions and he has found that Gmail now supports importing and exporting filters. His happiness knows no bounds.

Before I launch into my main thrust (a word that must be pronounced with a rolling "r") I have to direct you to a work of near genius, txt2re, an online regular expression code generator.

If you aren't au fait with regular expressions (also called regexes or regexps), they are formal descriptions of searches to be conducted on sequences of characters (or "strings") by a regular expression processor, that is, a program designed to process "regexes" (see the Wikipedia entry on regular expressions).

Regex is useful for jobs such as mining server logs and searching data files and txt2re makes generating code in Perl, PHP, Python, Java, Javascript, ColdFusion, C, C++, Ruby, VB, VBScript, J#.net, C#.net, C++.net or VB.net that perform these searches incredibly easy.

To use txt2re you give the service an example string and it shows you the substrings it recognizes and lets you select which ones you want to include in the output.

I did, however, say "near genius" as txt2re seems to have a bug that means the service doesn't always identify all of the "findable" substrings correctly. I was using txt2re to generate JavaScript code based on the following example entry in an Apache server access log:

192.168.10.11 - bob [16/Mar/2009:13:14:15 -0800] "GET /gibbs.gif HTTP/1.0" 200 5648

Txt2re failed to offer to treat the last digits in the string that show the data length as an integer – it only offered them as four individual digits (see here), which would be useless if the data length was five digits long.

The solution was, oddly enough, to change the IP address in the example string to 1.1.1.1 and voila! I got the code I needed (see here). Despite this bug, the concept is way cool and a little creative tweaking of either your example or the generated code will get you the code for exactly the regex search you need. (Here is a telephone number parser in JavaScript I created using txt2re).

So now, onto my main thrust: I love Gmail. Yes, I know that there are potential problems because, as a free service, there's no guarantee that your e-mail won't vanish one day and you'll have no recourse and occasionally the service goes unavailable, but I've had those same problems with services I've paid for.

Quite some time ago I decided to drop my e-mail service provider, Everyone.net. I'd been using Everyone because the mail service that came with my Web service (hosted by EasyCGI) had no spam filtering at the time.

I dropped Everyone because the company didn't provide support unless I was willing to spend more money, and I also realized I could use Gmail — which has very good spam filtering — for free. My kind of deal. Not only would this cure my spam problem for free but I'd also get excellent searching of my e-mail and could access it from anywhere, including from Outlook via IMAP.

Anyway, I've now been happily using Gmail for more than a year and I have set up lots of filters to categorize the e-mail I receive. And when I say "lots" I'm not kidding – I have hundreds. My problem is I recently realized I have many filters that should be combined. The trouble is doing this through the Gmail filter page would be, to say the least, painful.

Turns out that on the "Labs" tab under Gmail's settings there's a new option that enables the export and import of filters in an XML format (when enabled, the import and export features are located at the bottom of the filters page). This is fantastically useful if you're going to move your filters to a new Gmail account. Now I need to figure out how to efficiently edit the exported filters, which will be the topic of a future column – suggestions will be welcomed.

Join the discussion
Be the first to comment on this article. Our Commenting Policies