Curing Web insanity
There is little doubt that the Web has gone insane. Where once you could meander happily around the 'Net loading pages, you now find pages opening over the page you wanted (pop-ups), under the page you wanted (pop-unders) or opening when you leave the page (more pop-ups). There are also pages that force a refresh, pages with nosy JavaScript, pages with acres of blinking text and countless other pages that just generally tick you off.
So you have two choices - fix the problem or stop using the Web.
The latter being impractical, we'll go for fixing the problem and the solution is a utility - and a free utility at that! - called Proxomitron, the creation of Scott Lemmon. You can find this fantastic tool (do you think we're a little excited?) at the wonderfully named spywaresucks.org
Proxomitron is a simple idea: It's a proxy server that can parse Web pages and match patterns in the text of the retrieved HTML code to look for code that will do something you don't like.
Here's how it works: When a Web browser requests a URL from the proxy (which runs on your PC or any machine you please), the proxy retrieves the URL contents and attempts to match the text in the contents with rules defined in Proxomitron.
When a pattern match is found (say for a pop-under ad) Proxomitron changes the code into a comment that doesn't get displayed by the browser. Optionally, new code can be added based on the original code.
Before we get into how the tool is configured and how it works with your browser, we should first cover how it matches text patterns.
The tool has its own text-matching language that is a lot like regular expressions (see this column) but with some additional wrinkles. The rules are in several parts, the most important of which are the matching expression and the replacement text. For example, if the matching expression is:
\1 <body> \2 </body> \3
And the replacement text is:
\1 <body><b>All gone!</b> </body> \3
Then the page contents as defined between the body tags would be replaced with "All gone!" in bold text. The specifications "\1" etc. are variables that store the text that follows the start of the input text or the last matched text to the next matched string.
Thus, if in our last example the requested Web page read:
<html>
<head>title>My page</title></head>
<body>Howdy!</body>
</html>
The output will be:
<html>
<head><title>My page</title></head>
<body><b>All gone!</b></body>
</html>
The \1 variable held the text "<html><head><title>My page</title></head>", the \2 held "<body>Howdy!</body>", and so on. Actually, this is a very primitive rule because the <body> tag could contain an attribute such as <bodybackground= "mybg.gif">, which would cause the rule to fail. We can solve that by doing this:
\1 <body (*|)> \2 </body> \3
Here the string "(*|)" in the matching expression means that any sequence of characters (that's the "*") or (that's what the "|" character means) no characters can precede the closing ">". You can't use "*" by itself to match any character because the rule will fail - obviously not what we want.
So consider a page that contains the dreaded blinking HTML text (to be distinguished from animated GIFs and DHTML tricks that do the same thing). Under Proxomitron, the following rule will find both the opening and closing blink tags (note that a rule will be applied repeatedly to the incoming text):
<\1blink>
Proxomitron's Replacement Text would be:
<\1b>
Thus, "<blink>Isn't this annoying?</blink>" would become "<b>Isn't this annoying?</b>".
Next week, we'll delve further into the depths of Proxomitron. Match your text at gearhead@gibbs.com.
RELATED LINKS
Comments and suggestions to gh@gibbs.com.
Gibbs Forum
The place to discuss Gibbs's columns.
Check out this week's edition of
Backspin for more musings from Gibbs.

