Skip Links

Network World

Brian Egler

The International Dirty Word Database

By Brian Egler on Tue, 05/27/08 - 11:56am.

I recently made a reservation on American Airlines and noticed that my record locator was a six-character alphabetic code. This automatically generated code reminded me of a funny story regarding database design from early in my career. I was working as a consultant for a large multi-national automobile company in England, which shall remain nameless to protect the innocent. We were busy developing a purchasing system that would be used by the company's buyers throughout Europe. It was quite sophisticated for the time (around 1985) and included automatic bid generation and recording using an IMS Database on the IBM MVS platform and supported five native European languages - English, Spanish, Italian, German and Flemish. It was the first project I was on that spent some serious time and effort designing the database before the application. A great foundation.

The multi-lingual aspects of the application made it most interesting, forcing us into a data driven approach where all form prompts and messages had to be stored on the database in the five languages. When a buyer would logon, they could choose the language to be presented on the application forms which would be built dynamically at runtime. The original database design stipulated that the Purchase Order number would be a 6 character alphabetic code that would be automatically generated by the system starting with AAAAAA then AAAAAB and so on. This Purchase Order number would be sent out to suppliers of the large multi-national company and would be used for tracking with the external companies. All was well and good. The project was running on time, on budget.

However, one bright spark (maybe it was me) raised the issue of potentially naughty words appearing as purchase orders that would be generated over time. Using a very British example, let's say a purchase order of ‘BLOODY' might appear eventually. (I can think of other more fruity examples but I will spare you...). Can you imagine a reputable multi-national company issuing such a purchase order to an external company? Well, we mentioned this to the Project Manager who decided that this would be unacceptable (as well as embarrassing ...). So we were asked to avoid some obvious embarrassing combinations which we could hard-code in the application. However, the application itself was entirely data-driven anyway so the decision was made to create a data store that would contain the offending words. So we would create a "Dirty Word Database."

This would work quite well, because new dirty words might become in vogue and we could quickly eliminate the possibility of them being used as purchase orders. We would have to build a user interface of course so we could record the latest dirty words in the Dirty Word Database. Hang on, this was a multi-lingual system, so we couldn't just concentrate on English dirty words, we had to also exclude rude words in Spanish, Italian, German and Flemish. But the development team was hardly multi-lingual so we had to rely on our colleagues in the other European countries to help us populate this database. We could plan a multi-lingual dirty word conference with representatives from all five countries. Maybe we would meet once a year to record the latest vulgarities to be excluded. Can you imagine being at such a meeting? Translating naughty words into five different languages. Now that sounds like fun. (It reminds me of youth hostelling across Europe when I was younger). The end deliverable would be an "International Dirty Word Database." Maybe we could resell this "intellectual capital" to other companies? As you can probably guess, after a few minutes of literally crying with laughter, our Project Manager made the unilateral decision that the Purchase Order number would now be a combination of two alphabetic characters, followed by two numbers and another two alphabetics.

Spoil sport.
I wonder what American Airlines does? Keep an eye on those record locators!
Cheers
Brian

Recent blog posts...

Sliding Doors or Sliding Windows?

Database Design – build a blueprint for your database

Intellisense in SSMS at last

Avoiding dirty words the easy way

0

There is a similar story in the book "Programming as if People Mattered." The programmers argued about coding such a database, and a student intern suggested a simple fix that made the discussion moot: "Use base 31 and leave out the vowels."

Re: Avoiding dirty words the easy way

0

Yes, that's a fine solution: A-Z without the 5 vowels and 0-9. But that's not what American Airlines uses - my current reservation has 3 vowels in it (and no numbers). Watch this space...maybe someone from AA.com will respond...
cheers
Brian

Solution still not valid

0

Don't mean to be a stickler for detail, but to letters, two numbers, and two letters would still let your example dirty worth through (bl00dy).

In the UK there's a very mature trade in number plates that spell out phrases using combinations of letters & numbers in the UK's fixed formats (AAA 000A, then A000 AAA, then AA00 AAA).

It seems one of the other options offered would be more applicable.

Ian.

Re: Solution still not valid

0

Fair point. I think the Base 31 solution sounds best for this situation.
Thanks for the comment,
Brian

Remove aeou from a-z0-9 and

0

Remove aeou from a-z0-9 and get the base 31 character set, as mentioned also here:
http://www.15seconds.com/howto/pg001041.htm

Though you still get "fyck0ff", "d1rty", "d4mn", and so on.

Going for a structure with mixed characters and numbers seems quite safe (like "aa0aa0"), but for any human typed numbers better remove 0o and 1Il. Upper and lower-case might be used though - most people know how to type UPPER and lower case right? Make sure each order code has at least one upper and one lower case, to test verify that nobody is doing it all UPPER or lower case out of carelessness.

http://en.wikipedia.org/wiki/List_of_words_without_vowel_letters

Even if there would not exist today any dirty words without vowels, there might be some in the near future.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Welcome, visitor. Register Log in
About Brian Egler's SQL Server Strategies

Brian D. Egler, MCITP/MCSE/MCT 2009, is currently an instructor with Global Knowledge, teaching various Microsoft training courses. He is a SQL specialist with a focus on SQL Server, Windows, .Net and XML. Egler has been a technical instructor for over 20 years and has more than 10 years experience with SQL Server, data modeling, database design, application development including IMS, DB2, Sybase. Every year he runs the Boston Marathon for cancer research.

Global Knowledge