Machine learning and forgery

A new algorithm from University College London researchers creates forged handwritten text that never was

grammarly 0

There’s no doubt that pretty much everything humans do can be sliced, diced, and replicated by algorithms so it’s not surprising that recent work by Tom S. F. Haines, Oisin Mac Aodha, and Gabriel J. Brostow, researchers at University College London, has resulted in the fall of yet another bastion of being human: Handwriting. How did they do it? Machine learning.

Their paper, called "My Text in Your Handwriting," describes software that semi-automatically analyzes a sample of a handwriting, then generates whatever text you want in what looks like the identical style of the original handwriting sample. UCL’s press release explains:

... the machine learning algorithm is built around glyphs – a specific instance of a character. Authors produce different glyphs to represent the same element of writing – the way one individual writes an “a” will usually be different to the way others write an “a”. Although an individual’s writing has slight variations, every author has a recognisable style that manifests in their glyphs and their spacing. The software learns what is consistent across an individual’s style and reproduces this.

The semi-automatic part of the process is an important limitation because, according to the paper:

The automatic segmentation and labeling described saves the user substantial effort. However, mistakes in the output are immediately obvious, such as a wrong letter, a badly rendered line, and unusual ligatures. … For this reason, convincing handwriting requires a human user to perform the one time task of interactively correcting mistakes, as the presented algorithmic approach is not flawless.

screen shot 2016 08 19 at 4.21.31 pm University College London

"System diagram showing our processing pipeline, with representative images for each stage. After samples are collected and analyzed, the rendering system selects a glyph to represent each character, e.g. “e” as shown here. If there are many choices, it must choose one that fits the surrounding text. The glyphs are then positioned on the page, and ligatures added if the author uses joined up writing. Two example words are given for these three stages, “quietly” and “queuing.” Finally, the texture is transfered from the original input to the vector output and, if being printed, color correction is applied."

You can read the paper online and the source code is supposed to be coming soon. This will, I suspect, result in a tsunami of forged documents. For example, here’s a sample of Sir Arthur Conan Doyle’s actual handwriting:

sacd real 2 University College London

Sir Arthur Conan Doyle's real handwriting

And here’s something he never wrote:

sacd fake University College London

Sir Arthur Conan Doyle's fake handwriting

It's true, Sir Arthur never wrote those words.

This is all very cool but it will (not "might") become a nightmare for the antiques business and book sellers; for example, just imagine how inscriptions could be added to old books to “enhance” provenance. Of course, the other side of this forgery technology is a deep, algorithmic understanding of handwriting that can be used to detect forgeries … 

Given how convincing the computer generated handwriting is, some may believe the method could help in forging documents, but the team explained it works both ways and could actually help in detecting forgeries.  

So, here’s the interesting thing about that assertion: Let’s say you have two systems, one generating fakes and another detecting fakes. When a fake fails you then present a number of alternative fakes of the same text to the detector and retain the ones that pass (this is a great opportunity for using a genetic algorithm). At some point the faker will get so good that the detector will always be fooled. You, the forger, would then take the now highly refined faker into the real world and hilarity will ensue.

Of course, there’s more to document forgery than just handwriting; there’s the chemical and physical attributes of whatever you’re writing on and the ink or other medium used for the writing. Truly high-quality forgery is complex and expensive but when you consider the value of, for example, celebrity signatures, you can see why the effort would pay off. An article in the UK’s Telegraph looked  the top ten most valuable signatures and at number 10 was Marilyn Monroe whose John Hancock sold for £6,950 while James Dean’s, at number one, “is said to be worth £18,000 on the open market.”

If you’d like to see more generated samples to test whether you can spot the fakes, check out page 26 of the paper with the additional notes where the authors present originals and fakes side by side for you to try to figure out which is which (I only spotted the fakes in 3 out of the 10 examples while the paper claims their test panel could detect the fakes 60% of the time).

Comments? Thoughts? Send me your handwritten thoughts or comment below then follow me on Twitter and Facebook.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2016 IDG Communications, Inc.

SD-WAN buyers guide: Key questions to ask vendors (and yourself)