Skip Links

Network World

  • Social Web 
  • Email 
  • Close

EU helps machine translation with one million sentences

By Peter Sayer , IDG News Service , 01/21/2008
Newsletter Signup
  • Share/Email
  • Tweet This
  • Comment
  • Print

The European Commission is offering translation software developers free access to around one million sentences translated between 22 of the European Union's 23 official languages. It hopes the data will help improve the quality of a variety of language tools, including grammar and spelling checkers, online dictionaries and machine translators -- particularly in less well-served languages such as Latvian or Romanian.

The sentences are mostly drawn from the "Acquis Communautaire," the body of law that must be implemented by all new E.U. member states, and include the treaties, directives and regulations adopted by the E.U., and rulings from the European Court of Justice.

Translated by professional translators, they cover topics such as IT, telecommunications, labor law, agriculture and fishing.

The translations form part of the "translation memory" used by the Commission's permanent staff of 1,750 translators, and are matched up, sentence by sentence, in each of the 22 languages, and are tagged with subject classifications.

The matching and tagging makes the sentences especially useful for developers of statistical machine translation software, who must amass a corpus of thousands of matched sentences in the languages between which they wish to translate, so that they can calculate the most likely translation for any given expression. Since the matching of sentences has already been done, they will save time -- and the immense size of the Acquis Communautaire will help them make their calculations more accurate.

Until now, developers have typically resorted to scouring the Web for texts translated into several languages, and using other software tools to make a guess at where sentences start and end in order to match them up.

While the release of the data will benefit software developers, the Commission is not being entirely altruistic: it hopes that the availability of better, cheaper automated translation software will help speakers of the E.U.'s minority languages by giving them access to online information currently available only in the more widely spoken languages.

Interested developers can download the texts from the Web site of the Commission's Directorate General of Translation. They will also need the text extraction program and its library.

  • Share/Email
  • Tweet This
  • Comment
  • Print
Partner Content

Explore the Ultrium Edge

The powerful tape technology can address data security with tape encryption as well as long term data protection.

Find Out More

Disk and Tape Square Off

Discover what disk and tape really cost and which solution provides lower total cost of ownership and optimizes energy use for your organization

Download this White Paper

Don't Fall for the Myths

The Clipper Group explores the truth behind the myths of tape, digging into the misconceptions in the disk vs. tape debate.

Review this information

information examination

An examination of information security issues, methods and securing data with LTO-4 tape drive encryption

Read this analysis

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed