Laboratory Life

Dr.Bartholomäus Wloka, a guest from the Centre for Translation Studies, University of Vienna gave a talk on Harvesting Parallel Corpora for Machine Translation. The talk is given in English. Machine translation(MT) is used to translate the talk and translate in Q&A session between English and Japanese.


In this talk, Dr. Wloka talked about overview of machine translation(MT) and introduced projects at the University of Vienna including CEF. AT and harvesting parallel corpora. CEF AT is an automated translation system that is trained using English-German language resource, domain-specific data, parallel corpora, and terminology data. The translation results are fluent and good for morphology and word order.

Since, most MT approaches rely on parallel corpora, the more it adheres to a certain domain the better the result. Quality and coverage are equally important. They aully automated a process to harvest parallel corpora from WIkipedia focusing on English and Japanese language. Collecting parallel corpora is a challenging task. Since we are face with an increasing volume of meaningless data, intelligent and selective automatic methods of collecting data are very important. Wikipedia is one example of a content-rich source, but other sources also need to be explored.