On Monday September 21, 2020, the Circle presented an online seminar entitled “Artificial Intelligence, Machine Translation and Future Linguists Like You.” The presenter was Professor Graham Neubig from the Carnegie Mellon Language Technologies Institute.
Professor Neubig’s background is quite diverse in that it includes both computer science expertise and experience in interpretation and translation. Fascinated with computers from an early age, he had the opportunity as an adult to live in Japan as a professor in the Agricultural College at Kyoto University. During his tenure at the university he not only taught but also was called upon to act as interpreter/translator for various visiting delegations from other countries who came to tour the college.
In this fascinating presentation, Professor Neubig gave us an overview of machine translation, its technological underpinnings and its strengths and weaknesses. His first example was of a fairly successful machine translation from Japanese into English in which the machine was able to correctly identify the two personal pronouns used in the sentences (he vs. she) based on its ability to correctly associate the proper names in the sentences (Tanaka and Taro) with the appropriate gender.
A second less successful machine translation involved the translation of the Japanese word ko-do that has three meanings: cord, code and chord. In the English translation, the machine used the wrong word (code instead of chord) in a sentence related to music and the wrong word (chord instead of code) in a sentence which involved computer programming.
The professor then outlined the four most common methods of translation, all of which have their strengths and weaknesses. They are:
-Rule-based method: based on linguistic analysis, syntax transformation and word replacement. Limited by the number of linguists available to provide the necessary linguistic inputs in every language.
-Translation memory method: looks up the most similar sentence in the database but cannot generalize to new sentences.
-Phrase-based translation: looks up small chunks of a sentence and combines the chunks into full sentences. May result in sentences that lack fluidity and contain mistakes in syntax.
-Neural machine translation: feeds inputs into a probability model that predicts the next word. Better at syntax and fluency but can also make major errors.
It is important to note that all four methods are data driven. For widely diffused languages such as French and German, billions of sentences are available to pass through the machine translation process while lesser diffused languages have access to significantly less data. The disparity in available data may affect the quality of the resulting translations.
This led to a discussion of Artificial Intelligence which the professor defined as “the technology that does something that seems intelligent.” He also defined Machine Learning as “technology that learns from data to do something that cannot be done easily otherwise.” With the use of graphs, the Professor described the basics of neural machine translation as a sequence of mathematical operations capable of predicting each word in a sentence based on the probability of its use in a particular context.
In conclusion, Professor Neubig summarized what machine translation can and cannot do at the present time. Machines are good at capturing associations between words but not so good in cases where more than one step of reasoning is required. He believes that machine translation is good and will get better. However, it is not perfect, particularly in situations where little data exists, where non-literal translation is involved and where there are cultural implications which ideally would influence the choice of a particular word in a given sentence. In other words, machines are good at memory and speed but not as good as humans at reasoning. In the future, high quality translation may result from a combination of both human and machine translation.
The Gotham would like to thank Professor Neubig for this thought-provoking presentation. Thanks go also to Serene Su, Program Manager, for organizing this fine event.
By: Patricia Stumpp
[Unknown A1]Would be better to add “…so the overall translation quality of the latter is lower than that of the former…” or similar at the end of this sentence.
[Unknown A2]Would be better if we add “at this moment” at the end of the sentence.
[Unknown A3]“Speech” should be “speed”.