Center for Informatics Research in Science and Scholarship

Graduate School of Library and Information Science
Center for Informatics Research in Science and Scholarship
University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Meeting the Challenge of Language Change in Text Retrieval with machine Translation Techniques

Convert a query in contemporary English to English terms used in text from Medieval times to the present.

Principal Investigator: Miles Efron

See Also: Project Web Page, Project Announcement

This project is funded by a Google Digital Humanities Award. The work aims to improve peoples' ability to find information in large collections of books, such as the collection created by the Google Books project.

In particular, we are focusing on historical language change. Google Books contains millions of books in English. But English is a moving target. Fourteenth-Century vernacular is very different from its 20th-Century counterpart. Thus a query issued in modern English will fail to find related middle English documents. People researching the history of a proverb such as many hands make light work or finding literary allusions to the Shield of Achilles (a common example of ekphrasis, a poetic trope) can find historically diverse passages only by issuing queries in many forms and styles.

To improve on this situation, we are using cross-language information retrieval models to inform the problem of retrieving passages from historically diverse corpora. The primary goal of this project is to posit statistical models (and build software that instantiates them) that allow a single query to retrieval relevant information in documents from a wide variety of English historical periods.

HomeSocio-technical Data Analytics > Machine Translation Techniques
CIRSS
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
cirssinfo@cirss.lis.uiuc.edu | (217) 333-1980 | [fax] (217) 244-3302
I3