Computer Reads Old Handwritten Texts

10.11.2015

Handwriting is as unique as people. Nevertheless computers are nowadays able to decode historical handwritten texts automatically. Within the framework of the EU projects tranScriptorium and READ researchers are going to make this technology accessible to scholars, archives and the general public. At the same time this will further improve the software and the computer algorithms used.

*Computer Reads Old Handwritten Texts (Photo: Screenshot Transkribus)*

Having problems reading your grandfather’s letter written in Kurrent (old form of German language handwriting)? You may soon be able get digital help. For many years researchers worldwide have been working on automatically decoding historical documents with computers. “Basic research regarding handwritten texts has made a lot of progress. Now it is time to make the research results useful to the general public,” says Günter Mühlberger, head of the Digitization and Electronic Archiving group at the University of Innsbruck. With his team he is leading the development of a service platform specifically aimed at archives and historians. “With algorithms developed by the Technical University Valencia, Spain, and the National Research Centre in Athens, Greece, we are able to decode 70 to 80 percent of a document automatically.” However, the computer programs have struggled particularly with the complex layout of historical documents, the diversity of handwriting and the different languages, which have also undergone great changes over the course of time. “First, the computer has to recognize the location of the text and then each single line – a technical challenge that should not be underestimated,” says Mühlberger.

Counting on crowd support

Günter Mühlberger’s team wants to make the know-how accessible to humanities scholars and the general public alike and, thereby, enhance the technologies collaboratively. The team is supported by the European Union, which provides funding for the continuation of the work started in the tranScriptorium project. The new project called READ (Recognition and Enrichment of Archival Documents) will take up the work carried out over the last three years and make the project available to several user groups. “Together with research groups from Germany, Finland, France, Greece, Great Britain and Spain we are going to develop a service platform that can be used by everyone working with historical handwritten texts,” says Mühlberger. “We also count several archives that provide documents among our project partners.” The computer algorithms used need to be trained to constantly improve the handwritten text recognition. “That is why we not only want to invite researchers and scholars from the humanities to use the new infrastructure but also the general public as volunteers. The more people work with our software for handwritten text recognition, the better the algorithms will become.”
The software and the participation of a variety of users should eventually enable the user to quickly decode grandfather’s letter written in Kurrent. Many historical documents such as land and parish registers, letters, lists of names (immigrants, passengers, etc.), and minutes, among others, will be machine-readable within the next few years. The researchers are also planning to provide an app for smartphones that will enable the user to directly scan a handwritten text. To motivate volunteers to participate handwritten texts of famous people will be collected and made recognizable automatically. “You can then search for these digitized handwritings on the computer. This will save you from laboriously transcribing texts and will enable a direct access to the documents,” says Mühlberger. “This automated text recognition also allows you to search for other handwritten texts of a particular person, which has not been possible so far.”

Establishing a European research infrastructure

Considering the fact that contrary to documents collected in libraries those gathered in archives are, in general, unpublished and unique, i.e. only one copy exists, the scope of the project becomes clear. These documents primarily reflect the daily life of individuals, for example in the form of a short note in a birth or death register, an entry in a land register, a file in court proceedings or a note in police reports. It is the goal of the EU funded project READ, coordinated by the University of Innsbruck, to give the public and researchers access to these historical treasures. The project is funded by the framework program Horizon 2020 with a total of 8.2 million Euro. Project partners are European universities, research institutions and archives. The project starts at the beginning of 2016 with a duration of three and a half years and its goal is to establish a research infrastructure for the European academic community. Since it takes huge computational processing power for the copious quantity of automatic handwritten text recognition, the researchers in Innsbruck are going to closely collaborate with the Research Centre High Performance Computing at the University of Innsbruck and the Vienna Scientific Cluster. “Among the reasons why the EU has granted this project are certainly the existing structures and resources of high performance computing available at the University of Innsbruck,” underlines Mühlberger.

Experts and laypersons may register, try out and download an experimental version of the software at transkribus.eu.