Database and Digital Sourcebook

Transkribus models

Update: 15th December Noscemus GM v1 published

The first Transkribus model of NOSCEMUS trained by Stefan Zathammer was published on 15th December 2019. It is available for every registered user in Transkribus and called: Noscemus GM v1. The model is able to read texts set in Antiqua-based typefaces from the 16th, 17th and 18th century with a high level of accuracy and consistently outperforms most of the standard OCR engines. Although it is tailored towards transcribing (Neo-)Latin texts, Noscemus GM v1 also provides convincing results for other languages such as French, Italian and English. The Noscemus model is therefore able to offer help not only to Neo-Latinists, but to all kind of researchers dealing with larger text corpora from the Early Modern Period.

The model is based on training data from the project’s Digital Sourcebook and comprises at present (December 2019) around 1,000 fully corrected pages. In order to give the user a maximum of freedom, standardizations in the transcription process have been kept to a minimum. Normalizations have been implemented only in the following cases: ligatures (e.g. æ, œ, ct, ff) and abbreviations (e.g. -que, -us, -tur, …mm…, …nn…) have been expanded, long s (ſ) transcribed as a normal s, and small caps transcribed as majuscules.

In the model’s current state there remain a handful of known issues: There are occasional inconsistencies in the transcription of quotation marks; the error rate for the transcription of Greek words or passages is still high; to a lesser degree the same applies to words set in (German) Fraktur.

For more information see also the post on the Transkribus HP

How to guides: Official Wiki | LaTeX-Ninja (English) | B. Denicolò (German)


 Semantic Database
Heffter, Museum disputatorium
Heffter, Johann Carl, Museum disputatorium, vol. 1, Zittau, 1756.

A tripartite semantic database for authors, works (constituting the centerpiece) and secondary literature is compiled by all team members and serves as a working tool for all of them in turn. Representativity is ensured by using the categories of era, literary form and scientific discipline as a heuristic grid. The database keeps growing over the whole project duration and will comprise c. 1,500 works in the end.

Link to the Database

 

Digital Sourcebook

From the works listed in the database, c. 200 particularly typical items will be published online, resulting in a digital sourcebook – the first systematic selection of early modern scientific literature in Latin, providing a clear idea of the whole breadth of the field. Each work will be presented in a short introduction based on the informations given in the database. In addition to a facsimile, the text will be converted into a digitally searchable format, making use of the transcription platform Transkribus run by the Digitisation and Digital Preservation Group (DEA) of the University of Innsbruck. If a freely available translation exists, a link to it will be added. Each item will be referenced to similar datasets in the database so that it can be used as a starting point for research in a certain field. After the end of the project, the database and the sourcebook will remain accessible via the Central Computer Service of the University of Innsbruck and the research data repository Zenodo.

Link to the Digital Sourcebook

Nach oben scrollen