Documentation Federal Monuments Office

The documentation was processed with the workflow illustrated below to create Open Research Data according to FAIR (Findable-Accesible-Interoperable-Reusable, https://www.go-fair.org) principles. To illustrate how the research data were generate the processes leading from one data representation to another are identified with thin arrows between the representations and are named with a notation like A. -> B. for the process that converts representation A. (original file system) to representation B. (file system with long lasting file formats). Thick downward arrows indicate result datasets of the Open Research Data Pilot like B.1. ZIP file on Zenodo / DOI.

Workflow BDA

The first process (A. -> B. Conversion) was to transform the digital resources of the original Federal Monuments Office documentation into long lasting archival formats. They were first checked for their file formats and then converted if necessary. The guidelines of Archaeology Data Service and IANUS were used to identify preferred file formats for long time archiving. PDF/A is used for non-structured data like reports, presentations, graphics or drawings. If there are any other formats for graphical representations, they are converted to Tiff. Lists in MS Excel format were converted to CSV format and MS Word to PDF/A. Autodesk .dwg or .dxf files were converted to appropriate formats like .dxf in case of .dwg and in both cases to SVG (Scalable Vector Graphics) and additionally PDF/A files as well. For this process a python program was written that converted the file formats and created in addition a metadata file with an identifier for every generated file.

The files were generated in the same file structure as the original documentation and a ZIP file was created (B.1. ZIP file on Zenodo / DOI) and uploaded to Zenodo getting a DOI under which it is accessible. In addition, the files were stored on a Google Drive under the same file structure. A script was written to retrieve the links to the single documents and these links were related to the file ids generated within the python program. This storage on the Google Drive allows the access to single files from the documentation.

The next process was the extraction of information according to CIDOC CRM Concepts (A. -> C.). In an Excel Spreadsheet five tables were generated to represent Structures (S20), Objects (E19), Research Activities (S4), Stratigraphic Units (A8) and Documents (E31) that were documented explicitly or implicitly in the documentation. The manual assignment and management of identifiers was crucial to relate the different entities represented in various source documents to each other.

The generation of a thesaurus (D. Thesaurus) and the alignment of the assignments to specific categories used in the documentation (A. -> D.) was another necessary step to formalize the documentation and prepare it for further processing. To align the thesaurus with other vocabularies its terms were matched to Getty AAT for topics and PeriodO for chronological information.

The next step in the workflow “C., D. -> E. RDF Conversation with SQL and Karma“ was the ingestion in a Postgres Database of the “C. Tables for Structures (S20), Objects (E19), Research (S4), Stratigraphy (A8) and Documents (E31)“ and the “D. Thesaurus for concepts under BBT and match to AAT and PeriodO“ to create the URIs for the RDF representation and relate resources to each other that have not been explicitly related in the tables. Tables in Postgres generated through SQL commands were used as input for the KARMA tool where the mapping to CIDOC CRM and extensions took place. The outcome is an integrated RDF representation of the Federal Monuments Office documentation that was transformed into various data products as results of the ORD Pilot project.

E.1 RDFs for documentation and Thesaurus on Zenodo / DOI
E.2 Text, shp files Zenodo / DOI
E.3 SPARQL Endpoint

From this CIDOC CRM RDF representation the process “E. -> F. Create instances for ARIADNE Catalogue“ generated another RDF file that included instances for the ARIADNE Catalogue represented in the AO-cat ontology. The resulting RDF was passed on to the ARIADNEplus consortium for integration into the ARIADNE portal (F.1 ARIADNE Portal) and will be available on the portal when processed by the consortium.