Archive Setup
Functional Architecture
The LAC uses the KA³ repository software as the underlying repository framework. The LAC repository complies with the OAIS reference model’s tasks and functions. Moreover, the LAC’s KA3 repository software is compliant with the Reference Model for an Open Archival Information System (OAIS). The KA³ repository software implements open interface standards such as the Open Data Protocol (ODATA), the International Image Interoperability Framework (IIFS), OAI PMH, and the Security Assertion Markup Language (SAML) based Shibboleth system. The LAC with its partner RRZK is regularly applying for funding to actively develop the repository software further.
Archive Information Packages (AIPs) together with descriptive and technical metadata in self-contained XML files are stored in a redundant files system (Archival Storage). Descriptive and technical metadata as well as annotations are represented in an ElasticSearch instance and accessible via an ODATA compliant API. Bundles are the main archival objects in LAC and a bundle typically consists of multiple digital file objects, which are related with each other (media files and annotations). For Data Management LAC uses an ElasticSearch database to enable efficient querying, navigation, and checking of referential integrity. Ingest is supported by means of a simple Administration Interface (currently only accesible to archive managers) with scripts for bulk ingest. The LAC uses bulk ingest, which is fed by custom ingest pipelines generating Archival Information Packages (AIPs) from Submission Information Packages (SIPs) provided by the data producers. Figure 2 documents the LAC ingest procedure.
After an initial advisory session with the depositors, the LAC receives the SIP. An inital sanity check is performed on the submitted files and the describing metadata, which then generates a report. If the check fails, i.e. there is information or files missing or in a wrong format, this is reported back to the depositor who updates the SIP accordingly. When the initial report indicates no more problems, the SIP conversion via one of the two conversion pipelines (Excel2BLAM or IMDI2BLAM for legacy data) is initiated. In the post-processing step, handles are added and the SIP is converted to pre-OCFL-structure. A final quality check is performed via a script and the dataset is visually inspected in the dev-interface for problems. After that the dataset is ready for archival storage.
For Access the LAC relies the Open Data Protocol compliant Object and Query APIs which support a Web interface for search and presentation. The data consumer has direct access to the archived objects via the web interface that uses the KA³ International Image Interoperability Framework (IIFS) based media API for presentation and can download the resources through the web interface. These dissemination information packets (DIPs) either consist of individual datastreams for metadata and data, or a packed archive of the the bundle. In addition, the LAC supports harvesting of its metadata via its OAI-PMH interface.
Responsibilities
As illustrated below the LAC Administration is in charge of technical coordination, preservation planning, strategic planning, engagement with national and international research data infrastructures, user interaction (with depositors and consumers), public relations and certification. The LAC management is responsible for data management, curation, archiving, quality assurance and the conversion of legacy data. The RRZK operates the repository software, provides the server infrastructure and is in charge of its administration, and monitors the repository software. Lastly, the DCH operates the frontend and is the technical contact for SAML-federations.
Data Management
Data can be submitted via Sciebo, a sync and share service for universities in North Rhine-Westphalia or via SOFS, the online storage system of the University of Cologne (this option is only possible for members of the university) or via drop-off of a physical storage medium. The data is then moved to the virtual machine of the Data Center for the Humanities for conversion and quality checks. Then, the data can be staged at the LAC-dev-repository and then published at LAC-prod (both managed by the RRZK). Lastly, a copy of the AIP is send to the Long Term Storage of the RRZK.