zum Inhalt springen

Preservation Plan

(version 1.3, 2023-02-09)

This document describes all digital preservation policies and activities relating to the Language Archive Cologne. The policy outlines a plan to support sustainable preservation and access to the objects kept by  Language Archive Cologne (LAC) for the foreseeable future. This policy is subject to change.

For the purpose of this policy, preservation will include any activity with the purpose of ensuring access to digital material kept by LAC. These activities include, but are not limited to, provision of repository services, storage, policy development, and methods involved in preserving digital items and the information contained in those items.

Policy Statement

The LAC mission statement defines the archives goal as “preserving digital audio and video data, annotations, and other digital language data in the long term, and by promoting and disseminating these resources“ (https://dch.phil-fak.uni-koeln.de/bestaende/language-archive-cologne/user-guides/mission-statement). The LAC follows an active preservation policy to ensure reliable access to usable data for research and teaching.

Preservation Planning

The responsibility for preservation planning lies with archive management. Archive management determines the needs and priorities for sustainable preservation and updates the Digital Preservation Policy of LAC according to its long-range preservation planning. Preservation planning is conducted in consultation with the Data Centre for the Humanities and with the scientific guidance by the Department of Linguistics.

Implementation of the Digital Preservation Policy

The responsibility for implementing the digital preservation policy lies with the archive management. Archive management and archive staff implement this policy in collaboration with the computing center of the University of Cologne.

Identification of Content

The general scope of this policy covers all digital objects archived at LAC.

Auditing

As part of a digital preservation process, LAC checks data integrity on a regular basis. An automated integrity check is manually initiated every quarter. Auditing data integrity is part of the archive management workflow.

Levels of Preservation Support

The LAC Digital Preservation Policy recognizes two levels of preservation. The preservation policy covers the formats listed in the LAC File Format Whitelist (https://lac.uni-koeln.de/docs/format-whitelist.html).

Level 1

Level 1 is the highest level of preservation support. At this level, LAC guarantees the full functionality of the original digital object. This includes the integrity of the file (viability), ability to display the file for viewing (renderability), the availability of information to ensure future intelligibility of the preserved objects (understandability), and migration to successive formats.

Actions

The archive monitors file formats for changes that might warrant transformation or reassessment (obsolescence watch). In case of format obsolescence, migration to a successive format is conducted for all affected objects. Proprietary formats (in particular video formats and PDF) may present challenges to some preservation activities. In the case of video objects, video container formats (MPEG–4 Part 14, QuickTime File Format) are monitored separately from audio-video coding formats (H.264/MPEG–4 AVC, AAC, PCM).

Additionally, Level 1 preservation includes all Level 0 basic preservation activities.

File types

Audio:

  • WAV: LPCM (.wav)
Video:

  • MPEG–4 Part 14: H264/AAC (.mp4, .m4v)
  • QuickTime File Format: H264/PCM (.mov)
Annotations:

  • ELAN Annotation Format (.eaf)
Other file types:

  • PDF/A (.pdf)
  • UTF–8 (.txt)

Level 0

Level 0 of preservation support pledges best effort to maintain viability (integrity of the file) and availability of information to ensure future intelligibility of the preserved objects (understandability). Files of this type will not be migrated to successive formats nor updated to new standards.

Actions

The archive provides metadata and a persistent identifier to every object and stores a checksum alongside every AIP file. The archive ensure bitstream maintenance, on-site and off-site backup copies, as well as periodic refreshments to new storage media.

File types

Annotations:

  • Praat TextGrid (.TextGrid)
  • Exmaralda transcriptions (.exb)
  • TEI (in particular ISO 24624:2016) (.tei, .xml)
  • FLeX XML (.flextext, .lift)
  • Toolbox Files (.tbt, .sht, .txt)
Additional metadata:

  • CMDI metadata profiles (other than BLAM) other XML Metadata formats

Image:

  • TIFF (.tiff, .tif)
  • JPEG2000 (jp2, .j2k)
  • PNG (.png)
  • JPEG (.jpg, .jpeg)

Other file types:

  • CSV encoded data and metadata (preferably with W3C Metadata Vocabulary for Tabular Data annotations)
  • XHTML (.xhtml)