Data integrity

To check the integrity of the data stream, the repository generates a checksum (SHA-1) which is stored in the database. When a user asks for the data stream of a digital object, the checksum is used to retrieve the file stored in the filesystem.
 
The repository can only read and write the file, but not to alter it. Since a checksum is generated for each data file, the name of the file located on the filesystem is the checksum. This way, only one version of the same file is really stored on the filesystem. Once a file is stored on the platform, it becomes immutable and all new versions of a file will generate a unique SHA fingerprint.
 
There are three types of digital objects:
 
  1. A workspace which contains a root collection and several snapshots (collections locked by the publishing process)
  2. A collection which may include metadata files and contains elements (collection or data objects)
  3. A data object which contains a data stream (file) with metadata.

Metadata are processed the same way as the data. Clients write the metadata and send them to the repository after completion. A digital object can contain a dataset and several metadata. All changes to a digital object are recorded in the database. And these events can be viewed by the users and the administrators.

A workspace is a set of digital objects. Before publishing, digital objects are not available publicly (only for members of the workspace). A workspace member can do anything he/she wants (create, read, update, delete). No versioning exists before publishing. The producer can decide to publish a snapshot of his/her workspace in order to make the data available (with access control).

A snapshot of a workspace can no longer be modified and a workspace is made up of the whole set of its snapshots. The publishing workflow generates a new version for each digital object if the data or metadata have changed since the last version.

Producers must log in on the platform before depositing a file. This is a mandatory step to upload a file to a workspace. Only members of a workspace can make changes. All modifications are listed in the “Last activities” section of each workspace. Each data added to the repository is assigned to an owner (registered in the database). And each event related to a dataset can be reported with its associated owner.