ORTOLANG Charter

Foreword

ORTOLANG is an excellence project approved in the framework of the investments for the future initiated by the French government. Its aim is to provide a network infrastructure (cf. list of partners) offering a repository of data (corpora, lexicons, dictionaries, etc.) and tools on language and its processing.

This repository must be readily available with regular additions which are as broad as possible to enable:

  • the efficient sharing of research data on language
  • easy use by scientists, teachers or the general public

The exact functionalities and contents of version 1 of the ORTOLANG EquipEx and internet site www.ortolang.fr are described elsewhere (see: annex to this charter).

How this infrastructure works and its success depends as much on the infrastructure as on its users. For this reason, this charter on the use of ORTOLANG has been written and compliance with it will enable the optimum use of resources made available to ORTOLANG users.

In the text below we shall use the following terms:

  • ORTOLANG refers to the institutions and personnel in charge of managing the platform and the www.ortolang.fr website, securing and distributing resources and creating or distributing the software and tools which appear on the site.
  • Contributors are the people or institutions which provide computer resources to aliment the ORTOLANG repository.
  • Users are the persons or institutions who use the resources available on ORTOLANG, for research, teaching or any other use, whether by partial or complete downloading or simple consultations.

Mutual commitments by ORTOLANG, its contributors and users

The ORTOLANG EquipEx is aware that contributors and users may have very varied needs and restrictions and so there is a wide range of configurations concerning the protection of resources and the conditions for their use.

The contributors are aware that depositing resources on a public management and archiving platform has a significant cost which means the contributor needs to commit to disseminating these resources.

The users of the resources are aware that the investments made by all involved in the corpus creation and dissemination chain must be made visible and thus justified to the institutions which fund the projects. Research must also be valued and so the intellectual property rights of contributors must be respected. To this end, the sources of the resources used must be cited and the use made of these resources must be made widely known.

The application of these principles leads to a mutual dissemination and promotion of all the means implemented by actors in linguistics research, the State (ORTOLANG’s main financial partner), the contributors and users.

ORTOLANG’s functionalities and commitments

  1. Depositing resources is completely free of charge though it needs to remain within volume limits linked to ORTOLANG’s capacities (This capacity increases year by year). The resources deposited can cover all language sciences and may be written, oral or multimodal.
  2. ORTOLANG is committed to the secure long-term preservation of digital deposits through using information technology means and the secure storage of resources. The duration of storage shall be at least as long as ORTOLANG’s existence which is guaranteed by the institutions represented in ORTOLANG (notably French Universities and CNRS, cf. list of partners).
  3. Depositing resources is free of charge for any source of resources from French research laboratories (That is to say, the financing of which is the responsibility of the French state) relating to any language or for any source of resources relating to the languages of France (The characterization “language of France” falls under the competence of the DGLFLF) whatever their origin. A specific agreement can be negotiated by the parties involved for sources with other origins. This kind of agreement may then involve a financial contribution.
  4. ORTOLANG receives, saves and disseminates all resources related to language sciences. The formats of resources that can be processed are only limited by ORTOLANG’s technical capacities.
  5. Certain data formats will benefit from specific additional linguistic processing particularly language corpora provided in formats which are known and used by the scientific community (this list of known formats may be enriched over time). Such formats will benefit from tools like full-text indexing, automatic syntactic processing, live visualisation, etc. Automatic processing is proposed in priority for French language data.
  6. Resources in formats which are compatible with the requirements for long-term archiving by the French National Archives services will be offered for archiving to specialised institutions through the TGIR Huma-Num’s system. The actual archiving by these institutions is the sole responsibility of the institutions and ORTOLANG will only play only the role of a contributing service in this process. When the archiving is carried out, information on this deposit will be included in the metadata of the resources.
  7. ORTOLANG undertakes to respect the data security limitations defined by contributors (see below) and thus undertakes to control the modification of data by contributors and how they are used.
  8. Resources which are deposited are given a unique perennial identifier. This enables a reference to resources to be disseminated which can be found at any time even many years after the initial deposit. The unique identification of resources requires a versioning system to be used. Deposited resources are therefore associated with a unique version number which is part of the permanent identifier. It is possible to make minor modifications to the resources (corrections of errors for example) without changing the version. A major modification may lead to the creation of a new version number. For cost reasons, a new version must contain at least 10% new data.
  9. Eventually, in the final version of ORTOLANG (end 2016), ORTOLANG undertakes to use supervision and statistics tools to make access and downloading statistics of their resources accessible to contributors.

Contributors’ commitments

  1. Contributions to ORTOLANG can only be made by identified persons. When an identifier is created this will be accompanied by the creation of a workspace allowing the deposit, preparation and formatting of contributions.
  2. To deposit resources on ORTOLANG requires exact knowledge of the rights of use for the resources, and, if the contributor is not the owner or the legal successor to the rights to the resources, possessing all authorizations to deposit from the owner or legal successor.
  3. Thus depositing resources involves:
    • being able to take all decisions concerning the use and dissemination of the corpus (particularly intellectual property rights);
    • to possess all the information about the sources of the corpus and have the consent of the persons registered or filmed.
  4. ORTOLANG is a public service financed by the State so to deposit resources, it is necessary to grant a right of use for these resources. This should include at least the right to use them freely for public scientific research by downloading or through a specific visualization or usage tool.
    • Additional rights are widely encouraged and different rights may be applied to subsets of deposited resources (Cf. annexe 2 : List of types of identification of users offered). There can be no payments for deposited resources.
    • It is possible to deposit resources to be opened for use by other people than the contributors at a later date (quarantine system) either for reasons of usage of the resources and the later benefits of the work, or for legal reasons.
    • The duration of quarantine periods linked to legal constraints is solely defined by French law and is not within the competence of ORTOLANG. The duration of quarantine periods linked to intellectual or financial rights is limited to the duration of the constraints (thesis grant, ANR funding, etc.) plus a maximum period of two years. Beyond this period, the fixed rights of use apply.
  5. All deposits of resources on ORTOLANG must be accompanied by a minimum set of descriptive metadata in the Dublin Core format along with ORTOLANG’s own specific administrative metadata. To help users construct this metadata, ORTOLANG provides an interactive metadata editor in the workspace when a resource is uploaded. This particularly enables users to give general descriptive information about the resource, the rights related to it and the various contributors to the resource. Depositing a resource implies acceptance of the entirely free distribution of the aforementioned administrative and descriptive metadata.
  6. Depositing resources may be accompanied by instructions for use including how to cite the resources used. ORTOLANG’s policy is to stipulate that the source of the resources, i.e. their permanent identifier, be cited as a minimum. The request to cite work, scientific references or intellectual property rights references is possible within a limit of three citations per corpus. Respecting these citations is part of the user charter and must be respected by all actors involved.
  7. Resource deposits by students or postgraduates who do not belong to a public scientific establishment is possible and indeed desirable. This must be carried out under the responsibility of the laboratory where they work.
  8. The deposit of incomplete resources (resources which are being digitised, audiovisual resources which are yet to be transcribed) is possible and indeed desirable (particularly for reasons of security) provided that the rights of ownership and use of data are clearly defined and described. This type of use makes the implementation of a quarantine system possible. However, the partial unfinalised data deposit will be made public and available for searches at the end of the quarantine period, regardless of the status of the data.
  9. For reasons of storage and operating costs, media type resources (audio, video, physiological data) can only be deposited if associated with linguistic annotations. If the creation of the resources is ongoing (as part of a research project for example), media data can be deposited in advance before the annotations are available as long as the usage rights are already clearly defined.
  10. All resources deposited can only be modified by the original depositor. If the depositor modifies the data or if the transfer of rights is modified, this leads to the creation of a new version number.
  11. Resources deposited by a contributor who does not possess the rights to the data may be removed from ORTOLANG particularly in the event of a legal conflict or proceedings.

Users’ commitments

  1. Users must log in to the site for any usages apart from using resources which are free of all rights. This makes it possible to control and apply the rights defined by the contributors.
  2. The use of resources for scientific or personal purposes must respect the constraints defined and implemented by the contributors. It also needs to respect forms of usage (see the French “Ethics and Big Data” charter).
  3. The distribution and viewing of the resources must respect the original rights. It is therefore not possible to publicly disseminate resources with restricted rights.
  4. Any resource used must be accompanied by its permanent ORTOLANG identifier and all citations requested by the contributors must be respected. The identifiers and citation methods for resources are specified in their metadata. All users of ORTOLANG resources are therefore required to include these in their bibliographic citations, acknowledgements, footnotes or licences depending on the conditions of use for the resources.

ANNEX 1 : ORTOLANG’s Content functionalities

Functionalities of the www.ortolang.fr site provided without identification

The bilingual French/English site www.ortolang.fr will provide all users with the following functionalities:

  1. Home page: This presents news about ORTOLANG and its new resources along with varied information on the project (Presentation, partners, roadmap, newsletters, the ORTOLANG charter, legal information, access to the community site, etc .).
  2. Presentation of ORTOLANG’s integrated resources with a search window:
    • Corpus
    • Lexicons
    • Outils
    • Integrated projects

For each resource, this means users can:

    • Access a descriptive for each resource,
    • Access an extract on display,
    • Download a resource in respect of the resource’s dissemination limitations.

Functionalities of the www.ortolang.fr site provided for users who log in

After identification by logging in, users can access all their workspaces which enable them (for each resource) to:

  1. Deposit the files which make up their resource
  2. Access the various files that make up their resource
  3. Edit their presentation metadata
  4. Save their workspace
  5. Follow any ongoing processing
  6. Access the tools which are applicable to their resource
  7. Obtain previews before publication
  8. View a history and a list of the members who have access to this workspace
  9. Submit their resource for publication

ANNEX 2 : List of types of identification of users offered

ORTOLANG will offer 4 identification categories for potential users of a resource:

  1. All potential users with no restrictions,
  2. Members of French higher education and research,
  3. All users registered on the platform,
  4. A specific group of users previously identified on the ORTOLANG platform.