About ORTOLANG – ORTOLANG

ORTOLANG (Open Resources and TOols for LANGuage) is an EQUIPEX project accepted in the framework of investissements d’avenir.

Its aim is to construct a network infrastructure including a repository of language data (corpora, lexicons, dictionaries etc.) and readily available, well-documented tools for its processing. Expected outcomes comprize:

promoting research on analysis, modelling and automatic processing of our language to their highest international levels thanks to effective resource pooling;
facilitating the use and transfer of resources and tools set up within public laboratories to industrial partners, notably SMEs which often cannot develop such resources and tools for language processing given the cost of investment;
promoting French language and the regional languages of France by sharing expertise acquired by public laboratories.

ORTOLANG is a service for the language, which is complementary to the service offered by TGIR Huma-Num (Très Grande Infrastructure de Recherche).

Objectives

ORTOLANG’s objective is to extend and preserve the endeavours of digital resource centres dealing with language:

CNRTL (Centre National de Ressources Textuelles et Lexicales)
SLDR (Speech and Language Data Repository)

ORTOLANG is also aiming at offering:

a technical platform on written and oral language, in support of coordination actions by TGIR Huma-Num,
scientific equipment compliant with initiatives by DGLFLF and BNF on culture-heritage aspects of languages spoken in France;
a French node of CLARIN (Common Language Resources and Technology Infrastructure).

Functions

Identification/preparation of data

cataloging existing resources and tools using sets of standardised metadata;
controlling and validating resources and tools: assisting authors on standards, norms and current international recommendations: XML, TEI, LMF, MAF and SYNAF;
upgrading of resources and tools.

Archiving

storage, maintenance and curation of resources and tools;
long-term preservation using the framework of TGIR Huma-Num in connection with CINES.

Dissemination:

assistance et support to users and installing procedures that will make it possible for them to take advantage of shared resources and tools regardless of their localisation and spatial location.

The ORTOLANG model incorporates the basic entities of OAIS model by specifying the correction / data enrichment cycle, made possible through archiving.

Complementary competencies

In order to achieve this, we have chosen to call for complementary competencies in our consortium with respect to:

language sciences with ATILF, LPL, MoDyCo and LLL,
computer science with LORIA and INIST but also partly ATILF and LPL,
database and access to scientific information through INIST, and linguistic resources with the two resource centres: CNRTL and SLDR.

Beyond bringing together these different disciplinary competencies, ORTOLANG’s objective is to federate partners covering the diversity of approaches to the study of language for its equipment of shared resources and tools:

linguistic modelling (MoDyCo, LPL and ATILF),
experimental linguistics (LPL, ATILF),
language production and perception (LPL, MoDyCo),
diachronic studies (ATILF, LLL),
sociolinguistics (LLL, MoDyCo),
automatic processing of language (LORIA, LPL, ATILF).