GETALP, a long history…

History

The GETALP team, which comes from the GEOD and GETA teams of the CLIPS laboratory, has a long history, longer for written language processing (GETA history => Lien EN) than for oral language processing (GEOD history => Lien EN), and is building a new common history.

Thematic

GETA (until 2006)

GETA (Groupe d’Étude pour la Traduction Automatique) is a multidisciplinary team of computer scientists and linguists. GETA’s research topics cover all theoretical, methodological and practical aspects of CAT (Computer Assisted Translation), and more generally multilingual computing. GETA was created by CETA (1961- 1971), a pioneering laboratory of MT in France.

At present, GETA remains active in the editor’s CAT, but since 1988 has been redirecting its research towards individual CAT, which comprises two components, the translator’s CAT and the editor’s CAT.

The translator’s CAT consists in offering linguistic office automation tools to translators (professional or occasional). It’s the man who translates. In this field, we work in cooperation with other research groups, which provide linguistic data or tools (lexicons, lemmatisers), and we are interested in computer problems related to the integration of these elements in a form usable by occasional translators, eager to use them from their favorite applications. We have recently developed and proposed, in collaboration with SITE-Eurolang, the Montaigne project, which aims to make the Eurolang-Optimizer software available to the scientific community via the Internet and to use it to create large terminology databases that can then also feed automatic systems.

The editor’s CAT is the main objective of the work in progress, grouped together in the LIDIA project. The basic idea is to offer a unilingual writer the possibility of writing in his or her own language and, at the cost of a dialogue of standardisation and disambiguation (which should be made as simple and user-friendly as possible), to be translated into several languages, without revision or with minimal revision. So it’s dialogue-based CAT (DBMT, for Dialogue-Based Machine Translation) and indirect preediting, but it’s the machine that translates. A first model, LIDIA-1, from French to German, Russian and English, has been specified and produced over the last three years.

GEOD (until 2006)

GEOD’s research area is speech and dialogue, to design interaction and spoken communication software and provide systems with a reliable and high-performance language component.

For more than fifteen years, the means of communication (mobile phones, Internet) and the media for the electronic dissemination of information (digital radio and television broadcasts) have experienced an ever-increasing boom. At the same time, the progress in digital information processing and computer technology has been enormous. This development has opened up promising prospects for many applications in the field of man-machine or man-man mediatized oral communication, but also for specific applications in the medical field such as remote monitoring of patients at home (intelligent housing). At the same time, thanks to the ease of storage due in part to highly efficient compression algorithms, the corpus of audio and video documents continues to grow. Virtually all multimedia information is now available in digital format and its exploitation opens the field to new applications for indexing and searching documents by content.

In this context, GEOD’s research theme is centered on Oral Interaction, articulated around two main scientific axes: Recognition (speech, audio and speaker) and Dialogue (modelling and understanding). For these two axes, there are still a certain number of locks linked to the genericity of the models: this characteristic remains an essential objective, located at the centre of our research concerns for the long term.For the Recognition axis, GEOD’s research efforts during the period 2001-2005 focused on two sub-themes: the development of multilingual recognition systems for continuous speech and the improvement of their robustness, and the use of speech and sound as a component of multimodal interaction in perceptual spaces. For the Dialogue axis, the objective has been the development of multimodal human-machine dialogue systems.

GEOD maintains privileged relations with the MICA laboratory (Multimedia, Information, Communication and Applications), one of the CLIPS laboratory’s foreign branches.

Domains

The research on these different types of CAT is centred around computer, linguistic and ergonomic themes.

Themes with dominant computer science focus

  • distributed architecture and distributed CAT systems (whiteboard technique)
  • specialized languages of the future, adapted to noisy inputs and interactive approaches
  • design and implementation of an IT platform for heterogeneous BDLMs
  • multilingualisation of software and encoding of multilingual texts for CAT

Themes with a dominant linguistic focus

  • declarative formalisms of grammatical specifications (“static grammars”) and representations of statements (m-structures, f-typed structures)
  • linguistic design and construction of multilingual lexical databases
  • design and testing of MT systems

Themes with a dominant ergonomic focus

  • organisation of standardisation and disambiguation dialogues
  • integration of multimedia, and in particular multimodal interactive disambiguation techniques in CAT of the editor
  • computer-assisted learning aspects of systems based on complex language skills

Groupe d'Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole