Best paper award for GETALP at TALN 2019

à Toulouse du 1er au 5 juillet organisé conjointement avec la Plateforme d’Intelligence Artificielle, Loïc Vial, Benjamin Lecouteux et Didier Schwab ont obtenu le prix du meilleur article pour Compression de vocabulaire de sens grâce aux relations sémantiques pour la désambiguïsation lexicale.
Cet article présente une méthode originale qui pallie le manque de données annotés de bonne qualité et qui permet d’obtenir des résultats qui surpassent largement l’état de l’art sur toutes les tâches d’évaluation de la désambiguïsation lexicale.
Pour rappel, la Désambiguïsation Lexicale est une tâche qui vise à clarifier un texte en assignant à chacun de ses mots l’étiquette de sens la plus appropriée depuis un inventaire de sens prédéfini. Il s’agira, par exemple, de préférer dans la phrase La souris mange le fromage le sens de rongeur plutôt que le sens de dispositif électronique pour le mot souris. Ces travaux sont exploités par les auteurs dans plusieurs applications du traitement automatique des langues comme la traduction automatique ou pour concevoir des outils destinés à établir une communication alternative par exemple pour pour des personnes maîtrisant pas ou peu la langue ou des personnes en situation de polyhandicap.

2 seminars from Steven Bird in january 2019

Wed 9th January at 2pm – room 306 batiment IMAG
Scalable Methods for Working with Unwritten Languages 1: Interactive Respeaking
Mon 14th January at 2pm – room 306 batiment IMAG
Scalable Methods for Working with Unwritten Languages 2: Talking about Places and Processes
Steven Bird, Charles Darwin University
Computational methods offer exciting new possibilities for recording and processing low resource languages. If we are to extend these methods to encompass all languages we run into a problem: most languages are unwritten. Existing attempts encounter the “transcription bottleneck”, the fact that it is extremely onerous to transcribe audio in a language that has no established orthography. These talks describe two new ways to address the transcription bottleneck, by rethinking the tasks and the end products in the light of the capacities and motives of speakers, and the requirements of the speech and language processing pipeline. These talks will describe work in progress, based in a remote indigenous community in Australia.

Proposal for a CIFRE thesis with the company Ixiade

Ixiade proposes a CIFRE thesis as part of a collaboration with the GETALP team of the laboratoire d’Informatique de Grenoble.

Start of thesis: As soon as possible.
Location: the position is based in Grenoble.
The gross annual salary is 28 800€.

Description of the subject:

For more than 12 years, Ixiade’s experts have worked closely with teams of companies wishing to enhance and develop their strategic, technological and human assets. They work across the board as part of a participatory innovation process. The aim is to bring innovation to the teams through the transmission of best practices and “tailor-made” recommendations, in particular through upstream analysis of the potential meaning of a new concept or a new product or service.

On this last aspect, Ixiade’s expertises are based on the realization of semi-directive oral interviews, their written transcription and the detailed analysis of these transcriptions. This last step is particularly time-consuming and the objective of this thesis is the creation of tools based on the automatic processing of the natural language allowing the expert to more easily perceive the underlying opinion of the persons interviewed on the concept studied.

This thesis will thus consist in studying the extraction of ideas from textual documents, whether these ideas are linked to a particular field (for example in association with existing practices) or, for the most interesting part, to the expression of opinions that are more general and emotionally rich because they are linked to the profile of the interviewee.

The extraction can then be done in a:
– endogenous, based on information from a segmentation of the interview corpus.
– The extraction can then be done in a:
exogenous, from external resources such as, for example, the lexical resource JeuDemots (large lexical graph on the French language built from games/associations of ideas for about ten years) or from resources learned on a general corpus and/or from a related field.

The candidate will be encouraged to publish his/her progress at major conferences in the field in TALN (automatic natural language processing) (ACL, Interspeech).

The candidate may benefit from a thesis work that will start jointly on the same field of application but rather focus on the linguistic analysis of available corpora.

Profile of the candidate sought :
You have or have completed a Master 2 Research in Computer Science or in TALN (automatic natural language processing) and wish to prepare a CIFRE doctorate in a company in liaison with a research laboratory. You are passionate about information and communication technologies. You have training and experience in the study and/or development of automatic natural language processing. Knowledge in automatic learning would be a plus. You are interested in opinion analysis in oral productions and wish to discover the working methodologies of the innovation industry.

Please send a CV + cover letter + letter of recommendation to Isabelle Fournié (, Didier Schwab ( and Jérôme Goulian (

Ixiade :
Getalp :

Team seminar Pedro Chahuara on Thursday May 18 at 2pm

Online Mining of Web Publisher RTB Auctions for Revenue Optimization
In the online adversiment market there are two main actors: the publishers that offer a space for advertisement in their websites and the announcers who compite in an auction to show their advertisements in the available spaces. When a user accesses an internet site an auction starts for each ad space, the profile of the user is given to the announcers and they offer a bid to show an ad to that user. The publisher fixes a reserve price, the minimum value they accept to sell the space.
In this talk I will introduce a general setting for this ad market and I will present an engine to optimize the publisher revenue from second-price auctions, which are widely used to sell on-line ad spaces in a mechanism called real-time bidding. The engine is fed with a stream of auctions in a time-varying environment (non-stationary bid distributions, new items to sell, etc.) and it predicts in real time the optimal reserve price for each auction. This problem is crucial for web publishers, because setting an appropriate reserve price on each auction can increase significantly their revenue.
I consider here a realistic setting where the only available information consists of a user identifier and an ad placement identifier. Once the auction has taken place, we can observe censored outcomes : if the auction has been won (i.e the reserve price is smaller than the first bid), we observe the first bid and the closing price of the auction, otherwise we do not observe any bid value.
The proposed approach combines two key components: (i) a non-parametric regression model of auction revenue based on dynamic, time-weighted matrix factorization which implicitly builds adaptive users’ and placements’ profiles; (ii) a non-parametric model to estimate the revenue under censorship based on an on-line extension of the Aalen’s Additive Model.

Lorraine Goeuriot Seminar December 1, 2016

Lorraine Goeuriot, lecturer in the MRIM team, will give the following presentation on December 1st at 2pm in room 306.

Title : Medical Information Retrieval and its evaluation: an overview of CLEF eHealth evaluation task

Summary : In this talk, I will introduce my research activities in the field of medical information retrieval, and in particular its evaluation.
The use of the Web as source of health-related information is a wide-spread phenomena, and laypeople often have difficulties finding relevant documents. The goal of the CLEF eHealth evaluation challenge is to provide researchers with datasets to improve consumer health search. I will firstly introduce the task and the datasets built. Then I will describe some experiments and results obtained on this dataset.



Groupe d'Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole