Team seminars

The seminars are led by Solange Rossato and Didier Schwab

List of upcoming seminars

04/04/2018 at 3pm Steven Bird (306)

31/05/2018 Bruno Pouliquen (306)

Steven Bird

Professor, Univ. Charles Darwin, Australia

Sparse Transcription: Rethinking the Processing of Unwritten Languages

Steven Bird is researching new methods for documenting and revitalising the thousands of small languages still spoken in the world today. His career began with a BSc and MSc in computer science at Melbourne University, followed by a PhD in computational linguistics from Edinburgh University, completed in 1990. Since then he has worked at the Universities of Edinburgh, Pennsylvania, Melbourne, and Berkeley, and conducted fieldwork in Australia, West Africa, Melanesia, Amazonia, and Central Asia. He is co-author of a popular textbook in computational linguistics, and recently developed a new computer science curriculum for secondary students which has been adopted in Australian schools. The Aikuma app developed with his students took out the grand prize in the Open Source Software World Challenge.

Laurent Besacier

Professor at LIG, GETALP team

The challenge of discovering linguistic units from raw speech

In this seminar, I will present two collective scientific projects [1,2] that occupied me during the year 2017. What do they have in common?Discovering linguistic units from raw speech without any other supervision. Or almost…[1][2]

Marco Dinaralli – 22 mars 2018 à 15H15

LaTTiCe-CNRS UMR 8094 – staying at LIG-GETALP

Automatic speech comprehension and co-reference chain resolution.
In this seminar I will talk about the main areas of research on which I have worked: automatic speech understanding and co-reference chain resolution. I will describe the computer systems, especially artificial learning-based, that have been put in place to model these problems.
These systems are based on models that range from finite state probabilistic automata (FSA/FST) to neural networks, through conditional random fields (CRF), and hold the state of the art on certain tasks.

Emmanuel Morin – 20 mars 2018 à 9H30

Professor at the University of Nantes (LS2N – Laboratoire des Sciences du Numérique de Nantes)
Extraction of bilingual lexicons from comparable specialized corpora: the general language to the rescue of the specialized language
The extraction of bilingual lexicons from corpora was initially based on texts in translation correspondence (i.e. parallel corpora). However, despite the good results obtained, these corpora remain scarce resources, especially for specialized fields and for language pairs not involving English. In this context, bilingual lexical extraction research has looked at other corpora composed of texts sharing different characteristics such as domain, gender, period… without being in translation correspondence (i.e. comparable corpora). The extraction of bilingual lexicons from comparable specialized corpora is strongly constrained by the quantity of data that can be mobilized. To overcome this obstacle, one solution would be to associate external resources with specialized corpora. This solution, although intuitive, goes against the mainstream since many studies support the idea that adding off-domain documents to a specialized corpus decreases the quality of lexicons extracted. In this presentation we will show how general language corpora can complement specialized language corpora. We will present different ways of associating these data by exploiting distributional representations based on vector and neural models.

Olivier Kraif – 8 March 2018

Laboratory of Linguistics and Didactics of Foreign and Nursery Languages
Dependency analysis for automatic extraction of recurring patterns

“Patterns” are recurring constructions that may play a role in the textual organization and structuring of discourse. The motifs, as prefabricated constructions, are also characteristic of highly codified textual genres. The identification of these constructions can be useful in different types of TAL application: document classification, automatic translation, writing help, term search, corpus linguistics tools… After having clarified the linguistic concept, we will review different methods dedicated to the automatic identification of patterns: repeated segments or ngrams, item set patterns, recurring lexico-syntactic trees. We will detail the current research tracks concerning the use of syntax (analyses in dependencies) for the discovery and description of certain classes of patterns.

Moez Avili – February 8, 2018

Laboratoire d’Informatique d’Avignon
Reliability of voice comparison for forensic applications

In court proceedings, voice recordings are increasingly being presented as evidence. In general, a scientific expert is called upon to establish whether the extract of the voice in question was pronounced by a given suspect (prosecution hypothesis) or not (defence hypothesis). This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the Bayesian approach has become the new “golden standard” in forensic science. In this approach, the expert expresses the result of his analysis in the form of a likelihood ratio (LR). This report not only favours one of the hypotheses (“prosecution” or “defence”) but also provides the weight of this decision. Although the LR is theoretically sufficient to synthesize the result, in practice it is subject to certain limitations because of its estimation process. This is particularly true when automatic speaker recognition (ASpR) systems are used. These systems produce a score in all situations without taking into account the specific conditions of the case studied. Several factors are almost always ignored by the estimation process such as the quality and quantity of information in the two voice recordings, the consistency of the information between the two recordings, their phonetic content or the intrinsic characteristics of the speakers. All these factors call into question the notion of reliability of the comparison of voices in the judicial framework. In this thesis, we want to address this problematic within the framework of automatic systems (ASpR) on two main points.

The first is to establish a hierarchical scale of phonetic categories of speech sounds according to the amount of speaker-specific information they contain. This study shows the importance of phonetic content: It highlights interesting differences between phonemes and the strong influence of intra-speaker variability. These results were confirmed by a complementary study on oral vowels based on formantic parameters, independent of any speaker recognition system.

The second point is to implement an approach to predict the reliability of the LR from the two recordings of a voice comparison without using an ASpR. To this end, we defined a homogeneity measure (NHM) capable of estimating the amount of information and the homogeneity of this information between the two records considered. Our hypothesis thus defined is that homogeneity is directly correlated with the degree of reliability of the LR. The results obtained confirmed this hypothesis with an NHM measurement strongly correlated to the LR reliability measurement. Our work also revealed significant differences in NHM behaviour between target comparisons and fake comparisons.

Our work has shown that the “brute force” approach (based on a large number of comparisons) is not sufficient to ensure a good assessment of reliability in CVF. Indeed, certain variability factors can induce local system behaviours, linked to particular situations. For a better understanding of the FVC approach and/or an ASpR system, it is necessary to explore the behaviour of the system at as detailed a scale as possible (the devil hides in the details).

Paule-Annick Davoine – November 23, 2017

Professor at l’Université Grenoble Alpes, laboratoire Pactes
Cartography and geovisualization for the representation and analysis of spatialized data for digital humanities

More and more disciplines or research in the humanities and social sciences, literature and languages are interested in the spatial dimension of data or sources: in history for the representation of geo-historical data necessary to the understanding of the evolution of territories or phenomena impacting them; in literature for the cartography of places in novels, life stories of authors in linguistics to apprehend the spatial diffusion of languages or dialects; in geography for the reconstruction of trajectories and movements of individuals from stories or for the enhancement of ancient cartographic documents…. All these needs pose new challenges to mapping and geo-visualization, which must deal with semi-structured, multidimensional, multi-platform spatialized data defined by a diversity of observation scales, both geographical and temporal, and with varying levels of quality.
The objective of the communication is to present certain cartographic and geo visualization issues raised by the spatialized processing and representation of data from the field of digital humanities based on research projects conducted within the Steamer team.

Patrick Paroubek – 26 October 2017

Research Engineer CNRS (IR1)

Automatic Language Processing for the analysis of scientific publications

The theme will be addressed based on the analysis of TAL community publications based on the NLP4NLP corpus which covers 50 years of publications from major conferences and journals in the field of text and speech analysis and biomedical corpora (MIROR project). The TAL contributions addressed here will concern the analysis of trends and networks as well as the detection of plagiarism or “spin” (embellishment) in scientific publications.

Christian Boitet – October 5, 2017

Professor Emeritus at l’Université Grenoble Alpes, GETALP-LIG
MT summit seminar (

Maximiliano Duran – May 30, 2017

Peruvian Linguist
The unmarked time and suffixation at four levels in Quechua

Pedro Chahuara – 18 May 2017

Researcher at Xerox European Center (XRCE)
Online Mining of Web Publisher RTB Auctions for Revenue Optimization
In the online adversiment market there are two main actors: the publishers that offer a space for advertisement in their websites and the announcers who compite in an auction to show their advertisements in the available spaces. When a user accesses an internet site an auction starts for each ad space, the profile of the user is given to the announcers and they offer a bid to show an ad to that user. The publisher fixes a reserve price, the minimum value they accept to sell the space.
In this talk I will introduce a general setting for this ad market and I will present an engine to optimize the publisher revenue from second-price auctions, which are widely used to sell on-line ad spaces in a mechanism called real-time bidding. The engine is fed with a stream of auctions in a time-varying environment (non-stationary bid distributions, new items to sell, etc.) and it predicts in real time the optimal reserve price for each auction. This problem is crucial for web publishers, because setting an appropriate reserve price on each auction can increase significantly their revenue.
I consider here a realistic setting where the only available information consists of a user identifier and an ad placement identifier. Once the auction has taken place, we can observe censored outcomes : if the auction has been won (i.e the reserve price is smaller than the first bid), we observe the first bid and the closing price of the auction, otherwise we do not observe any bid value.
The proposed approach combines two key components: (i) a non-parametric regression model of auction revenue based on dynamic, time-weighted matrix factorization which implicitly builds adaptive users’ and placements’ profiles; (ii) a non-parametric model to estimate the revenue under censorship based on an on-line extension of the Aalen’s Additive Model.

Jean-Pierre Chevrot – March 2, 2017

Professor at l’Université Grenoble Alpes
Laboratoire de l’Informatique du Parallélisme, Institut rhône-alpin des systèmes complexes, ENS Lyon
Lidilem laboratory, Université Grenoble Alpes
Language acquisition and sociolinguistic uses: social, cognitive and network
The approximation of cognitive and social approaches is often presented as a desirable goal to better understand the language acquisition process (Hulstijn et al., 2014). However, the question remains how to translate this program into actual research practice.

Although cognitive and social approaches are based on different traditions, attempts to combine the two perspectives in language acquisition research may benefit similar undertakings in other fields, such as social cognition, cognitive sociology, cognitive sociolinguistics, social neurosciences, etc. An examination of these interdisciplinary attempts leads to the identification of three ways of combining the social and the cognitive: the social approach to cognition, the cognitive approach to social and the so-called complex individualism approach (Kaufmann and Clément, 2011; Chevrot, Drager & Foulkes, in preparation, Dupuy, 2004).

Of these options, only the latter does not favour either the social and collective level or the cognitive and individual level (Dupuy, 2004). Instead, it emphasizes the interaction and bi-directional causality between them. In this perspective, individuals with specific social and cognitive characteristics interact with each other within general social and cognitive constraints. Individual characteristics may change as a result of interactions between individuals and these changes may in turn change general constraints (Hruschka et al. 2009). In this context, the acquisition of language and its use can be considered as the results of reciprocal influences diffusing in a network of relationships.

We will present projects that can implement this framework, including the DyLNet project – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data (Nardy, 2017).
Chevrot, J.P., Drager, K. & Foulkes, P. (in preparation). Sociolinguistic Variation and Cognitive Science.

Dupuy, J.-P. (2004). Vers l’unité des sciences sociales autour de l’individualisme méthodologique complexe. Revue du MAUSS, 24(2), 310-328.

Hruschka, D. J., Christiansen, M. H., Blythe, R. A., Croft, W., Heggarty, P., Mufwene, S. S., Pierrehumbert, Janet B., Poplack, S. (2009). Building social cognitive models of language change. Trends in Cognitive Sciences, 13(11), 464–469.

Hulstijn, J. H., Young, R. F., Ortega, L., Bigelow, M., DeKeyser, R., Ellis, N. C., Lantolf, J. P., Mackey, A., Talmy, S. (2014). Bridging the Gap. Studies in Second Language Acquisition, 36(03), 361–421.

Kaufmann, L., & Clément, F. (2011). L’esprit des sociétés. Bilan et perspectives en sociologie cognitive. In L. Kaufmann & F. Clément, La sociologie cognitive, Ophrys (pp. 7–40).

Nardy (2017). DyLNet Project – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data []

Michael Zock – January 12, 2017

Research Director CNRS at the Laboratoire d’Informatique Fondamentale (LIF), TALEP group in Aix-Marseille University

If all roads lead to Rome, they are not all equal. The problem of lexical access in production

Everyone has already encountered the following problem: you are looking for a word (or the name of a person) that you know, without being able to access it in time. The work of psychologists has shown that people in this cognitive state know a great deal about the word being searched for (meaning, number of syllables, origin, etc.), and that the words with which they confuse it resemble it strangely (letter or its initial, syntactic category, semantic field, etc.). My (long-term) goal is to make a program that takes advantage of this state of affairs to help a speaker or writer (re)find the word he has on the tip of his tongue. To this end, I plan to add to an existing electronic dictionary an association index (collocations encountered in a large corpus). In other words, I propose to build a dictionary similar to that of human beings, which, in addition to conventional information (definition, written form, grammatical information) would contain links (associations), making it possible to navigate between ideas (concepts) and their expressions (words). Such a dictionary would therefore allow access to the information sought either by form (lexical: analysis), by meaning (concepts: production), or by both.
The objective of this presentation is to show how to build such a resource, how to use it, what are the difficulties of construction and what are the possibilities that such a resource offers.

Lorraine Goeuriot – December 1, 2016

Mistress of conferences at Univ. Grenoble Alpes in the MRIM team of the Laboratoire d’informatique de Grenoble

Medical Information Retrieval and its evaluation: an overview of CLEF eHealth evaluation task

In this talk, I will introduce my research activities in the field of medical information retrieval, and in particular its evaluation.
The use of the Web as source of health-related information is a wide-spread phenomena, and laypeople often have difficulties finding relevant documents. The goal of the CLEF eHealth evaluation challenge is to provide researchers with datasets to improve consumer health search. I will firstly introduce the task and the datasets built. Then I will describe some experiments and results obtained on this dataset.

Fabien Ringeval – October 20, 2016

Lecturer at the Univ. Grenoble Alpes in the GETALP team of the
Laboratoire d’informatique de Grenoble

Towards the automatic recognition of ecological emotions

Automatic emotion recognition technologies have gained increasing attention in the last decade at both the academic and industrial levels, since they have found many applications in fields as varied as health, education, video games, advertising, or social robotics. Although good performances are reported in the literature for acted emotions, the automatic recognition of spontaneous emotions, as expressed in everyday life, remains an unresolved challenge, since these emotions are subtle, and their expression, like their meaning, vary greatly according to many speaker parameters, such as age, and gender, but also personality, social role, language, and culture. In this presentation, I will describe the current methodologies in affective data acquisition and annotation, and present the latest advances for the automatic recognition of emotions from the speech signal.