Les séminaires ont lieu, sauf précision contraire, les jeudi à 14h au bâtiment IMAG.

Liste des prochains séminaires / Team Seminars

Saison 2024-2025

Alice Millour (Université Paris 8), October 30th, Room 306 bât. IMAG

Variation(s) : produire des ressources au service de l’évaluation en TAL

À travers plusieurs exemples, je discuterai de l’importance de la diversité des ressources d’évaluation pour l’appréciation de la variation et de la performance des systèmes en TAL.
Sur une tâche comme la reconnaissance des entités nommées en français, on observe chez la plupart des systèmes un bias de performance en fonction du « genre textuel » annoté. Si celui-ci semble être à l’origine de la variation de performance, il ne l’explique pas d’un point de vue linguistique. Comment la variation se manifeste-t-elle et impacte-t-elle concrètement les systèmes ? Nous proposons pour y répondre une étude de la variation basée sur sa modélisation sous forme de traits linguistique proposée par Douglas Biber.
Nous verrons par un second exemple, celui de la transcription automatique du breton, la problématique sous l’angle du coût de production des ressources linguistiques. Quelle stratégie adopter dans le choix des ressources à (faire) produire dans le cadre de langues peu dotées et présentant une forte variation ?

Ikram Belmadani (LIS), June 18th, Room 306 bât. IMAG

Adaptation des Connaissances Médicales aux Grands Modèles de Langue pour le Question-Answering : Stratégies et analyse comparative.

L’adaptation des grands modèles de langue (LLMs) à des domaines spécialisés, comme le domaine médical en français, soulève des questions méthodologiques importantes. Alors que le pré-entraînement adaptatif (DAPT) est remis en question dans le contexte anglophone, une évaluation comparative de trois stratégies – pré-entraînement continu (CPT), affinage supervisé (SFT) et combinaison des deux (CPT suivi de SFT) – permet d’en analyser la pertinence en contexte francophone. Les résultats obtenus sur plusieurs jeux de données de Question-Answering biomédical, montrent que l’adaptation d’un modèle généraliste au domaine médical permet d’atteindre des gains significatifs, tandis que l’affinage de modèles déjà spécialisés produit des bénéfices plus limités. L’approche hybride CPT+SFT offre les meilleures performances globales, mais SFT seul reste compétitif tout en nécessitant moins de ressources. Ces résultats apportent des éléments concrets pour orienter l’adaptation de LLMs à des domaines spécialisés dans des langues peu dotées.

Emre Kiciman (Microsoft Research), June 5th, Online

A New Frontier at the Intersection of Causality and LLMs

Correct causal reasoning requires domain knowledge beyond observed data. Consequently, the first step to correctly frame and answer cause-and-effect questions in medicine, science, law, and engineering requires working closely with domain experts and capturing their (human) understanding of system dynamics and mechanisms. This is a labor-intensive practice, limited by expert availability, and a significant bottleneck to widespread application of causal methods.

In this talk, we will delve into the causal capabilities of large language models (LLMs), discussing recent studies and benchmarks of their ability to retrieve and apply causal knowledge, as well as the limitations of their causal reasoning capabilities. Most notably, LLMs present the first instance of general-purpose assistance for constructing causal arguments, including generating causal graphical models and identifying contextual information from natural language. This promises to reduce the necessary human effort and error in end-to-end causal inference and reasoning, broadening their practical usage. Ultimately, by capturing common sense and domain knowledge, we believe LLMs are a catalyst for a new frontier facilitating translation between real world scenarios and causal questions, and formal and data-driven methods to answer them.

Emmanuel Schang and William Havard (Laboratoire Ligérien de Linguistique, Orléans), June 5th, Amphi MJK

Dans le cadre du projet ANR CREAM, nous explorons les enjeux liés à la documentation des langues créoles, souvent marginalisées dans les ressources numériques. Nous présentons des expérimentations sur la reconnaissance automatique de la parole (ASR) pour le créole haïtien (kreyòl ayisyen), ainsi que des mesures de distance linguistique entre le morisien, le français et l’anglais, afin de mieux comprendre les dynamiques de proximité et de divergence entre ces langues.

Tsuneo Kato (Doshisha University, Japan), April 9th, Room 306, bât. IMAG

Retrieval-Augmented Generation based on Knowledge Graph (KG-RAG) for Providing Accurate Information to Dialogue Systems.

Advancements in large language models (LLMs) make it easier to develop interactive information systems. However, LLMs have a problem of generating hallucinations, especially for closed information of organizations and frequently renewed information. We prototyped Doshisha University’s interactive information system (text-chat system) in which we focused on the correctness of the information supplied by retrieval-augmented generation using knowledge graph (KG-RAG) and bilingualism. We built a knowledge graph (KG) on the Neo4j database and developed a chat system that used the KG and OpenAI API for GPTs. The KG enables three types of search, i.e., KG search, full-text search, and vector search, by registering text information and its embedding vector into the nodes. The chat system calls the API for GPTs four times: 1) preprocessing, i.e., translating questions in English to those in Japanese and converting colloquial Japanese expressions to formal Japanese texts; 2) conversion from natural language to Cypher queries for KG search; 3) response generation based on KG-RAG; and 4) additional suggestion of next question for continuing the dialogue. We further developed a smart responding mechanism to inaccurate user inputs that contain wrong proper nouns, etc. We quantitatively evaluated the system from two points of view: 1) general accuracy of information provided by the responses in two languages and 2) accuracies of responses to inaccurate inputs. The addition of the first LLM for preprocessing and the smart responding mechanism significantly improved the general accuracy and the accuracy of responses to inaccurate inputs, respectively.

Arnault Chatelain (CREST), February 21st, Room 306 bât. IMAG

Il n’y a pas de mauvaise publicité ? De l’effet des scandales sexuels sur le succès commercial des artistes

Cet article regarde si les consommateurs et consommatrices de musique se détournent des artistes à la suite d’accusations publiques de violences sexistes et sexuelles (VSS). A l’aide des données du marché de la musique enregistrée en France de 2006 à 2022, ainsi que d’une liste d’artistes accusés publiquement de VSS construite à l’aide d’expressions régulières, d’une détection d’entités nommées et validée manuellement, l’analyse se concentrera sur 4 études de cas. Les effets sur les ventes d’albums et les écoutes en ligne (streaming) seront discutés. La présentation s’attachera surtout à discuter d’une autre source de données dont l’analyse pourrait compléter le travail: les commentaires Youtube.

Angela Potochnik (University of Cincinnati), February 6th, Online

A presentation that will make us take a step back and reflect on the foundations of “Science”. While we practice science on a daily basis, we rarely pause to consider the principles that make it work. In particular, fields like machine learning and explainable AI often aim, explicitly or implicitly, to automate the discovery of scientific explanations. However, many of this approaches are built on our superficial understanding of what truly makes science work. This seminar will be an opportunity for all of us to reflect on these questions.

Benoit Favre (Aix-Marseille Université), January 16th, Room 306 bât. IMAG

What do multimodal LLMs learn?

Instruct-tuned language models have changed the way we interact with computers through language. A lot of work is now devoted to outlining the envelope of their capabilities and their limitations. In this talk, I will describe my tentative to tackle a subset of this problem in the context of multimodal learning. The first line of work I will present is devoted to understanding how vision-language models (VLM) represent fine-grained relationships between objects. Previously, we had shown that VLMs trained on image-text matching fail to extract reliable relationships between objects. I will report on an updated set of experiments to see how those capabilities have evolved in recent instruction-tuned multimodal models. The second line of work aims at better understanding the space of tasks that one may ask from a LLM, in order to select better training mixes and adequately evaluate them. I will evoke past work where we tried to decompose the task space into unitary skills from an introspective and benchmark availability perspective, and more recent work which tries to address this problem from a statistical point of view, allowing to build empirically a hierarchy of task solving capabilities. Finally, I will describe my ongoing project on building foundation models of multimodal conversation. Drawing from video generation techniques, this project aims at assessing the representation capabilities of models able to continue a human-human conversation, both in the audio and video modality. I will describe a framework for studying those capabilities through probing underlying representations according to various tasks. This project has applications in studying human conversation behavior, as well as simulating this behavior in the context of robotic interactions.

Amelie Rochet-Capellan (Gipsa-Lab), December 12th, Room 306 bât. IMAG

How to Observe and Describe Communication in Individuals with Severe to Profound Disabilities?

Individuals with severe to profound disabilities (SPD) represent a particularly vulnerable and often marginalized minority. Long considered non-communicative, recent advancements in legislation concerning the rights of people with disabilities—especially the right to communication and inclusion—compel us to reconsider their communicative potential. However, due to their historical exclusion from communication research and (re)educational programs, we lack methods and models to describe, understand, and assess their communication.
Over the past two decades, clinical research has begun addressing this gap, but it often progresses independently of fundamental research on communication, rarely integrating its theoretical and methodological advancements. Moreover, the growing development and use of augmentative and alternative communication (AAC) tools, such as communication binders or software with voice synthesis using pictogram boards, suggest that individuals with SPD might achieve symbolic communication far more sophisticated than previously thought—or than is still commonly believed. To better understand these capacities, it is essential to engage in meticulous observation and reconsider existing concepts and methods, or even invent new ones.
In this context, the ParticipAACtion research project proposes an approach at the intersection of fundamental research, clinical research, and the expertise of professional or family caregivers. The objective is to integrate these perspectives to establish inclusive standards for describing behaviors and to create a “dictionary” of communicative behaviors of individuals with SPD, including their use of AAC. Our methodology is based on detailed observation and manual annotations of individuals’ behaviors in interactive situations. However, this approach faces significant time constraints, highlighting the need to develop automated methods, potentially leveraging artificial intelligence (AI), to accelerate and enhance these analyses.

Saskia Laora Schröer (University of Liechtenstein), October 10th, Room 306 bât. IMAG

The cyber threat landscape is constantly evolving, with new tools and technological advancements changing how adversaries conduct cyberattacks. While Artificial Intelligence (AI) has the potential to offer advantages to our society, adversaries are not asleep at the wheel. More and more evidence appears that AI is also used for offensive purposes. In this talk, we explore the innovation potential of hackers, focusing on the specific case of (i) how hackers can leverage AI and (ii) how we (as researchers and defenders) can gain insights into the operations of cybercriminals. For the latter, we discuss the purpose and context of cyber threat intelligence extraction from various data sources, such as social network data, underground forums (from the clearnet and darknet), cyber-crime-related chat channels, and darknet websites. In this regard, I will deep dive into the role of NLP, and specifically NLU, for cyber threat intelligence extraction, as well as discuss the challenges and open problems in the context of cybersecurity.

Saison 2023-2024

Wei Zhao (University of Aberdeen, Scotland & University of Heidelberg, Germany) le mercredi 19 juin 2024 à 16H, salle 406 bât. IMAG et en visio

A Pathway from Language Change to Computational Lexicography

Language change over time has been researched for decades by historical linguists and, more recently, by computational linguists. Computational modeling of language change enables cross-language comparison of multifaceted changes at a large scale. In this talk, I will start by presenting a method to capture syntactic changes and investigate how similar these changes are in German and English over the past hundred years. Additionally, I will present a method to capture semantic changes, particularly to detect the gained or lost meanings with low frequency over time. I will also showcase the use of our method as a visualization tool to compare cross-language semantic changes in German and English. Lastly, I will present a pathway from computational modeling of language change to computational lexicography by comparing detected word meanings with dictionary sense inventories to identify unrecorded meanings in dictionaries.

Kristina Gligorić (Stanford University) le jeudi 2 mai 2024 à 16h30, en visio

Large language models for enabling constructive online conversations

NLP systems promise to disrupt society through applications in high-stakes social domains. However, current evaluation and development focus on tasks that are not grounded in specific societal implications, which can lead to societal harm. There is a need to evaluate and mitigate the societal harms and, in doing so, bridge the gap between the realities of application and how models are currently developed.
In this talk, I will present recent work addressing these issues in the domain of online content moderation. In the first part, I will discuss online content moderation to enable constructive conversations about race. Content moderation practices on social media risk silencing the voices of historically marginalized groups. We find that both the most recent models and humans disproportionately flag posts in which users share personal experiences of racism. Not only does this censorship hinder the potential of social media to give voice to marginalized communities, but we also find that witnessing such censorship exacerbates feelings of isolation. We offer a path to reduce censorship through a psychologically informed reframing of moderation guidelines. These findings reveal how automated content moderation practices can help or hinder this effort in an increasingly diverse nation where online interactions are commonplace.
In the second part, I will discuss how identified biases in models can be traced to the use-mention distinction, which is the difference between the use of words to convey a speaker’s intent and mention of words for quoting what someone said or pointing out properties of a word. Computationally modeling the use-mention distinction is crucial for enabling counterspeech to hate and misinformation. Counterspeech that refutes problematic content mentions harmful language but is not harmful itself. We show that even recent language models fail at distinguishing use from mention and that this failure propagates to downstream tasks. We introduce prompting mitigations that teach the use-mention distinction and show that they reduce these errors.
Finally, I will discuss the big picture and other recent efforts to address these issues in different domains beyond content moderation, including education, emotional support, and public discourse about AI. I will reflect on how, by doing so, we can minimize the harms and develop and apply NLP systems for social good.

Karën Fort (Sorbonne Université) le jeudi 7 mars 2024 à 10h, visio + salle 406

How Ethics can help us build better systems: Towards a comprehensive evaluation of stereotyped biases in LLMs

In the past 10 years, ethics has become a major topic in NLP, both in terms of community management, with ethics committees for most international conferences, and in research work, with specialized tracks in the most renowned conferences and a very active field of research. I’ll present here my latest work on ethics in NLP, focusing on the evaluation of stereotyped biases in LLMs. I’ll show why we need to improve our models and how ethics can help us do so. I’ll present new resources and tools that we developed to evaluate LLMs, especially adapted for languages other than English. I encourage all the attendees to come with their concerns and questions, so that this time is devoted to open discussion.

Andon Tchechmedjiev (IMT Mines Alès) le jeudi 15 février 2024 à 14h, visio + salle 306

Les graphes de connaissances scientifiques dans tous leurs états: de la construction à la vérification de factualité

Les graphes de connaissances scientifiques (GCS), c’est-à-dire des graphes de connaissances représentant des articles scientifiques, leurs métadonnées, leur contenu, ainsi des connaissances structurées extraites de ce contenu, connaissent un incroyable essor. En France, les initiatives à la fois techniques (graphe de connaissances HAL) et scientifiques (nombreux projets en informatique médicale, Covid-on-the-web, ISSA) se multiplient, suivant de près une tendance déjà très ancrée au niveau Européen. Les GCS peuvent tout autant servir les institutions pour évaluer et analyser la production de la recherche, les décideurs à orienter les politiques publiques de recherche et les scientifiques à découvrir des tendances émergentes et à mener des revues systématiques d’envergure. Par ailleurs, devant l’émergence de modèles d’IA générative, et de techniques de plus en plus sophistiquées de falsification, s’ajoutant à une quantité déjà très importante de publications frauduleuses, l’utilisation des GSC conjointement aux techniques de vérification de faits computationnelles, est un outil clé dans la lutte contre ce fléau.
Cette présentation présentera d’abord les bases des graphes de connaissances et du web sémantique, suivies d’une présentation des spécificités des GSC, des techniques permettant de les construire et d’une revue d’initiatives et de cas d’usage existants. Ensuite nous nous intéresseront spécifiquement à l’utilisation potentielle des GC (et particulièrement les GCS) et de modèles de vérification computationnelle pour l’identification d’informations erronées ou contrefactuelles.

Maxime Peyrard (LIG, UGA) le jeudi 7 décembre 2023, à 14h, visio + salle 306

Towards Understanding, Interpreting, and Using Language Models

As a new GetAlp member, I’ll use this presentation to provide a shallow but broad overview of my recent research interests: (i) causal machine learning, a sub-field of ML aiming to understand and rectify issues arising from traditional statistical approaches; (ii) LLM interpretability, which I will argue should be rooted in causal reasoning, the rigorous framework to answer “why” questions; (iii) introducing Thought-Decoding, an emerging paradigm viewing LLMs as producers of semantically meaningful messages (thoughts), where the goal is to use LLMs in interactions with other AI systems, tools, or humans in control flows to iteratively refine and improve the thoughts until a robust answer is found.

Nicolas Ballier (Université Paris Cité) le jeudi 16 novembre 2023, à 14h, visio + salle 306

Using Whisper for translation and transcription: an XAI roadmap for audio multilingual LLMs

Whisper (Radford, 2023) is a multilingual audio Large Language Model (LLM) developed by OpenAI and trained with 680,000 hours of labeled audio data to do transcription and translation into English (as to the free models released on HuggingFace) for 96 languages. This talk will present results and current research based on a customised version of Whisper’s C++ implementation (Gerganov, 2022) developed at Université Paris Cité (Younès 2023). We were able to probe the LLM outputs for French, English and Persian and investigate Whisper’s dictionary of subtokens.
First, reporting on a paper presented at SummitMT, I will present the results of a comparison of the six models designed to translate from French and Persian into English (Ballier et al, 2023).
Second I will show that the transcriptions in English produced by the multilingual medium model are more robust to non-native speech variability than the tiny model transcriptions. In turn, the tiny model transcriptions provide plausible correlates of non-native pronunciation errors (Ballier et al., submitted). The audio LLM might be used as a plausible simulation of misunderstanding scenarios with native speakers. Using the probability of the LLM prediction, we can try to score non-native productions (Ballier et al., in prep).
Then, presenting ongoing experiments with the translation and transcription of Persian data, I will showcase some architectural biases of the multilingual model: hallucinations can be observed in the transcriptions of Persian that can be explained by the absence of some graphemes of Persian in the vocabulary of subtokens. I will also present some of the Americancentered biases of the Whisper models for English and discuss some of the consequences for educational uses of these models).
More generally, I will discuss the reverse engineering strategy that we have adopted. With great help from data scientists and phoneticians, we have created scripts to map back the LLM predictions onto the speech signal. We aim to contribute to XAI (Explainable Artificial Intelligence) by trying to match the out of vocabulary subtokens produced by Whisper to discrete prosodic events. We also try to model how some of the LLM transcriptions may or may not correlate with human perception tests.

Saison 2022-2023

Rachel Bawden (Inria, Paris) le mardi 28 mars 2023, à 10h30, visio + salle 306

From Linguistic to Visual Context in Machine Translation

From the very beginning of natural language processing, one of the most important issues has been ambiguity, when words, phrases or sentences have multiple meanings. For Machine Translation (MT), ambiguity is an issue when those meanings result in different translations. Is French trombone a paperclip or a musical instrument? If the fans are out of order, is there a problem at the football (supporteurs) or a ventilation problem (ventilateurs). Traditionally, MT was carried out sentence by sentence, and such sentences could be lacking the context necessary to distinguish between correct and incorrect translations. Since then, contextual MT has been growing in popularity, with the inclusion of both linguistic and extra-linguistic context. New evaluation strategies have also been developed to target context-dependent phenomena, since the usual metrics are ill-adapted. In this talk, I will start by introducing contextual MT using linguistic context, including some of my earlier work carried out in evaluation and more recent work for the evaluation of the multilingual language model BLOOM. I will then describe recent work in the inclusion of visual context for disambiguation in MT, including a new test set and an adapted multimodal MT approach.

Rodrigo Wilkens (Université catholique de Louvain) le jeudi 23 janvier à 14h, visio + salle 306

Améliorer la précision des outils de lisibilité : présentation de FABRA et Dmesure pour l’annotation et l’évaluation de la lisibilité des textes en français

L’évaluation des compétences de lecture est une question abordée dans le domaine de la lisibilité. Les formules de lisibilité se sont développées au cours du 20ème siècle, passant d’un calcul manuel à une automatisation par des techniques de traitement automatique du langage naturel (TAL) et d’apprentissage automatique. Ce changement de paradigme a amélioré la précision des modèles de lisibilité, facilitant les applications dans d’autres domaines tels que l’évaluation de la production écrite et la simplification automatique des textes. Plusieurs outils sont disponibles pour l’anglais afin d’aider les chercheurs à calculer automatiquement les caractéristiques textuelles. Cependant, les outils similaires pour la langue française sont rares. Cet atelier présentera FABRA, un outil pour l’annotation de textes, qui fournit une solution pour ce défi en obtenant et en calculant automatiquement plus de 5000 caractéristiques textuelles pertinentes liées à la lisibilité. Ensuite, un outil (Dmesure) qui allie la connaissance linguistique et le deep learning pour soutenir et évaluer la lisibilité d’un texte écrit sera présenté et démontré. L’exposé explorera également le défi que représente la combinaison de la connaissance linguistique et de l’apprentissage profond, en présentant de nouvelles façons d’améliorer la précision des outils de lisibilité.

Martin Rodrigue A. Ongolo & Yannick Yomie Nzeuhang (IDASCO, université de Yaoundé 1 (UY1)) le jeudi 12 janvier à 15h, visio + salle 306

Martin Rodrigue A. Ongolo : Plongement multilingue basé sur le mapping monolingue pour la traduction automatique neuronale des langues peu dotées

Low-resource languages pose a performance problem for neural machine translation because they have scarce data resources. In this work, we propose to use multilingual embedding based on monolingual mapping as a method of representing words as input to a neural automatic translator. This approach is then compared to word representation approaches in the monolingual context used in the literature for neural machine translation. The results we obtain with multilingual embedding on a dataset of 7942 French-Ewondo parallel sentences are promising because they are better than those obtained with the representation approaches, in particular, the Onehot and Word embedding methods used so far. Also, we highlight the phenomenon of agglutination which is a characteristic of several African languages and more particularly Cameroonian and we try to bring as a solution to this phenomenon a grouping of words before the tokenization stage.

Yannick Yomie Nzeuhang : Acoustic modeling by self-supervised learning for low ressources languages

Acoustic modeling is the task of determining a mapping function between an audio signal and the corresponding linguistic unit transcription. Initially a distinct part of a global speech recognition architecture (Hidden Markov Model), the acoustic model has been merged into the recent neural network-based architectures. We present the main lines of development of this research intending to find an exploitable path for low resources languages.

Nicolas Ballier (Université Paris Cité) & Laurianne Sitbon (Queensland University of Technology (QUT)) le jeudi 8 décembre à 13h, visio + salle 306

Nicolas Ballier : Travaux en cours sur la traduction neuronale

Dans un premier temps, je présenterai quelques résultats du projet Neuroviz sur l’analyse des flux d’informations et ses effets pour les biais de genre dans la traduction français/anglais, en particulier liés à la phase de sous-tokenisation.

Puis, je présenterai les premiers résultats de travaux en cours sur la traduction en anglais des dislocations du français. Je montrerai que les systèmes continuent à dupliquer les sujets dans les traductions en anglais dès que l’on s’écarte de la forme canonique de la dislocation à gauche très présente dans les corpus (type “X, c’est …”), en particulier dans le cas des dislocations multiples. J’analyserai des traductions d’exemples authentiques tirés du Corpus de Français Parlé Parisien (CFPP) et montrerai les limites des systèmes testés (Deepl, Google translate, MBART-50). Je détaillerai une analyse en cours qui vise à l’opérationnalisation de la distance à l’élément disloqué. Si la variable à prédire est le succès de la traduction (au moins en termes d’adéquation syntaxique, grammaticalité des traductions produites), je questionnerai les variables susceptibles d’en rendre compte, qu’il s’agisse de variables qualitatives plus classiques en linguistique de corpus ou du paramètre de la distance entre l’élément disloqué et le prédicat de la principale. Je questionnerai la représentation de la distance en structure de surface pour le linguiste (nombre de tokens, nombre de caractères) mais également dans l’input réel de la machine (nombre de sous-tokens après le prétraitement en byte pair encoding).

Enfin, j’exposerai les idées sous-jacentes au projet MAKE-NMTViz en explicitant les attentes d’une telle démarche : visualiser pour comprendre. Je présenterai des systèmes de visualisation existants ou ayant existé et esquisserai les souhaits de visualisation pour les différentes étapes de la traduction neuronale.

Travaux en collaboration avec Behnoosh Namdarzareh, Guillaume Wisniewski, Jean-Baptiste Yunès, François Yvon, Maria Zimina-Poirot et Lichao Zhu.

Laurianne Sitbon : Co-designing intelligent systems that can enrich inclusive pictorial communication

Bridging the gap between advances in artificial intelligence and assistive technologies, I will discuss why and how we can co-design pictorial visual communication applications with people with intellectual disability or autism and community members. There is little research that explores the opportunities that multi-modal conversational systems present for people who are not most comfortable communicating linguistically, such as people with intellectual disability. Yet, access to online materials offers unlimited resources to support communication and connection with others. I will present my team’s research on how computers can facilitate rich pictorial communication environments, thereby supporting self-expression and inclusion.

Salima Mdhaffar (LIA, Avignon) le jeudi 1er décembre à 15h, visio + salle 406

End-to-end model for named entity recognition from speech without paired training data

Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this work, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available. Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text. These representations mimic the hidden representations that could be generated inside an end-to-end automatic speech recognition (ASR) model by processing a speech signal. An SLU neural module is then trained using these representations as input and the annotated text as output. Last, the SLU module replaces the top layers of the ASR model to achieve the construction of the end-to-end model. Our experiments on named entity recognition, carried out on the QUAERO corpus, show that this approach is very promising, getting better results than a comparable cascade approach or than the use of synthetic voices.

Amira Barhoumi (LIG, SIGMA) & Martin Lentschat (LIG, SIGMA) le 15 septembre à 14h, visio + salle 306

Amira Barhoumi : Sentiment analysis: document representation, machine learning methods and corpora

Sentiment Analysis (SA) is a growing field in both industry and academia. It consists of determining the polarity (or sentiment) of a reviewer towards an entity or specific entity aspects in reviews. I will present an overview of the SA field. Firstly, I will show a specific protocol to fix the length of reviews. Secondly, I will present some deep learning methods and highlight how to take into account the specificities of the language in which reviews are written (Arabic language as a use case). Then, I will discuss some limitations of available corpora. Finally, I will present how to explore citation polarities in scientific papers within the framework of the NanoBubbles project.

Martin Lentschat : n-Ary relations extraction and claims in scientific documents

The automatic extraction of knowledge in scientific documents is an important step that could help in the cross-checking, diffusion and exploitation of research data. However, this raises a series of questions related to the language used in each domain, the targeting of specific information, and the representation of knowledge. I will present the challenges specific to those tasks in experimental fields, and a method that uses semantic and lexical resources to extract and represent knowledge related to packaging permeability as n-Ary relations. I will then discuss how it can be transferred to a different domain and my future contributions to the NanoBubbles project.

Saison 2021-2022

Eric Le Ferrand (Northern Institute, Charles Darwin University, Australia / LIG, Grenoble) le 16 juin, visio + salle 406

A consistent theme in recent NLP research has been doing more with less. In the realm of research in the low-resource languages spectrum, it is popular to describe new pipelines to solve a wide variety of tasks (POS tagging, ASR pipeline, translation etc.). Indigenous communities around the world are not homogenous. They have different needs, priorities, and challenges that do not always follow the one-size-fits-all paradigm. In this work, I will present my journey as NLP scientist working at the intersection of speech processing and Indigenous research to enable a community-based language documentation framework. .

Steven Bird (Northern Institute, Charles Darwin University, Australia) le 2 juin, 13h30, visio + salle de réunion rez-de-chaussée de l’IMAG)

Local languages, third spaces, and other high-resource scenarios

How can language technology address the diverse situations of the world’s languages? In one view, languages exist on a resource continuum and the challenge is to scale existing solutions, bringing under-resourced languages into the high-resource world. In another view, presented here, the world’s language ecology includes standardised languages, local languages, and contact languages. These are often subsumed under the label of `under-resourced languages’ even though they have distinct functions and prospects.

I explore this position and propose some ecologically-aware language technology agendas.

Philip Scales (LIG, Grenoble) le 12 mai, 14h, visio + salle 306

Adaptive Social Navigation for Robots operating in Human Environments in real ecological use-cases

Companies and research groups are increasingly aiming to deploy mobile robots to perform various navigation tasks in environments shared with humans. Various social norms have been implemented into social navigation algorithms, but the evaluation of these methods typically studies whether the robot’s navigation is “natural” and “acceptable”, or “uncomfortable”.

Beyond respecting social norms, humans also use their movement and positioning as a means of communication and interaction, often in combination with other communication channels such as speech. Our first goal is to study how the manner in which a robot moves impacts how people perceive it and interact with it. We take inspiration from the study of vocal prosody, where the (often subtle) manner in which a word or sentence is spoken is a communication signal. Our second goal is to use this understanding of what we call “movement prosody” in order to design a social navigation architecture that can alter the manner in which a navigation task is performed according to how the robot should be perceived.

To achieve both of these goals, we propose an iterative design process, alternating between Human-Robot Interaction experiments and Social Navigation Algorithm design. As a first step in our work, we designed an adaptive social navigation architecture that offers a flexible formulation to control all of the robot’s motion parameters. Secondly we conducted two perception experiments to characterize the impact of motion and appearance parameters on people’s perceptions of a robot. Based on the preliminary results of these experiments, we extracted the most significant parameters which will be controlled by our navigation architecture. Two further experiments will be conducted in order to observe how people react to the robot when it uses our algorithm to navigate with various parameter sets.

Jean-François Bonastre (LIA, Avignon) le 17 mars, 11h, visio + salle 306

The Binary-Attribute Likelihood Ratio estimation (BA-LR) approach

This seminar presents BA-LR, a new approach in automatic voice comparison developed in the framework of the LIAvignon chair and Imen Benamor’s thesis work. This approach is dedicated to explicabilty and interpretability aspects, contrarily to classical approaches more focused on “accuracy”. These aspects take a particular importance in the field of voice comparison because this field raises strong societal and ethical issues, mainly related to forensic applications.

The first part of the seminar defines automatic voice comparison and proposes a brief introduction on what means explicability and interpretability in AI. Voice comparison is used here as a a case study. The second part of the seminar is devoted to the BA-LR approach.While classical voice comparison approaches produce a single value, the LR, without further explanation, our approach proposes to see the output of the system as the composition of a set of factors, related to the voice attributes. Moreover, the behavior of each attribute is described by three easily understood parameters. Finally, in addition to building a voice comparison system that is more explicit and understandable than standard solutions, our approach allows us to stay within the main paradigm of voice comparison, the likelihood ratio, which is known to be the “correct and logical” paradigm for forensic applications. Next the limits of the current implementation and the remaining steps to be taken for a fully transparent approach, where each piece of information and decision is defined precisely and understandable by all, is discussed. Finally, a conclusion is proposed, keeping in mind foransic application of voice comparison.

Aurélien Lamercerie (LIG, Grenoble) & Sylviane Chappuy (LIG, Grenoble) le 20 janvier à 16H en visio

Extraction de contenus sémantiques pour la vérification d’exigences systèmes
Le projet UNseL, financé par la DGA de 2019 à 2021, vise à fournir des outils pour accompagner la spécification de “systèmes de systèmes” (par exemple, un système de communications sol-air pour un aéroport ou un système de freinage d’urgence). Dans ce cadre, nos travaux proposent l’application d’une méthode d’extraction de contenu sémantique dans un contexte industriel, avec pour objectif la vérification automatique d’exigences systèmes rédigées en langue naturelle non contrainte. L’étape d’extraction utilise une analyse par transduction sémantique, implémentée en s’appuyant sur les standards du Web Sémantique du W3C. Elle part d’une représentation sémantique des textes, exprimée sous forme de graphes UNL (Universal Networking Language) à “sens garanti” (obtenu grâce à une étape de désambiguïsation interactive), qui produit d’abord une structure semi-formelle et indépendante de la langue source. Les outils développés construisent ensuite automatiquement une ontologie OWL à partir des spécifications du système, exprimées par des énoncés non contraints. Finalement, une vérification automatique des exigences est réalisée à l’aide de règles SPARQL génériques et de raisonneurs logiques. Une mise en pratique a été réalisée sur des exigences extraites d’une spécification réelle.

Dr. Jerome R. Bellegarda (Apple, Cupertino, Californie) le 01 décembre à 15H dans l’amphithéâtre de la maison Jean Kuntzmann

Input Intelligence on Mobile Devices
Over the past decade, the confluence of sophisticated algorithms and tools, computational infrastructure, and data science has fueled a machine learning revolution across multiple fields, including speech and handwriting recognition, natural language processing, computer vision, social network filtering, and machine translation. Ensuing advances are changing the way we interact with technology in our daily lives. This is particularly salient when it comes to user input on mobile devices, be it speech, handwriting, touch, keyboard, or camera input. Increased input intelligence boosts device responsiveness across languages, improving not only basic abilities like tokenization, named entity recognition and part-of-speech tagging, but also more advanced capabilities like statistical language modeling and question answering. In this talk, I will give selected examples of what we are doing at Apple to impart input intelligence to mobile devices, with two overarching themes as sub-text: (i) enhancing interaction experience through machine learning, and (ii) transforming users’ digital lives without sacrificing their privacy.

Dr. Jerome R. Bellegarda is Apple Distinguished Scientist in Intelligent System Experience at Apple Inc., Cupertino, California, which he joined in 1994. Prior to that, he was a Research Staff Member at the IBM T.J. Watson Center, Yorktown Heights, New York. He received the Ph.D. degree in Electrical Engineering from the University of Rochester, Rochester, New York, in 1987. Among his diverse contributions to speech and language advances over the years, he pioneered the use of tied mixtures in acoustic modeling and latent semantics in language modeling. In addition, he was instrumental to the due diligence process leading to Apple’s acquisition of Siri personal assistant technology and its integration into the Apple ecosystem. His general interests span machine learning applications, statistical modeling algorithms, natural language processing, man-machine communication, multiple input/output modalities, and multimedia knowledge management. In these areas he has written close to 200 publications, and holds over 100 U.S. and foreign patents. He has served on many international scientific committees, review panels, and advisory boards. In particular, he has worked as Expert Advisor on speech and language technologies for both the U.S. National Science Foundation and the European Commission, served on the IEEE Signal Processing Society (SPS) Speech Technical Committee, was Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing, is currently an Editorial Board member for Speech Communication, and will be a 2022 IEEE SPS Distinguished Industry Speaker. He is a Fellow of both IEEE and ISCA (International Speech Communication Association).

Caroline Rossi (ILCEA4, Grenoble) le 21 octobre à 13h30 en salle 406

ExTra: Expliquer la traduction automatique neuronale (TAN) pour former des utilisateurs avertis
Caroline Rossi commencera par définir le type d’explication recherché. L’explicabilité des systèmes neuronaux est souvent définie en distinguant trois niveaux : les systèmes opaques (systèmes propriétaires, dont le fonctionnement n’est pas décrit), les systèmes interprétables (également dits transparents, ou pour lesquels le passage de données d’entrée à la sortie du système est bien décrit) et les systèmes explicables (Doran et al. 2017). C’est ce dernier niveau qui est visé dans le présent projet, où l’explicabilité est fonction de notre capacité à expliquer, sans nécessairement supposer que l’on puisse atteindre une certaine transparence quant au fonctionnement des systèmes de TAN. Elle abordera ensuite les différentes étapes du projet, qui vont du repérage d’erreurs dans ACCOLE au développement d’une solution logicielle liant le repérage des erreurs probables à une base de données d’explications sur ces erreurs.

Saison 2020-2021

Oana Goga (LIG, Grenoble) le 29 avril à 14h en ligne

Investigating paths towards safe online political advertising

In this presentation I will talk about our paper “Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook” published at The Web Conference 2020 and followup discussions with civil societies on how to regulate political advertising: https://epd.eu/wp-content/uploads/2020/09/joint-call-for-universal-ads-transparency.pdf.

Paper abstract: The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms.

Oana Goga is a tenured CNRS research scientist in the Laboratoire d’Informatique Grenoble (France) since October 2017. Prior to this, she was a postdoc at the Max Plank Institute for Software Systems and obtained a PhD in 2014 from Pierre et Marie Curie University in Paris. She is the recipient of a young researcher award from the French National Research Agency (ANR). Her research interests are in security and privacy issues that arise in online systems that have at their core user provided data. Her recent research received a Honorable Mention Award at The Web Conference in 2020, obtained the 2020 CNIL-Inria Award for Privacy Protection and was runner-up for the 2019 Caspar Bowden PET Award for outstanding research in privacy enhancing technologies and runner-up for the 2019 CNIL-Inria Award for Privacy Protection.

Didier Schwab (LIG, Grenoble) le 14 janvier à 14h, en ligne

FlauBERT : des modèles de langue contextualisés pré-entraînés pour le français
Les modèles de langue pré-entraînés sont désormais indispensables pour obtenir des résultats à l’état-de-l’art dans de nombreuses tâches du TALN. Tirant avantage de l’énorme quantité de textes bruts disponibles, ils permettent d’extraire des représentations continues des mots, contextualisées au niveau de la phrase. L’efficacité de ces représentations pour résoudre plusieurs tâches de TALN a été démontrée récemment pour l’anglais. Dans cette présentation, nous présentons FlauBERT, un ensemble de modèles appris sur un corpus français hétérogène et de taille importante. Des modèles de complexité différente ont été entraînés à l’aide du nouveau supercalculateur Jean Zay du CNRS. Nous évaluons nos modèles de langue sur diverses tâches en français (classification de textes, paraphrase, inférence en langage naturel, analyse syntaxique, désambiguïsation automatique) et montrons qu’ils surpassent souvent les autres approches sur le référentiel d’évaluation FLUE également présenté ici.

Saison 2019-2020

Michael Ustaszewski (University of Innsbruck, Austria) le 20 février à 15H en salle 206

TransBank – A general-purpose translation repository The field of translation is no exception when it comes to the increasing importance of data collection and data-driven technologies, be it from the perspective of professional translation services or the scientific inquiry into translation. However, academia and industry usually have different data needs. Consequently, there is no data repository that suits translation industry and academia alike.
The corpus building initiative “TransBank – A Meta-Corpus for Translation Research” (www.transbank.info) addresses this gap by providing an infrastructure for the collaborative collection and sharing of open-access translation data. The main aim of TransBank is to build a general-purpose translation repository, making real-world translation data easily and freely accessible to anyone in the global translation community.
In this talk, the motivation behind the TransBank initiative as well as main features of the repository architecture will be outlined. In addition, major theoretical and technical challenges with regard to data collection, data processing and data provision will be discussed.

Michael Ustaszewski is assistant professor in the Department of Translation Studies at the University of Innsbruck, Austria. He is a trained translator and holds a doctoral degree in linguistics, obtained from the University of Innsbruck, as well as a master’s degree in computational linguistics, obtained from the University of the Basque Country. His research interests lie in the intersection of translation, corpus linguistics and language technology, with a focus on corpus-based translation studies. He is principal investigator of the corpus-building project “TransBank – A Meta-Corpus for Translation Research”.

Gérard Bailly (Gipsa-Lab) le 13 février à 14H

Characterizing and assessing the reading fluency of young readers
Gérard Bailly (1), Erika Rassat (1,2), Anne-Laure Piat-Marchand (1) & Marie-Line Bosse (2)
(1) GIPSA-Lab
(2) LPNC
According to the ministry of education, one young adult (16-25) over 10 has reading diffulties, 50% of them being illiterate. France was ranked 34/50 in the PIRLS 2016. A degradation of the reading performance of these 4th grade children was also noticed.
Mastering comprehensive reading is yet a prequisite for accessing other educational disciplines. But reading requires the maturationand synchronization of a complex cognitive network, involving vision, phonological, semantic and pragmatic processing together with the sensorimotor activation of phonetic representations.
We will present the framework we developped so far in order to characterize and assess the reading fluency of young readers.
This work is performed in the context of the e-FRAN Fluence project, where hundreds of primary schoolers are trained via computer-assisted technologies and monitored in a longitudinal study.

Mathieu Mangeot (LIG/GETALP) le 16 janvier à 14H

Génie lexico-sémantique multilingue contributif

L’informatique et le Web réinventent notre culture. Qu’en est-il de l’objet dictionnaire ? Est-il possible de s’affranchir de la copie électronique du format papier pour proposer de nouvelles structures de dictionnaire ? Peut-on réussir à créer de nouvelles ressources au moyen de projets collaboratifs et coopératifs ? Comment peut-on interagir avec d’autres outils en échangeant des données lexicales ? L’objectif de ce mémoire est de répondre à ces questions en concevant des environnements de gestion de bases lexicales accessibles au plus grand nombre pour consulter et éditer des données lexicales sur le Web et de les utiliser dans d’autres applications, principalement pour la compréhension de textes et l’apprentissage des langues.
iPoLex est un entrepôt de données lexicales qui s’appuie sur un devin de microstructures afin d’analyser et de produire une description fine. Celle-ci servira pour l’import des données dans Jibiki, une plateforme de gestion de ressources lexicales en ligne munie d’interfaces de consultation, d’édition et de programmation (API). Ces outils sont utilisés dans plusieurs projets de construction de dictionnaires tels que la base lexicale multilingue Papillon, le dictionnaire estonien français GDEF, le dictionnaire français-khmer MotÀMot, les dictionnaires DiLAF Langue Africaine-Français et le dictionnaire japonais-français jibiki.fr. Certains de ces projets réutilisent des données issues de lecture optique, de traitements de textes ou de conversion de structures XML. L’interface de programmation de Jibiki permet d’outiller l’accès aux textes à travers la lecture active et les environnements d’apprentissage des langues.

Maximin Couavoux (LIG/GETALP) le 12 décembre à 14H

Algorithms for Discontinuous Constituency Parsing

In this seminar, I will present an overview of my work on discontinuous constituency parsing.
Constituency trees represent the syntactic structure of a sentence by a recursive grouping of tokens into labelled phrases such as NP, PP, VP, etc.
However, standard constituency trees are not adequate to model a number of syntactic phenomena related to word order variations, e.g. long distance extractions.
In contrast, *discontinuous* constituency trees use crossing branches to model these phenomena.
Discontinuous structures correspond to derivations of mildly context-sensitive formal grammars, such as Linear Context-Free Rewriting Systems (LCFRS), and are therefore much more difficult to parse than projective constituency trees (CFG derivations).

In this talk, I will present efficient data-driven parsing algorithms for discontinuous constituency trees, as well as empirical parsing results on English and German discontinuous parsing.

I will also mention the release of wikiparse: wikipedia automatically annotated with constituency trees, dependency trees and morphological analyses (POS tags, morphological features) for 7 languages.

References:

– https://www.aclweb.org/anthology/E17-1118.pdf
– https://www.aclweb.org/anthology/Q19-1005.pdf
– https://www.aclweb.org/anthology/N19-1018.pdf
– https://hal.archives-ouvertes.fr/tel-02302563
– wikiparse:
– data download: http://www.llf.cnrs.fr/wikiparse/
– scripts: https://github.com/mcoavoux/wiki_parse

Raheel QADER (LIG/GETALP) le 21 novembre à 11H (attention horaire atypique)

Neural Natural Language Generation with Limited Annotated Data

In Natural Language Generation (NLG), End-to-End models have recently gained a strong interest. Such models need a large amount of carefully annotated data to reach satisfactory performance. However, acquiring such datasets for every new NLG application is a tedious and time-consuming task. In the first part of my talk I will present our efforts on collecting a newly created and publicly available company dataset that has been collected from Wikipedia. The dataset consists of around 51K company descriptions that can be used for both concept-to-text and text-to-text generation tasks. We study the performance of several End-to-End models applied to generation of short company descriptions and discuss the challenge in evaluating models trained on such data. In the second part of the talk, I will present a semi-supervised deep learning scheme that can learn from non-annotated data and annotated data when available. It uses an NLG and a Natural Language Understanding (NLU) End-to-End models which are learned jointly to compensate for the lack of annotation. Our experiments on two benchmark datasets show that, with limited amount of annotated data, the method can achieve competitive results while not using any preprocessing or re-scoring tricks. I will also briefly talk about a PyTorch based sequence-to-sequence model that we developed for this project: https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/seq2seqpytorch

Emmanuelle Esperança-Rodier et Francis Brunet-Manquat (GETALP/LIG), le 17 octobre 2019 en salle 306 à 14H

ACCOLÉ : Annotation Collaborative d’erreurs de traduction pour COrpus aLignÉs

L’objectif initial qui a guidé le développement d’ACCOLÉ, est l’annotation manuelle des erreurs de traduction selon des critères linguistiques. L’idée sous-jacente est de pouvoir fournir à un utilisateur une aide dans le choix d’un système de TA à utiliser selon le contexte (compétences linguistiques et informatiques de l’utilisateur, connaissance du domaine du document source à traduire et la tâche pour laquelle il a besoin de traduire le document source.) Pour ce faire, ACCOLÉ doit permettre de détecter quels sont les phénomènes linguistiques qui ne sont pas traités correctement par le système de TA étudié. Nous proposons sur la même plateforme une palette de services permettant de répondre aux besoins d’analyse d’erreurs de traduction. Ainsi, les principales fonctionnalités de la plateforme ACCOLÉ sont la gestion simplifiée des corpus, des typologies d’erreurs, des annotateurs, etc. ; l’annotation d’erreurs ; la collaboration et/ou supervision lors de l’annotation ; la recherche de modèles d’erreurs (type d’erreurs dans un premier temps, patrons morphosyntaxiques ultérieurement) dans les annotations. La tâche d’analyse d’erreurs étant déjà fastidieuse, il est important que les personnes la réalisant aient un accès simple à l’outil ainsi qu’au corpus qu’ils souhaitent analyser. La plateforme est donc disponible en ligne (http://lig-accole.imag.fr) sur un navigateur et ne nécessite aucune installation spécifique.

Fusa Katada
Faculty of Science and Engineering
Waseda University, Tokyo, Japan le 10 octobre 2019 à 14H en salle 306

The Realm of Unknown Oral Languages and Its Interface with Information Technology: A Report from Mindanao

The General Conference of UNESCO (Paris, 1999) approved February 21 as International Mother Language Day. The idea is to promote linguistic and cultural diversity and multilingualism in all aspects of public life, particularly in education. The focus here is on mother-tongue-based education for all children in the world. Subsequently, UNESCO Ad Hoc Expert Group on Endangered Languages (2003) includes “materials for language education and literacy” as one of the nine suggested factors to assess the vitality of language. Several issues are raised here with respect to the notions of ‘language of education’, ‘orality vs. literacy’, ‘literacy and information technology’, among others:
Why is mother-tongue-based education important?
What challenges do multilingual communities face?
What changes has the globalized digital age brought to languages in oral tradition?
How foreign is it the currently used script for indigenous languages?
What can information technology do to languages that once had their own script?
In this talk, I report on my fieldwork in extraordinarily multilingual communities in Mindanao, the Philippines, and respond to the above addressed issues. In particular, I clarify why mother-tongue-based education is important for children’s sustainable development of conceptual thinking, and introduce a challenging enterprise to revive the understudied minority languages with the lost ancient script called Baybayin, which is anticipated as the script of their own. (This research was made possible by my research collaborators in Mindanao since 2014.)

Hady Elsahar, Naver labs le 26 septembre 2019 en salle 306 à 14H

Enabling Dynamic Interactions between Natural Language and Structured Knowledge Bases

The complementary nature between Natural Language and Structured Knowledge Bases has inspired a lot of applications that involve interactions in between, such as Information Extraction, Question Answering and Natural Language Generation.
Natural language processing models responsible for those interactions are usually designed in a static way, with the abundance of continuously published new information on the web they require a tedious process of maintenance and adaptation.
In this direction, I’ll present several contributions from My PhD. thesis to tackle this issue. Firstly, T-REx an architecture for automatically generating alignment datasets between natural language text and structured knowledge bases, alongside with the largest available dataset of the same kind.
Secondly, I’ll present an encoder-decoder neural network architecture for question generation from knowledge base triples in a zero-shot setup; this model occupied with part-of-speech tag copy actions has proven its capacity of generating questions for unseen relation and entity types during test time, and hence has a large potential of being used in data-augmentation techniques for training Question Answering systems in an evolving way. Finally, I’ll present several delexicalization techniques to generate entity descriptions from structured triples, this method has proven efficiency to generate entity summaries in an under-served multilingual setup with limited training datasets.

Saison 2018-2019

Maëva Garnier, chercheuse CNRS au département parole & cognition du GIPSA-lab, Grenoble le 20 juin à 13H30 en salle 306

Speech clarification in face-to-face interaction

According to the H&H theory (Lindblom, 1970), speakers adapt their articulation effort to the level of intelligibility required by the communicative situation.
In a first part of my talk, I will present different studies on speech adaptation to a noisy environment (also called the Lombard effect), showing that speech modifications observed in such a perturbed situation do not limit to hyper-articulation, but reflect a set of different communicative strategies aiming, among others, at improving speech audibility (detection, segregation from the background noise, depending on the noise type), improving segment (audiovisual) intelligibility and facilitating discourse structure and segmentation (through the enhancement of prosodic cues). I will discuss how such adaptation strategies vary between individuals – in particular how some of them make use of the visual modality to improve one’s intelligibility in perturbed conditions – and how they can be affected by the experimental paradigm (reading task vs. interactive game; noise played through headphones vs. loudspeakers).
In a second part of my talk, I will go deeper into that speaker-listener interaction loop and present some resent work on how exactly a speaker estimates the required level of intelligibility and adapts his speech to meet the listener’s needs. I will present a study of the different audiovisual markers of understanding vs. misunderstanding emitted by the listener, from which the speaker can estimate online his actual intelligibility. I will also present an ongoing study on speech correction following a misunderstanding, exploring how the speaker’s clarification depends (or not) on what the listener actually perceived.
I will conclude my talk with some perspectives on speech adaptation and clarification in face-to-face interaction.

Mathieu Loiseau (Univ. Grenoble Alpes) le 16 mai à 14H00 en salle 306

ICALL ou iCALL — Questions posées par l’intégration du TAL dans l’ALAO ?

Si Traitement Automatique des Langues et Apprentissage des Langues Assisté par Ordinateur sont intimement liés depuis leurs origines (ou presque), cette alliance ne s’est pas faite sans heurts et à l’heure actuelle les systèmes d’Apprentissage des Langues embarquant de véritables technologies de TAL ne sont pas légion. Toutefois, le TAL s’immisce de plus en plus largement dans les technologies du quotidien. À l’heure du “Big Data”, l’intégration des technologies du TAL aux systèmes tournés vers l’ALAO pose des questions méthodologiques.

Caroline Rossi (ILCEA4 (GREMUTS) Univ. Grenoble Alpes) & Dorothy Kenny (Dublin City University)

Le contexte en traduction automatique : un concept à géométrie variable

Souvent saluée pour ses effets de démocratisation (Prado 2010 ; Boitet et al. 2010 ; Goltz 2017) et sa contribution au maintien de la diversité linguistique (Cronin 2013 : 59), la TA peut également être considérée comme fondée sur des mythes trompeurs (l’universalité du sens ou la transparence des formes linguistiques), et symptôme d’une vision purement instrumentale du langage (voir par exemple Raley 2003 ; ou encore Cronin 2013). Paradoxalement peut-être, elle est aussi souvent interprétée comme un soutien à l’hégémonie culturelle continue de l’anglais (Raley 2003 ; Poibeau 2017 : 168). Enfin, si elle constitue une technologie clé pour les grandes entreprises mondialisées (Poibeau 2017 : 6), elle est également impliquée dans la baisse des coûts de traduction, que subissent actuellement les traducteurs humains indépendants (Moorkens 2017). Il n’est donc pas étonnant que de nombreux travaux en traductologie soient porteurs de représentations négatives de la TA. Les points de rencontre sont cependant de plus en plus nombreux, non seulement parce que la façon dont les développeurs de TA abordent leur tâche peut à la fois refléter et aider à construire la compréhension du langage, du sens et de la traduction (Kenny 2012a), mais aussi parce que c’est principalement par l’interface avec la TA que les études en traduction rencontrent certaines des questions les plus pressantes de notre époque, liées au regain d’intérêt pour l’intelligence artificielle et l’avenir du travail humain.
Dans cet exposé, nous analysons ces convergences récentes en considérant les usages de la notion de contexte en traduction automatique. Après avoir défini trois grands types de contexte (co-texte, contexte élargi, et contexte situationnel), nous décrivons les principales approches historiquement adoptées en TA et leur utilisation du contexte, afin de montrer que les modèles statistiques puis neuronaux ont permis une prise en compte élargie de celui-ci. Dans un second temps, pour dresser un premier bilan des apports et des limites d’un tel élargissement, nous procédons à l’analyse comparative de sorties de TA statistique et neuronale (avec la paire de langues anglais-français) à partir de deux problèmes souvent relevés: le traitement des anaphoriques (Voita et al. 2018 ; Bawden et al. 2018 ; Hardmeier & Guillou 2018), et celui des métaphores (Isabelle 2017 ; Toral & Way 2018).

Véronique Aubergé (LIG-GETALP) 28 mars à 14H en salle 406

La Théorie D.A.N.S.E. et ses applications en interaction personne-robot, située dans les enjeux sociétaux et éthiques inévitables

Le robot social : utile, futile ou toxique ?

Ce qui sépare le robot social d’un objet intelligent connecté n’est pas tant la complexité des compétences artificielles qu’on lui donne que des caractéristiques que nous ne cernons pas encore et qui le font percevoir comme un “autre”. Le robot augmente ainsi notre espace social, ce qui est tout à fait nouveau puisque nous avions jusqu’ici « désiré », par les outils techniques et technologies, augmenter seulement les capacités de l’individu (mieux voir, traverser l’atlantique en 7h, augmenter sa mémoire par l’écriture ou le net etc). La question que nous devons tous nous poser sur le fond est pourquoi ce désir aujourd’hui, soudain et effervescent, d’un autre que moi artefactuel, alors que la première statue parlante date de 2000 ans avant l’égypte ancienne. Vaucanson avait réussi à convaincre tous de l’autonomie de ses automates, sans qu’aucun des puissants qui l’accueillaient n’aient souhaité mettre dans son salon le son joueur de flûte ou son canard comme automate compagnon…

Donc, même si cette rupture entre objet et sujet ne peut pas encore être objectivement rapporté à des caractéristiques techniques (le critère d’autonomie n’est par exemple pas totalement discriminant) on doit constater que les objets technologiques, perçus par nous humain, comme des robots déclenchent en nous une illusion empathique, une illusion d’un autre acceptable dans notre espace social. Une illusion d’un autre avec qui nous croyons communiquer. Or la communication est un élément vital pour l’humain au même titre que l’eau et la nourriture : au 13^e siècle un empereur se demandant quelle langue parlerait spontanément un enfant si les personnes qui l’entouraient ne lui adressaient jamais la parole ; il a confié 6 bébés à 6 nourrices, tous les bébés sont morts rapidement. Dans les orphelinats en roumanie en 1990 dans lesquels les enfants ont été à peu près nourris et lavés, mais jamais dans le désir de les considérer communicativement, dans un orphelinat récemment au Maroc d’enfants issus de viols, dont le tabou a involontairement rendu les soignantes non communiquantes, les enfants sont décédés ou sont devenus irrémédiablement psychotiques. De manière moins exceptionnelle, les personnes âgées de plus de 75 ans se fragilisent et meurent 7 fois plus vite q’ils sont en sentiment d’isolement, les opérés cardiaques entre 35 et 45 ans se dégradent 5 fois plus s’ils se sentent isolés, et de plus en plus de jeunes gens au Japon, mais surtout en Europe, souffrent du syndrome d’isolement social hikikimori.

Cela signifie que si à travers nos signaux sociaux : nos paroles, nos regards nos gestes, nos rapprochements ou éloignements, nos manières de nous déplacer ensemble, nous n’échangeons pas assez de nourriture socio-relationnelle, ou de la mauvaise nourriture, nous développons des souffrances intenses, pas toujours conscientes, par ce sentiment d’isolement. Ces fibres sociales qui nous relient, nous les appelons la glu socio-affective et nous avons émis l’hypothèse que lorsque nous tirons sur ces fibres, quand notre culture nous y pousse, avec un sentiment individuel de liberté, nous prenons le risque de les casser et de ne plus pouvoir y laisser passer assez de nourriture utile. Vous l’avez compris, l’hypothèse que nous proposons pour comprendre ce désir aujourd’hui de robot est un déficit de nourriture de l’autre.

Le robot pourrait ainsi par cette illusion empathique soulager dans un premier temps cette souffrance, comme le miroir posé en face d’un bras valide soulage la douleur fantôme d’un bras amputé. Mais bien sûr le grand danger serait que ce soulagement laisse croire que le robot va réellement nourrir l’humain en mal de l’autre, alors qu’il ne s’agit que d’une illusion, d’un miroir sur lequel l’humain se projette.

Ainsi dans les expériences que nous menons, des personnes, en sentiment variable d’isolement, projettent rapidement des attachements qui soulagent leur douleur d’isolement sur le robot Emox. Cet attachement, cette glu, utilise des primitives vocales, gestuelles et proxémiques sans contenus lexicaux, mais dont les caractéristiques prosodiques sont parfaitement contrôlées aussi bien dans leurs contenus que dans la dymanique par laquelle nous « manipulons » la nature de cet attachement. Ces relations s’inscrivent dans le modèle DANSE (Dynamics of the Affective Network for Social Entities), à travers une méthode écologique de « Making Thinking » (Fractal) de Living Lab mettant l’humain au centre du paradigme technologique de l’interaction dont le but est bien de ramener l’humain vers l’humain jusqu’à rendre inutile le robot.

Nous avons ainsi montré que ces bruits purement prosodiques peuvent attacher l’humain d’autant plus vite quand il n’est pas isolé, mais d’autant plus fort et longtemps quand il souffre d’isolement, et ceci sur des axes que nous avons choisis pour ne pas placer l’humain dans la dominance ou la soumission, en explorant un espace proposés de l’altruisme, et dans une relation qu’il perçoit comme positive. Les dimensions de fragilité/robustesse associée à une évaluation visée de « soin tendre » sont particulièrement explorée pour toutes les dimensions de l’artefact robot (son design sa locomotion, ses expressions vocales et visuelles). La théorie D.A.N.S.E., outre explorer cet espace, propose une dynamique co-animée de l’interaction, qui, selon ses paramètres, permet de représenter des états différents du « corps social » – i.e. de la mole émergente d’interactions- en analogie aux états de la matière inerte liquide, solide, gazeux. Au delà des interactions entre humains, ce sont les entités intégrées par l’humain dans son espace social que nous souhaitons observer, analyser, modéliser et simuler, la machine perçue comme entité, cad le robot, étant de plus un outil instrumental ; les entités vivantes non humaines, en particulier les « animaux de compagnie » (pets) permettant d’étendre l’observation des primitives collectées sur l’humain à des primitives plus génériques, voire potentiellement universelles du lien social intra et inter-espèces. Ces primitives identifiées dans l’espace vocal sont plus généralement recherchées dans tous les espaces de la communication directe ou indirecte (par exemple la « navigation »).

Notre but applicatif est de soulager ainsi la souffrance d’isolement pour que cela amène l’humain à retrouver sa capacité d’entretenir et de faire grandir ses relations avec les autres humains de sa sphère sociale, jusqu’à n’avoir plus aucun désir, plus aucun besoin du robot, et le rejeter.

Mais comment encadrer ce mécanisme afin qu’il ne coupe pas encore plus l’humain des ses relations aux humains, justement car il ne souffre plus quand il est installé dans sa communication avec le robot, dont notre hypothèse est résolument qu’elle est factice et à terme mettra cet humain dans un isolement peut-être irréversible ?

Cette capacité de manipulation, par exemple comme nous l’avons fait par des primitives aussi simples de langage, nous interroge très fortement éthiquement sur les conséquences de ces manipulations, d’autant plus si la raison de l’engouement pour ces technologies se révélait être un grandissement de l’isolement par des relations insuffisantes ou surtout de mauvaise qualité pour notre bien grandir ensemble.

Bien au-delà du robot pensé dès le départ comme compagnon, le robot de services quel qu’il soit, du fait même que nous le percevons robot, est perçu comme communicant ; Cela signifie que l’humain va trouver un sens à ces signaux émis sans but communicatif : que se passe-t-il si cette manipulation implicite, non volontairement contrôlée par le constructeur, et pour cause puisqu’il ne voulait pas explicitement faire de robot social, a des effets psychologiques, comportementaux, sociétaux qui seront toxiques à court ou moyen terme ? Comment et qui rendre responsable de ces effets ?
A ce moment même où, une loi européenne donnant le statut de personne électronique au robot a été gelée certes, mais après de longues discussions, et je dirais même de combats, en se souvenant que son embryon est née au MIT il y a 10 ans au plus près des GAFA qui proposent de décider pour nous et de nous enseigner les bons usages de l’IA (proposition émise en sept 2017), et où cette loi de personne électronique séduit malgré tout par exemple en Pologne, alors qu’un robot a obtenu un titre de nationalité en arabie saoudite, il est urgent de se poser les questions des effets de cette technologie et de les encadrer afin qu’elle ne soit pas rejetée par une inquiétude collective. Car si notre hypothèse d’isolement s’avérait, le robot serait un moyen efficace de nous en faire prendre conscience pour que nous développions des mécanismes de reconstructions humaines.

La question qu’il me semble donc essentiel de poser est : cet artefact est-il seulement futile, auquel cas il n’est pas nécessaire d’y réfléchir ? Pourrait-il utilement favoriser la reconstruction d’un espace social humain endommagé, jusqu’à devenir inutile ? Dans ce cas comment s’assurer que c’est un mécanisme de reconstruction humaine qui sera mise en œuvre et non pas l’enferment dans l’illusion au début agréable de l’illusion de relation avec les artefacts robots ? Ou au contraire la méconnaissance que nous avons aujourd’hui, toutes sciences humaines confondues, des processus profonds de l’interaction est-il un risque de toxicité dans la simulation artificielle de l’interaction qui déclenche l’illusion d’un autre ?

Marc Douguet (LITT&ARTS) le 21 mars à 14H en salle 306

Le traitement automatique de la parole théâtrale

Les textes dramatiques français du xviie siècle présentent une structure intrinsèque extrêmement riche : leur division en scènes, répliques et vers fait de ce corpus un recueil d’interactions qui présentent de vastes potentialités pour une analyse automatisée des phénomènes conversationnels.
En se plaçant à des échelles variées (des unités les plus étendues aux plus brèves), il s’agira ici de montrer tout le bénéfice que les études littéraires aussi bien que la linguistique peuvent tirer de l’extraction automatique des motifs récurrents que l’on observe dans ces textes, que ceux-ci concernent la gestion du cadre conversationnel (entrées et sorties des personnages) ; l’organisation des tours de parole en situation de trilogue ou de polylogue ; ou, enfin, les choix lexicaux reposant sur l’emploi d’éléments de langage fréquemment répétés.

Fusa Katada (Professeur à la Waseda University, Tokyo, Japon) le 5 février à 14h en salle 306

Explaining Mora Inclination in Phonological Dyslexia

The neurobiological disability called dyslexia (< Greek dys- ‘impaired’ + lexis ‘word’) is a specific learning disability that affects only literacy skills. It has been generally assumed that congenital form of dyslexia, termed developmental dyslexia, stems from a particular problem in language acquisition affecting phonological awareness. However, the exact nature of phonological awareness has not yet been made clear.
This study spotlights the seemingly mysterious discrepancy in the prevalence of dyslexic populations between stress-timed English (as high as 20%) and mora-timed Japanese (as low as 1%). Stress-timed French falls between the two types of languages. On the basis of English dyslexic reading marked by an overproduction of moraic (CV) units in the absence of rhyme (VC) units, the study strengthens the mora-basic hypothesis and shows that the discrepancy is due to differences in prosodic structures between the two languages. For VC-oriented English, the readers must have rhyme awareness depicting the unit rhyme through prosodic restructuring from CV-C to C-VC. A failure to do so manifests as phonological dyslexia. For mora (CV) oriented Japanese, rhyme awareness and prosodic restructuring are irrelevant. Consequently phonological dyslexia is largely undetected.
From the articulatory phonological point of view, it is suggested that onset consonants are coarticulation of the following vowels. Moras (CVs) are thus formed automatically and essentially free. In contrast, coda consonants are not coarticulation with the preceding vowels. Forming rhymes (VCs) requires a cognitive temporal-spatial decision load, which a dyslexic mind is unable to bear. Mora inclination is explained accordingly.
The study deepens the above view and come to claim that mora-forming coarticulation is easy because it is a synchronized articulatory behavior, akin to a synchronized human locomotive behavior. This view conforms to a human neurobiological restriction inclined toward synchronized behavior, which is claimed to be acquired in the process of human evolution.
Developmental dyslexia has a serious impact on children’s learning and forms a quite interdisciplinary field of study ranging from clinics, to brain science, to information processing, to linguistics, and to pedagogy, which offers both technical and conceptual research potentials.

Keywords: mora inclination, developmental dyslexia, phonological awareness, rhyme awareness, coarticulation, synchronized articulatory behavior

Claude Roux (Naver Labs Europe) le 14 février 2018 à 14H00 en salle 306

Tamgu un langage de programmation pour l’extraction d’information

Tamgu signifie en coréen l’investigation, la recherche. Ce langage regroupe tous les outils de base nécessaire à l’extraction et la détection d’expressions textuelles. Tamgu permet en particulier de combiner des approches en apprentissage automatique avec des approches plus symboliques, par exemple en intégrant des lexiques généraux ou utilisateurs.

Steven Bird, Charles Darwin University

Mon 14th January at 2pm – room 306 batiment IMAG

Scalable Methods for Working with Unwritten Languages 2: Talking about Places and Processes

Lane Schwartz (Univ. of Illinois) le 10 janvier 2019 à 14H00 en salle 306

Intersecting machine learning and linguistic fieldwork: Computational models for St. Lawrence Island Yupik

Marc Dymetman Principal Scientist, NLP, NAVER Labs Europe, le 6/12/2018 à 14H00 en salle 306

Prior knowledge and deep learning: some principles and applications to NLP

In the last few years, neural networks have quickly gained a dominant position in computational linguistics. In application domains where supervised data is abundant, such as Machine Translation between some of the major world languages, the superior learning capabilities of neural networks have produced models with better performance than the previously available techniques. In such abundant data conditions, these models can be trained from raw data, in an end-to-end fashion, without much injection of external knowledge.

However, in less favorable data conditions, prior knowledge continues to play an important role: it allows the neural components to be guided, not only by direct data observations, but also by hypotheses and principles that come from an understanding of the problem at hand.

In my talk, I will try to provide some intuitions about the role of prior knowledge in deep learning for NLP and provide some examples from my own experience with applications such as Language Modelling, NLG, and Semantic Parsing.

Marc Cavazza, Professeur à University of Greenwich · School of Computing & Mathematical Sciences le 4/12/2018 à 10H30 en salle séminaire-1

Nouvelles Applications des Techniques de Narration Interactive

Les techniques de Narration Interactive se sont développées depuis les années 2000 principalement dans le but de fournir un contenu narratif plus évolué aux médias interactifs, pour des applications plus ludiques qu’éducatives. Avec le développement de représentations des connaissances plus sophistiquées et d’approches plus cognitives de la narration, il existe de nouvelles opportunités pour utiliser des techniques narratives dans le cadre de la simulation et de la formation. Cette tendance rejoint au niveau narratif le développement du domaine des « Jeux Sérieux ». Nous présentons plusieurs exemples d’utilisation de techniques narratives dans des applications non ludiques, basées sur une utilisation de techniques de planification par opérateurs ou par tâches.

Dans le domaine de la formation ou de l’éducation des patients, la conversion de modèles en connaissances en fragments narratifs scénarisés peut être utilisée pour créer une diversité de situations résultant de l’interaction entre des connaissances génériques et des données personnelles. Nous présenterons également une approche cognitive de la narration qui vise à contrôler le phénomène de compréhension narratif, et qui a pu être utilisée pour explorer la compréhension causale chez l’enfant.

Denis Paperno (Loria) – 8 novembre 2018 à 14H00 en salle 206.

Limitations in learning an interpreted language with recurrent models

I report work in progress on learning simplified interpreted languages by means of recurrent models. The data is constructed to reflect core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. Preliminary results suggest that LSTM networks do generalise to compositional interpretation, albeit only in the most favorable learning setting, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.
Bio : While I have experience in different subfields of language science such as field linguistics, language typology, and formal semantics, my current work mainly focuses on computational semantic representations for natural language, including word, phrase, and sentence embeddings. My work has included proposing new models such as the Practical Lexical Function model for syntax-driven vector compositionality or the Boolean Distributional Semantic Model for entailment detection, as well as analyzing existing models and evaluating them on new tasks.
I hold an undergraduate degree in Linguistics from Moscow State University and a PhD from the University of California, Los Angeles. After finishing my thesis, I was a postdoc at Marco Baroni’s COMPOSES group at the University of Trento. Since 2016, I am a researcher (CR CNRS) at the Lorraine laboratory of computer science and its applications (Loria).

Jacob Levy Abitbol, Márton Karsai, Jean-Pierre Chevrot, Jean-Philippe Magué – 20 septembre 2018 à 15H30 en salle 306.

Socioeconomic Dependencies of Linguistic Patterns in Twitter: A Multivariate Analysis

Our usage of language is not solely reliant on cognition but is arguably determined by myriad external factors leading to a global variability of linguistic patterns. This issue, which lies at the core of sociolinguistics and is backed by many small-scale studies on face-to-face communication, is addressed here by constructing a dataset combining the largest French Twitter corpus to date with detailed socioeconomic maps obtained from national census in France. We show how key linguistic variables measured in individual Twitter streams depend on factors like socioeconomic status, location, time, and the social network of individuals. We found that (i) people of higher socioeconomic status, active to a greater degree during the daytime, use a more standard language; (ii) the southern part of the country is more prone to use more standard language than the northern one, while locally the used variety or dialect is determined by the spatial distribution of socioeconomic status; and (iii) individuals connected in the social network are closer linguistically than disconnected ones, even after the effects of status homophily have been removed. Our results inform sociolinguistic theory and may inspire novel learning methods for the inference of socioeconomic status of people from the way they tweet.

Christian Boitet – 13 septembre 2018 à 14h en salle 306

Professeur émérite à l’Université Grenoble Alpes, GETALP-LIG
Séminaire consacré à la conférence Coling 2018 (http://coling2018.org)

Saison 2017-2018

Bruno Pouliquen 31/05/2018 à 14h – 306

Organisation mondiale de la propriété intellectuelle

From SMT to NMT at WIPO

Steven Bird – 4 avril 2018 à 15h

Professeur, Univ. Charles Darwin, Australie
http://www.cdu.edu.au/northern-institute/our-teams/steven-bird

Sparse Transcription: Rethinking the Processing of Unwritten Languages

Steven Bird is researching new methods for documenting and revitalising the thousands of small languages still spoken in the world today. His career began with a BSc and MSc in computer science at Melbourne University, followed by a PhD in computational linguistics from Edinburgh University, completed in 1990. Since then he has worked at the Universities of Edinburgh, Pennsylvania, Melbourne, and Berkeley, and conducted fieldwork in Australia, West Africa, Melanesia, Amazonia, and Central Asia. He is co-author of a popular textbook in computational linguistics, and recently developed a new computer science curriculum for secondary students which has been adopted in Australian schools. The Aikuma app developed with his students took out the grand prize in the Open Source Software World Challenge.

Laurent Besacier

Professeur au LIG, équipe GETALP

Le défi de découvrir des unités linguistiques à partir de la parole brute / The challenge of discovering linguistic units from raw speech

Dans ce séminaire, je présenterai deux projets scientifiques collectifs [1,2] qui m’ont occupé pendant l’année 2017. Qu’ont-ils en commun ?La découverte d’unités linguistiques à partir de la parole brute sans aucune autre supervision. Ou presque …In this seminar, I will present two collective scientific projects [1,2] that occupied me during the year 2017. What do they have in common?Discovering linguistic units from raw speech without any other supervision. Or almost…[1] https://arxiv.org/pdf/1712.04313.pdf[2] https://arxiv.org/pdf/1802.05092.pdf

Marco Dinaralli – 22 mars 2018 à 15H15

LaTTiCe-CNRS UMR 8094 – en séjour au LIG-GETALP

Compréhension automatique de la parole et resolution de chaînes de coréférences.

Dans ce séminaire je vais parler des principaux domaines de recherches sur lesquels j’ai travaillé : compréhension automatique de la parole et resolution de chaînes de coréférences. Je décrirai les systèmes informatiques, surtout à base d’apprentissage artificiel, mis en place pour modéliser ces problèmes.

Ces systèmes s’appuient sur des modèles qui vont des automates probabilistes à états finis (FSA/FST) aux réseaux neuronaux, en passant par les champs conditionnels aléatoires (CRF), et détiennent l’état-de-l’art sur certaines tâches.

Emmanuel Morin – 20 mars 2018 à 9H30

Professeur à l’Université de Nantes (LS2N – Laboratoire des Sciences du Numérique de Nantes)

Extraction de lexiques bilingues à partir de corpus comparables spécialisés : la langue générale au secours de la langue de spécialité

L’extraction de lexiques bilingues à partir de corpus a initialement été réalisée en s’appuyant sur des textes en correspondance de traduction (c’est-à-dire des corpus parallèles). Cependant, et en dépit des bons résultats obtenus, ces corpus demeurent des ressources rares, notamment pour les domaines spécialisés et pour des couples de langues ne faisant pas intervenir l’anglais. Dans ce contexte, les recherches en extraction de lexiques bilingues se sont penchées sur d’autres corpus composés de textes partageant différentes caractéristiques telles que le domaine, le genre, la période… sans être en correspondance de traduction (c’est-à-dire des corpus comparables).L’extraction de lexiques bilingues à partir de corpus comparables spécialisés est fortement contrainte par la quantité de données mobilisables. Pour contourner cet obstacle, une solution serait d’associer des ressources externes au corpus spécialisés. Cette solution, quoi que intuitive, va à l’encontre du courant dominant puisque de nombreuses études soutiennent l’idée que l’ajout de documents hors-domaine à un corpus spécialisé diminue la qualité des lexiques extraits. Dans cet exposé nous montrerons comment des corpus de langue générale peuvent venir compléter des corpus de langue de spécialité. Nous présenterons différentes manières d’associer ces données entre elles en exploitant des représentations distributionnelles basées sur modèles vectoriels et neuronaux.

Olivier Kraif – 8 mars 2018

Laboratoire de Linguistique et Didactique des Langues Etrangères et Maternelles
Analyse en dépendances pour l’extraction automatique de motifs récurrents

On désigne par “motifs” des constructions récurrentes susceptibles de jouer un rôle dans l’organisation textuelle et la structuration du discours. Les motifs, en tant que constructions préfabriquées, sont par ailleurs caractéristiques de genres textuels très codifiés. L’identification de ces constructions peut s’avérer utile dans différents types d’application en TAL : classification de documents, traduction automatique, aide à la rédaction, recherche de termes, outils pour la linguistique de corpus… Après avoir précisé la notion au plan linguistique, nous passerons en revue différentes méthodes dédiées à l’identification automatique de motifs : segments répétés ou ngrams, motifs d’itemsets, arbres lexico-syntaxiques récurrents. Nous détaillerons les pistes recherches actuelles concernant l’utilisation de la syntaxe (analyses en dépendances) pour la découverte et la description de certaines classes de motifs.

Moez Avili – 8 février 2018

Laboratoire d’Informatique d’Avignon
Fiabilité de la comparaison de voix dans le cadre judiciaire / Reliability of voice comparison for forensic applications

Dans les procédures judiciaires, des enregistrements de voix sont de plus en plus fréquemment présentés comme élément de preuve. En général, il est fait appel à un expert scientifique pour établir si l’extrait de voix en question a été prononcé par un suspect donné (prosecution hypothesis) ou non (defence hypothesis). Ce prosessus est connu sous le nom de “Forensic Voice Comparison (FVC)” (comparaison de voix dans le cadre judiciaire). Depuis l’émergence du modèle DNA typing, l’approche Bayesienne est devenue le nouveau “golden standard” en sciences criminalistiques. Dans cette approche, l’expert exprime le résultat de son analyse sous la forme d’un rapport de vraisemblance (LR). Ce rapport ne favorise pas seulement une des hypothèses (“prosecution” ou “defence”) mais il fournit également le poids de cette décision. Bien que le LR soit théoriquement suffisant pour synthétiser le résultat, il est dans la pratique assujetti à certaines limitations en raison de son processus d’estimation. Cela est particulièrement vrai lorsque des systèmes de reconnaissance automatique du locuteur (ASpR) sont utilisés. Ces systèmes produisent un score dans toutes les situations sans prendre en compte les conditions spécifiques au cas étudié. Plusieurs facteurs sont presque toujours ignorés par le processus d’estimation tels que la qualité et la quantité d’information dans les deux enregistrements vocaux, la cohérence de l’information entre les deux enregistrements,
leurs contenus phonétiques ou encore les caractéristiques intrinsèques des locuteurs. Tous ces facteurs mettent en question la notion de fiabilité de la comparaison de voix dans le cadre judiciaire. Dans cette thèse, nous voulons adresser cette problématique dans le cadre des systèmes automatiques (ASpR) sur deux points principaux.

Le premier consiste à établir une échelle hiérarchique des catégories phonétiques des sons de parole selon la quantité d’information spécifique au locuteur qu’ils contiennent. Cette étude montre l’importance du contenu phonétique: Elle met en évidence des différences intéressantes entre les phonèmes et la forte influence de la variabilité intra-locuteurs. Ces résultats ont été confirmés par une étude complémentaire sur les voyelles orales basée sur les paramètres formantiques, indépendamment de tout système de reconnaissance du locuteur.

Le deuxième point consiste à mettre en oeuvre une approche afin de prédire la fiabilité du LR à partir des deux enregistrements d’une comparaison de voix sans recours à un ASpR. À cette fin, nous avons défini une mesure d’homogénéité (NHM) capable d’estimer la quantité d’information et l’homogénéité de cette information entre les deux enregistrements considérés. Notre hypothèse ainsi définie est que l’homogénéité soit directement corrélée avec le degré de fiabilité du LR. Les résultats obtenus ont confirmé cette hypothèse avec une mesure NHM fortement corrélée à la mesure de fiabilité du LR. Nos travaux ont également mis en évidence des différences significatives du comportement de NHM entre les comparaisons cibles et les comparaisons imposteurs.

Nos travaux ont montré que l’approche “force brute” (reposant sur un grand nombre de comparaisons) ne suffit pas à assurer une bonne évaluation de la fiabilité en FVC. En effet, certains facteurs de variabilité peuvent induire des comportements locaux des systèmes, liés à des situations particulières. Pour une meilleure compréhension de l’approche FVC et/ou d’un système ASpR, il est nécessaire d’explorer le comportement du système à une échelle aussi détaillée que possible (le diable se cache dans les détails).

Paule-Annick Davoine – 23 novembre 2017

Professeur à l’Université Grenoble Alpes, laboratoire Pactes
Cartographie et géovisualisation pour la représentation et l’analyse de données spatialisées pour les humanités numériques

De plus en plus de disciplines ou de recherches en sciences humaines et sociales, lettres et langues s’intéressent à la dimension spatiale des données ou des sources: en histoire pour la représentation de données géo historiques nécessaires à la compréhension de l’évolution des territoires ou des phénomènes les impactants; en littérature pour la cartographie des lieux dans les romans, des récits de vie des auteurs ; en linguistique pour appréhender la diffusion spatiale des langues ou des dialectes; en géographie pour la reconstitution de trajectoires et de déplacements d’individus à partir de récits ou pour la valorisation de documents cartographiques anciens…. Tous ces besoins lancent de nouveaux défis à la cartographie et à la géo visualisation qui doivent traiter des données spatialisées semi-structurées, multidimensionnelles, multi-formes et définies par une diversité d’échelles d’observation tant géographique que temporelle et selon des niveaux de qualité variables.
L’objectif de la communication est de présenter certaines problématiques cartographiques et de géo visualisation soulevées par le traitement et la représentation spatialisées des données issues du domaine des humanités numériques en s’appuyant sur des projets de recherche menés au sein de l’équipe Steamer

Patrick Paroubek – 26 octobre 2017

Ingénieur de Recherche CNRS (IR1)

Le Traitement Automatique des Langues pour l’analyse des publications scientifiques

Le thème sera abordé à partir des travaux d’analyse des publications de la communauté TAL réalisés autour du corpus NLP4NLP
qui couvre 50 ans de publications des principales conférences et revues dans le domaine de l’analyse du texte et de la parole
et de corpus biomédicaux (projet MIROR). Les apports du TAL adressés ici concerneront l’analyse des tendances et des réseaux ainsi que la détection de plagiat ou de “spin” (embellissement) dans les publications scientifiques.

Christian Boitet – 5 octobre 2017

Professeur émérite à l’Université Grenoble Alpes, GETALP-LIG
Séminaire consacré au MT summit (http://aamt.info/app-def/S-102/mtsummit/2017/)

Saison 2016-2017

Maximiliano Duran – 30 mai 2017

Linguiste peruvien

Le temps non marqué et suffixation à quatre niveaux en quechua

Pedro Chahuara – 18 mai 2017

Chercheur au Centre Européen de Xerox (XRCE)

Online Mining of Web Publisher RTB Auctions for Revenue Optimization

In the online adversiment market there are two main actors: the publishers that offer a space for advertisement in their websites and the announcers who compite in an auction to show their advertisements in the available spaces. When a user accesses an internet site an auction starts for each ad space, the profile of the user is given to the announcers and they offer a bid to show an ad to that user. The publisher fixes a reserve price, the minimum value they accept to sell the space.

In this talk I will introduce a general setting for this ad market and I will present an engine to optimize the publisher revenue from second-price auctions, which are widely used to sell on-line ad spaces in a mechanism called real-time bidding. The engine is fed with a stream of auctions in a time-varying environment (non-stationary bid distributions, new items to sell, etc.) and it predicts in real time the optimal reserve price for each auction. This problem is crucial for web publishers, because setting an appropriate reserve price on each auction can increase significantly their revenue.

I consider here a realistic setting where the only available information consists of a user identifier and an ad placement identifier. Once the auction has taken place, we can observe censored outcomes : if the auction has been won (i.e the reserve price is smaller than the first bid), we observe the first bid and the closing price of the auction, otherwise we do not observe any bid value.

The proposed approach combines two key components: (i) a non-parametric regression model of auction revenue based on dynamic, time-weighted matrix factorization which implicitly builds adaptive users’ and placements’ profiles; (ii) a non-parametric model to estimate the revenue under censorship based on an on-line extension of the Aalen’s Additive Model.

Jean-Pierre Chevrot – 2 mars 2017

Professeur à l’Université Grenoble Alpes
Laboratoire de l’Informatique du Parallélisme, Institut rhône-alpin des systèmes complexes, ENS Lyon
Laboratoire Lidilem, Université Grenoble Alpes
Acquisition du langage et usages sociolinguistiques : le social, le cognitif et le réseau
Le rapprochement des approches cognitives et des approches sociales est souvent présenté comme un objectif souhaitable pour mieux comprendre le processus d’acquisition du langage (Hulstijn et al., 2014). Cependant, la question reste de savoir comment traduire ce programme dans la en réalité de la pratique de la recherche.

Bien que les approches cognitives et sociales soient fondées sur des traditions différentes, la tentative de combiner les deux points de vue dans la recherche sur l’acquisition du langage peut bénéficier d’entreprises similaires dans d’autres domaines, tels que la cognition sociale, la sociologie cognitive, la sociolinguistique cognitive, les neurosciences sociales, etc. L’examen de ces tentatives interdisciplinaires conduit à l’identification de trois façons de combiner le social et le cognitif: l’approche sociale de la cognition, l’approche cognitive du social et l’approche dite individualisme complexe (Kaufmann et Clément, 2011; Chevrot, Drager & Foulkes, en préparation, Dupuy, 2004).

Parmi ces options, seule la dernière ne favorise ni le niveau social et collectif, ni le niveau cognitif et individuel (Dupuy, 2004). Au contraire, elle met l’accent sur l’interaction et la causalité bidirectionnelle entre ces eux. Dans cette perspective, des individus ayant des caractéristiques sociales et cognitives spécifiques interagissent les uns avec les autres dans le cadre de contraintes sociales et cognitives générales. Les caractéristiques des individus peuvent évoluer en raison des interactions entre eux et ces changements peuvent à leur tour modifier les contraintes générales (Hruschka et al. 2009). Dans ce cadre, l’acquisition du langage et son usage peuvent être considérés comme les résultats d’influences réciproques diffusant dans un réseau de relations.

Nous présenterons des projets susceptibles de mettre en œuvre ce cadre, notamment le projet DyLNet – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data (Nardy, 2017).
References
Chevrot, J.P., Drager, K. & Foulkes, P. (en préparation). Sociolinguistic Variation and Cognitive Science.

Dupuy, J.-P. (2004). Vers l’unité des sciences sociales autour de l’individualisme méthodologique complexe. Revue du MAUSS, 24(2), 310-328.

Hruschka, D. J., Christiansen, M. H., Blythe, R. A., Croft, W., Heggarty, P., Mufwene, S. S., Pierrehumbert, Janet B., Poplack, S. (2009). Building social cognitive models of language change. Trends in Cognitive Sciences, 13(11), 464–469.

Hulstijn, J. H., Young, R. F., Ortega, L., Bigelow, M., DeKeyser, R., Ellis, N. C., Lantolf, J. P., Mackey, A., Talmy, S. (2014). Bridging the Gap. Studies in Second Language Acquisition, 36(03), 361–421.

Kaufmann, L., & Clément, F. (2011). L’esprit des sociétés. Bilan et perspectives en sociologie cognitive. In L. Kaufmann & F. Clément, La sociologie cognitive, Ophrys (pp. 7–40).

Nardy (2017). DyLNet Project – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data [https://hal-univ-orleans.archives-ouvertes.fr/hal-01396652]

Michael Zock – 12 janvier 2017

Directeur de recherche CNRS au Laboratoire d’Informatique Fondamentale (LIF), groupe TALEP à Aix-Marseille Université

Si tous les chemins mènent à Rome, ils ne se valent pas tous. Le problème d’accès lexical en production

Tout le monde a déjà rencontré le problème suivant : on cherche un mot (ou le nom d’une personne) que l’on connaît, sans être en mesure d’y accéder à temps. Les travaux des psychologues ont montré que les personnes se trouvant dans cet état cognitif savent énormément de choses concernant le mot recherché (sens, nombre de syllabes, origine, etc.), et que les mots avec lequel ils le confondent lui ressemblent étrangement (lettre ou son initial, catégorie syntaxique, champ sémantique, etc.).
Mon objectif (à long terme) est de réaliser un programme tirant bénéfice de cet état de faits pour assister un locuteur ou rédacteur à (re)trouver le mot qu’il a sur le bout de la langue. À cette fin, je prévois d’ajouter à un dictionnaire électronique existant un index d’association (collocations rencontrées dans un grand corpus). Autrement dit, je propose de construire un dictionnaire analogue à celui des êtres humains, qui, outre les informations conventionnelles (définition, forme écrite, informations grammaticales) contiendrait des liens (associations), permettant de naviguer entre les idées (concepts) et leurs expressions (mots). Un tel dictionnaire permettrait donc l’accès à l’information recherchée soit par la forme (lexicale : analyse), soit par le sens (concepts : production), soit par les deux.
L’objectif de cet exposé est de montrer comment construire une telle ressource, comment s’en servir, quelles sont les difficultés de construction et quelles sont les possibilités qu’offre un telle ressource.

Lorraine Goeuriot – 1er décembre 2016

Maîtresse de conférences à l’Univ. Grenoble Alpes dans l’équipe MRIM du Laboratoire d’informatique de Grenoble

Medical Information Retrieval and its evaluation: an overview of CLEF eHealth evaluation task

In this talk, I will introduce my research activities in the field of medical information retrieval, and in particular its evaluation.
The use of the Web as source of health-related information is a wide-spread phenomena, and laypeople often have difficulties finding relevant documents. The goal of the CLEF eHealth evaluation challenge is to provide researchers with datasets to improve consumer health search. I will firstly introduce the task and the datasets built. Then I will describe some experiments and results obtained on this dataset.

Fabien Ringeval – 20 octobre 2016

Maître de conférences à l’Univ. Grenoble Alpes dans l’équipe GETALP du Laboratoire d’informatique de Grenoble

Vers la reconnaissance automatique d’émotions écologiques

Les technologies de reconnaissance automatique de l’émotion ont gagné une attention croissante dans la dernière décennie tant au niveau académique qu’industriel, puisqu’elles ont trouvé de nombreuses applications dans des domaines aussi variés que la santé, l’éducation, les jeux-vidéos, la publicité, ou encore la robotique sociale. Bien que de bonnes performances soient reportées dans la littérature pour des émotions actées, la reconnaissance automatique d’émotions spontanées, comme exprimées dans la vie de tous les jours, reste encore un challenge non résolu, puisque ces émotions sont subtiles, et leur expression, comme leur signification, varient fortement selon de nombreux paramètres locuteur, comme par exemple l’âge, et le genre, mais aussi la personnalité, le rôle social, la langue, et la culture. Dans cette présentation, je décrirai les méthodologies actuelles en acquisition et annotation de données affectives, et présenterai les dernières avancées pour la reconnaissance automatique des émotions à partir du signal de parole.