List of upcoming seminars
04/04/2018 at 3pm Steven Bird (306)
31/05/2018 Bruno Pouliquen (306)
Professor, Univ. Charles Darwin, Australia
Sparse Transcription: Rethinking the Processing of Unwritten Languages
Steven Bird is researching new methods for documenting and revitalising the thousands of small languages still spoken in the world today. His career began with a BSc and MSc in computer science at Melbourne University, followed by a PhD in computational linguistics from Edinburgh University, completed in 1990. Since then he has worked at the Universities of Edinburgh, Pennsylvania, Melbourne, and Berkeley, and conducted fieldwork in Australia, West Africa, Melanesia, Amazonia, and Central Asia. He is co-author of a popular textbook in computational linguistics, and recently developed a new computer science curriculum for secondary students which has been adopted in Australian schools. The Aikuma app developed with his students took out the grand prize in the Open Source Software World Challenge.
Professor at LIG, GETALP team
The challenge of discovering linguistic units from raw speech
In this seminar, I will present two collective scientific projects [1,2] that occupied me during the year 2017. What do they have in common?Discovering linguistic units from raw speech without any other supervision. Or almost… https://arxiv.org/pdf/1712.04313.pdf https://arxiv.org/pdf/1802.05092.pdf
Marco Dinaralli – 22 mars 2018 à 15H15
LaTTiCe-CNRS UMR 8094 – staying at LIG-GETALP
Emmanuel Morin – 20 mars 2018 à 9H30
Professor at the University of Nantes (LS2N – Laboratoire des Sciences du Numérique de Nantes)
Olivier Kraif – 8 March 2018
Laboratory of Linguistics and Didactics of Foreign and Nursery Languages
Dependency analysis for automatic extraction of recurring patterns
“Patterns” are recurring constructions that may play a role in the textual organization and structuring of discourse. The motifs, as prefabricated constructions, are also characteristic of highly codified textual genres. The identification of these constructions can be useful in different types of TAL application: document classification, automatic translation, writing help, term search, corpus linguistics tools… After having clarified the linguistic concept, we will review different methods dedicated to the automatic identification of patterns: repeated segments or ngrams, item set patterns, recurring lexico-syntactic trees. We will detail the current research tracks concerning the use of syntax (analyses in dependencies) for the discovery and description of certain classes of patterns.
Moez Avili – February 8, 2018
Laboratoire d’Informatique d’Avignon
Reliability of voice comparison for forensic applications
In court proceedings, voice recordings are increasingly being presented as evidence. In general, a scientific expert is called upon to establish whether the extract of the voice in question was pronounced by a given suspect (prosecution hypothesis) or not (defence hypothesis). This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the Bayesian approach has become the new “golden standard” in forensic science. In this approach, the expert expresses the result of his analysis in the form of a likelihood ratio (LR). This report not only favours one of the hypotheses (“prosecution” or “defence”) but also provides the weight of this decision. Although the LR is theoretically sufficient to synthesize the result, in practice it is subject to certain limitations because of its estimation process. This is particularly true when automatic speaker recognition (ASpR) systems are used. These systems produce a score in all situations without taking into account the specific conditions of the case studied. Several factors are almost always ignored by the estimation process such as the quality and quantity of information in the two voice recordings, the consistency of the information between the two recordings, their phonetic content or the intrinsic characteristics of the speakers. All these factors call into question the notion of reliability of the comparison of voices in the judicial framework. In this thesis, we want to address this problematic within the framework of automatic systems (ASpR) on two main points.
The first is to establish a hierarchical scale of phonetic categories of speech sounds according to the amount of speaker-specific information they contain. This study shows the importance of phonetic content: It highlights interesting differences between phonemes and the strong influence of intra-speaker variability. These results were confirmed by a complementary study on oral vowels based on formantic parameters, independent of any speaker recognition system.
The second point is to implement an approach to predict the reliability of the LR from the two recordings of a voice comparison without using an ASpR. To this end, we defined a homogeneity measure (NHM) capable of estimating the amount of information and the homogeneity of this information between the two records considered. Our hypothesis thus defined is that homogeneity is directly correlated with the degree of reliability of the LR. The results obtained confirmed this hypothesis with an NHM measurement strongly correlated to the LR reliability measurement. Our work also revealed significant differences in NHM behaviour between target comparisons and fake comparisons.
Our work has shown that the “brute force” approach (based on a large number of comparisons) is not sufficient to ensure a good assessment of reliability in CVF. Indeed, certain variability factors can induce local system behaviours, linked to particular situations. For a better understanding of the FVC approach and/or an ASpR system, it is necessary to explore the behaviour of the system at as detailed a scale as possible (the devil hides in the details).
Paule-Annick Davoine – November 23, 2017
Professor at l’Université Grenoble Alpes, laboratoire Pactes
Cartography and geovisualization for the representation and analysis of spatialized data for digital humanities
More and more disciplines or research in the humanities and social sciences, literature and languages are interested in the spatial dimension of data or sources: in history for the representation of geo-historical data necessary to the understanding of the evolution of territories or phenomena impacting them; in literature for the cartography of places in novels, life stories of authors in linguistics to apprehend the spatial diffusion of languages or dialects; in geography for the reconstruction of trajectories and movements of individuals from stories or for the enhancement of ancient cartographic documents…. All these needs pose new challenges to mapping and geo-visualization, which must deal with semi-structured, multidimensional, multi-platform spatialized data defined by a diversity of observation scales, both geographical and temporal, and with varying levels of quality.
The objective of the communication is to present certain cartographic and geo visualization issues raised by the spatialized processing and representation of data from the field of digital humanities based on research projects conducted within the Steamer team.
Patrick Paroubek – 26 October 2017
Research Engineer CNRS (IR1)
Automatic Language Processing for the analysis of scientific publications
The theme will be addressed based on the analysis of TAL community publications based on the NLP4NLP corpus which covers 50 years of publications from major conferences and journals in the field of text and speech analysis and biomedical corpora (MIROR project). The TAL contributions addressed here will concern the analysis of trends and networks as well as the detection of plagiarism or “spin” (embellishment) in scientific publications.
Christian Boitet – October 5, 2017
Professor Emeritus at l’Université Grenoble Alpes, GETALP-LIG
MT summit seminar (http://aamt.info/app-def/S-102/mtsummit/2017/)
Maximiliano Duran – May 30, 2017
Pedro Chahuara – 18 May 2017
Jean-Pierre Chevrot – March 2, 2017
Professor at l’Université Grenoble Alpes
Laboratoire de l’Informatique du Parallélisme, Institut rhône-alpin des systèmes complexes, ENS Lyon
Lidilem laboratory, Université Grenoble Alpes
Language acquisition and sociolinguistic uses: social, cognitive and network
The approximation of cognitive and social approaches is often presented as a desirable goal to better understand the language acquisition process (Hulstijn et al., 2014). However, the question remains how to translate this program into actual research practice.
Although cognitive and social approaches are based on different traditions, attempts to combine the two perspectives in language acquisition research may benefit similar undertakings in other fields, such as social cognition, cognitive sociology, cognitive sociolinguistics, social neurosciences, etc. An examination of these interdisciplinary attempts leads to the identification of three ways of combining the social and the cognitive: the social approach to cognition, the cognitive approach to social and the so-called complex individualism approach (Kaufmann and Clément, 2011; Chevrot, Drager & Foulkes, in preparation, Dupuy, 2004).
Of these options, only the latter does not favour either the social and collective level or the cognitive and individual level (Dupuy, 2004). Instead, it emphasizes the interaction and bi-directional causality between them. In this perspective, individuals with specific social and cognitive characteristics interact with each other within general social and cognitive constraints. Individual characteristics may change as a result of interactions between individuals and these changes may in turn change general constraints (Hruschka et al. 2009). In this context, the acquisition of language and its use can be considered as the results of reciprocal influences diffusing in a network of relationships.
We will present projects that can implement this framework, including the DyLNet project – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data (Nardy, 2017).
Chevrot, J.P., Drager, K. & Foulkes, P. (in preparation). Sociolinguistic Variation and Cognitive Science.
Dupuy, J.-P. (2004). Vers l’unité des sciences sociales autour de l’individualisme méthodologique complexe. Revue du MAUSS, 24(2), 310-328.
Hruschka, D. J., Christiansen, M. H., Blythe, R. A., Croft, W., Heggarty, P., Mufwene, S. S., Pierrehumbert, Janet B., Poplack, S. (2009). Building social cognitive models of language change. Trends in Cognitive Sciences, 13(11), 464–469.
Hulstijn, J. H., Young, R. F., Ortega, L., Bigelow, M., DeKeyser, R., Ellis, N. C., Lantolf, J. P., Mackey, A., Talmy, S. (2014). Bridging the Gap. Studies in Second Language Acquisition, 36(03), 361–421.
Kaufmann, L., & Clément, F. (2011). L’esprit des sociétés. Bilan et perspectives en sociologie cognitive. In L. Kaufmann & F. Clément, La sociologie cognitive, Ophrys (pp. 7–40).
Nardy (2017). DyLNet Project – Language Dynamics, Linguistic Learning, and Sociability at Preschool: Benefits of Wireless Proximity Sensors in Collecting Big Data [https://hal-univ-orleans.archives-ouvertes.fr/hal-01396652]
Michael Zock – January 12, 2017
Research Director CNRS at the Laboratoire d’Informatique Fondamentale (LIF), TALEP group in Aix-Marseille University
If all roads lead to Rome, they are not all equal. The problem of lexical access in production
Everyone has already encountered the following problem: you are looking for a word (or the name of a person) that you know, without being able to access it in time. The work of psychologists has shown that people in this cognitive state know a great deal about the word being searched for (meaning, number of syllables, origin, etc.), and that the words with which they confuse it resemble it strangely (letter or its initial, syntactic category, semantic field, etc.). My (long-term) goal is to make a program that takes advantage of this state of affairs to help a speaker or writer (re)find the word he has on the tip of his tongue. To this end, I plan to add to an existing electronic dictionary an association index (collocations encountered in a large corpus). In other words, I propose to build a dictionary similar to that of human beings, which, in addition to conventional information (definition, written form, grammatical information) would contain links (associations), making it possible to navigate between ideas (concepts) and their expressions (words). Such a dictionary would therefore allow access to the information sought either by form (lexical: analysis), by meaning (concepts: production), or by both.
The objective of this presentation is to show how to build such a resource, how to use it, what are the difficulties of construction and what are the possibilities that such a resource offers.
Lorraine Goeuriot – December 1, 2016
Mistress of conferences at Univ. Grenoble Alpes in the MRIM team of the Laboratoire d’informatique de Grenoble
Medical Information Retrieval and its evaluation: an overview of CLEF eHealth evaluation task
In this talk, I will introduce my research activities in the field of medical information retrieval, and in particular its evaluation.
The use of the Web as source of health-related information is a wide-spread phenomena, and laypeople often have difficulties finding relevant documents. The goal of the CLEF eHealth evaluation challenge is to provide researchers with datasets to improve consumer health search. I will firstly introduce the task and the datasets built. Then I will describe some experiments and results obtained on this dataset.
Fabien Ringeval – October 20, 2016
Lecturer at the Univ. Grenoble Alpes in the GETALP team of the
Laboratoire d’informatique de Grenoble
Towards the automatic recognition of ecological emotions
Automatic emotion recognition technologies have gained increasing attention in the last decade at both the academic and industrial levels, since they have found many applications in fields as varied as health, education, video games, advertising, or social robotics. Although good performances are reported in the literature for acted emotions, the automatic recognition of spontaneous emotions, as expressed in everyday life, remains an unresolved challenge, since these emotions are subtle, and their expression, like their meaning, vary greatly according to many speaker parameters, such as age, and gender, but also personality, social role, language, and culture. In this presentation, I will describe the current methodologies in affective data acquisition and annotation, and present the latest advances for the automatic recognition of emotions from the speech signal.