Slide Slam

< Slide Slam Sessions

Slide Slam P12 Sandbox Series

The Triplets Task: a open, large scale, curated benchmark for biological and artificial semantic representations (in the making)

Slide Slam Session P, Thursday, October 7, 2021, 2:30 - 4:30 pm PDT Log In to set Timezone

Valentina Borghesani1,2, Jonathan Armoza1,2, Pierre Bellec1,2, Simona Brambati1,2; 1Centre de recherche de l'Institut universitaire de gériatrie de Montréal, Montréal, QC H3W 1W6, Canada, 2Department of Psychology, Université de Montréal, Montréal QC H3C 3J7, Canada

Current theories of semantic knowledge aim at capturing how specialized yet distributed neural representations can encode both experiential (e.g., the word lemon evokes the concept of sour) and distributional (e.g., the word lemon often comes with squeezer) information [1,2]. Moreover, given the increasingly widespread adoption of natural language processing (NLP) models as a window onto the neuro-cognitive correlates of human language processing, the field is in need of appropriate benchmarks to compare artificial and human semantic representations. We set out to validate a task testing how well different neuro-cognitive and NLP models predict human behavior. Modeled on common neuropsychological tests of associative semantic knowledge, we devised a task eliciting human participants' semantic representations by asking which of three words are more closely associated (e.g., lemon / squeezer / sour). We then generated 10k of triplets of both abstract and concrete nouns (6433 unique words) and compared how they would be solved by experiential and distributional models. We selected one neuro-cognitive model embedding concepts onto 11 sensory-motor dimensions (the Lancaster Sensorimotor Norms, LSN [3]) and fourteen NLP models: five GloVe models trained on Wikipedia, four GloVe models trained on Twitter, a sense2vec model trained on Reddit comments, a fasttext model trained on Common Crawl, and three fasttext models trained on Amazon reviews, Yahoo answers, and Yelp reviews respectively. Overall, NLP models agreement ranged from perfect (100%) to null (0%), with a mean of 40.94 (std = 25.05), while NLP models and LSN had only a 30% agreement. We then selected a subset of triplets (n=2555, 3630 unique words) for online behavioral validation. We chose those triplets that (1) had been evaluated by at least 6 models; (2) appeared in LSN; (3) showed the highest (i.e, < 25%, n=2078) or lowest (i.e, > 75%, n=477) level of agreement among NLP models. To date, we collected responses from 1292 MTurk workers (555 female, 103 left handed, mean age 39.66±11.32 y; mean education 15.39±1.8 y). Our preliminary results suggest that LSN captures human semantic representation better than the NLP models (percentage agreement 73.5% vs. 23.68%). We will openly release the full set of triplets, along with the associated code and behavioral data. Overall, we believe our large, carefully curated, dataset will be a useful benchmark for both computational and empirical investigations of semantic knowledge. The current results suggest that incorporating sensory-motor, experiential information is critical to achieve human-like semantic representations. [1] Bidner 2016 [2] Huth 2016 [3] Lynott 2019

< Slide Slam Sessions

SNL Account Login

Forgot Password?
Create an Account