You are viewing the SNL 2018 Archive Website. For the latest information, see the Current Website.

Poster E23, Saturday, August 18, 3:00 – 4:45 pm, Room 2000AB

Statistical learning and reading: An information-theoretical perspective

Noam Siegelman1, Victor Kuperman2, Ram Frost1,3,4;1Hebrew University of Jerusalem, 2McMaster University, 3Haskins Laboratories, 4Basque center of Cognition, Brain and Language (BCBL)

In spite of recent evidence tying visual statistical learning (VSL) abilities to reading performance (e.g., Chetail, 2017; Frost et al., 2013), not much is known as to how sensitivity to regularities in the visual modality eventually leads to high-quality orthographic representations. Here, we embrace an information-theoretical perspective which proposes that the link between VSL and reading stems from a joint mechanism of information extraction from the visual array. This view relies on recent findings showing that VSL performance can be explained by the amount of information embedded in the visual stream (Siegelman et al., under review). In the current work we adopt a similar information-theoretical view on reading, which argues that the information present to readers in different orthographies shapes reading behavior. We first present a corpus analysis examining the information structure of five writing systems (Hebrew, English, Spanish, Finnish, and French). Specifically, we examine how surprisal (or (un)predictability) of letter bigrams unfolds within a word (i.e., -log(p(B|A))). We show that languages differ from one another in the overall surprisal level of letter transitions (e.g., transitions are generally unpredictable in Hebrew and much more predictable in English). In addition, different orthographies are characterized by different surprisal trajectories (e.g., whereas English shows a flat trajectory, where all bigrams across a word carry similar surprisal, Hebrew words tend to begin and end predictably, with high surprisal in the middle of a word). Importantly, a simple algorithm can classify an input of surprisal word vectors to one of the five languages, way above chance-level. This shows that the information carried by letter transitions is a stable structural property of a writing system. We then investigate whether this information structure is reflected in reading behavior. We present data from a cross-linguistic eye-tracking reading experiment, in which 50 Hebrew native speakers (Hebrew University students) and 50 English native speakers (McMaster University students) read excerpts of Wikipedia entries in their native language. We show that the surprisal of letter transitions affects reading times in the two languages differently: Whereas readers of English are strongly affected by letter-level predictability (more surprisal, longer reading time), readers of Hebrew do not show such effect. This suggests that Hebrew readers guess the identity of full words from context, not reverting to letter-by-letter computations, due to the high uncertainty of letter transitions in the language. Together, the results suggest that readers are indeed sensitive to the information structure of their writing system. We discuss future research avenues, towards an integrative theory of VSL and reading.

Topic Area: Perception: Orthographic and Other Visual Processes