The Center for Social Data Analytics Colloquium speaker: Eli Ben-Michael

Time	Thu, Feb 13, 2025 - 12:00 pm to 01:00 pm Add to Calendar `2025-02-13 12:00:00 2025-02-13 13:00:00 The Center for Social Data Analytics Colloquium speaker: Eli Ben-Michael 421 Susan Welch Liberal Arts Building Population Research Institute America/New_York public`
Location	421 Susan Welch Liberal Arts Building Google map
Presenter(s)	Eli Ben-Michael
Description	Abstract: As language technologies become widespread, text experiments — where texts are randomly assigned to readers — are increasingly used by social scientists and technology developers to understand how language affects perceptions and behavior. This talk presents a framework for estimating causal effects of language from such experiments. A key challenge is the high-dimensional nature of language, which leads to positivity violations and low effective sample sizes. We address this by characterizing language-encoded interventions as stochastic interventions, averaging over the distribution of texts from a corpus. We distinguish between ``natural effects'' that capture the effect of a language attribute along with correlated attributes, and ``isolated effects'' that capture the effect of the attribute while keeping others fixed. We show that natural effects are easily identified and estimated in text experiments, but as experimental corpora are not always representative of naturally occurring language, we propose a method to generalize from a randomized text corpus to any target corpus. In contrast, it is challenging to estimate isolated effects even in randomized text experiments, as this requires approximating and adjusting for all non-focal language attributes. We link this to learning text representations and use principles of omitted variable bias to evaluate isolated effect estimation along the axes of the fidelity and overlap of the text representations. Finally, we apply these ideas to large language model (LLM) alignment, showing how to learn generative text models that optimally cause desired impacts and de-bias correlational approaches to LLM alignment. Throughout, we demonstrate these approaches in various applied settings including political persuasion and the perception of hate speech. Bio: Eli is an assistant professor in the Department of Statistics & Data Science and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. His research focuses on developing statistical and computational methods to solve practical issues in public policy and social science research, with a particular interest in bringing together ideas from statistics, optimization, and machine learning to create methods for credible and robust causal inference and data-driven decision making.

Time

Thu, Feb 13, 2025 - 12:00 pm to 01:00 pm Add to Calendar 2025-02-13 12:00:00 2025-02-13 13:00:00 The Center for Social Data Analytics Colloquium speaker: Eli Ben-Michael 421 Susan Welch Liberal Arts Building Population Research Institute America/New_York public

Location

421 Susan Welch Liberal Arts Building

Google map

Presenter(s)

Eli Ben-Michael

Description

Abstract: As language technologies become widespread, text experiments — where texts are randomly assigned to readers — are increasingly used by social scientists and technology developers to understand how language affects perceptions and behavior. This talk presents a framework for estimating causal effects of language from such experiments. A key challenge is the high-dimensional nature of language, which leads to positivity violations and low effective sample sizes. We address this by characterizing language-encoded interventions as stochastic interventions, averaging over the distribution of texts from a corpus. We distinguish between ``natural effects'' that capture the effect of a language attribute along with correlated attributes, and ``isolated effects'' that capture the effect of the attribute while keeping others fixed. We show that natural effects are easily identified and estimated in text experiments, but as experimental corpora are not always representative of naturally occurring language, we propose a method to generalize from a randomized text corpus to any target corpus. In contrast, it is challenging to estimate isolated effects even in randomized text experiments, as this requires approximating and adjusting for all non-focal language attributes. We link this to learning text representations and use principles of omitted variable bias to evaluate isolated effect estimation along the axes of the fidelity and overlap of the text representations. Finally, we apply these ideas to large language model (LLM) alignment, showing how to learn generative text models that optimally cause desired impacts and de-bias correlational approaches to LLM alignment. Throughout, we demonstrate these approaches in various applied settings including political persuasion and the perception of hate speech.

Bio: Eli is an assistant professor in the Department of Statistics & Data Science and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. His research focuses on developing statistical and computational methods to solve practical issues in public policy and social science research, with a particular interest in bringing together ideas from statistics, optimization, and machine learning to create methods for credible and robust causal inference and data-driven decision making.

The Center for Social Data Analytics Colloquium speaker: Eli Ben-Michael

Upcoming Events

Follow SSRI on