Seeds of Stereotypes

TRIPOD-LLM is now published in Nature Medicine

Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources

Lasse Hyldig Hansen^1,2Cognitive Science, Aarhus University, Nikolaj Andersen^1,2Cognitive Science, Aarhus University, Jack Gallifant^2,3Laboratory for Computational Physiology, MIT; Department of Critical Care, Guy’s & St Thomas’ NHS Trust, Liam G. McCoy⁴Division of Neurology, University of Alberta, James K Stone⁵University of Manitoba Max Rady College of Medicine, Nura Izath⁶Faculty of Computing, Mbarara University of Science and Technology, Marcela Aguirre-Jerez⁷Digital Health Department, Fundacion Arturo Lopez Perez, Danielle S Bitterman^8,9,10Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School; Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute; Computational Health Informatics Program, Boston Children’s Hospital, Harvard Medical School, Judy Gichoya¹¹Department of Radiology, Emory University School of Medicine, Leo Anthony Celi^2,12,13Laboratory for Computational Physiology, MIT; Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center; Department of Biostatistics, Harvard T.H. Chan School of Public Health

ArXiv

Addressing Biases in Language Models for Healthcare

This study explores how Large Language Models (LLMs) used in healthcare can show biases related to race and gender. By analyzing a vast amount of text from sources like Arxiv, Wikipedia, and Common Crawl, we quantify how diseases are discussed alongside race and gender markers. Our goal was to identify potential biases that LLMs might learn from these texts.

The results revealed that gender terms are often linked to disease concepts, while racial terms are less frequently associated. We found significant disparities, with Black race mentions being overrepresented compared to population proportions. These findings emphasize the importance of examining and addressing biases in LLM training data, especially in healthcare, to develop fairer and more accurate models.

Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources

The "Seeds of Stereotypes" study investigates how Large Language Models (LLMs) used in healthcare might perpetuate biases related to race and gender. By analyzing a vast amount of text from diverse sources such as Arxiv, Wikipedia, and Common Crawl, researchers examined the contexts in which diseases are discussed alongside racial and gender markers. This exploration is crucial as it highlights potential biases that LLMs could learn from these texts, which may impact their applications in sensitive domains like healthcare.

Workflow diagram illustrating the process for analyzing race and gender co-occurrences with disease terms within online texts.

Analyzing Disease Associations

The study found that gender terms are frequently associated with disease concepts, while racial terms appear less often. Notably, there were significant disparities, with Black race mentions being overrepresented compared to population proportions. These results underscore the importance of critically examining and addressing biases in LLM training data to develop fairer and more accurate models.

Proportional Disease Mentions with Demographic References within a 100-word Contextual Window. Panel A shows the gender-associated mentions of various diseases, with Panel B detailing the mentions in connection with different races. In both panels, yellow bars indicate the proportion of disease mentions occurring without any specific demographic context.

Model Predictions vs. Real-World Data

Further analysis compared the disease mentions in the training data with real-world prevalence and GPT-4 outputs. The results revealed a mismatch between model predictions and real-world data, suggesting a lack of real-world grounding in these models. For example, Black race mentions are significantly overrepresented in the training data compared to actual prevalence rates, indicating potential bias in how these models learn associations.

Comparison of Disease Mentions by Race Across GPT-4 Estimates, Real World Prevalence, and Training Data. This figure contrasts the proportional estimates of disease mentions with demographic categorizations in GPT-4, actual prevalence rates, and occurrences in training data, confined to a 100-word context window. Comparison is limited to population health data of four racial categories—White, Black, Asian, and Hispanic. Side-by-side bar graphs facilitate direct visual comparison, illustrating the congruence or disparity between the estimated focus on certain diseases in text relative to their real-world demographic prevalence.

Exploring Solutions and Strategies

The project not only highlights these issues but also explores strategies to mitigate these biases. This includes examining different alignment strategies and their effectiveness in improving model accuracy and fairness across diverse demographic groups. These efforts are crucial to ensuring that LLMs provide equitable and unbiased information, fostering better healthcare outcomes.

The "Seeds of Stereotypes" study is a step towards understanding and addressing the biases inherent in LLMs, aiming to bridge the gap between model perceptions and real-world data. For more details and to explore our findings further, visit our project site.

Related Work

Our work builds upon insights into how technology can impact outcomes across subgroups:

Chen et al., (2024) Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias.

Notes: Cross-Care is a new benchmark that evaluates the likelihood of language models generating outputs that are grounded to real world prevalence of diseases across subgroups. We validate this method on different model architectures, sizes and alignment strategies.

How To Cite

This article can be cited as follows:

Bibliography

Hyldig Hansen L, Andersen N, Gallifant J, McCoy LG, Stone JK, Izath N, Aguirre-Jerez M, Bitterman DS, Gichoya J, Celi LA. Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources. arXiv e-prints. 2024 May:arXiv-2405.

BibTeX

@article{hyldig2024seeds,
                title={Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources},
                author={Hyldig Hansen, Lasse and Andersen, Nikolaj and Gallifant, Jack and McCoy, Liam G and Stone, James K and Izath, Nura and Aguirre-Jerez, Marcela and Bitterman, Danielle S and Gichoya, Judy and Celi, Leo Anthony},
                journal={arXiv e-prints},
                pages={arXiv--2405},
                year={2024}
             }