Hua Shen

Instructor

Hua Shen

Time Mon/Wed 2:15 - 3:30 PM

Location EB227

Zoom https://nyu.zoom.us/my/hua.shen (Passcode: enQb9h)

Welcome! 🤗

This course will provide an overview of (Bidirectional) Human-AI Alignment, emphasizing both how AI systems can be designed to reflect human values and how humans can be empowered to critically engage and collaborate with AI. Topics include human-centered data collection and curation, reinforcement learning from human feedback (RLHF), human-in-the-loop evaluation, and human-AI interaction. By focusing on this two-way alignment, you will be equipped to shape AI systems responsibly while developing the skills to navigate and contribute to both HCI and AI research.

Class Schedule

See NYU Shanghai's Course Syllabus for the tentative schedule, which is subject to change.

Week	Date	Theme	Topics	Reading Materials
1	Sep 1 (M)	Foundations	Overview: Introduction to Human-AI Alignment slides video
1	Sep 3 (W)	Foundations	Overview: Evolving Challenges of AI Alignment and Human's Role 🎓 Project Proposal Start slides video	• Shen, Hua, Tiffany Knearem et al. "Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions." arXiv:2406.09264. • Anwar, Usman, Abulhair Saparov et al. "Foundational challenges in assuring alignment and safety of large language models." arXiv:2404.09932.
2	Sep 8 (M)	Foundations	Values & Morals in LLMs: Theories and Evaluation slides video	• Gabriel, Iason. "Artificial intelligence, values, and alignment." Minds and Machines, 2020. • Sorensen, Taylor, Jared Moore et al. "A roadmap to pluralistic alignment." ICML 2024. • Ye, Haoran, Jing Jin, Yuhang Xie, Xin Zhang, and Guojie Song. "Large language model psychometrics: A systematic review of evaluation, validation, and enhancement." arXiv:2505.08245.
2	Sep 10 (W)	Foundations	Values & Morals in LLMs: Misalignment Mitigation and Social Impacts slides video	• Feng, Shangbin, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. "Modular pluralism: Pluralistic alignment via multi-llm collaboration." arXiv:2406.15951. • Shen, Hua, Nicholas Clark, and Tanushree Mitra. "Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?." arXiv:2501.15463. • AlKhamissi, Badr, Muhammad ElNokrashy, Mai AlKhamissi, and Mona Diab. "Investigating cultural alignment of large language models." arXiv:2402.13231. • Rao, Abhinav, Aditi Khandelwal, Kumar Tanmay, Utkarsh Agarwal, and Monojit Choudhury. "Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs." arXiv:2310.07251. • Santurkar, Shibani, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. "Whose opinions do language models reflect?." ICML 2023. • Kapoor, Sayash, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins et al. "On the societal impact of open foundation models." arXiv:2403.07918.
3	Sep 15 (M)	Foundations	Paper Presentation: Values and Morals in LLMs Belinda's slides
3	Sep 17 (W)	Methods	Data is Gold: Human-in-the-loop Collection and Co-Annotation slides video	• Kim, Sungdong, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, and Minjoon Seo. "Aligning large language models through synthetic feedback." EMNLP 2023. • Li, Minzhi, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, and Diyi Yang. "Coannotating: Uncertainty-guided work allocation between human and large language models for data annotation." EMNLP 2023. • Aher, Gati V., Rosa I. Arriaga, and Adam Tauman Kalai. "Using large language models to simulate multiple humans and replicate human subject studies." ICML 2023.
4	Sep 22 (M)	Methods	Data is Gold: Human Preference Styles and Validation 🎓 Project Proposal Due slides video	• Bai, Yuntao, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen et al. "Constitutional ai: Harmlessness from ai feedback." arXiv:2212.08073. • Shen, Hua, Vicky Zayats, Johann C. Rocholl, Daniel D. Walker, and Dirk Padfield. "Multiturncleanup: A benchmark for multi-turn spoken conversational transcript cleanup." EMNLP 2023. • Huang, Saffron, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. "Collective constitutional ai: Aligning a language model with public input." FAccT 2024. • Gordon, Mitchell L., Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S. Bernstein. "Jury learning: Integrating dissenting voices into machine learning models." CHI 2022. • Bradley, Herbie, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Grégory Schott, and Joel Lehman. "Quality-diversity through AI feedback." ICLR 2024. • Zhao, Dora, Jerone TA Andrews, Orestis Papakyriakopoulos, and Alice Xiang. "Position: measure dataset diversity, don't just claim it." ICML 2024.
4	Sep 24 (W)	Methods	Post-Training for Alignment: LLM Prompting and Supervised Fine-Tuning (SFT) slides video	• Schulhoff, Sander, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li et al. "The prompt report: a systematic survey of prompt engineering techniques." arXiv:2406.06608.
5/6	Sep 29 (M) - Oct 8 (W)		💃🏻 National Holiday Day - No Class
7	Oct 13 (M)	Methods	Paper Presentation: Data is Gold
7	Oct 15 (W)	Methods	Post-Training for Alignment: RLHF and similar Approaches slides	• Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang et al. "Training language models to follow instructions with human feedback." NeurIPS 2022. • Rafailov, Rafael, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." NeurIPS 2024.
8	Oct 20 (M)	Methods	Post-Training for Alignment: Interactive Alignment and LLM Adaptation 🎯 Assignment 1 Released (LLM Post-Training Alignment) slides	• Petridis, Savvas, Benjamin D. Wedin, James Wexler, Mahima Pushkarna, Aaron Donsbach, Nitesh Goyal, Carrie J. Cai, and Michael Terry. "Constitutionmaker: Interactively critiquing large language models by converting feedback into principles." IUI 2024. • Terry, Michael, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, and Meredith Ringel Morris. "Interactive AI alignment: Specification, process, and evaluation alignment." arXiv:2311.00710.
8	Oct 22 (W)	Methods	Evaluation and Ecosystem: Automatic, Human-in-the-loop Evaluation and Platform slides	• Felkner, Virginia K., Ho-Chun Herbert Chang, Eugene Jang, and Jonathan May. "Winoqueer: A community-in-the-loop benchmark for anti-lgbtq+ bias in large language models." ACL 2023. • Sun, Zhiqing, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, and Chuang Gan. "Principle-driven self-alignment of language models from scratch with minimal human supervision." NeurIPS 2024. • Lee, Noah, Na Min An, and James Thorne. "Can large language models capture dissenting human voices?." EMNLP 2023. • Zhou, Xuhui, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency et al. "Sotopia: Interactive evaluation for social intelligence in language agents." ICLR 2024. • Dubois, Yann, Chen Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy S. Liang, and Tatsunori B. Hashimoto. "Alpacafarm: A simulation framework for methods that learn from human feedback." NeurIPS 2023. • Yuan, Yifu, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, and Yan Zheng. "Uni-rlhf: Universal platform and benchmark suite for reinforcement learning with diverse human feedback." ICLR 2024.
9	Oct 27 (M)	Methods	Paper Presentation: Post-Training for Alignment
9	Oct 29 (W)	Practice	Human-AI Interaction: Design and Development slides	• Lee, Mina, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan et al. "A design space for intelligent and interactive writing assistants." CHI 2024. • Zagalsky, Alexey, Dov Te'eni, Inbal Yahav, David G. Schwartz, Gahl Silverman, Daniel Cohen, Yossi Mann, and Dafna Lewinsky. "The design of reciprocal learning between human and artificial intelligence." CSCW 2021. • Park, Joon Sung, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. "Generative agents: Interactive simulacra of human behavior." IUI 2023. • Friedman, Batya, David G. Hendry, and Alan Borning. "A survey of value sensitive design methods." Foundations and Trends in Human–Computer Interaction, 2017.
10	Nov 3 (M)		💃🏻 EMNLP Conference - No Class
10	Nov 5 (W)		💃🏻 EMNLP Conference - No Class
11	Nov 10 (M)		🎓 Midway Project Showcase
11	Nov 12 (W)		🎓 Midway Project Showcase
12	Nov 17 (M)	Practice	Human-AI Interaction: Evaluation and Intervention slides 🎯 Assignment 1 Due	• Shen, Hua, and Tongshuang Wu. "Parachute: Evaluating interactive human-lm co-writing systems." CHI 2023 In2Writing. • Birhane, Abeba, William Isaac, Vinodkumar Prabhakaran, Mark Diaz, Madeleine Clare Elish, Iason Gabriel, and Shakir Mohamed. "Power to the people? Opportunities and challenges for participatory AI." EAAMO 2022. • Wu, Sherry, Hua Shen, Daniel S. Weld, Jeffrey Heer, and Marco Tulio Ribeiro. "Scattershot: Interactive in-context example curation for text transformation." IUI 2023.
12	Nov 19 (W)	Practice	Human-AI Interaction: Empirical Applications and Use Cases 🎯 Assignment 2 Released (Human-LLM Interactive Alignment System)	• Ma, Qianou, Hua Shen, Kenneth Koedinger, and Sherry Tongshuang Wu. "How to teach programming in the ai era? using llms as a teachable agent for debugging." AIED 2024. • Yang, Diyi, Caleb Ziems, William Held, Omar Shaikh, Michael S. Bernstein, and John Mitchell. "Social skill training with large language models." arXiv:2404.04204.
13	Nov 24 (M)	Practice	LLM Interpretability: Mechanistic Interpretability Approaches	• Bereska, Leonard, and Efstratios Gavves. "Mechanistic interpretability for AI safety--a review." arXiv preprint arXiv:2404.14082 (2024). TMLR 2024. • Sharkey, Lee, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill et al. "Open problems in mechanistic interpretability." arXiv:2501.16496. • Cunningham, Hoagy, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. "Sparse autoencoders find highly interpretable features in language models." ICLR 2024.
13	Nov 26 (W)	Practice	LLM Interpretability: Human Evaluation and Interactive Explanation	• Shen, Hua, Chieh-Yang Huang, Tongshuang Wu, and Ting-Hao Kenneth Huang. "ConvXAI: Delivering heterogeneous AI explanations via conversations to support human-AI scientific writing." CSCW 2023 Demo. • Gebreegziabher, Simret Araya, Zheng Zhang, Xiaohang Tang, Yihao Meng, Elena L. Glassman, and Toby Jia-Jun Li. "Patat: Human-ai collaborative qualitative coding with explainable interactive rule synthesis." CHI 2023.
14	Dec 1 (M)	Methods	Paper Presentation: Evaluation and Ecosystem
14	Dec 3 (W)	Practice	Paper Presentation: LLM Interpretability 🎯 Assignment 2 Due
15	Dec 8 (M)	Practice	Risks, Trust and Safety: Taxonomies and Benchmarks	• Bengio, Yoshua, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari et al. "Managing extreme AI risks amid rapid progress." Science 2024.
15	Dec 10 (W)	Practice	Risks, Trust and Safety: Human-in-the-loop Auditing and Mitigation	• Prabhudesai, Snehal, Ananya Prashant Kasi, Anmol Mansingh, Anindya Das Antar, Hua Shen, and Nikola Banovic. "'Here the GPT made a choice, and every choice can be biased': How Students Critically Engage with LLMs through End-User Auditing Activity." CHI 2025.
16	Dec 15 (M)		🎓 Final Project Presentation
16	Dec 17 (W)		🎓 Final Project Presentation

Overview

Office Hour

Hua Shen: 11:00 AM - 12:00 PM, Tue/Thu; Office: S749; Zoom: https://nyu.zoom.us/my/hua.shen (Passcode: enQb9h)

Prerequisite

Students are expected to (1) be proficient in Python and relevant programming libraries (e.g. HuggingFace, OpenAI API) for completing assignments, and (2) know basic Machine Learning and Human-computer Interaction (optional) concepts to comprehend research papers.

Topics in CS – Human-AI Alignment

(CSCI-SHU 205)

Hua Shen