Mohammad Beigi | University of California, Davis

About Me

I am a second-year CS PhD student at University of California, Davis, advised by Prof. Lifu Huang. My research centers on improving the reasoning, alignment, and interpretability of large language models. Currently, I focus on developing mechanisms for robust reasoning and trustworthy alignment by detecting and mitigating reward hacking through adversarial auditing and inverse reinforcement learning, and by designing uncertainty-aware methods to enhance reasoning stability and reduce sycophancy. My goal is to make these models not only more capable, but also more reliable, transparent, and aligned with human intent.

Prior to starting my PhD, I completed my Masters at Virginia Tech, and obtained my BS from Sharif University of Technology.

If you’re interested in my research, would like to discuss relevant topics, or explore potential collaborations, please feel free to get in touch :) - I am best reached by email at mbeigi@ucdavis.edu.

Research Interests

Reinforcement Learning, Inverse Reinforcement Learning, Reward Hacking
Reasoning and Self-Training of Large Language Models
Mechanistic Interpretability of Foundation Models
Uncertainty Estimation and Quantification

News

[Sept. 2025] Our paper, SMART, Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories , is accepted to EMNLP 2025!
[Feb. 2025] Our new survey on mechanistic interpretability for multi-modal foundation models is availble at arxiv!
[Jan. 2025] Glad to share that our lab has moved to UC Davis!
[Oct. 2024] Our new survey on uncertainty estimation and quantification of LLMs is availble at arxiv!
[Sept. 2024] Our paper, InternalInspector, is accepted to EMNLP 2024!
[May. 2024] Our paper, Navigating Dual Facet, on evaluating memory editing of LLMs is accepted to ACL 2024!

Publications

EMNLP 2025

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

Mohammad Beigi*, Ying Shen, Parshin Shojaee, Qifan Wang, Zichao Wang, Chandan Reddy, Ming Jin, Lifu Huang

PDF Cite Main

Pre-Print

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Zihao Lin*, Samyadeep Basu*, Mohammad Beigi*, Varun Manjunatha, Ryan A Rossi, Zichao Wang, Yufan Zhou, Sriram Balasubramanian, Arman Zarei, Keivan Rezaei, Ying Shen, Barry Menglong Yao, Zhiyang Xu, Qin Liu, Yuxiang Zhang, Yan Sun, Shilong Liu, Li Shen, Hongxuan Li, Soheil Feizi, Lifu Huang (*Equal Contribution)

PDF Cite Pre-Print

EMNLP 2024

InternalInspector I²: Robust Confidence Estimation in LLMs through Internal States

Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)

PDF Cite Poster

ACL 2024

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Zihao Lin, Mohammad Beigi, Hongxuan Li, Yufan Zhou, Yuxiang Zhang, Qifan Wang, Wenpeng Yin, Lifu Huang

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL)

PDF Cite Oral

Pre-Print

Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models

Mohammad Beigi, Sijia Wang, Ying Shen, Zihao Lin, Adithya Kulkarni, Jianfeng He, Feng Chen, Ming Jin, Jin-Hee Cho, Dawei Zhou, Chang-Tien Lu, Lifu Huang

PDF Cite Pre-Print

Services

Reviewer: ICLR 2025