Selected Projects 2025-2026

USC + Amazon Center on Secure & Trusted Machine Learning

Deployment of Uncertainty Estimation Methods for Hallucination Detection and Effective Test Time Computing

PI and Co-PI: Salman Avestimehr, Professor of Electrical Engineering and Computer Science, Viterbi
‍
‍Capital One Fellow Student: Duygu Nur Yaldiz (PhD Student)

This project studies why LLMs struggle to reliably express their uncertainty through verbalized confidence, despite often knowing when their answers are correct. Through probing experiments, we show that pretrained LLMs encode a strong latent correctness signal in their hidden representations, which emerges during pretraining and remains largely unchanged by instruction tuning. However, this internal signal is not naturally accessible via zero-shot or in-context verbalized confidence, which we find to be dominated by instruction-following behavior rather than genuine uncertainty elicitation. We systematically demonstrate that zero-shot verbalized confidence is highly prompt-sensitive, weakly correlated with correctness, and poorly aligned with internal correctness representations. In contrast, lightweight supervised fine-tuning explicitly aligns verbalized confidence with the model’s latent correctness signal, enabling confidence outputs that closely match probe-level performance. Crucially, this alignment generalizes only to tasks where the underlying correctness representation itself generalizes.Shortly, we show that verbalized confidence is not an emergent capability of LLMs but an alignment problem: models possess internal correctness information yet require explicit supervision to expose it through language.

Expected Deliverables
Conference publication (COLM 2026 or EMNLP 2026)

Sungmin Kang, Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, Salman Avestimehr. “Uncertainty Quantification for Hallucination Detection in Large Language Models: Foundations, Methodology, and Future Directions”, 2025.

Applying topological data analysis to GraphRAG for responsible AI in finance

PI : John Carlsson, Professor of ISE
‍
‍Capital One Fellow Student: Ke Xu (PhD Student)

This project improves the reliability and interpretability of AI for financial decision-making by combining GraphRAG with Topological Data Analysis (TDA). Financial data are structured into a knowledge graph and embedded using graph neural networks. The mapper algorithm from TDA is then applied to these embeddings to identify clusters, gaps, and ambiguities in retrieved information. By analyzing the topology of relevant knowledge, the system can assess whether an answer is well-grounded or potentially unreliable. If results are ambiguous, the system automatically refines the query or flags hallucination risk. The outcome is a prototype system that reduces hallucinations, improves answer accuracy, and provides interpretable insights for more responsible AI in finance.

Expected Deliverables
Conference and journal publications, open-source software code

Generative AI for Financial Decision-Making in Noisy, Dynamic, and Agentic Settings.

PI : Mahdi Soltanolkotab & Stephen Tu
‍
‍Capital One Fellow Student: Mohammad Shahab Sepehri (PhD Student)

This project focuses on generative AI for financial decision-making in noisy and dynamic environments. We study problems governed by stochastic differential equations, where decisions must be made over time to optimize expected rewards under uncertainty. We investigate multimodal large language models that process time series and other structured inputs, with the goal of understanding and improving their ability to reason about stochastic dynamics and make reliable decisions. As an initial step, we benchmark the performance of existing MLLMs on carefully designed synthetic and real-world datasets that require reasoning over SDE-driven dynamics. A representative example is the optimal exercise problem for American options, where the model must identify the optimal stopping time given stochastic asset price trajectories. These benchmarks allow us to systematically assess current limitations in reasoning, temporal understanding, and decision-making under uncertainty. Building on these findings, we plan to improve model performance using reinforcement learning from human feedback (RLHF) and by developing novel algorithms tailored to reasoning over stochastic dynamics. Our approach emphasizes learning structured decision strategies rather than pattern matching, aiming to produce models that can reliably optimize decisions in stochastic environments. Finally, we will evaluate the extent to which the learned reasoning abilities generalize across domains. In particular, we will study whether models trained on financial decision-making tasks can transfer their understanding to other SDE-governed systems, such as problems in physics with different state representations and reward formulations.

Expected Deliverables
A set of synthetic and real-world benchmarks for evaluating multimodal reasoning and decision-making in stochastic dynamic systems governed by stochastic differential equations (SDEs).

Improved multimodal models and training methods, incorporating reinforcement learning from human feedback (RLHF) and novel reasoning algorithms for decision-making under stochastic dynamics.

An assessment of cross-domain generalization, testing whether learned reasoning strategies transfer from financial tasks to other SDE-governed domains such as physics.

Research outputs, including peer-reviewed publications and, where appropriate, publicly released code and datasets.

Publications & Conferences
Sepehri, M. S., Tinaz, B., Fabian, Z., & Soltanolkotabi, M. “Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs”. Proceedings of the Thirty-Ninth

Annual Conference on Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2025. https://openreview.net/forum?id=NBU5IJGhCf

Project Resources: GitHub: https://github.com/AIF4S/Hyperphantasia

Hugging Face: https://huggingface.co/datasets/shahab7899/Hyperphantasia

Human-in-the-loop Multi-Agentic AI Systems for Task-oriented Dialogue

PI : Jesse Thomason, Assistant Professor Computer Science
‍
‍Capital One Fellow Student: Mohammad Shahab Sepehri (PhD Student)

We propose to empower multi-agentic AI systems with transparency and explainability by surfacing of their reasoning processes in natural language, including frictive dialogue with human users to clarify information, agent-agent introspection to overcome single-agent uncertainties, and human-initiated interruption for human-in-the-loop control. Current AI agent research largely considers individual agents acting autonomously, without human intervention, or, at most, in response to single instructions. Even in cases where users attempt to correct or guide model behavior, large language model (LLM)-powered agents are frequently instruction-tuned to produce sycophantic, rather than pragmatic, responses. By introducing human-like frictive dialogue based on our recently developed friction taxonomy, as well as agent-agent introspection that builds on our work in model uncertainty estimation and multi-agent reasoning, our proposed human-in-the-loop, multi-agentic workflow will improve both quantitative task success and qualitative user experience. Such a workflow could be utilized in high-stakes decision making arenas like finance, where human intervention must be possible, easy, and appropriately enabled.

Publications & Conferences
Tejas Srinivasan and Jesse Thomason. Adjust for Trust: Mitigating Trust-Induced Inappropriate Reliance on AI Assistance. ACM Conference on Intelligent User Interfaces (IUI), 2026. https://arxiv.org/abs/2502.13321

Causality-Aware Imitation Learning with Applications to Financial Forecasting and Trading

PI : Erdem Biyik, Assistant Professor Computer Science
‍Capital One Fellow Student: Yutai Zhou (PhD Student)

Data-driven decision-making systems are prone to latching onto spurious correlations between observations and decision, rather than learning the actual causal factors behind a decision. Humans are causal reasoners, with gaze being a particularly strong indicator for the part of the visual observation that is most salient for an action taken. Works in the robotic imitation learning field have shown effective usage of human gaze as a learning signal to train decision-making agents that focus on task-relevant aspects of their visual inputs, e.g., an autonomous driving agent focuses on the red traffic signal for braking, while ignoring the irrelevant trees on the side of the road. The resulting agents not only have significantly improved task performance, but their decision-making processes were more robust to visual distractors compared to baseline methods. The neural network agents’ learned features were also found to be more interpretable for human evaluators. In contrast to physically embodied robotic tasks, virtual web agents bring a novel set of challenges, such as ill-defined action spaces and navigation structures that are ungrounded in intuitive physical motion. We will apply gaze-based imitation learning techniques to web-based decision-making problems while overcoming unique challenges of the web domain. Our goal is to enable virtual agents to make effective, causally informed decisions that are grounded in human behavior, that are robust to irrelevant distractors, while having their decision-making process be interpretable to humans.

Publications & Conferences
A. Banayeeanzade, F. Bahrani, Y. Zhou, and E. Bıyık, “GABRIL: Gaze-Based Regularization for Mitigating Causal Confusion in Imitation Learning,” International Conference on Intelligent Robots and Systems (IROS), Oct. 2025.

L. Gong, F. Bahrani, Y. Zhou, A. Banayeeanzade, J. Li, and E. Bıyık, “AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations,” Submitted to the 2026 IEEE International Conference on Robotics and Automation (ICRA), May 2026

Presentations
USC CREDIF Symposium (2025)
‍
UCLA, Computer Science, Department Seminar (2025)
‍
Carnegie Mellon University, Guest Lecture at “Human-Robot Interaction” (2025)
‍
Inha University, Department of Aerospace Engineering (2025)
‍
CoRL 2025 Workshop on Resource-Rational Robot Learning

Website
GABRIL: https://liralab.usc.edu/gabril/ AutoFocus-IL: https://autofocus-il.github.io/

A Benchmark Framework for Accelerating Co-design Mixture of Experts Algorithms and Systems

PI : Seo Jin Park, Assistant Professor Computer Science
‍Capital One Fellow Student: Shaoyu Wang (PhD Student)

This project proposes a benchmarking framework for co-designing Mixture of Experts (MoE) model architectures and serving systems to maximize accuracy per dollar cost. The framework will address the current fragmented approach to MoE research, where model design often prematurely compromises for existing system limitations, such as expert load imbalance. It will feature a domain-specific language (DSL) for representing MoE models, a small-scale benchmark suite to predict the ultimate model accuracy of each model architecture design, an optimal system performance predictor based on token dependency analysis, and automatic generation of tailored serving system runtimes. This will enable rapid exploration of the MoE design space, guiding researchers towards architectures that achieve optimal performance within cost constraints.

Expected Deliverables
A fast-switching mechanism between expert parallelism and tensor parallelism (EP – TP hot switch)

A overhead-benefit evaluation on heterogeneous quantization approach

An open-sourced custom CUDA kernels for efficient quantized MoE decoding

Serving performance model for heterogeneously quantized MoE model

Accuracy performance model for heterogeneously quantized MoE model

Publications & Conferences
Straggler-tolerant large-scale MoE serving with layer-wise continuous batchingShaoyu Wang, Yizhuo Liang, Guangrong He, Geon-Woo Kim, Yanqi Zhou and Seo Jin Park. (Under submission to SIGCOMM’26 (Aug 2026)
‍
ADPMoE: Adaptive Parallelism for Mixture-of-Expert Models InferenceShaoyu Wang, Yizhuo Liang and Seo Jin Park. (Under submission to SOSP’26 (Sep 2026)

Mixed-Precision MoE Inference for IO-Efficient MoE Serving Huanchen Sun, Junzhou He and Seo Jin Park. (Under submission to NSDI’27 (May 2027)

Reasoning-driven and Data-informed LLMs for Financial GenAI Systems.

PI : Viktor Prasana, Professor of Electrical Engineering and Computer Science, Viterbi
‍Capital One Fellow Student: Yuxin Yang (PhD Student)

Large Language Models (LLMs) have revolutionized AI-driven decision-making in finance, yet they still struggle with challenges such as handling ambiguous queries, reasoning over structured data, and adapting to complex financial tasks. Traditional Retrieval-Augmented Generation (RAG) helps mitigate some of these issues but remains limited when dealing with uncertain user queries and multi-hop reasoning. Additionally, financial applications often require structured inference over interconnected data, which LLMs fail to achieve effectively. To overcome these limitations, we propose a comprehensive AI framework that enhances LLM reasoning through both algorithmic advancements and dataset-driven techniques, integrating Graph Machine Learning (GML) for structured inference: (i) We propose to develop robust AI for user-centric interactive reasoning, where an iterative query-rewriting mechanism clarifies ambiguous inputs and a reasoning module decomposes tasks and interactively requests user input, and (ii) We propose GML-enhanced multi-hop reasoning models to enable the system to traverse structured data, such as transaction networks, to support deeper inference beyond text-based retrieval.

Publications & Conferences
Submitted: Yuxin Yang, Gangda Deng, Ömer Faruk Akgül, Nima Chitsazan, Yash Govilkar, Akasha Tigalappanavara, Shi-Xiong Zhang, Sambit Sahu, Viktor Prasanna. SPARC-RAG: Adaptive Sequential–Parallel Scaling with Context Management for Retrieval-Augmented Generation. ACL, 2026 (under review).

Yuxin Yang is a 3rd year PhD student. Poster presentation at the USC–Capital One CREDIF Research Symposium 2025

Benchmarks for Symbolic and Logical Reasoning Abilities of LLMs

PI : Jyo Deshmukh, Associate Professor Computer Science
‍Capital One Fellow Student: Yuan Xia (PhD Student)

Large Language Models (LLMs) have demonstrated remarkable abilities in language understanding and generation. Their ability to produce complex and structured answers to many types of queries has led to several claims about their reasoning abilities. Several benchmark problems have been constructed that claim to demonstrate that LLMs can reason while skeptics have argued through handcrafted examples and theoretical arguments that LLMs cannot reason. In this project, we focus on the problem of a systematically generating benchmarks for logical reasoning tasks centered around producing proofs for statements in a formal logic system with the goal of quantifying how well an LLM can reason. Our work is summarized as follows: We will divide our proposed work into three tasks: (1) We will systematically generate synthetic data containing valid propositional logic formulas and their proofs using a fixed set of axioms and inference rules, sample the synthetic data to obtain few-shot examples, and prompt the LLM with these to produce proofs of validity of a propositional logic formula. (2) We will extend the work in Task 1 to logical statements over decidable theories and use randomly sampled valid formulas and proof traces from SMT solvers as few-shot examples. (3) We will quantify the LLM’s ability to reason over multiple types of examples using an epistemic logic; this will enable quantifying reasoning abilities of ensemble approaches that use prompt routers for complex reasoning tasks.