Detailed Publications

Publications

* indicates equal contribution.

2026

Learning to Defer in Non-Stationary Time Series via Switching State-Space Models.
Yannis Montreuil*, Letian Yu*, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract

We study Learning to Defer for non-stationary time series with partial feedback and time-varying expert availability. At each time step, the router selects an available expert, observes the target, and sees only the queried expert’s prediction. We model signed expert residuals using L2D-SLDS, a factorized switching linear-Gaussian state-space model with context-dependent regime transitions, a shared global factor enabling cross-expert information transfer, and per-expert idiosyncratic states. The model supports expert entry and pruning via a dynamic registry. Using one-step-ahead predictive beliefs, we propose an IDS-inspired routing rule that trades off predicted cost against information gained about the latent regime and shared factor. Experiments show improvements over contextual-bandit baselines and a no-shared-factor ablation.
Why Ask One When You Can Ask k? Learning-to-Defer to the Top-k Experts.
Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference ICLR26 Link

Existing Learning-to-Defer (L2D) frameworks are limited to single-expert deferral, forcing each query to rely on only one expert and preventing the use of collective expertise. We introduce the first framework for Top-\(k\) Learning-to-Defer, which allocates queries to the \(k\) most cost-effective entities. Our formulation unifies and strictly generalizes prior approaches, including the one-stage and two-stage regimes, selective prediction, and classical cascades. In particular, it recovers the usual Top-1 deferral rule as a special case while enabling principled collaboration with multiple experts when \(k > 1\). We further propose Top-\(k(x)\) Learning-to-Defer, an adaptive variant that learns the optimal number of experts per query based on input difficulty, expert quality, and consultation cost. To enable practical learning, we develop a novel surrogate loss that is Bayes-consistent, \(\mathcal{H}_h\)-consistent in the one-stage setting, and \((\mathcal{H}_r,\mathcal{H}_g)\)-consistent in the two-stage setting. Crucially, this surrogate is independent of \(k\), allowing a single policy to be learned once and deployed flexibly across \(k\). Experiments across both regimes show that Top-\(k\) and Top-\(k(x)\) deliver superior accuracy–cost trade-offs, opening a new direction for multi-expert deferral in L2D.
Online Learning-to-Defer with Varying Experts.
Yannis Montreuil*, Duy Dang Hoang*, Maxime Meyer*, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference AISTATS26

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of \(O((n+n_e)T^{2/3})\) in general and \(O((n+n_e)\sqrt{T})\) under a near-realizable condition, where \(T\) is the time horizon, \(n\) the number of labels, and \(n_e\) the number of distinct experts observed across rounds. The analysis builds on novel \(\mathcal{H}\)-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.
Adversarial Robustness in One-Stage Learning-to-Defer.
Yannis Montreuil*, Letian Yu*, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference AISTATS26 Link

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including \(\mathcal{H}\), \((\mathcal{R }, \mathcal{F})\), and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees.
Yannis Montreuil*, Yeo Shu Heng*, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference AISTATS26 Link

Large Language Models (LLMs) excel at generative language tasks but remain unreliable for structured prediction—particularly in extractive question answering (EQA), where success hinges on precise span selection. These challenges are magnified in resource-constrained environments, such as mobile or embedded systems, where deploying high-capacity models is often infeasible. We propose a \textit{Learning-to-Defer} framework that routes EQA queries across a pool of models with varying capabilities and costs, balancing accuracy against efficiency. Our approach is grounded in statistical decision theory: we define a differentiable surrogate loss whose minimizer provably converges to the Bayes-optimal allocation policy. Experiments on SQuADv1, SQuADv2, and TriviaQA show that our method consistently improves accuracy–efficiency trade-offs relative to static baselines and prior routing heuristics. Our work provides a principled and scalable solution for EQA in both high-performance and on-device deployment settings.
Towards Robust Human-AI Decision-Making via Learning-to-Defer.
Yannis Montreuil.

Doctoral Consortium AAAI26

Learning-to-Defer (L2D) facilitates optimal task allocation between AI systems and decision-makers. Despite its potential, we show that current two-stage L2D frameworks are highly vulnerable to adversarial attacks, which can misdirect queries or overwhelm decision agents, significantly degrading system performance. This paper conducts the first comprehensive analysis of adversarial robustness in \emph{two-stage} L2D frameworks. We introduce two novel attack strategies—\emph{untargeted} and \emph{targeted}—that exploit inherent structural vulnerabilities in these systems. To mitigate these threats, we propose \name, a robust, convex, deferral algorithm rooted in Bayes and \((\mathcal{R},\mathcal{G})\)-consistency. Our approach guarantees optimal task allocation under adversarial perturbations for all surrogates in the cross-entropy family. Extensive experiments on classification, regression, and multi-task benchmarks validate the robustness of

2025

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees.
Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference ICML25 Link

Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation—causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategies—\emph{untargeted} and \emph{targeted}—which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose \name, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and \((\mathcal{R}, \mathcal{G})\)-consistent. These guarantees hold across classification, regression, and multi-task settings. Empirical results demonstrate that SARD significantly improves robustness under adversarial attacks while maintaining strong clean performance, marking a critical step toward secure and trustworthy L2D deployment.
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning.
Yannis Montreuil*, Yeo Shu Heng*, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi.

Abstract
Main Conference ICML25 Link

The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and \((\mathcal{G}, \mathcal{R})\)-consistent, ensuring convergence to the Bayes-optimal rejector. We derive explicit consistency bounds tied to the cross-entropy surrogate and the \(L_1\)-norm of agent-specific costs, and extend minimizability gap analysis to the multi-expert two-stage regime. We also make explicit how shared representation learning—commonly used in multi-task models—affects these consistency guarantees. Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach and highlight the limitations of existing L2D methods in multi-task scenarios.

Yannis Montreuil

PhD Candidate
AISG and A*MA Recipient
School of Computing
National University of Singapore

About Me

Research Interests

Scholarships

Publications

2026

2025