4.6 Article

Signal detection models as contextual bandits

期刊

ROYAL SOCIETY OPEN SCIENCE
卷 10, 期 6, 页码 -

出版社

ROYAL SOC
DOI: 10.1098/rsos.230157

关键词

decision theory; signal detection theory; multi-armed bandit; contextual bandit; Softmax; Thompson sampling

向作者/读者索取更多资源

Signal detection theory (SDT) is widely used for optimal decision-making under uncertainty, but it assumes decision-makers immediately adopt the appropriate acceptance threshold, which may not be the case in real-world situations that require learning. This study recasts the traditional SDT model into a contextual multi-armed bandit (CMAB), where decision-makers must infer the relationship between a continuous cue and the desirability of a signal while seeking to exploit the acquired information. Various CMAB heuristics are discussed to address the trade-off between estimating the underlying relationship and exploiting it. The results suggest that CMABs provide principled parametric solutions to SDT problems when decision-makers have incomplete information.
Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据