4.7 Article

Human-aligned trading by imitative multi-loss reinforcement learning

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 234, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.120939

关键词

Algorithmic trading; Reinforcement learning; Deep Q network; Imitation learning; Human alignment

向作者/读者索取更多资源

Research on algorithmic trading using reinforcement learning has gained popularity in recent years. In this paper, we propose a trading model that aims to align machine trading agents with human traders. We introduce a novel multi-loss function combining supervised learning, single-step and multi-step Q learning, and incorporate imitation learning in the training and trading processes. Our model outperforms baseline models and justifies the inclusion of individual model features to align with human trader behavior.
Research into algorithmic trading using reinforcement learning has been garnering increasing popularity in recent years. While most research work focuses on solving a certain modelling problem or data problem with positive results, we believe that in an application as critical as financial trading, aligning the machine to human behaviours is imperative and should be regarded as the basis of all further improvements before machine algorithms are free to go their own innovative ways. In this paper, we are proposing a trading model whose design principles are based on bringing a machine trading agent close to a human trader. We study areas where human alignment is necessary and introduce as a solution a novel multi-loss function of the model combining supervised learning, single-step and multi-step Q learning, and also inject the paradigm of imitation learning in the training and trading processes. We also introduce a realistic backtesting setup and a holding position aware profit calculation scheme under which the machine algorithm conducts intra-day trading using minute tick data over a group of U. S. stocks chosen to represent different industrial sectors and liquidity levels. Our model's overall out-performance over a group of baseline models as well as our ablation study results justify the inclusion of individual model features all of which are introduced to bring aspects of the model behaviour more aligned with those of a human trader.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据