4.7 Article

Human-aligned trading by imitative multi-loss reinforcement learning

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 234, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.120939

Keywords

Algorithmic trading; Reinforcement learning; Deep Q network; Imitation learning; Human alignment

Ask authors/readers for more resources

Research on algorithmic trading using reinforcement learning has gained popularity in recent years. In this paper, we propose a trading model that aims to align machine trading agents with human traders. We introduce a novel multi-loss function combining supervised learning, single-step and multi-step Q learning, and incorporate imitation learning in the training and trading processes. Our model outperforms baseline models and justifies the inclusion of individual model features to align with human trader behavior.
Research into algorithmic trading using reinforcement learning has been garnering increasing popularity in recent years. While most research work focuses on solving a certain modelling problem or data problem with positive results, we believe that in an application as critical as financial trading, aligning the machine to human behaviours is imperative and should be regarded as the basis of all further improvements before machine algorithms are free to go their own innovative ways. In this paper, we are proposing a trading model whose design principles are based on bringing a machine trading agent close to a human trader. We study areas where human alignment is necessary and introduce as a solution a novel multi-loss function of the model combining supervised learning, single-step and multi-step Q learning, and also inject the paradigm of imitation learning in the training and trading processes. We also introduce a realistic backtesting setup and a holding position aware profit calculation scheme under which the machine algorithm conducts intra-day trading using minute tick data over a group of U. S. stocks chosen to represent different industrial sectors and liquidity levels. Our model's overall out-performance over a group of baseline models as well as our ablation study results justify the inclusion of individual model features all of which are introduced to bring aspects of the model behaviour more aligned with those of a human trader.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available