4.8 Article

A Reconfigurable Two-WSe2-Transistor Synaptic Cell for Reinforcement Learning

期刊

ADVANCED MATERIALS
卷 34, 期 48, 页码 -

出版社

WILEY-V C H VERLAG GMBH
DOI: 10.1002/adma.202107754

关键词

2D semiconductors; ferroelectric materials; reinforcement learning; reward-modulated spike-timing-dependent plasticity; synaptic device

资金

  1. National Key R&D Program of China [2019YFB2205100]
  2. National Natural Science Foundation of China [11864020, 61974051]
  3. Research Grant Council of Hong Kong [15205619]
  4. Hong Kong Polytechnic University [SB4C]

向作者/读者索取更多资源

This study demonstrates a 2T synaptic transistor design based on WSe2 ferroelectric transistors for implementing R-STDP learning rules. By utilizing the characteristics of ferroelectric polarization, multilevel conductance states and ultra-low nonlinearity are achieved. By applying rewards to the 2T unit, a spiking neural network is successfully trained and the classical cart-pole problem is solved.
Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large G(max)/G(min) ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 mu m(2)) hardware chip for reinforcement learning.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据