King's College London

Research portal

Stochastic learning in deep neural networks based on nanoscale PCMO device characteristics

Research output: Contribution to journalArticle

Anakha V. Babu, Sandip Lashkare, Udayan Ganguly, Bipin Rajendran

Original languageEnglish
Pages (from-to)227-236
Number of pages10
JournalNEUROCOMPUTING
Volume321
Early online date20 Sep 2018
DOIs
Accepted/In press10 Sep 2018
E-pub ahead of print20 Sep 2018
Published10 Dec 2018

King's Authors

Abstract

Deep Neural Networks (DNN) have proven to be highly effective in extracting high level abstractions of input data using multiple neural network layers. However, the huge training times for DNNs in traditional von-Neumann machines have hindered their ubiquitous adoption in IoT and other mobile computing platforms. Recently, acceleration of DNN with a time complexity of O(1) was proposed using the idea of stochastic weight update with resistive processing units (RPU). However, it has been projected that RPU devices require more than 1000 reliable conductance levels, which is a stringent requirement to realize in memristive devices. Here, we study the optimization of stochastic learning for DNNs for the hand-written digit classification benchmark using the characteristics of non-filamentary Pr0.7Ca0.3MnO3 (PCMO) devices that are fabricated using a standard lithography process. The electrical characteristics of these devices exhibit a linear conductance response with an on-off ratio of 1.8 with 26 levels and significant programming variability. We captured these non-ideal behaviors of experimental PCMO device in the simulations to demonstrate stochastic learning with O(1) time complexity, achieving a test accuracy of 88.1% for the hand-written digit recognition benchmark. While the linearity, dynamic range, bit resolution, programming variability and the reset rate of the device conductance to account for its limited dynamic range have to be co-optimized for improving the training efficiency, we show that programming variability has the paramount role in determining the network performance. We also show that if devices with reduced programming variability (5x smaller compared to our experimental device) can be developed keeping all other parameters constant, it is possible to boost the network performance as high as 95%. We also observe that the performance of stochastic DNNs with memristive synapses is independent of the on-off ratio of the devices for a fixed programming variability. Thus, programming variability represents a new optimization corner for on-chip learning of stochastic DNNs. Further, we also evaluate the performance of stochastic inference engines to noise corrupted input test data as a function of the variability in the memristive devices. We demonstrate that noise-resilient inference engines can be achieved if 100 bits are used for stochastic encoding during inference even though the expensive network training can be done with as few as 10 bits. Thus, our studies emphasize the need for optimization of learning strategies for DNNs based on the non-ideal characteristics of experimental nanoscale devices.

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454