## Efficient Training of the Memristive Deep Belief Net Immune to Non-Idealities of the Synaptic Devices

Wei Wang,\* Barak Hoffer, Tzofnat Greenberg-Toledo, Yang Li, Minhui Zou, Eric Herbelin, Ronny Ronen, Xiaoxin Xu, Yulin Zhao, Jianguo Yang, and Shahar Kvatinsky\*

The tunability of conductance states of various emerging nonvolatile memristive devices emulates the plasticity of biological synapses, making it promising in the hardware realization of large-scale neuromorphic systems. The inference of the neural network can be greatly accelerated by the vector-matrix multiplication (VMM) performed within a crossbar array of memristive devices in one step. Nevertheless, the implementation of the VMM needs complex peripheral circuits, and the complexity further increases as non-idealities of memristive devices prevent precise conductance tuning (especially for the online training) and largely degrade the performance of the deep neural networks (DNNs). Herein, an efficient online training method of the memristive deep belief net (DBN) is presented. The proposed memristive DBN uses stochastically binarized activations, reducing the complexity of peripheral circuits, and uses the contrastive divergence (CD)-based gradient descent learning algorithm. The analog VMM and digital CD are performed separately in a mixed-signal hardware arrangement, making the memristive DBN highly immune to non-idealities of synaptic devices. The number of write operations on memristive devices is reduced by two orders of magnitude. The recognition accuracy of 95-97% can be achieved for the MNIST dataset using pulsed synaptic behaviors of various memristive synaptic devices.

The Andrew and Erna Viterbi Faculty of Electrical and Computer Engineering

Technion-Israel Institute of Technology Haifa 3200003, Israel

E-mail: wei.wang@campus.technion.ac.il; shahar@ee.technion.ac.il

X. Xu, Y. Zhao, J. Yang Institute of Microelectronics Chinese Academy of Sciences Beijing 100029, P. R. China

The ORCID identification number(s) for the author(s) of this article can be found under https://doi.org/10.1002/aisy.202100249.

© 2022 The Authors. Advanced Intelligent Systems published by Wiley-VCH GmbH. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

## DOI: 10.1002/aisy.202100249

1. Introduction

The separation of memory and computing units in the conventional von-Neumann architecture computing systems, which causes the memory wall bottleneck, is the main issue preventing the artificial neural network from competing with the human brain in efficiency and intelligence.<sup>[1,2]</sup> Emerging nonvolatile memory devices that have tunable resistance, that is, memristive devices, including resistive random-access memory (RRAM),<sup>[3,4]</sup> phase-change memory (PCM),<sup>[5]</sup> ferroelectric random-access memory, and so on are promising techniques to solve the memory wall issue.<sup>[6-8]</sup> They can store information in an analog way and process information at the same location, acting as artificial synaptic devices and enabling in-memory computation similar to what happens in the human brain.<sup>[9,10]</sup> Furthermore, an array of memristive devices can efficiently perform the vector-matrix multiplication (VMM), which is in the computational kernel of a deep neural network (DNN), via Ohm's law

and Kirchhoff's current law in one step,  $^{[11-13]}$  making it a promising way to greatly accelerate the DNN and to power future artificial intelligence.  $^{[14-16]}$ 

However, there are several remaining issues before the promise comes true. First, as the memristive VMM operations are conducted in the analog domain, expensive analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) and additional circuits for the neuron's nonlinear activation functions are needed for the communication between adjacent layers of the DNNs.<sup>[17,18]</sup> To avoid the use of high-precision and expensive ADCs and DACs, novel spiking rate-coded neurons have been proposed.<sup>[19,20]</sup> However, the spiking rate-coded neuron circuit is still complex and informational inefficient.<sup>[21]</sup> Second, the online training is usually performed by tuning the conductance of the synaptic devices in a closed-loop write method (iteratively write and verify to achieve the target value for a single weight update request<sup>[14,22]</sup>), which is inefficient. The open-loop method, which can potentiate the synaptic weight by a single write pulse and depress the weight by another single write pulse in the opposite direction, is preferred.<sup>[23,24]</sup> However, this method fades due to the fact that the neural network

W. Wang, B. Hoffer, T. Greenberg-Toledo, Y. Li, M. Zou, E. Herbelin, R. Ronen, S. Kvatinsky





performance is greatly degraded by the non-idealities of the memristive synaptic devices.<sup>[25,26]</sup> Addtionally, process variation and stuck-at fault errors have been widely reported to cause performance degradation, although can be partially compensated by various methods with extra costs.<sup>[27–39]</sup> These issues can be generally attributed to the nonbiological conventional learning algorithm of DNN, that is, error backpropagation-based gradient descent weight update,<sup>[40,41]</sup> which needs both the VMMs and the conductance tuning in high precision.<sup>[42–44]</sup> Novel neural network structures and learning algorithms need to be explored to address these issues.

In this article, we investigated the hardware implementation of the memristive deep belief net (DBN) based on the learning algorithm of contrastive divergence (CD).<sup>[45]</sup> The memristive DBN is composed of stacked restricted Boltzmann machines (RBMs),<sup>[46]</sup> where the VMM operations have binary inputs and stochastically binarized outputs, needing no ADCs or DACs in the peripheral circuits. The RBM is trained by accumulating the CD in a separated digital array and updating the synaptic weights periodically via the open-loop write method on the memristor array. The training of the DBN needs no additional cache memory to store the intermediate states of hidden layers, nor dedicated circuits for nonlinear activation functions. The proposed memristive DBN shows high immunity to non-idealities of the synaptic devices, greatly relaxing the specifications for memristive synaptic devices in multiple dimensions.

## 2. Network Structure and Hardware Design

#### 2.1. The DBN and RBMs

The structure of the investigated DBN is shown in **Figure 1**a, which consists of three stacked RBMs.<sup>[45]</sup> Each RBM has a visible layer, a hidden layer, and a weight matrix (*w*) connecting them. For supervised learning tasks, taking the MNIST dataset as an example in this work,<sup>[40]</sup> the images are fed into the visible layer of the first RBM, and the labels are part of the visible layer of the last RBM. Unlike conventional DNNs based on error backpropagation algorithms, the DBN relies on the consecutive training of each RBM via the CD algorithm.

The CD is obtained by alternative Gibbs sampling between the visible layer and the hidden layer within each RBM, which requires both forward and backward VMMs as well as binary sampling operations. All input signals are binary digital signals, which can be easily generated by digital circuits. For instance, in the first RBM layer (RBM 1, Figure 1a,b), each image in the MNIST dataset was binarized (pixel value to be either "0" or "1") and converted to a vector representing the states of the visible neurons ( $\nu$ ). After alternative Gibbs sampling (see Experimental Section for more details), the hidden neuron states (h), the reconstructed visible neuron states ( $\nu'$ ), and the reconstructed hidden unit states (h'), which are the local information needed to calculate the CD and update the weight matrix, are obtained.



**Figure 1.** The structure of the memristive DBN and the mixed-signal design of memristive RBM. a) The structure of a typical DBN for the training of the MNIST dataset consisting of three RBMs. b) Illustration of Gibbs sampling between the visible layer and the hidden layer in a single RBM during the training. c) Forward VMM in a memristive crossbar array and binary sampling in the outputs to implement the Gibbs sampling from the visible neurons to the hidden neurons. d) Design of mixed-signal training of single RBM layers in the DBNs. e) Flow chart of the training of a single RBM layer in the greedy learning algorithm of the DBN. The light-blue and light-yellow colored blocks are the procedures of analog operations (VMM, stochastic sampling, and weight updates) and digital operations (CD calculation and accumulation), respectively, which are conducted by the components in d) with the same colors.

SCIENCE NEWS \_\_\_\_\_\_ www.advancedsciencenews.com

DVANCED

After the first RBM layer is trained, the state of the hidden neurons (*h*) will be the input of the second RBM layer (RBM 2 in Figure 1a) for its training. The last RBM layer (RBM 3) takes both the states of the hidden neurons of the previous RBM layer and the label vector (*l*) as the input (see Experimental Section for more details). The weight matrix in the RBM 3 is partitioned into two parts ( $w_3$  and  $w_4$  for clarification). The learning of DBN by consecutive training of the stacked RBM layer is named as "greedy learning" method.<sup>[45]</sup>

In a conventional DNN with the error backpropagation algorithm, the error propagated from the last layer would gradually vanish, which makes it harder to be handled in hardware. Additionally, the gradient descent of the weight relies on both the input of the layer and the error back propagated from the last, which raises the issue of the data dependency. In other words, the states of the neurons in all layers need to be stored before the backpropagated error arrives and the weight is updated. Whereas, in the DBN, all the neuron states are binarized ("0" or "1") and the CD elements are ternary values ("-1", "0," and "1," see Equation (10)) making them easier to be processed by the hardware. Moreover, the calculation of the gradient descent, that is, the CD, depends only on the local information of the neurons, further simplifying the memory requirements and hardware design in the training stage.

#### 2.2. Implement VMM with In Situ Stochastic Activations

The Gibbs sampling operation in an RBM can be fully hardware performed by the memristive crossbar array with an additional noise current in each output node, as shown in Figure 1c. Figure 1c performs the forward VMM and output sampling from the visible neurons to hidden neurons (Equation (6) and (8)). The binary states of the visible neurons (input digital signal) are converted to the read voltage  $(V_i, i \in 1, 2, ..., m)$  as the input of the memristive array with the size of m-by-n, which performs the VMM via Ohm's law and Kirchhoff's current law. As the input of the VMM operation is a binarized vector, only a level shifter is needed (i.e., 1-bit DAC). The current output of the memristive array can be denoted as

$$I_j = \sum V_i G_{ij} \tag{1}$$

where  $j \in 1, 2, ..., n$  is the column index of the memristive array, and  $G_{ij}$  is the conductance of the device in the *i*th row and *j*th column. A separate column of the memristive device with fixed reference conductance ( $G_{ref}$ ) is used to provide the reference current

$$I_{\rm ref} = \sum V_i G_{\rm ref} \tag{2}$$

A noise current  $[I_{noise} \in \mathcal{N}(0, I_n^2)]$  is injected into each output node of the memristive array. The output current is then converted to a voltage by a trans-impedance amplifier (TIA) and compared with the voltage output of the reference column by a comparator (i.e., 1-bit ADC). The hidden neuron states thus can be written as www.advintellsyst.com

$$h_{j} = \begin{cases} 1, & I_{j} - I_{\text{ref}} \ge I_{\text{noise}} \\ 0, & I_{j} - I_{\text{ref}} < I_{\text{noise}} \end{cases}$$
(3)

which reproduces Equation (7) with the weight  $w_{ij}$  being  $G_{ij}$ - $G_{ref}$ . Similarly, the backward VMM and output sampling from the hidden neurons to visible neurons (Equation (8)) can be implemented by placing the input circuits in the hidden neurons and the noise currents, TIAs, and comparators in the visible neurons. Note that the current design only supports the sigmoid-type activation function, which is the only activation function needed in the memristive RBM and DBN. Supporting other activation functions that are needed in other neural networks requires further investigation. We have separately simulated and verified the circuit functionality of the noise current generation, TIA, comparator, and level shifter. However, simulation of the complementary metal-oxide-semiconductor (CMOS) peripheral circuit for a specific technology, including specific limitations, such as operational voltage and parasitic capacitance, to explore the bandwidth and latency of the design memristive RBM and DBN, needs further investigation and should be the next step of the work.

The stochasticity of visible or hidden neurons can also be provided by the intrinsic read noise of the memristive device by properly tuning the signal-to-noise ratio,<sup>[47]</sup> which can further simplify the hardware implementation of the DBN. Here, we utilize the external noise such that we can turn the noise current off for fast inference.

#### 2.3. Memristive Array and CD Accumulation Array

To enable the learning of memristive DBN tolerant to nonidealities of synaptic devices, we used a mixed-signal hardware design of the RBM layer composed of an analog memristor array and a signed digital counter array (Figure 1d). The memristor array is composed of a crossbar array of memristors with the conductance of  $G_{ii}$  and reference cells with the conductance of  $G_{ref}$ . Only the memristor array participates in the VMM, as detailed in Figure 1c. The forward and backward VMMs and stochastic excitation result in two sets of binarized visible neuron states and hidden neuron states ( $\nu$  and h,  $\nu'$  and h'). The digital counter array will perform the outer product calculation of the CD matrix  $(CD_{ii} = v_i h_i - v'_i h'_i)$  and accumulate the ternary CD values in its cells, which are signed digital counters. An identical pulse will be applied to the memristor cell to potentiate or depress its weight  $(G_{ii})$  when the corresponding  $CD_{ii}$  in the digital array reaches a threshold ( $\geq CD_{\text{th}}$ ) or below the negative threshold ( $\leq -CD_{\text{th}}$ ). This will result in a positive or negative conductance change  $(\Delta G)$  on the memristor cell defined by the memristive synaptic weight update behavior. No verifying read operations will be needed. The training procedure of the memristive RBM is shown in Figure 1e, where the analog VMM and neuron state sampling steps, as well as the weight updates, are light-blue colored, and the digital CD calculation and accumulation step are light-yellow colored, corresponding to the colored components in Figure 1d.

The proposed mixed-signal approach for memristive DBN training shares some similarities with the state-of-the-art techniques recently proposed to improve the training performance of the DNN,<sup>[48,49]</sup> however, also shows distinct features. Ambrogio et al.<sup>[48]</sup> proposed a hybrid synaptic cell composed of nonvolatile



memristive devices and volatile capacitor gated transistors (2PCM + 3T1C) for a DNN implementation. The capacitor gated transistor branch of the synaptic cell has high linearity for weight updating and performs both VMMs and weight updates. The accumulated weight updates were transferred to the nonvolatile memristive devices periodically. Here, in our proposed neural network, the CD counter array only accumulates gradient descent (weight update request), and the memristive array performs the VMMs alone. Nandakumar et al.<sup>[49]</sup> proposed a mix-precision approach where each layer of a DNN is composed of a lowprecision memristive array and a high-precision digital part. The digital part computes and accumulates the weight update request in floating-point numbers, and the conductance of the elements in the memristive array is updated when the accumulated weight update request in the high-precision digital part reaches a threshold. The memristive array performs the VMMs in an analog fashion and deals with the small input and output for error backpropagations, which requires highperformance DACs and ADCs. The high-precision digital part is more complex than our digital counter array as in our proposal the weight update request (CD) only consists of integers. The comparison of the learning algorithm and training method with previously reported works of the memristive DNN can be seen in Table S1, Supporting Information. According to the literature,<sup>[14]</sup> the ADCs and DACs may account for 75% of the area and 87% of energy consumption of the macro core consisting of the memristive array and peripheral circuits. Thus, a significant energy consumption reduction compared with the conventional design of a memristive-based DNN is expected.

Capacitor gated transistors<sup>[48]</sup> or other emerging electrolyte gate mem-transistors<sup>[50]</sup> with highly linear behaviors may also be used as the CD accumulation cells replacing the digital counters. As the CD accumulation array is only needed in the training stage, it can be powered off at the inference stage and does not require long-term nonvolatility.

## 3. Training and Inference

#### 3.1. Memristive DBN Training

We first use a synaptic behavior with an ideally symmetric and linear weight update ability (Figure S1a, Supporting Information) to test the applicability of the proposed training algorithm. The memristive DBN (Figure 1a) has the size of 784-500-(500 + 10)-2000, and each RBM is consecutively trained with the greedy learning algorithm for 30 epochs for all 60 000 images in the training set of the MNIST. Figure 2a shows the reconstruction error of visible neurons, defined as the normal distance between the original input and the reconstructed input  $< |\nu' - \nu| >$ , for each RBM during the training. The reconstruction error of the label neurons in the last RBM (RBM 3) is also shown in the figure. The gradually decreasing



**Figure 2.** Training and inference of the memristive DBN. a) Reconstruction error for each RBM during training as a function of the training epoch. b) Example evolution traces of digital CD counter and analog memristor as the function of the number of input training samples. c) Structure of the reorganized neural network for inference (pattern recognition). d) Comparison of the accuracy between the fast deterministic inference and repeated sampling inference when using the well-trained DBN to recognize the handwritten digit images in the MNIST dataset.

reconstruction error indicates that each RBM learned the input patterns to the visible neurons successfully. Three example evolutional traces of the value in the CD counter and the memristor conductance monitored during one training epoch (60 000 training samples) are shown in Figure 2b. From Figure 2b, we can see that the CD counting updates when each training image is input, while most of them cancel each other and will not accumulate. When it reaches the threshold ( $CD_{th} = 64$ ) or is below the negative threshold (-64), the corresponding memristor conductance is potentiated or depressed, respectively. An animation is provided to show the greedy learning process in a more illustrative way (Movie 1, Supporting Information).

Then, the DBN is fine-tuned by the wake-sleep algorithm for 30 epochs (Movie 2 and Figure S2, Supporting Information).<sup>[51]</sup> which can be performed in the same hardware as in the greedy learning (see Experimental Section for more details). Note that replacing the fully connected RBM layer with the convolutional RBM layer can effectively reduce the size of both the memristive array and the CD accumulation array, resulting in better accuracy,<sup>[52]</sup> which, however, is beyond the scope of the current work. To scale up the DBN for larger datasets, for instance, CIFAR-10, convolution RBM layers are also necessary.<sup>[53]</sup>

#### 3.2. Inferences with Binarized Activations

The inference of the DBN, that is, pattern recognition from the input to label, can be implemented by unfolding the last RBM, as shown in Figure 2c. Only forward VMM and stochastic sampling are needed. For a well-trained DBN, sequential implementation of the forward VMMs and stochastic sampling in each RBM layer results in reduced accuracy for all the test images in MNIST ( $\approx$ 83.73%, Figure 2d). However, the accuracy can be gradually improved by repeating the sampling inference ( $\approx$ 97.26% for 50 times repeats). The noise current can be disabled such that the excitation of each neuron is deterministic, that is, deterministic inference. This results in an intermediate accuracy ( $\approx$ 95.17%), however, at higher speed and lower power consumption. A trade-off between the slow-accurate and the fast-coarse inferences can be made according to application needs.

Figure S1b in the Supporting Information shows the test accuracies of deterministic inference and 50 times sampling inference (following training results will use this metric) as a function of training epochs.

### 3.3. Effect of the CD Threshold

Figure 3a shows the performance of the training when varying  $CD_{\rm th}$  in the digital counter array (or varying the bit size of the signed counter). When  $CD_{th} = 1$  (corresponding to directly write to the memristor array for any immediate CD and no digital counter array), lower performance is observed. When  $CD_{\rm th} = 64$  (6-bit counter), the highest recognition accuracy can be obtained. Further increase of the CD<sub>th</sub> will reduce the recognition accuracy as some weight requests are remained in the CD accumulation array and will not be transferred to the memristive array. Figure 3b shows the statistical results of the counts of write operations on each memristive cell for different CD<sub>th</sub>. Without the digital counter array ( $CD_{th} = 1$ ), maximal  $10^6$  and median  $10^3$  write operations are performed on the memristive cells. When  $CD_{\rm th} = 64$ , the number of write operations reduced to maximal 200 and median 20, and half of the devices are not operated at all. The endurance specification for the memristive device is largely relaxed.

## 4. Immunity to Non-Idealities

To simulate more non-idealities of the memristive synaptic devices, an empirical model capturing conductance levels ( $N_p$ and  $N_{\rm d}$  for potentiation and depression phases, respectively), on/off ratio ( $G_{\rm max}/G_{\rm min}$ ), the nonlinearities ( $\alpha_{\rm p}$  and  $\alpha_{\rm d}$ ), and the asymmetry between potentiation and depression ( $N_{\rm p} \neq N_{\rm d}$ ,  $\alpha_{\rm p} \neq \alpha_{\rm d}$ ) is proposed and shown in **Figure 4**a, which can be written as (without cycle-to-cycle and device-to-device variations)

$$\Delta G_{\rm pot} = \left[ \frac{G_{\rm max} - G_{\rm min}}{1 - e^{-\alpha_{\rm p}}} - (G - G_{\rm min}) \right] (1 - e^{-\alpha_{\rm p}/N_{\rm p}}) \tag{4}$$

and



Figure 3. Mixed-signal training of the DBN. a) Test accuracy as a function of the training epoch for different CD<sub>th</sub> using the symmetric and linear weight update behavior. b) Statistical results of the counts of write operations on each memristive cell for different CDth.



ADVANCED INTELLIGENT SYSTEMS

www.advintellsvst.com



**Figure 4.** Effect of non-idealities of the synaptic device on the training performance of the memristive. a) An empirical model to capture more non-idealities of the memristive synaptic devices: nonlinear weight update, the asymmetry between potentiation and depression, and write variation. (Red lines: model w/o variations; Gray lines: model with variations). Training accuracies as b) a function of conductance levels, c) the symmetric nonlinearity, e) cycle-to-cycle variation, f) device-to-device variation, g) yield, and h) read noise.

$$\Delta G_{\rm dep} = -\left[\frac{G_{\rm max} - G_{\rm min}}{1 - e^{-\alpha_{\rm d}}} - (G_{\rm max} - G)\right] (1 - e^{-\alpha_{\rm d}/N_{\rm d}})$$
(5)

for potentiation and depression, respectively. Figure S3 in the Supporting Information shows the example traces of conductance evolution obtained from the model when random generated potentiation and depression pulses are applied. With this model in hand, we check the effects of various non-idealities of memristive devices on the performance of the memristive DBN.

#### 4.1. Limited Conductance Levels

In contrast to the ideal analog conductance tunability, most memristive devices only show two conductance levels, that is, low conductance state (LRS) and high conductance state (HRS).<sup>[2]</sup> Multiple conductance states are generally more promising in RRAM and PCM devices.<sup>[54]</sup> However, these multiple

conductance states are usually obtained with external controlling stimuli, for example, compliance currents or closed-loop readwrite-read verify technique.<sup>[22,55]</sup> Here, we simulate the case where the multiple conductance states are obtained by identical potentiation or depression pulse. The number of conductance levels is defined as the number of pulses ( $N_p$ ) needed for the device to switch from the LRS to HRS in the potentiation phase or, vice versa, the number of pulses ( $N_d$ ) needed for the device to switch from the HRS to LRS in the depression phase, as shown in Figure 4a.

Figure 4b shows the test accuracy after the training as a function of conductance levels. The network works well (accuracy >90%) even when only two conductance levels are available and reaches the best performance (accuracy >97%) when 20–40 conductance levels are available. The deterioration in the performance at higher number of conductance levels can be compensated if more training epochs are conducted (Figure 4b).



#### 4.2. Nonlinear Weight Update

Nonlinear weight update behavior is another major source of performance lost when using memristive synaptic devices for the training of a neural network.<sup>[42,56]</sup> To verify the effect of weight update nonlinearity on the training of the memristive DBN, we vary the nonlinearities of both potentiation and depression in the model ( $\alpha_p$  and  $\alpha_d$ ) while keeping them equal (Figure S5a and S5b, Supporting Information). Figure 4c shows that increasing the nonlinearity of the weight update will slightly decrease the training accuracy. In addition, increasing the *CD*<sub>th</sub> could partially compensate for the deterioration.

#### 4.3. Asymmetric Weight Updates

The weight updates for potentiation and depression generally do not have the same degree of nonlinearity, that is, asymmetric nonlinear weight updates. To test the effect of asymmetric weight updates, we fix the nonlinearity for the depression phase ( $\alpha_d$ ) and only vary the nonlinearity for the potentiation phase ( $\alpha_p$ ) (Figure S5c and S5d, Supporting Information). Figure 4d shows the performance of the memristive DBN as a function of the asymmetry between the weight update nonlinearities of potentiation and depression phases.

#### 4.4. Write Variations

Another source of performance degradation of the memristive neural network comes from the cycle-to-cycle and device-todevice variations when writing to memristive devices.<sup>[57,58]</sup> The cycle-to-cycle write variations are modeled by adding a Gaussian distribution to the conductance change with its standard deviation proportional to the ideal conductance change for each weight update operation (Figure S6a and S6b, Supporting Information)

$$\Delta G_{c2} \in \mathcal{N}(\Delta G, \sigma_{c2c}^2), \quad \sigma_{c2c} = \gamma \Delta G \tag{6}$$

The simulation result shows that in the proposed memristive DBN, the cycle-to-cycle write variations only slightly affect the test accuracy of the neural network (Figure 4e).

Device-to-device variation is modeled by assigning each device's weight update nonlinearity according to a Gaussian distribution  $[\alpha_{p,d2d} \in \mathcal{N}(\alpha_p, \sigma_{\alpha p}^2) \text{ and } \alpha_{n,d2d} \in \mathcal{N}(\alpha_n, \sigma_{\alpha n}^2)$ , Figure S6c and S6d, Supporting Information]. Figure 4f shows the recognition accuracy of the memristive DBN as a function of the standard deviation of the device's nonlinearity. Surprisingly, the higher device-to-device variation does not degrade the performance of the neural network. We see a slight increase in the recognition accuracy.

#### 4.5. Device Yield

In memristive devices, especially the RRAM devices, device yield is the major issue preventing its application in data storage and neuromorphic computation on a large scale.<sup>[55,59]</sup> The memristive device may not work due to the process variation or in some other cases, the synaptic devices may initially work well but stuck at HRS or LRS during the following write operations. In the simulation shown in Figure 4g, we assume that a percentage of the devices is not working (half of them stuck in HRS and the other half stuck in LRS). From Figure 4g, we see that when the device yield is higher than 90%, the performance of the memristive DBN does not degrade. While when the yield is less than 90%, the accuracy of memristive DBN training quacking drops to 20%. Two factors cause the accuracy drop for low device yield: 1) low device yield prevents the accurate greedy learning layer-bylayer; and 2) the fine-tuning after the greedy learning is more sensitive to the non-idealities of the memristive devices, thus ruining the previously learned recognition ability (Figure S7a, Supporting Information).

#### 4.6. Read Noise

Multiple sources of noise can induce inaccuracy in the reading of the memristive devices, for instance, flicker noise, random telegraphy noise, and white noise.<sup>[60–62]</sup> The noise read instability could also be originated from the sense amplifiers and other peripheral circuits. The inaccurate read current will result in the inaccurate output of the VMM. However, as the proposed memristive DBN has stochastic output, the read noise could be a beneficial factor making the hardware implementation easier. As discussed earlier, the probabilistic behavior of the neurons in RBM induced by the noise current injected to the input of each column of the memristive array in Figure 1c can be realized by properly tuning the signal-to-noise ratio of the memristive device reading.<sup>[47]</sup> Here, we test the effect of reading noise by adding a current noise in each of the memristive devices and test two cases, that is, without and with noise current injected into the neuron. Figure 4i shows the performance of the memristive DBN as a function of the read noise level for the two cases. With noise current injected into the neuron, that is, probabilistic neurons as designed earlier, the read noise of the memristive device slightly lowers the training performance. While, without noise current injected into the neuron, when the read noise is small, the memristive DBN shows highly degraded recognition accuracy after training. A certain noise level will be beneficial to the neural network.

#### 4.7. Nonlinear *I-V* Characteristic

Another common nonideal behavior of the memristive device is its nonlinear I-V characteristics.<sup>[13,63]</sup> This prevents the direct implementation of the multiplication in the analog domain as the conductance of the device is not a constant value at different read voltages. Pulse width or pulse number modulation is usually used to represent the analog input of VMM in the implementation of a neural network.<sup>[18,64]</sup> Method of operating the devices in a small dynamic range to avoid the non-Ohmic conduction or in the small signal domain has also been proposed.<sup>[65,66]</sup> All the solutions come with the price of complexing the readout circuit for VMM operations. In the proposed memristive DBN structure, however, the input of the VMM is also binary-valued, which is inherently immune to the nonlinear I-V characteristic issue of memristive devices.



# 5. Memristive DBN with Real Memristive Synaptic Devices

The synaptic model in Figure 4a is used to fit various memristive synaptic behaviors of SiGe epiRAM,<sup>[67]</sup> PCMO,<sup>[23]</sup> ECRAM,<sup>[50]</sup> OxRRAM,<sup>[68]</sup> and PCM<sup>[69]</sup> devices (Figure 5a-e and Table S2, Supporting Information). The fitting parameters are used to validate the training of the memristive DBN. As device-to-device variations and device yield data are not shown in the references, we assumed ideal parameters for them. For the devices that only have gradual weight update behavior in one direction, that is, OxRRAM<sup>[68]</sup> and PCM,<sup>[69]</sup> we use the differential pair where each synaptic cell contains two devices (Figure S8, Supporting Information). All the memristive synaptic behaviors obtained by identical potentiation and depression pulses enable the successful training of the memristive DBN with accuracies ranging from 95% to 97% (Figure 5f and S9, Supporting Information). Note that in Section 4, to validate the learning algorithm accommodates to all synaptic behavior, the simulation includes extreme cases where the non-idealities of the memristive devices are unusually high. In these cases, the fine-tuning would ruin the previously learning recognition ability (Figure S4, S5, and S7, Supporting Information). For the synaptic behavior of real memristive devices, as shown in Figure 5 and S9 in the Supporting



Information, we see that fine-tuning always improves the performance of the neural network.

## 6. Relaxed Specifications for Memristive Synapses

By properly balancing among the parameters of various nonidealities, we got a set of parameters required for the memristive synaptic devices that can achieve 95% accuracy for the MNIST dataset, as listed in Table 1. The specifications of memristive devices in the literature<sup>[26,42–44]</sup> are also listed in the table to make the comparison. The proposed memristive DBN has high relaxed specifications for memristive devices. According to our simulation experience of checking various non-idealities, we found that the nonlinearity of weight update is the most important factor needs to be taken care of. Reducing the nonlinearity would continuously improve the performance of the neural network. The number of conductance levels and device vield need special care as a low number of conductance levels and low device yield will suddenly deteriorate the performance of the neural network. However, when enough conductance levels and device yield are available, further improvement of these metrics would not significantly benefit the neural network's performance. Other non-idealities are highly relaxed and can be easily met with most of the existing memristive device technology.



**Figure 5.** Training of the memristive DBN with memristive synaptic devices data. Fitting memristive synaptic behaviors by the device model for a) SiGe epiRAM, b) PCMO, c) ECRAM, d) OxRRAM, and e) PCM, respectively. Blue points: data from the references; red lines: model w/o variations; gray lines: model with variations. f) Training accuracy of the memristive DBN using the memristive device behaviors ( $CD_{th} = 128$ ).

#### ADVANCED SCIENCE NEWS \_\_\_\_\_ www.advancedsciencenews.com

|                            | Gokmen et al. (2016) <sup>[26]</sup> | Chen et al. (2017) <sup>[43]</sup> | Chang et al. (2018) <sup>[42]</sup> | Agarwal et al. (2016) <sup>[44]</sup> | This work     |
|----------------------------|--------------------------------------|------------------------------------|-------------------------------------|---------------------------------------|---------------|
| Conductance levels         | ≥1000                                | ≥64                                | $\geq$ 256 (8 bits)                 | -                                     | ≥20           |
| On/off ratio               | $\geq$ 8                             | ≥50                                | -                                   | -                                     | ≥3            |
| Nonlinearity               | -                                    | ≤1.0                               | $\leq$ 4.5 <sup>b)</sup>            | $\leq$ 5                              | ≤10           |
| Asymmetry                  | ≤1.05                                | ≤2%                                | -                                   | -                                     | $\leq 2^{a)}$ |
| Cycle-to-cycle variation   | -                                    | -                                  | -                                   | $\leq$ 0.4%                           | ≤30%          |
| Device-to-device variation | -                                    | $\leq 1$                           | -                                   | -                                     | $\leq$ 5      |
| Yield                      | -                                    | _                                  | -                                   | -                                     | ≥90%          |
| Read noise                 | -                                    | ≤20%                               | -                                   | <b>≤</b> 9%                           | ≤10%          |
| Endurance                  | -                                    | -                                  | -                                   | -                                     | ≥100          |
| DAC/ADC accuracy           | 9 bits                               | _                                  | 8 bits                              | -                                     | 1 bit         |
| Nonlinear activation       | Software                             | Software                           | Software                            | Digital core                          | Not required  |

Table 1. Specifications of memristive devices to achieve 95% accuracy for training MNIST compared with the specifications in the literature.

<sup>a)</sup>Could be removed if differential pairs are used; <sup>b)</sup>Converted from the original value as a different metric of nonlinearity is used in the reference.

## 7. Conclusion

A memristive DBN composed of mixed-signal RBM layers for efficient online training is proposed. The mixed-signal RBM layer consists of an analog memristive array for the stochastic VMM and a digital counter array for the accumulation of CD. The proposed memristive DBN has stochastically binarized activation, free from the need for complex peripheral circuits with expensive DACs and ADCs. It shows high immunity to various non-idealities of the memristive synaptic devices. The endurance requirement of the memristive is also highly relaxed.

## 8. Experimental Section

Training of a Single RBM Layer. We used the first-order CD for the training of memristive RBM. The input (state of the visible units,  $\nu$ ) was first multiplied by the weight matrix of the RBM 1 ( $w_1$ ) to obtain the probability of the state of the hidden units (h)

$$P(h=1) = \sigma(\nu w_1) \tag{7}$$

where  $\sigma(x) = \frac{1}{1+e^{-x}}$  is the logistic sigmoid function. Then, the state of the hidden units (*h*) was backward multiplied by the weight matrix to obtain the probability of the reconstructed state of the visible units ( $\nu'$ )

$$P(\nu'=1) = \sigma(hw_1^T) \tag{8}$$

After that, the reconstructed state of the visible units ( $\nu$ ') was again multiplied by the weight matrix to obtain the probability of the reconstructed state of the hidden units ( $\nu$ ')

$$P(h'=1) = \sigma(\nu'w_1) \tag{9}$$

The CD was then calculated by the difference between the outer products of the two sets of visible-hidden neuron states

$$\mathsf{CD} = \mathsf{v} \otimes \mathsf{h} - \mathsf{v}' \otimes \mathsf{h}' \tag{10}$$

The CD matrix, which acts as the gradient descent of the weight matrix, was used to update the weight matrix

$$\frac{\partial E}{\partial w_1} \propto \mathsf{CD} \tag{11}$$

where *E* is the energy of the RBM, which should be minimized. In this work, we do not update the weight matrix  $w_1$  directly. We accumulate the CD matrix (request for weight updates) and only update the elements in the weight matrix when the elements in the CD accumulation (signed integers stored in digital counters) reach a threshold.

Training of the Last RBM with Label Input: When training the last RBM, label target (*I*) in one-hot format is input to the label layer with 10 neurons. The weight matrix was partitioned into two parts ( $w_3$  and  $w_4$  in Figure 1a). The forward VMM and sampling were performed by both inputting the hidden neuron states of the 2nd RBM to the visible layer and inputting the target label into the label layer. The hidden neurons sum all the currents from both the visible layer and the label layer. The backward VMM and sampling were performed by both inputting the target label into the label layer. The hidden neurons sum all the currents from both the visible layer and the label layer. The backward VMM and sampling were performed separately from the hidden layer to the visible layer via weight matrix  $w_3$ , and from the hidden layer to the label layer via weight matrix  $w_4$ . The neurons in the visible layer were excited according to the probability given by the sigmoid function as described in Equation (7). The neurons in the label layer, however, were excited according to the probability given by the softmax function

$$P(l'=1) = SoftMax(hw_1^T)$$
(12)

where  $SoftMax(x)_i = \frac{e^{-x_i}}{\sum e^{-x_i}}$ , which makes sure that, statistically, only one label neuron will be excited. The CD matrix and its accumulation will be

implemented separately for the weight matrices  $w_3$  and  $w_4$ , respectively Wake–Sleep Algorithm for Fine-Tuning: The fine-tuning of the memristive

DBN by the wake-sleep algorithm was performed after the greedy learning of the DBN (pretraining). To do the fine-tuning, the RBMs in the memristive DBN were duplicated except for the last RBM with new weight matrices  $w_1 = w_1'$  and  $w_2 = w_2'$  (Figure S2a, Supporting Information). The weight matrices  $w_1$ ,  $w_2$ ,  $w_3$ , and  $w_4$  constitute the recognition path, and the weight matrix  $w_4$ ,  $w_3$ ,  $w_2'$ , and  $w_4'$  constitute the generation path or wake path. The input data of the image and the label were fed to the recognition path, and the top RBM layer performed Gibb sampling iteratively (iteration number is 20 in our simulation), and then the reconstructed visible neuron states were fed to the generation path to generate the reconstructed image (sleep path). The states of neurons in the wake path and sleep path were used to update the weight matrix in the sleep path and wake path, respectively (Figure S2b, Supporting Information). In this article, the DBN is pretrained for 30 epochs and fine-tuned for another 30 epochs. From the detailed recognition accuracy versus epoch traces in Figure S4-S7 in the Supporting Information, we could see that the pretraining always improves the performance of the neural networks, while fine-tune can further improve the performance in some cases. When the non-idealities of the memristive device are high, the fine-tuning is not



working well or may ruin the previously learned recognition ability. This indicates that the fine-tuning procedure is more prone to device non-idealities. The highest accuracy in the traces of the accuracy versus the epoch is taken as the metric for the analysis in Figure 3–5.

*Image Generation by the Trained DBN*: The memristive DBN after fine-tuning with generation path can be used to generate image when only given label input (Figure S10, Supporting Information). As shown in Figure S10a in the Supporting Information, a random noise image and a label are input to the DBN, and the top RBM layer performs the Gibbs sampling in multiple iterations. Taking the reconstructed visible neuron states of the top RBM layer as input, the generation path's output will provide the correct digit image corresponding to the label (Figure S10b, Supporting Information).

## Supporting Information

Supporting Information is available from the Wiley Online Library or from the author.

## Acknowledgements

This work was supported by the European Research Council through the European Union's Horizon 2020 Research and Innovation Programme under Grant 757259. This project has alao received funding from the European Union's Horizon 2020 Research and Innovation Programme FET-Open NEU-Chip under grant agreement No. 964877. W.W. was supported in part at the Technion by the Aly Kaufman Fellowship. *The Interactive Supporting Information of this article can be found at: https://www.authorea.com/doi/full/10.22541/au.164130702.22387391.* 

## **Conflict of Interest**

The authors declare no conflict of interest.

## **Data Availability Statement**

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## **Keywords**

contrastive divergence, deep belief net, memristive synapse, non-ideality, restricted Boltzmann machine

Received: December 2, 2021 Revised: January 26, 2022 Published online:

- P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, D. S. Modha, *Science* **2014**, *345*, 668.
- [2] A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, E. Eleftheriou, Nat. Nanotechnol. 2020, 15, 529.
- [3] R. Dittmann, J. P. Strachan, APL Mater. 2019, 7, 110903.
- [4] J. J. Yang, D. B. Strukov, D. R. Stewart, *Nat. Nanotechnol.* 2013, *8*, 13.
  [5] A. Sebastian, M. Le Gallo, G. W. Burr, S. Kim, M. BrightSky,
- E. Eleftheriou, J. Appl. Phys. 2018, 124, 111101.
- [6] D. Ielmini, H.-S. P. Wong, Nat. Electron. 2018, 1, 333.

#### www.advintellsyst.com

- [7] Z. Wang, H. Wu, G. W. Burr, C. S. Hwang, K. L. Wang, Q. Xia, J. J. Yang, Nat. Rev. Mater. 2020, 5, 173.
- [8] M. A. Zidan, J. P. Strachan, W. D. Lu, Nat. Electron. 2018, 1, 22.
- [9] W. Wang, G. Pedretti, V. Milo, R. Carboni, A. Calderoni, N. Ramaswamy, A. S. Spinelli, D. Ielmini, *Sci. Adv.* 2018, *4*, eaat4752.
- [10] Y. Zhang, Z. Wang, J. Zhu, Y. Yang, M. Rao, W. Song, Y. Zhuo, X. Zhang, M. Cui, L. Shen, R. Huang, J. Joshua Yang, *Appl. Phys. Rev.* **2020**, *7*, 011308.
- [11] M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, H. Jiang, R. S. Williams, J. J. Yang, Q. Xia, J. P. Strachan, *Adv. Mater.* 2018, *30*, 1705914.
- [12] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Dávila, C. E. Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, Q. Xia, *Nat. Electron.* **2018**, *1*, 52.
- [13] Q. Xia, J. J. Yang, Nat. Mater. 2019, 18, 309.
- [14] P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, H. Qian, *Nature* **2020**, 577, 641.
- [15] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, D. B. Strukov, *Nature* **2015**, *521*, 61.
- [16] Z. Wang, C. Li, W. Song, M. Rao, D. Belkin, Y. Li, P. Yan, H. Jiang, P. Lin, M. Hu, J. P. Strachan, N. Ge, M. Barnell, Q. Wu, A. G. Barto, Q. Qiu, R. S. Williams, Q. Xia, J. J. Yang, *Nat. Electron.* **2019**, *2*, 115.
- [17] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, V. Srikumar, in *Annu. Int. Symp. Comput. Archit.*, IEEE, Piscataway, NJ **2016**, pp. 14–26.
- [18] F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn, W. D. Lu, Nat. Electron. 2019, 2, 290.
- [19] B. Yan, Q. Yang, W. Chen, K. Chang, J. Su, C. Hsu, S. Li, H.-Y. Lee, S.-S. Sheu, M.-S. Ho, Q. Wu, M.-F. Chang, Y. Chen, H. Li, in 2019 Symp. VLSI Technol., IEEE, Piscataway, NJ 2019, pp. T86–T87.
- [20] X. Zhang, Y. Zhuo, Q. Luo, Z. Wu, R. Midya, Z. Wang, W. Song, R. Wang, N. K. Upadhyay, Y. Fang, F. Kiani, M. Rao, Y. Yang, Q. Xia, Q. Liu, M. Liu, J. J. Yang, *Nat. Commun.* **2020**, *11*, 51.
- [21] G. Foffani, M. L. Morales-Botello, J. Aguilar, J. Neurosci. 2009, 29, 5964.
- [22] V. Milo, A. Glukhov, E. Perez, C. Zambelli, N. Lepri, M. K. Mahadevaiah, E. P.-B. Quesada, P. Olivo, C. Wenger, D. Ielmini, *IEEE Trans. Electron Devices* **2021**, *68*, 3832.
- [23] J.-W. Jang, S. Park, G. W. Burr, H. Hwang, Y.-H. Jeong, IEEE Electron Device Lett. 2015, 36, 457.
- [24] D. Ielmini, G. Pedretti, Adv. Intell. Syst. 2020, 2, 2000040.
- [25] G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, B. N. Kurdi, H. Hwang, *IEEE Trans. Electron Devices* 2015, 62, 3498.
- [26] T. Gokmen, Y. Vlasov, Front. Neurosci. 2016, 10, 333.
- [27] Z. Song, Y. Sun, L. Chen, T. Li, N. Jing, X. Liang, L. Jiang, IEEE Trans. Comput. Des. Integr. Circuits Syst. 2021, 40, 129.
- [28] D. Joksas, E. Wang, N. Barmpatsalos, W. H. Ng, A. J. Kenyon, G. A. Constantinides, A. Mehonic, ArXiv 2021.
- [29] A. Dorzhigulov, A. P. James, in 2019 IEEE Int. Symp. Circuits Syst., IEEE, Piscataway, NJ 2019, pp. 1–5.
- [30] M. Ansari, A. Fayyazi, A. BanaGozar, M. A. Maleki, M. Kamal, A. Afzali-Kusha, M. Pedram, *IEEE Trans. Comput. Des. Integr. Circuits Syst.* 2018, 37, 1602.
- [31] L. Chen, J. Li, Y. Chen, Q. Deng, J. Shen, X. Liang, L. Jiang, in Autom. Test Eur. Conf. Exhib. (DATE), 2017, IEEE, Piscataway, NJ 2017, pp. 19.
- [32] A. P. James, L. O. Chua, IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 4470.
- [33] Y. Zhu, G. L. Zhang, T. Wang, B. Li, Y. Shi, T.-Y. Ho, U. Schlichtmann, in 2020 Des. Autom. Test Eur. Conf. Exhib., IEEE, Piscataway, NJ 2020, pp. 1590–1593.

ADVANCED SCIENCE NEWS

www.advancedsciencenews.com

- [34] S. Jin, S. Pei, Y. Wang, Futur. Gener. Comput. Syst. 2020, 106, 270.
- [35] S. Pal, S. Bose, W.-H. Ki, A. Islam, IEEE J. Electron Devices Soc. 2019, 7, 701.
- [36] A. Irmanova, A. Maan, A. James, L. Chua, IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 1133.
- [37] T. Li, S. Duan, J. Liu, L. Wang, Neural Comput. Appl. 2018, 30, 1939.
- [38] H. Mostafa, Y. Ismail, IEEE Trans. Semicond. Manuf. 2016, 29, 145.
- [39] B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, T. Huang, in Proc. 52nd Annu. Des. Autom. Conf., ACM: New York, NY, 2015, pp. 1–6.
- [40] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Proc. IEEE 1998, 86, 2278.
- [41] Y. LeCun, Y. Bengio, G. Hinton, Nature 2015, 521, 436.
- [42] C. C. Chang, P. C. Chen, T. Chou, I. T. Wang, B. Hudec, C. C. Chang, C. M. Tsai, T. S. Chang, T. H. Hou, *IEEE J. Emerg. Sel. Top. Circuits Syst.* **2018**, *8*, 116.
- [43] P.-Y. Chen, X. Peng, S. Yu, in 2017 IEEE Int. Electron Devices Meet., IEEE, Piscataway, NJ 2017, pp. 6.1.1–6.1.4.
- [44] S. Agarwal, S. J. Plimpton, D. R. Hughart, A. H. Hsia, I. Richter, J. A. Cox, C. D. James, M. J. Marinella, *Proc. Int. Jt. Conf. Neural Networks 2016*, IEEE, Piscataway, NJ, October 2016, p. 929.
- [45] G. E. Hinton, S. Osindero, Y.-W. Teh, Neural Comput. 2006, 18, 1527.
- [46] N. Zhang, S. Ding, J. Zhang, Y. Xue, Neurocomputing 2018, 275, 1186.
- [47] M. R. Mahmoodi, M. Prezioso, D. B. Strukov, Nat. Commun. 2019, 10, 5113.
- [48] S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, D. Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, G. W. Burr, *Nature* **2018**, *558*, 60.
- [49] S. R. Nandakumar, M. Le Gallo, C. Piveteau, V. Joshi, G. Mariani,
  I. Boybat, G. Karunaratne, R. Khaddam-Aljameh, U. Egger,
  A. Petropoulos, T. Antonakopoulos, B. Rajendran, A. Sebastian,
  E. Eleftheriou, *Front. Neurosci.* 2020, 14, 1.
- [50] J. Tang, D. Bishop, S. Kim, M. Copel, T. Gokmen, T. Todorov, S. Shin, K.-T. Lee, P. Solomon, K. Chan, W. Haensch, J. Rozen, in 2018 IEEE Int. Electron Devices Meet., IEEE, Piscataway, NJ 2018, pp. 13.1.1–13.1.4.
- [51] G. Hinton, P. Dayan, B. Frey, R. Neal, Science 1995, 268, 1158.
- [52] H. Lee, R. Grosse, R. Ranganath, A. Y. Ng, in *Proc. 26th Annu. Int. Conf. Mach. Learn. ICML '09*, ACM Press, New York, NY **2009**, pp. 1–8.

- [53] A. Krizhevsky, G. Hinton, Convolutional Deep Belief Networks On Cifar-10, 2010.
- [54] T. Zhang, K. Yang, X. Xu, Y. Cai, Y. Yang, R. Huang, Phys. Status Solidi RRL 2019, 1900029, 1900029.
- [55] V. Milo, C. Zambelli, P. Olivo, E. Pérez, M. K. Mahadevaiah, O. G. Ossorio, C. Wenger, D. Ielmini, APL Mater. 2019, 7, 081120.
- [56] G. W. Burr, R. M. Shelby, C. di Nolfo, J. W. Jang, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, B. Kurdi, H. Hwang, in 2014 IEEE Int. Electron Devices Meet., IEEE, Piscataway, NJ 2014, pp. 29.5.1–29.5.4.
- [57] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, D. Ielmini, *IEEE Trans. Electron Devices* 2014, 61, 2912.
- [58] J. H. Lee, D. H. Lim, H. Jeong, H. Ma, L. Shi, IEEE Trans. Electron Devices 2019, 66, 2172.
- [59] A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, A. J. Kenyon, Front. Neurosci. 2019, 13, 1.
- [60] W. Shim, J. S. Seo, S. Yu, Semicond. Sci. Technol. 2020, 35, 115026.
- [61] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, D. Ielmini, in 2013 IEEE Int. Electron Devices Meet., IEEE, Piscataway, NJ 2013, pp. 31.5.1–31.5.4.
- [62] S. Ambrogio, S. Balatti, V. McCaffrey, D. Wang, D. Ielmini, in 2014 IEEE Int. Electron Devices Meet., IEEE, Piscataway, NJ 2014, pp. 14.4.1–14.4.4.
- [63] W. Wang, W. Song, P. Yao, Y. Li, J. Van Nostrand, Q. Qiu, D. Ielmini, J. J. Yang, *iScience* **2020**, *23*, 101809.
- [64] P. Yao, H. Wu, B. Gao, S. B. Eryilmaz, X. Huang, W. Zhang, Q. Zhang, N. Deng, L. Shi, H.-S. P. Wong, H. Qian, *Nat. Commun.* 2017, *8*, 15199.
- [65] C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang, P. Yan, Y. Li, P. Lin, M. Hu, N. Ge, J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, Q. Xia, *Nat. Mach. Intell.* **2019**, *1*, 49.
- [66] L. Danial, E. Pikhay, E. Herbelin, N. Wainstein, V. Gupta, N. Wald, Y. Roizin, R. Daniel, S. Kvatinsky, *Nat. Electron.* 2019, 2, 596.
- [67] S. Choi, S. H. Tan, Z. Li, Y. Kim, C. Choi, P.-Y. Chen, H. Yeon, S. Yu, J. Kim, Nat. Mater. 2018, 17, 335.
- [68] P. Huang, D. Zhu, S. Chen, Z. Zhou, Z. Chen, B. Gao, L. Liu, X. Liu, J. Kang, *IEEE Trans. Electron Devices* 2017, 64, 614.
- [69] M. Suri, O. Bichler, D. Querlioz, B. Traoré, O. Cueto, L. Perniola, V. Sousa, D. Vuillaume, C. Gamrat, B. DeSalvo, J. Appl. Phys. 2012, 112, 054904.

