Highly-Efficient New Neuromorphic Chip for AI on the Edge
The NeuRRAM chip is the first compute-in-memory device to show a variety of AI applications while consuming a small fraction of the energy used by other platforms while retaining an accuracy of the same level.
An multinational team of researchers has developed NeuRRAM, a novel semiconductor that performs computations directly in memory and can run a wide range of AI applications. It stands out because it accomplishes all of this while using a tiny fraction of the energy used by platforms for general-purpose AI computing.
With the NeuRRAM neuromorphic semiconductor, AI is one step closer to being able to operate independently from the cloud on a variety of edge devices. This implies that they are capable of carrying out complex cognitive tasks at any time, anywhere, without the need for a network connection to a centralized server. There are countless uses for this technology in every region of the world and aspect of our daily lives. They include anything from smartwatches to VR headsets, intelligent earphones, intelligent factory sensors, and rovers for deep space exploration.
The NeuRRAM chip, an advanced class of hybrid circuits that executes computations in memory, is not only twice as energy-efficient as the state-of-the-art "compute-in-memory" chips, but it also produces results that are just as accurate as traditional digital processors. Conventional AI platforms are far larger and often limited to utilising massive data servers that are cloud-based.
The NeuRRAM device is incredibly adaptable and supports a wide range of neural network models and topologies. As a result, the chip can be applied to a wide range of tasks, such as voice recognition and image recognition and reconstruction.
Weier Wan, a recent Stanford University Ph.D. graduate who worked on the chip while at UC San Diego and was co-advised by Gert Cauwenberghs in the Department of Bioengineering, said, "The conventional wisdom is that the higher efficiency of compute-in-memory is at the cost of versatility, but our NeuRRAM chip obtains efficiency while not sacrificing versatility."
The study's team, which included bioengineers from the University of California, San Diego (UCSD), presented its findings in Nature on August 17.
The architecture of the NeuRRAM chip is novel and has been co-optimized throughout the stack. Credit: University of California, San Diego/David Baillot
At the moment, AI computing is both expensive computationally and power-hungry. The majority of edge device AI applications require sending data to the cloud, where the AI processes and analyzes it. The outcomes are then returned to the apparatus. This is required since the majority of edge devices are battery-powered and can only allocate a certain amount of power to computation.
This NeuRRAM chip could result in more reliable, intelligent, and usable edge devices as well as more intelligent manufacturing by lowering the power consumption required for AI inference at the edge. Given the heightened security risks associated with transferring data from devices to the cloud, it might potentially result in greater data privacy.
One significant bottleneck on AI chips is the transfer of data from memory to compute units.
It's comparable to an eight-hour commute for a two-hour workday, according to Wan.
Researchers employed resistive random-access memory to address this data transfer issue. Instead of using separate processing units, this kind of non-volatile memory enables computation to take place directly within memory. One of the key contributors to this study was Philip Wong, whose Stanford lab pioneered the use of RRAM and other new memory technologies as synapse arrays for neuromorphic computing. Even while using RRAM devices for computation is not particularly novel, it typically results in a reduction in the precision of the computations carried out on the chip and a lack of flexibility in the chip's architecture.
Since neuromorphic engineering was first developed more than 30 years ago, compute-in-memory has become a standard technique, according to Cauwenberghs. What is novel about NeuRRAM is that, compared to conventional digital general-purpose computation systems, the exceptional efficiency now coexists with tremendous flexibility for a variety of AI applications.
The work with several levels of "co-optimization" across the abstraction layers of hardware and software, from the design of the chip to its setup to run different AI tasks, required a carefully constructed methodology. The group also made sure to take into account numerous limitations, which range from memory device physics to circuit and network design.
According to Siddharth Joshi, an assistant professor of computer science and engineering at the University of Notre Dame who began working on the project as a Ph.D. student and postdoctoral researcher in Cauwenbergh's lab at UCSD, "This chip now provides us with a platform to address these problems across the stack from devices and circuits to algorithms."
Researchers used a metric called the energy-delay product, or EDP, to gauge the chip's energy efficiency. EDP combines the time it takes to do each task with the energy used to perform that action. By this standard, the NeuRRAM chip outperforms state-of-the-art semiconductors by having an EDP that is 1.6 to 2.3 times lower (lower is preferable) and a computational density that is 7 to 13 times greater.
On the chip, engineers performed several AI functions. It was 99% accurate when recognizing handwritten numbers, 85.7 percent when classifying images, and 84.7 percent while recognizing Google speech commands. Additionally, the chip reduced picture reconstruction error on an image recovery test by 70%. These results are on par with those of current digital processors that operate with the same level of bit precision but use significantly less energy.
The fact that all of the results presented in the publication were acquired directly on the hardware, say the researchers, is one of its major contributions. AI benchmark results were frequently obtained in large numbers of earlier works of compute-in-memory devices via software simulation.
The next steps involve scaling the design to more advanced technology nodes and enhancing designs and circuits. Additionally, engineers want to work on applications like spiking neural networks.
Rajkumar Kubendran, an assistant professor at the University of Pittsburgh who began work on the project while a Ph.D. student in Cauwenberghs' research group at UCSD, stated that "we can do better at the device level, improve circuit design to implement additional features, and address diverse applications with our dynamic NeuRRAM platform."
Wan also helped form a business that is working to commercialize compute-in-memory technology. As an engineer and researcher, Wan stated that one of his goals is to put laboratory discoveries to use in the real world.
The novel technique used to sense output in memory is the secret of NeuRRAM's energy efficiency. Traditional methods measure current as a result and use voltage as an input. But as a result, circuits become increasingly sophisticated and power-intensive. The team created NeuRRAM, an energy-efficient analog-to-digital converter that uses a neuron circuit to measure voltage. Higher parallelism is made possible by the voltage-mode sensing, which can activate all the rows and columns of an RRAM array in a single computation cycle.
CMOS neuron circuits and RRAM weights are physically interleaved in the NeuRRAM design. As opposed to usual designs, which normally place CMOS circuits on the outer edges of RRAM weights, this one does not. The connections between the neuron and the RRAM array can be set up to act as the neuron's input or output. This permits neural network inference in several data flow directions without resulting in additional space or power requirements. In turn, this facilitates architecture reconfiguration.
Engineers created a set of hardware algorithm co-optimization approaches to ensure that the accuracy of the AI computations can be maintained across diverse neural network architectures. On a variety of neural networks, including convolutional neural networks, long short-term memory, and constrained Boltzmann machines, the strategies were validated.
The 48 neurosynaptic cores of NeuroRRAM, a neuromorphic AI processor, work in parallel to disperse processing. NeuRRAM provides data-parallelism by mapping a layer in the neural network model onto multiple cores for parallel inference on multiple data in order to achieve high adaptability and high efficiency at the same time. Additionally, NeuRRAM provides model-parallelism by executing pipelined inference while mapping various model layers to different cores.
For great efficiency and versatility, the UCSD team created the CMOS circuits that implement the brain functions interacting with the RRAM arrays to support the synaptic functions in the chip's architecture. Wan implemented the design, described the device, trained the AI models, and carried out the experiments while closely collaborating with the entire team. A software toolchain that Wan created maps AI applications onto the silicon.
At Stanford University, the RRAM synapse array and its working circumstances were thoroughly studied and optimized.
At Tsinghua University, the RRAM array was created and implemented onto CMOS.
The chip's design and architecture as well as the creation and training of the machine learning models were both influenced by the Notre Dame team.
The Office of Naval Research Science of AI program, the Semiconductor Research Corporation and DARPA JUMP program, as well as Western Digital Corporation, provided ongoing funding support for the research, which was initially conducted at Penn State University as part of the Expeditions in Computing project on Visual Cortex on Silicon, which was funded by the National Science Foundation.
Weier Wan, Rajkumar Kubendran, Clemens Schaefer, Sukru Burc Eryilmaz, Wenqiang Zhang, Dabin Wu, Stephen Deiss, Priyanka Raina, He Qian, Bin Gao, Siddharth Joshi, Huaqiang Wu, H.-S. Philip Wong, and Gert Cauwenberghs published "A compute-in-memory chip based on resistiv
University of California, San Diego, Weier Wan, Rajkumar Kubendran, Stephen Deiss, Siddharth Joshi, and Gert Cauwenberghs
Stanford University, Weier Wan, S. Burc Eryilmaz, Priyanka Raina, and H-S Philip Wong
By UNIVERSITY OF CALIFORNIA - SAN DIEGO
Comments
Post a Comment