Alexa and Siri, listen up! Teaching machines to really hear us

Per Sederberg, a cognitive scientist at the University of Virginia, offers a fun experiment you can perform at home. Take out your smartphone and speak the word "octopus" as slowly as you can using a voice assistant, such the one for Google's search engine.

It will be difficult for your gadget to repeat what you just stated. It can offer you a meaningless response or something resembling anything but still odd, like "toe pus." Gross!

The fact is, according to Sederberg, that existing artificial intelligence is still a little deaf when it comes to receiving aural cues as humans and other animals do, despite all of the computer power devoted to the endeavor by tech giants like Google, Deep Mind, IBM, and Microsoft.

For people who struggle with their speech, the results can be anything from amusing and somewhat aggravating to downright alienating.

However, UVA joint research has made it feasible to transform current AI neural networks into technology that can actually hear us, no matter how quickly we talk, by utilizing recent advances in neurology as a paradigm.

By generalizing input, the deep learning technology known as SITHCon is able to comprehend words spoken at different rates than a network was trained on.

This new capability has the potential to affect how artificial neural networks "think," improving the way they process information. It will not only improve the end-experience. user's And it might fundamentally alter a sector that is continuously seeking to increase processing power, decrease data storage, and lessen AI's enormous carbon impact.

In conjunction with scientists from Boston University and Indiana University, Sederberg, an associate professor of psychology and the head of UVA's Cognitive Science Design, worked with graduate student Brandon Jacques to program a functional demonstration of the technology.

According to Jacques, the paper's lead author, "We've shown that we can interpret speech, in particular scaled speech, better than any model we know of."

We kind of see ourselves as a motley bunch of misfits, Sederberg continued. The large teams at Google, Deep Mind, and Apple were unable to resolve this issue, but we did.

The ground-breaking study was revealed on Tuesday at the prestigious International Conference on Machine Learning (ICML) in Baltimore.

Companies have been creating sophisticated artificial neural networks for machines for many years, but especially in the last 20 years, in an effort to simulate how the human brain perceives a changing environment. Among many other uses, these programs are specialized to anticipate the stock market, diagnose medical ailments, and monitor for dangers to national security. They go beyond simply facilitating simple information retrieval and consumption.

In essence, Sederberg remarked, "we are looking for significant patterns in the world around us." "Those patterns will aid us in making choices about how to act and how to fit in with our surroundings, so we may reap as many benefits as possible."

The technology was first inspired by the brain, thus the name "neural networks," which was given to it by programmers.

Early AI researchers, according to Sederberg, "took the fundamental characteristics of neurons and how they're related to one another and replicated those using computer code."

However, programmers unknowingly chose a different route than how the brain truly functions for challenging challenges like teaching robots to "hear" language, he claimed. They were unable to change course in response to advances in our understanding of neurobiology.

The lecturer outlined how these big businesses approach the issue by dedicating computing resources to it. "Therefore, they expand the neural networks. An area that was first influenced by the brain has become an engineering challenge.

Basically, programmers train the vast networks using a technique called back propagation by inputting a variety of different voices using various words at various speeds. The programmers constantly feed the continuously improved information back in a loop since they are aware of the outcomes they aim to attain. The AI will then start to give the portions of the input that will lead to accurate answers the proper weight. The noises are transformed into textual characters.

Even while computing speeds and the training data sets used as inputs have both increased, the process is still far from perfect as programmers add more layers to identify more subtleties and complexity—so-called "deep" or "convolutional" learning.

The number of languages in use now exceeds 7,000. Accents and dialects, deeper or higher voices, and of course quicker or slower speaking all result in variations. Every time a rival develops a superior product, a computer must process the data.

The environment will suffer as a result of it in the actual world. According to a 2019 research, the energy needed to train a single big deep-learning model resulted in carbon dioxide emissions equivalent to the lifetime footprint of five automobiles.

The phrase "time cells," which describes the phenomena on which this most recent AI study is based, was first used by the late Howard Eichenbaum of Boston University. When the brain processes time-based information, such as music, there are spikes in neuronal activity, according to research by neuroscientists on time cells in mice and later humans. These distinct neurons, which are found in the hippocampus and other regions of the brain, record certain periods of data, which the brain analyzes and interprets in relation. These cells coexist with "place cells," which support our ability to create mental maps.

No matter how quickly or slowly the information arrives, time cells aid in the brain's ability to develop a coherent interpretation of sound.

"If I say 'oooooooc-toooooo-pussssss,' you've probably never heard 'octopus' uttered at that speed, but you can comprehend it because the way your brain is processing that information is called'scale invariant,'" said Sederberg. What that essentially implies is that if you've heard something and learnt to decode that information at one scale, you'll still understand it if it suddenly comes in a little quicker, a little slower, or even much slower.

The Boston University lab of cognitive researcher Marc Howard is still expanding on the findings of the time cell. Howard has worked with Sederberg for over 20 years and has conducted research on how people interpret their life experiences. He then applies arithmetic to his newfound understanding.

A timeline is a part of Howard's equation that describes auditory memory. Time cells firing in order create the timeline. Importantly, the equation predicts that when sound flows toward the past, the chronology will blur — and in a specific way. That's because the accuracy of the brain's recollection of an experience declines over time.

As a result, Sederberg explained, "there is a certain pattern of firing that codes for what happened for a given period in the past, and information gets fuzzier and fuzzier the further in the past it goes." "The interesting part is that Marc and a post-doc working in his lab calculated theoretically how this should appear. Then, in the brain, neuroscientists began observing evidence in support of it.

Time provides sounds context, and context is part of what gives what is spoken to us meaning. The math easily breaks down, according to Howard.

Sederberg and Howard first realized that representations influenced by the brain may be useful for AI about five years ago. Sederberg's Computational Memory Lab started developing and testing models in collaboration with Howard's lab as well as in consultation with Zoran Tiganj and associates at Indiana University.

About three years ago, Jacques experienced the major insight that enabled him to code the proof of concept that followed. Similar to how a zip file on a computer may be used to compress and store huge data, the technique includes a type of compression that can be unpacked as needed. At order to conserve storage space, the system only keeps a sound's "memory" in a resolution that will be helpful later.

When the input is scaled, Sederberg explained, "it doesn't entirely disrupt the pattern because the information is logarithmically compressed; it only moves over."

A "temporal convolutional network" is a pre-existing tool that academics may use for free in comparison to the AI training for SITHCon. The objective was to change the network from one that was only trained to hear at certain speeds.

Morse code, a simple language that employs long and brief bursts of sound to indicate dots and dashes, was used as the first step in the process before moving on to an open-source group of English speakers reciting the digits 1 through 9 as input.

In the end, no more instruction was required. If a speaker stretched out the phrases, the AI wouldn't be deceived after it had learned to detect the message at one pace.

We demonstrated that SITHCon could generalize to speech scaled up or down in speed, in contrast to other models that were unable to interpret data at rates they had not seen during training, according to Jacques.

In an effort to enhance knowledge, UVA has now chosen to make its code freely accessible. According to the team, any neural network that interprets voice should be able to use the information.

Because we support open science, we're going to publish and make all the code available, according to Sederberg. "We're hoping that businesses will see this, become really thrilled, and offer to support our ongoing efforts. We've tapped into a basic aspect of how the brain processes data, combining strength with effectiveness, and we've only begun to explore the potential of these AI models.

According to Sederberg, he is hopeful that superior hearing AI would be used responsibly, as all technology should be in theory.

These businesses are currently experiencing computational difficulties as they attempt to develop more potent and practical tools, he claimed. "You have to believe that the advantages exceed the disadvantages. For better or worse, if you can transfer more of your cognitive processes to computers, the world will become more productive.

University of Virginia

Search This Blog

Scienceblogtwo

Alexa and Siri, listen up! Teaching machines to really hear us

Comments

Post a Comment

Popular posts from this blog

Do You Sleep on Your Back or Side? Here's The Research on 'Optimal' Sleep Positions

North America’s Rarest Snake Found Dead

Breakthrough: Physicists Take Particle Self-Assembly to New Level by Mimicking Biology