Connecting other people online through voice and video calls is becoming more and more a part of everyday life. The real-time communication frameworks that make this possible depend on efficient compression techniques, codecs, to encode (or decode) the signals transmitted or stored. As part of the effort to make the best codecs universally available, Google decided to make Lyra open source. It thus allows other developers to feed their communication applications and take Lyra in new directions. Lyra relies on machine learning to enable high quality voice calls in low bandwidth situations.A staple of media applications for decades, codecs have enabled bandwidth-intensive applications to transmit data efficiently. As such, the development of codecs, both for video and audio, poses an ongoing challenge: to provide increasing quality, using less data, and to minimize latency for real-time communication. While video may seem much more bandwidth-intensive than audio, modern video codecs can achieve lower bitrates than some of the high-quality voice codecs in use today.
The combination of low bit rate video and voice codecs can provide a high quality video calling experience even on low bandwidth networks. However, historically, the lower the bit rate of an audio codec, the less intelligible the voice signal and the more robotic it is. Additionally, while some people have access to a consistently high-quality broadband network, this level of connectivity is not universal, and even people living in well-connected areas sometimes face poor network connections, low bandwidth. and congestion.
To solve this problem, Google created Lyra, a high-quality, very low-bitrate voice codec that makes voice communication available even on the slowest networks. To do this, Google applied traditional coding techniques while leveraging advances in machine learning with models formed over thousands of hours of data to create a new method of compressing and transmitting speech signals.
Lyra code is written in C ++ for speed, efficiency and interoperability. It uses the Bazel framework with Abseil and the GoogleTest framework for complete unit tests. The base API provides an interface for file and packet level encoding and decoding. The complete signal processing toolchain is also provided, and includes various filters and transformations. Our sample app integrates with the Android NDK to demonstrate how to embed Lyra native code into a Java-based Android app. We also provide the weights and vector quantifiers needed to run Lyra, Google said. This release provides the tools necessary for developers to encode and decode audio with Lyra, optimized for the Android ARM 64-bit platform, with a version for Linux.
Features are waveform decoded via a generative model. Generative models are a special type of machine learning model well suited to recreating a complete audio waveform from a limited number of functions. Lyra’s architecture is very similar to traditional audio codecs, which have been the backbone of Internet communication for decades. While these traditional codecs are based on digital signal processing techniques, Lyra lies in the generative model’s ability to reconstruct a high quality speech signal.
Google implemented Lyra in its free Duo video calling app, and said it was making the code open source because it thinks it might be suitable for other apps. Google believes there are a number of applications Lyra could be suitable for, whether it’s archiving large amounts of speech, saving battery life, or alleviating network congestion in busy situations. emergency. We look forward to seeing the creativity that characterizes the open source community apply Lyra to deliver unique and impactful applications, Google said.
Lyra’s architecture is separated into two parts, the encoder and the decoder. When a person speaks on their phone, the encoder captures the distinctive attributes of their speech. These voice attributes, are extracted every 40 ms, and are then compressed and sent over the network. The purpose of the decoder is to convert the characteristics back to an audio waveform that can be played through the speaker on the listener’s phone.
There are also other applications for which Lyra may be particularly well suited, whether it is archiving large amounts of speech, saving battery by using the Lyra encoder, which is inexpensive in terms of computation, or reduce network congestion in emergency situations where many people are trying to make calls at the same time. All the code to run Lyra is free under the Apache license, with the exception of the kernel for which a shared library is provided pending a fully open source solution on a larger number of platforms.Currently, the open source Opus codec is the most widely used codec for WebRTC-based VOIP applications and, with an audio bit rate of 32 kbps, it achieves transparent voice quality, i.e. indistinguishable from original. However, if Opus can be used in more bandwidth-limited environments up to 6 kbps, it begins to exhibit degraded audio quality. Other codecs are able to run at comparable bitrates to Lyra’s (Speex, MELP, AMR), but they all suffer from increased artifacts and give robotic voice to the voice.
Lyra is currently designed to operate at 3 kbps. According to Google, listening tests show that Lyra outperforms any other codec at this bit rate and compares favorably to Opus 8 kbps, which can reduce bandwidth by over 60%. Lyra can be used anywhere where bandwidth requirements are insufficient for higher bit rates and where existing low bit rate codecs do not provide adequate quality. Making Lyra open source allows other developers to take it into account in their communications applications and take Lyra in new directions.
Source : Google
And you ?
What is your opinion on the subject?
See as well :
Apple adds two new voices to its Siri voice assistant and no longer offers female voices as the default choice
Apple continues to record conversations overheard by Siri despite its promise to stop, according to one of its former subcontractors who protests against the lack of reaction from the authorities
Amazon faces a privacy issue with its Sidewalk feature, which turns Alexa devices into neighborhood Wi-Fi networks, which owners must turn off.
Researchers have successfully tricked Google Assistant and Siri into performing various actions, using ultrasonic waves
Get the latest news delivered to your inbox
Follow us on social media networks