TI senior engineer's analysis and sharing of voice interface

The voice interface has become a new entry point to change the way human-computer interaction. How do these systems work? What are the hardware requirements for creating such a device? As voice control interfaces became more popular, an engineer at Texas Instruments gained an in-depth understanding of the technology and shared its knowledge and perspective on the technology.

What is the voice interface?

Speech recognition technology has been around us since the 1950s. At that time, Bell Labs engineers created a system that could identify a single number. However, speech recognition is only part of the complete speech interface technology. The voice interface contains all aspects of the traditional user interface: it can present information and provide a way for users to manipulate. In the voice interface, manipulation or even the presentation of some information will be done by voice. The voice interface option may also be configured on some traditional user interfaces such as buttons or displays.

The first voice interface device that most people encounter is most likely a mobile phone, or a very basic program for converting a language into text on a personal computer. However, these devices operate very slowly, have inaccurate recognition, and have limited vocabulary recognition.

What is it that turns speech recognition from an adjunct function into a hot technology in the computer world? First, today's computing power and algorithm performance are significantly improved (if you have an understanding of the hidden Markov model, you will have a more intuitive understanding of this). Second, the application of cloud technology and big data analytics has improved speech recognition and improved the speed and accuracy of recognition.

TI senior engineer's analysis and sharing of voice interface

Add speech recognition to your device

Some people often have questions about how to add some kind of voice interface to a project. In fact, TI offers several different voice interface products, including the SitaraTM family of ARM® processors and the C5000TM DSP family, all with voice processing capabilities. The two series of products have their own advantages and are suitable for different applications.

When choosing between DSP and ARM solutions, the key factor to consider is whether or not the device can leverage the cloud voice platform. There are three application scenarios: the first is offline, and all processing takes place on the local device. The second is online, through cloud-based voice processing devices such as Amazon's Alexa, Google Assistant or IBM Watson; the third is a mixture of the two.

Offline: Car Voice Control

From the current development trend, people seem to want everything to be connected to the Internet. However, whether for cost reasons or lack of reliable network connectivity, in some applications, the meaning of connecting networks is actually small. In modern automotive applications, many entertainment information systems use offline voice interface systems. These voice interface systems typically use only a limited set of commands, such as "calling a call," "playing music," and "increasing or lowering the volume." Although the speech recognition algorithms of traditional processors have made significant progress, they are still not satisfactory. In such a situation, DSPs such as the C55xx can provide the best performance for the system.

Online: Smart Home Center

A lot of hot topics about voice interfaces are centered around interconnected devices such as Google Home and Amazon Alexa. Because Amazon allows third parties to access its voice processing ecosystem with Alex voice services, their development in this area has attracted attention. In addition, other cloud services such as Microsoft Azur can also provide speech recognition services and similar functions. It is worth noting that the sound processing of these devices all happen in the cloud.

Whether it is worthwhile to provide uplink data to voice service providers for this convenient integration is entirely up to the user. However, cloud service providers have taken on the main job, and equipment vendors need to do very simple. In fact, because the voice synthesis part of the interface also occurs in the cloud, Alexa only needs to complete the simplest function, that is, play and record the recording file. Since no special signal processing functions are required, the ARM processor is sufficient to handle the interface work. This means that if your device is already equipped with an ARM processor, you may be able to integrate a cloud computing voice interface.

In fact, it is also very important to pay attention to services that Alexa cannot provide. Alexa does not directly perform any kind of device control or cloud integration. Many of Alexa's "smart devices" have cloud computing capabilities, which are provided by developers and can use Alexa's voice processing capabilities to drive drivers into existing cloud applications. For example, if you tell Alexa to order a pizza, your favorite pizza shop needs to develop a “skill” for Alexa. This skill is a code that defines the work content when you order a pizza. Alexa calls this skill every time you order a pizza. This skill embeds an online ordering system that can place orders for you. Similarly, smart home device manufacturers must implement the skills of how Alexa interacts with local devices and online services. Amazon comes with many of these skills, plus the skills provided by third-party developers, even if you don't develop any skills, Alexa devices can still be very useful.

TI senior engineer's analysis and sharing of voice interface

Mixing: Interconnected thermostat

Sometimes, even if we don't have an internet connection, we have a need to ensure that some of the basic features of the device work properly. For example, if the thermostat does not adjust the temperature autonomously when the Internet is not connected, this can be a very troublesome problem. In order to avoid this problem, a good product designer will design some local sound processing functions to achieve a seamless connection. To do this, the system must have a DSP, such as the C55XX for local voice processing and the ARM processor for connecting the networked interface to the cloud.

What is voice triggering?

You may have noticed that until now we have not mentioned the true magic of the new generation of voice assistants: that is, always pay attention to "trigger vocabulary." How will they track the sounds you make anywhere in the room, or how do you hear your voice when the device plays audio? There is nothing particularly magical about achieving these, just some intelligent software. This type of software is independent of the voice interface of the cloud and can also be run offline.

The most understandable part of this system is the "wake up vocabulary." The wake-up vocabulary is a simple local speech recognition program that looks for a single vocabulary in the received audio signal by continuous sampling. Since most voice services are happy to accept audio without wake-up vocabulary, the vocabulary does not need to specify any special voice platform. Because the requirements for implementing this functionality are relatively low, operations can be done on ARM processors by utilizing open source databases such as Sphinx or KITT.AI.

In order to hear the sound you make anywhere in the room, the speech recognition device uses a process called beamforming. Most importantly, the source of the sound is determined by comparing the arrival times of different sounds with the distance between the microphones. Once the position of the target sound is confirmed, the device uses audio processing techniques such as spatial filtering to further reduce noise and enhance signal quality. The implementation of beamforming depends on the layout of the microphone. A true 360-degree recognition requires a non-linear microphone array (usually a circular shape). For wall-mounted devices, only two microphones are required to enable 180-degree spatial discrimination.

The last resort of the voice assistant is automatic echo cancellation (AEC). AEC is somewhat similar to noise canceling headphones, but the application is just the opposite. The algorithm is implemented by using an output audio signal such as known music. In noise canceling headphones to take advantage of this to eliminate external noise, AEC eliminates the effect of the output signal on the input signal on the microphone. The device can ignore the audio it produces and it will still be received no matter what content the speaker plays. Achieving AEC requires a lot of calculations, which works best in DSP.

In order to implement all of the above mentioned functions such as wake-up identification, beamforming, and AEC, the ARM processor is required to work with the DSP: DSP enhances all signal processing functions, while the ARM processor controls device logic and interfaces. DSPs can play an important role in performing input data pipelines, thereby minimizing processing delays and providing a better user experience. ARM is free to run advanced operating systems such as Linux to control other devices. Such advanced features all occur locally, and if a cloud service is used, only a single voice file containing the final processing results will be received.

in conclusion

The voice interface seems to have gained a lot of popularity and will appear in our lives in different forms for a long time to come. Although there are many different ways to implement voice interface services, TI can provide you with the ideal choice no matter what device your application requires.

EPON WIFI ONU

There are two types of EPON systems: one is a system that uses 2 wavelengths; the other is a system that uses 3 wavelengths.
For a two-wavelength system, its downlink wavelength is 1510 nm to transmit downlink voice, data and digital video services; the uplink wavelength is 1310nm to transmit uplink voice and video on-demand and download data request signals. The two-way transmission rate of this system is 1.25 Gb/s, even if the OBD split ratio is 32, it can transmit 20km.
For the three-wavelength system, in addition to the downlink wavelength of 1510 nm and the uplink wavelength of 1310 nm, a transmission window with a downlink wavelength of 1550 nm (1530 ~ 1565 nm) is added. The new window is used to transmit downlink CATV service or DWDM service. The CATV service can be either an analog video signal or an MPEG-2 digital video signal. When the splitting ratio of this system is 32, it can transmit 18 km.
EPON is located between the business network interface and the user network interface, connected to the business node through SNI, and connected to user equipment through UNI. EPON system is mainly composed of optical line terminal (OLT), optical distribution network (ODN) and optical network unit.
In the EPON system, the OLT is not only a switch or router, but also a multi-service providing platform, which provides optical fiber interfaces for passive optical fiber networks. According to the development trend of Ethernet to metropolitan area network and wide area network, OLT will provide 1 Gb/s and 10 Gb/s Ethernet interfaces. In addition to supporting traditional voice, ordinary telephone lines, and other types of T1/E1 interfaces, the OLT also supports SONET connections at ATM, FR, and OC3/12/48/192 rates. The OLT in the EPON can be configured with multiple optical line cards as required to connect with 16 to 64 ONUs. In EPON, the maximum distance from the OLT to the ONU can reach 20 km. If an optical fiber amplifier (active repeater) is used, the distance can be extended.

Epon Wifi Onu,Epon 1Ge Wifi Onu,Epon 4Ge Wifi Onu,Wifi Onu, FTTX solution, HGU ONU

Shenzhen GL-COM Technology CO.,LTD. , https://www.szglcom.com