Qualcomm Patent | Augmented Reality Language Translation

小编映维 | 分类：Qualcomm | 2020年8月27日

Patent: Augmented Reality Language Translation

Publication Number: 20200272699

Publication Date: 20200827

Applicants: Qualcomm

Abstract

Methods, systems, and devices for language translation are described. A device (e.g., a user equipment (UE), a pair of Bluetooth earbuds or a Bluetooth headset) may identify a sound signal originating in an augmented reality environment. The sound signal may include a representation in a language (e.g., a language translated from an original language). The device may, in response to reception of the sound signal, determine a set of characteristics of the sound signal based in part on a set of measurements of the sound signal (e.g., an intensity of the sound signal, an angle of arrival of the sound signal, a pitch of the sound signal, a loudness of the sound signal) and apply one or more characteristics from at least one of the set of characteristics to an output of the sound signal provide a natural rendering of the sound signal at the device.

BACKGROUND

[0001] Wireless communications systems are widely deployed to provide various types of communication content such as voice, video, packet data, messaging, broadcast, and so on. These systems may be capable of supporting communication with multiple users by sharing the available system resources (e.g., time, frequency, and power). Examples of such multiple-access systems include fourth generation (4G) systems such as Long Term Evolution (LTE) systems, LTE-Advanced (LTE-A) systems, or LTE-A Pro systems, and fifth generation (5G) systems which may be referred to as New Radio (NR) systems, as well as wireless local area networks (WLAN), such as Wi-Fi (i.e., Institute of Electrical and Electronics Engineers (IEEE) 802.11) and Bluetooth-related technology. Some examples of wireless communications systems, such as those outlined above, may be capable of supporting an augmented reality system with multiple characters (e.g., users, players).

SUMMARY

[0002] An augmented reality system may support a fully immersive augmented reality experience, a non-immersive augmented reality experience, or a collaborative augmented reality experience. In some examples, an augmented reality environment may have multiple users from different areas of the world sharing in the augmented reality experience. Some examples of an augmented reality system may support language translation methods to further promote collaborative augmented reality experiences. These other methods, however, lack supporting a natural rendering of translated speech. The described techniques disclosed herein support translation techniques, such as speech translation, and more specifically augmented reality language translation to provide a natural rendering of translated speech to a target person in an augmented reality environment, by using one or more characteristics of a sound signal to deliver the natural rendering of the translated speech.

[0003] A method of language translation at a device is described. The method may include identifying a sound signal originating in an augmented reality environment, the sound signal including a representation in a language, determining a set of characteristics of the sound signal based on a set of measurements of the sound signal, applying, to the sound signal, one or more characteristics from at least one of the set of characteristics, and outputting the representation of the sound signal based on applying the one or more characteristics from the at least one of the set of characteristics.

[0004] An apparatus for language translation is described. The apparatus may include a processor, memory in electronic communication with the processor, and instructions stored in the memory. The instructions may be executable by the processor to cause the apparatus to identify a sound signal originating in an augmented reality environment, the sound signal including a representation in a language, determine a set of characteristics of the sound signal based on a set of measurements of the sound signal, apply, to the sound signal, one or more characteristics from at least one of the set of characteristics, and output the representation of the sound signal based on applying the one or more characteristics from the at least one of the set of characteristics.

[0005] Another apparatus for language translation is described. The apparatus may include means for identifying a sound signal originating in an augmented reality environment, the sound signal including a representation in a language, determining a set of characteristics of the sound signal based on a set of measurements of the sound signal, applying, to the sound signal, one or more characteristics from at least one of the set of characteristics, and outputting the representation of the sound signal based on applying the one or more characteristics from the at least one of the set of characteristics.

[0006] A non-transitory computer-readable medium storing code for language translation at a device is described. The code may include instructions executable by a processor to identify a sound signal originating in an augmented reality environment, the sound signal including a representation in a language, determine a set of characteristics of the sound signal based on a set of measurements of the sound signal, apply, to the sound signal, one or more characteristics from at least one of the set of characteristics, and output the representation of the sound signal based on applying the one or more characteristics from the at least one of the set of characteristics.

[0007] In some examples of the method, apparatuses, and non-transitory computer-readable medium described herein, the language includes a second language translated from an original language.

[0008] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for receiving the sound signal from a source of the sound signal in the augmented reality environment, and measuring at least one of an intensity of the sound signal, an angle of arrival of the sound signal, a pitch of the sound signal, a loudness of the sound signal, a distance between the device and the source of the sound signal in the augmented reality environment, a time of arrival of the sound signal at the device, or a time of departure of the sound signal from the source of the sound signal in the augmented reality environment, or a combination thereof based on receiving the sound signal, and where the set of measurements of the sound signal includes the intensity of the sound signal, the angle of arrival of the sound signal, the pitch of the sound signal, the loudness of the sound signal, the distance between the device and the source of the sound signal in the augmented reality environment, the time of arrival of the sound signal at the device, or the time of departure of the sound signal from the source of the sound signal in the augmented reality environment,* or a combination thereof*

[0009] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for identifying a time of departure of the sound signal from a source of the sound signal in the augmented reality environment based on the set of measurements of the sound signal, and determining a delay including a difference in time of arrival of the sound signal at a first microphone of the device and time of arrival of the sound signal at a second microphone of the device based on the set of measurements of the sound signal, and where the set of characteristics includes the time of departure of the sound signal and the delay associated with the difference in the times of the arrivals.

[0010] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for determining a difference in intensity associated with the sound signal based on the set of measurements of the sound signal, where the difference in intensity includes a difference between an intensity of the sound signal at a first microphone of the device and an intensity of the sound signal at a second microphone of the device, and where the set of characteristics includes the difference in intensity.

[0011] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for determining an angular offset between the device and a source of the sound signal in the augmented reality environment using a sensor of the device, determining a second set of characteristics that may be based on the angular offset, where the second set of characteristics includes at least one of an intensity of the sound signal, a pitch of the sound signal, a loudness of the sound signal, or a combination thereof, and applying, to the sound signal, one or more characteristics from at least one of the second set of characteristics that may be based on the angular offset, where outputting the representation of the sound signal may be based on applying the one or more characteristics from at least one of the second set of characteristics.

[0012] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for translating a representation of the sound signal from the language into a second language, where outputting the representation of the sound signal includes, and outputting the translated representation of the sound signal in the second language based on applying the one or more characteristics from the at least one of the set of characteristics.

[0013] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for establishing a connection with a second device based on a connection procedure, and receiving the sound signal from the second device in communication with the device, where identifying the sound signal may be based on receiving the sound signal from the second device in communication with the device.

[0014] Some examples of the method, apparatuses, and non-transitory computer-readable medium described herein may further include operations, features, means, or instructions for receiving the set of measurements of the sound signal from the second device in communication with the device based on the connection, where determining the set of characteristics of the sound signal may be based on receiving the set of measurements of the sound signal.

[0015] In some examples of the method, apparatuses, and non-transitory computer-readable medium described herein, the device includes a pair of Bluetooth earbuds or a Bluetooth headset.

[0016] In some examples of the method, apparatuses, and non-transitory computer-readable medium described herein, the device includes a UE.

[0017] In some examples of the method, apparatuses, and non-transitory computer-readable medium described herein, the representation includes speech in a verbal form or a written form.

BRIEF DESCRIPTION OF THE DRAWINGS

[0018] FIGS. 1 and 2 illustrates example of a wireless communications system for language translation that supports augmented reality language translation in accordance with aspects of the present disclosure.

[0019] FIGS. 3 and 4 illustrate example of a process flow that supports augmented reality language translation in accordance with aspects of the present disclosure.

[0020] FIGS. 5 and 6 show block diagrams of devices that support augmented reality language translation in accordance with aspects of the present disclosure.

[0021] FIG. 7 shows a block diagram of a language translation manager that supports augmented reality language translation in accordance with aspects of the present disclosure.

[0022] FIG. 8 shows a diagram of a system including a device that supports augmented reality language translation in accordance with aspects of the present disclosure.

[0023] FIGS. 9 through 11 show flowcharts illustrating methods that support augmented reality language translation in accordance with aspects of the present disclosure.

DETAILED DESCRIPTION

[0024] An augmented reality system may support a fully immersive augmented reality experience, a non-immersive augmented reality experience, or a collaborative augmented reality experience. For example, an augmented reality system may support perception of real and virtual sounds originating in an augmented reality environment, motion tracking to enable interactivity and location-awareness in the augmented reality environment, audio rendering to deliver audio augmented reality content in the augmented reality environment, and spatial rendering to display spatialized augmented reality content in the augmented reality environment. In some examples, an augmented reality environment may have multiple users sharing in the augmented reality experience. Some examples of an augmented reality system may support language translation methods to further promote collaborative augmented reality experiences (e.g., a conversation in a particular language can be translated live (e.g., in real time) to another language).These other methods, however, lack supporting a natural rendering of the translated speech. The described techniques disclosed herein support speech translation techniques, and more specifically augmented reality language translation to provide a natural rendering of translated speech to a target person in an augmented reality environment. In some cases, the translation may include using characteristics (e.g., an intensity, a distance, an angle of arrival, and the like) of a sound signal to deliver the natural rendering of the translated speech.

[0025] To attain the benefits of augmented reality language translation, and more specifically a natural rendering of translated speech, one or more characteristics of a sound signal may be determined, measured, and/or collected (e.g., via sensors). The one or more characteristics of a sound signal may relate to spatial hearing and support augmented reality language translation to provide a natural rendering of translated speech based in part on perception of the sound signal at or related to a target person. For example, by measuring at least one of an intensity of a sound signal, an angle of arrival of the sound signal, a pitch of the sound signal, a loudness of the sound signal, a distance between a person and a source of the sound signal in an augmented reality environment, a time of arrival of the sound signal at the person, or a time of departure of the sound signal from the source of the sound signal in the augmented reality environment, or a combination thereof (among other potential parameters or conditions), may support natural rendering of translated speech.

[0026] In some examples, a head-related transfer function also referred to as an anatomical transfer function may be a response relating to arrival characteristics of a sound signal and may be used to support a natural rendering of translated speech. A person may observe a sound spatial position based on differences between arrival characteristics of the sound signal. For example, a head-related transfer function may be a response that characterizes how an ear receives a sound signal from a point in space (e.g., in an augmented reality environment). The relationship between the spatial position of a sound source of the sound signal and the arrival characteristics of the sound signal at a target person may be represented by a pair of head-related transfer functions. A pair of head-related transfer functions for a person can be used to control outputting a sound signal to come from a particular point in space. Thus, in addition to applying one or more characteristics to a sound signal, the sound signal with the applied one or more characteristics may be filtered by a head-related transfer function, as merely one non-limiting example, to output (e.g. render) a representation (e.g., translated speech) of the sound signal at or to a target person, which may result in a natural rendering of the translated speech.

[0027] Aspects of the disclosure are initially described in the context of a wireless communications system. Aspects of the disclosure are then illustrated by and described with reference to process flows that relate to augmented reality language translation. Aspects of the disclosure are further illustrated by and described with reference to apparatus diagrams, system diagrams, and flowcharts that relate to augmented reality language translation.

[0028] FIG. 1 illustrates a wireless communications system 100 that supports augmented reality language translation in accordance with aspects of the present disclosure. In some examples, the wireless communications system 100 may be a multiple-access wireless communications system, for example, such as a fourth generation (4G) systems such as Long Term Evolution (LTE) systems, LTE-Advanced (LTE-A) systems, or LTE-A Pro systems, and fifth generation (5G) systems which may be referred to as New Radio (NR) systems, as well as wireless local area networks (WLAN), such as Wi-Fi (i.e., Institute of Electrical and Electronics Engineers (IEEE) 802.11) and Bluetooth-related technology. The wireless communications system 100 may include a base station 105, a device 110, a device 115 (which may in some cases be a paired device), a server 125, and a database 130. In some examples, the device 110 may be referred to herein as a listening device, while the device 115 may be referred to herein as a playback device. In some examples, either or both the device 110 and the device 115 may additionally or alternatively perform similar or same operations that support augmented reality language translation.

[0029] The device 110 and the device 115 may be stationary and/or mobile. In some examples, the device 110 may be a personal computing device, a desktop, a laptop, mobile computing device, or a head mounted display (HMD), etc. The device 110 may additionally, or alternatively, include or be referred to by those skilled in the art as a user equipment (UE), a user device, a smartphone, a BLUETOOTH device, a Wi-Fi device, a mobile station, a subscriber station, a mobile unit, a subscriber unit, a wireless unit, a remote unit, a mobile device, a wireless device, a wireless communications device, a remote device, an access terminal, a mobile terminal, a wireless terminal, a remote terminal, a handset, a user agent, a mobile client, a client, and/or some other suitable terminology.

[0030] The device 110 may be configured to allocate graphics resources, handle audio and/or video streams, and/or render multimedia content (e.g., render audio and/or video streams (e.g., augmented reality language translation)) for a augmented reality experience as described herein. For example, the device 110 may communicate one or more frames with the device 115 to provide an augmented reality experience. A frame may be a stereoscopic three dimensional (3D) visualization that is transmitted to the device 115 for presentation.

[0031] In some examples, the device 115 may be an HMD. As an HMD, the device 115 may be worn by a user. In some examples, the device 115 may be configured with one or more sensors to sense a position of the user and/or an environment surrounding the HMD to generate information when the user is wearing the HMD. The information may include movement information, orientation information, angle information, etc. regarding the device 115. In some examples, the device 115 may be configured with a microphone (e.g., a single microphone or an array of microphones) for capturing audio and one or more speakers for broadcasting the audio. The device 115 may also be configured with a set of lenses and a display screen for the user to view and be part of an augmented reality experience in an augmented reality system.

[0032] In some examples, an augmented reality environment may have multiple users from different areas of the world sharing in the augmented reality experience. Some examples of an augmented reality system may support language translation methods to further promote collaborative augmented reality experiences (e.g., a discussion in a particular language can be translated live (e.g., in real time) to another language). These other methods, however, may not support a natural rendering of the translated speech. That is, these methods may provide a mechanical translated speech output, rather than a natural rendering of translated speech, which leads to degraded user experience among other problems. In addition, these methods further pose challenges when there is more than one user speaking in a scene (e.g., frame, plane) in an augmented reality. As a result, these methods are lacking in capability to relate the translated speech to the appropriate person. The described techniques disclosed herein support speech translation, and more specifically augmented reality language translation to provide a natural rendering of translated speech to a target person in the augmented reality environment by using one or more characteristics of a sound signal to deliver the natural rendering of the translated speech.

[0033] To achieve the advantages of natural rendering of augmented reality language translation, the device 110 and/or the device 115 may measure and/or determine one or more characteristics of a sound signal as well as one or more aspects associated with the sound signal at or related to a target person. The one or more characteristics of a sound signal may, in some examples, relate to spatial hearing and support augmented reality language translation to provide a natural rendering of translated speech to a target person in the augmented reality environment. For example, the device 110 and/or the device 115 may measure at least one of an intensity of a sound signal, an angle of arrival of the sound signal, a pitch of the sound signal, a loudness of the sound signal, a distance between the device 110 and/or the device 115 and a source of the sound signal in an augmented reality environment, a time of arrival of the sound signal at the device 110 and/or the device 115, a time of departure of the sound signal from the source of the sound signal in the augmented reality environment, or one or more other characteristics, or a combination thereof.

[0034] In some examples, the device 110 and the device 115 may use function, such as a head-related transfer function which may also be referred to as an anatomical transfer function, that may be a response relating to arrival characteristics of a sound signal. A person may observe a sound spatial position based on differences between arrival characteristics of the sound signal. For example, a function, including but not limited to a head-related transfer function, may be a response that characterizes how an ear receives a sound signal from a source, such as a point in space (e.g., in an augmented reality environment). The relationship between the spatial position of a sound source of the sound signal and the arrival characteristics of the sound signal at or related to a target person (e.g., of the device 110 and/or the device 115) may be represented by a one or more functions, such as pair of head-related transfer functions. A pair of head-related transfer functions for a target person may, in some cases, be used to synthesize a binaural sound output that seems to come from a particular point in space. Thus, a head-related transfer function may define how a sound signal from a specific point in space will arrive at the target person. In some examples, the device 110 and/or the device 115 may detect the sound signal and determine one or more characteristics of the sound signal at a second location that is different from the location of the target person. For example, the device 110 (e.g., a UE or a first headphone of a pair of headphones) may be at a first location and detect the sound signal and determine one or more characteristics of the sound signal, while the device 115 (e.g., a second headphone of the pair of headphones) may be at a second location different from the first location. Here, the device 110 may perform the processes described herein, while the device 115 may broadcast the processed sound signal, as described herein.

[0035] In some examples, the device 110 and the device 115 may control (e.g., the rendering) of the sound signal reaching a listener’s ears. Controlling the ear input signals of the left and the right ear independently may allow the device 110 and the device 115 to encode the one or more characteristics (e.g., intensity, direction, angle) of a sound signal that may evoke the perception and localization of the sound signal in the augmented reality environment. Thus, the device 110 and the device 115 may for spatial sound signal rendering support channel separation at the ears of the listener, to enable the output of these one or more characteristics. By applying one or more characteristics to a sound signal, and outputting a representation (e.g., translated speech) of the sound signal based in part on applying the one or more characteristics, the device 110 and the device 115 may provide a natural rendering of translated speech to a target person in the augmented reality environment.

[0036] The device 115 may include Bluetooth-enabled devices capable of pairing with other Bluetooth-enabled devices (e.g., such as the device 110), which may include wireless headsets, earbuds, speakers, ear pieces, headphones, display devices (e.g., TVs, computer monitors), microphones, etc. The device 110 and the device 115 may be able to communicate directly with each other (e.g., using a peer-to-peer (P2P) or device-to-device (D2D) protocol, or Bluetooth protocol). By way of example, the device 115 (e.g., headset) may be connected to the device 110 (e.g., mobile phone) over a Bluetooth connection, or the like.

[0037] Bluetooth communications may refer to a short-range communication protocol and may be used to connect and exchange information between the device 110 and the device 115 (e.g., between mobile phones, computers, digital cameras, wireless headsets, speakers, keyboards, mice or other input peripherals, and similar devices). Bluetooth systems (e.g., aspects of the wireless communications system 100) may be organized using a master-slave relationship employing a time-division duplex protocol having, for example, defined time slots of 625 mu seconds, in which transmission alternates between the master device (e.g., the device 110) and one or more slave devices (e.g., the device 115). In some examples, the device 110 may generally refer to a master device, and the device 115 may refer to a slave device in the wireless communications system 100. As such, in some examples, a device may be referred to as either the device 110 or a device 115 based on the Bluetooth role configuration of the device. That is, designation of a device as either a device 110 or a device 115 may not necessarily indicate a distinction in device capability, but rather may refer to or indicate roles held by the device in the wireless communications system 100. Generally, device 110 may refer to a wireless communication device capable of wirelessly exchanging data signals with another device, and device 115 may refer to a device operating in a slave role, or to a short-range wireless device capable of exchanging data signals with the mobile device (e.g., using Bluetooth communication protocols).

[0038] A Bluetooth-enabled device may be compatible with certain Bluetooth profiles to use desired services. A Bluetooth profile may refer to a specification regarding an aspect of Bluetooth-based wireless communications between devices. That is, a profile specification may refer to a set of instructions for using the Bluetooth protocol stack in a certain way, and may include information such as suggested user interface formats, particular options and parameters at each layer of the Bluetooth protocol stack, etc. For example, a Bluetooth specification may include various profiles that define the behavior associated with each communication endpoint to implement a specific use case. Profiles may thus generally be defined according to a protocol stack that promotes and allows interoperability between endpoint devices from different manufacturers through enabling applications to discover and use services that other nearby Bluetooth-enabled devices may be offering. The Bluetooth specification defines device role pairs that together form a single use case called a profile. One example profile defined in the Bluetooth specification is the Handsfree Profile (HFP) for voice telephony, in which one device implements an Audio Gateway (AG) role and the other device implements a Handsfree (HF) device role. Another example is the Advanced Audio Distribution Profile (A2DP) for high-quality audio streaming, in which one device (e.g., device 110-a) implements an audio source device (SRC) role and another device (e.g., device 115-a) implements an audio sink device (SNK) role.

[0039] For a commercial Bluetooth-enabled device that implements one role, another device that implements the also corresponding role may be present within the radio range of the Bluetooth-enabled device. For example, in order for an HF device such as a Bluetooth headset to function according to the Handsfree Profile, a device implementing the AG role (e.g., a cell phone) may have to be present within radio range. Likewise, in order to stream high-quality mono or stereo audio according to the A2DP, a device implementing the SNK role (e.g., Bluetooth headphones or Bluetooth speakers) may have to be within radio range of a device implementing the SRC role (e.g., a stereo music player). A link 132 established between two Bluetooth-enabled devices (e.g., between the device 110 and the device 115) may provide for communications or services (e.g., according to some Bluetooth profile). Other Bluetooth profiles supported by Bluetooth-enabled devices may include Bluetooth Low Energy (BLE) (e.g., providing considerably reduced power consumption and cost while maintaining a similar communication range), human interface device profile (HID) (e.g., providing low latency links with low power requirements), etc.

[0040] The server 125 may be a computing system or an application that may be an intermediary node in the wireless communications system 100 between the device 110, or the device 115, or the database 130. The server 125 may include any combination of a data server, a cloud server, a server associated with an augmented reality service provider, proxy server, mail server, web server, application server (e.g., gaming application server), database server, communications server, home server, mobile server, or any combination thereof. The server 125 may also transmit to the device 110 or the device 115 a variety of augmented reality information, such as rendering instructions, configuration information, control instructions, and other information, instructions, or commands relevant to performing augmented reality language translation.

[0041] The database 130 may store data that may include graphics resources, audio and/or video streams, and/or rendered multimedia content (e.g., rendered audio and/or video streams (e.g., frames)) for an augmented reality environment, or commands relevant to augmented reality language translation for the device 110 and/or the device 115. The device 110 and the device 115 may retrieve the stored data from the database via the network 120 using communication links 135. In some examples, the database 130 may be a relational database (e.g., a relational database management system (RDBMS) or a Structured Query Language (SQL) database), a non-relational database, a network database, an object-oriented database, among others that stores the variety of information, such as instructions or commands relevant to augmented reality language translation.

[0042] The network 120 that may provide encryption, access authorization, tracking, Internet Protocol (IP) connectivity, and other access, computation, modification, and/or functions. Examples of network 120 may include any combination of cloud networks, local area networks (LAN), wide area networks (WAN), virtual private networks (VPN), wireless networks (using 802.11, for example), cellular networks (using 3G, 4G, LTE, or NR systems (e.g., 5G for example), etc. Network 120 may include the Internet.

[0043] The base station 105 may wirelessly communicate with the device 110 and the device 115 via one or more base station antennas. Base station 105 described herein may include or may be referred to by those skilled in the art as a base transceiver station, a radio base station, an access point, a radio transceiver, a NodeB, an eNodeB (eNB), a next-generation Node B or giga-nodeB (either of which may be referred to as a gNB), a Home NodeB, a Home eNodeB, or some other suitable terminology. The device 115 and the device 115 described herein may be able to communicate with various types of base stations and network equipment including macro eNBs, small cell eNBs, gNBs, relay base stations, and the like.

[0044] The communication links 135 shown in the wireless communications system 100 may include uplink transmissions from the device 115 and/or the device 115 to the base station 105, or the server 125, and/or downlink transmissions, from the base station 105 or the server 125 to the device 115 and the device 115. The downlink transmissions may also be called forward link transmissions while the uplink transmissions may also be called reverse link transmissions. The communication links 135 may transmit bidirectional communications and/or unidirectional communications. The communication links 135 may include one or more connections, including but not limited to, 345 MHz, Wi-Fi, BLUETOOTH, BLUETOOTH Low-Energy, cellular, Z-WAVE, 802.11, peer-to-peer, LAN, wireless local area network (WLAN), Ethernet, FireWire, fiber optic, and/or other connection types related to wireless communications systems.

[0045] FIG. 2 illustrates an example of a wireless communications system 200 that supports augmented reality language translation in accordance with aspects of the present disclosure. In some examples, the wireless communications system 200 may implement aspects of wireless communications system 100. For example, the wireless communications system 200 may include a device 110-a, a device 115-a, which may be examples of the corresponding devices described with reference to FIG. 1. The wireless communications system 200 may illustrate an augmented reality system, and more specifically FIG. 2 may illustrate the device 110-a and the device 115-a capability to localize a sound signal within an augmented reality environment, as well as provide a natural rendering of augmented reality language translation to the sound signal.

[0046] In an augmented reality environment, an audio source 205 may output (e.g., transmit, broadcast) a sound signal. In some examples, the audio source 205 may directly or indirectly output a sound signal towards the device 110-a or the device 115-a. For example, an audio source 205 may be another user in the augmented reality environment speaking to a user of the device 110-a and the device 115-a. In some examples, the sound signal emitted by the audio source 205 may be in a language not understood by the user of the device 110-a and the device 115-a. As such, it may be necessary to translate the language into a second language understood by the user, as described further in detail below.

[0047] Alternatively, the audio source 205 may be audible gestures, audio signaling devices, audio playback devices, mechanical systems, and so forth. In the example of FIG. 2, either or both the device 110-a and the device 115-a may receive the sound signal from the audio source 205 and process the sound signal appropriately (e.g., sound localization, augmented reality language translation). A portion or all of the processing of the sound signal may be performed by the device 110-a and/or the device 115-a.

[0048] By way of example, the device 110-a may be a listening device, which may receive the sound signal from the audio source 205. After receiving, or as part of receiving the sound signal, the device 110-a may localize the sound signal within the augmented reality environment. By localizing the sound signal, the device 110-a may be capable of determining one or more aspects related to, such as a spatial origin of, the sound signal within the augmented reality environment. To localize the sound signal, the device 110-a may measure one or more characteristics of the sounds signal, such as, at least one of an intensity of the sound signal, an angle of arrival of the sound signal, a pitch of the sound signal, a loudness of the sound signal, a distance between the device 110-a and the audio source 205 of the sound signal in the augmented reality environment, a time of arrival of the sound signal at the device 110-a, or a time of departure of the sound signal from the audio source 205 of the sound signal in the augmented reality environment, or a combination thereof. In further examples, the one or more characteristics may sampled for desired frequencies (e.g., a range of audible frequencies for humans, such as 20 Hz to 20 kHz).

[0049] In an example, the device 110-a may identify a time of departure of the sound signal from the audio source 205 based in part on a set of measurements (e.g., a time of arrival) of the sound signal at different devices (e.g., microphones of an array of microphones) associated with the device 110-a, and determine a delay including a difference in time of arrival of the sound signal at the different devices. In this example, at least a subset of the set of characteristics of the sound signal may include the delay (e.g., difference in time of arrival of the sound signal). In other examples, the device 110-a may determine a difference in intensity associated with the sound signal based in part on the set of measurements of the sound signal at different devices (e.g., microphones of an array of microphones) associated with the device 110-a. Here, at least a subset of the set of characteristics of the sound signal may include the difference in intensities of the sound signal at different devices (e.g., microphones of an array of microphones) associated with the device 110-a.

[0050] The device 110-a may use a subset or the set of characteristics determined of the sound signal to localize the sound signal in the augmented reality environment. Although the above localization of the sound signal is performed by the device 110-a, the device 115-a may be additionally, or alternatively, be capable of performing the localization of the sound signal. Alternatively, the device 110-a may transmit the set of measurements of the sound signal to the device 115-a via communication link 220 (e.g., wired or wireless connection).

[0051] Where the sound signal emitted by the audio source 205 may be in a language not understood by the user of the device 110-a and the device 115-a, it may be necessary to translate the language into a language understood by the user, as described further in detail below. The sound signal may include a representation in a language that may be speech in a verbal form or a written form. Thus, the device 110-a may convert the sound signal (e.g., speech) from verbal form to written form (e.g., text). After converting the sound signal from verbal form to written form, the device 110-a may translate the original language of the sound signal to a second language. In some examples, the device 110-a may identify the second language based in part on a preference (e.g., a default language) of the user associated with the device 110-a and the device 115-a. The device 110-a may then convert the translated speech from written form back into verbal form based in part on the preference.

[0052] In some examples, when the device 110-a is the listening device and the device 115-a is a playback device, the device 110-a may forward the translated representation (e.g., translated speech) of the sound signal to the device 115-a for playback. To provide a natural rendering of the translated speech of the sound signal by the device 115-a, the device 110-a may also transmit additional information (e.g., the set of characteristics of the sound signal) to the device 115-a. For example, the additional information may include the intensity of the sound signal, the angle of arrival of the sound signal, the pitch of the sound signal, the loudness of the sound signal, the distance between the device and the source of the sound signal in the augmented reality environment, or a combination thereof.

[0053] The device 115-a may receive the translated representation (e.g., translated speech) of the sound signal, as well as the additional information. Using the additional information provided by the device 110-a, the device 115-a may determine a comparative delay (e.g., a difference in time of arrival of the sound signal in left and right ears), or a difference in intensity associated with the sound signal (e.g., a difference in intensity of the sound signal in left and right ears), or both. In some examples, the device 115-a may consider base times along with differences in time observed at both channels (e.g., earbuds (e.g., ears of a user) of the device 115-a). For example, a first sentence associated with the sound signal may have been spoken at x time, and it had perceived delay of .DELTA.x between left and right ear. While a second sentence associated with the sound signal may have been spoken at y time, and it had perceived delay of .DELTA.y between left and right. The paired device may use, during the playback, one or more of x, y, .DELTA.x, and .DELTA.y.

[0054] The device 115-a may determine a second set of characteristics of the sound signal that may include subset or the set of characteristics determined by the device 110-a, as well as the comparative delay and/or the difference in intensity determined by the device 115-a. The device 115-a may then apply, a subset of the second set or the entire second set of characteristics to the sound signal. The device 115-a may then output (e.g., playback) the translated representation (e.g., translated speech) of the sound signal to the user of the device 115-a. Thus, the device 115-a may be capable of outputting a sound signal (e.g., a translated sound signal) and controlling its perception perceived by a listener by using one or more characteristics of the sound signal giving the sound signal a natural rendering in the augmented reality environment.

[0055] In some examples, perceived localization of the audio source 205 may be stale (e.g., no longer correct, outdated) when playing back the translated representation (e.g., translated speech) of the sound signal to the user of the device 115-a, due to slight delay between original speech and playback after translation. For example, a user wearing the device 115-a (e.g., HMD) may move around in the augmented reality environment. To account for the movement, the device 115-a may use one or more sensors (e.g., a motion sensor, a magneto sensor) to offset an angular placement away or towards the audio source 205. As such, perceived sound localization may be accurate while playback.

[0056] By way of example, a human speaker, the device 110-a and the device 115-a may in two-dimensional augmented reality environment, the device 110-a may use an array of microphone to determine an angular position that is an angular placement of the human speaker relative to the device 110-a. The device 110-a may, in some examples, then determine an absolute angular placement of the human speaker with respect to magnetic north direction. This absolute angular placement may be communicated to the device 115-a by the device 110-a. As such, the device 115-a may be aware of the actual position of the human speaker by using a sensor (e.g., magneto meter) and the absolute angular placement.

[0057] The device 115-a may use one or more sensors (e.g., a motion sensor, a magneto sensor) to offset an angular placement away or towards the audio source 205. As such, perceived sound localization may be accurate while playback, For example, the device 110-a (and/or the device 115-a) may determine an angular offset between the device 110-a (and/or the device 115-a) and the audio source 205 using a sensor of the device 110-a (and/or the device 115-a). The device 110-a (and/or the device 115-a) may adjust, modify, or determine another set of characteristics that are based in part on the angular offset. The set of characteristics may include at least one of an intensity of the sound signal, a pitch of the sound signal, a loudness of the sound signal, or a combination thereof associated with the angular offset of the audio source 205. The device 110-a (and/or the device 115-a) may apply, to the sound signal, one or more characteristics from the set of characteristics that are based in part on the angular offset. In other examples, the audio source 205 may move (e.g., change locations) within the augmented reality environment. In this examples, the device 110-a and/or the device 115-a may use latest placement samples or original samples to determine an angular placement of the audio source 205. If the audio source 205 continuous to broadcast sound signals (e.g. a user continue to speak) the device 110-a and/or the device 115-a may use latest placement samples (e.g., location information) for sound localization. Thus, the device 115-a may be capable of outputting a sound signal (e.g., a translated sound signal) and controlling its perception perceived by a listener, even when movement in the augmented reality environment exists, by using one or more characteristics of the sound signal giving the sound signal a natural rendering in the augmented reality environment.

……
……
……

本文链接：https://patent.nweon.com/12913

Qualcomm Patent | Augmented Reality Language Translation

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Qualcomm Patent | Augmented Reality Language Translation

您可能还喜欢...

Qualcomm Patent | Reducing Seam Artifacts in 360-Degree Video

Qualcomm Patent | Mapping networked devices

Qualcomm Patent | Customizable connected mode discontinuous reception

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘