Facebook Patent | Texture memory storage

Patent: Texture memory storage

Drawings: Click to check drawins

Publication Number: 20210192677

Publication Date: 20210624

Applicant: Facebook

Abstract

In one embodiment, an apparatus, coupled to a computing system, may include a first-level of data bus comprising first-level data lines. The apparatus may include second-level data buses each including second-level data lines. Each second-level data bus may be coupled to a memory unit. The second-level data lines of each second-level data bus may correspond to a subset of the first-level data lines. The apparatus may include third-level data buses each including third-level data lines. Each third-level data bus may be coupled to a sub-level memory unit. The third-level data lines of each third-level data bus may correspond to a subset of the second-level data lines of a second-level data bus along a structural hierarchy. The apparatus may be configured to allow the computing system to load a data block from the first-level data lines to sub-level memory units through the third-level data buses excluding multiplexing operations.

Claims

  1. An apparatus, coupled to a computing system, the apparatus comprising: a first-level of data bus comprising a plurality of first-level data lines; a plurality of second-level data buses each comprising a plurality of second-level data lines, wherein each second-level data bus is coupled to a memory unit, and wherein the plurality of second-level data lines of each second-level data bus corresponds to a subset of the plurality of first-level data lines; and a plurality of third-level data buses each comprising a plurality of third-level data lines, wherein each third-level data bus is coupled to a sub-level memory unit, and wherein the plurality of third-level data lines of each third-level data bus corresponds to a subset of the plurality of second-level data lines of a second-level data bus along a structural hierarchy, wherein the apparatus is configured to allow the computing system to load a data block from the plurality of first-level data lines to a plurality of sub-level memory units through the plurality of third-level data buses excluding multiplexing operations.

  2. The apparatus of claim 1, wherein the data block is associated with a plurality of texels and is retrieved from a memory block, and wherein the plurality of texels is stored in the memory block in a texel order that maps the plurality of texels to respective third-level data buses such that each subset of the plurality of texels that have the same associated low order address bits are read onto the same third-level data bus.

  3. The apparatus of claim 2, wherein the texel order is determined based on a two-dimensional array used to organize the plurality of texels before the plurality of texels is stored in the memory block, and wherein each of the plurality of texels is associated with a low order address bit of a first address associated with a position that texel in the two-dimensional array.

  4. The apparatus of claim 2, wherein the texel order of the plurality of texels in the memory block is determined based on one or more mapping rules, and wherein the one or more mapping rules map the plurality of texels from a two-dimension array into a one-dimensional array in the texel order.

  5. The apparatus of claim 4, wherein the one or more mapping rules map the plurality of texels from the two-dimensional array into the one-dimensional array using a swizzle pattern.

  6. The apparatus of claim 4, wherein the one or more mapping rules map the plurality of texels from the two-dimensional array into the one-dimensional array using an interleaved swizzle pattern.

  7. The apparatus of claim 2, wherein the texel order of the plurality texels stored in the memory block maps the plurality of texels to respective third-level data buses using a mapping relationship applicable to a plurality of texel sizes comprising at least an 8-bit texel size, a 16-bit texel size, or a 32-bit texel size.

  8. The apparatus of claim 1, wherein each third-level data bus is associated with a sub-level memory unit of the plurality of sub-level memory units, and wherein each sub-level memory unit is associated with the same low order address bits.

  9. The apparatus of claim 1, wherein the structural hierarchy causes each bit of the data block to be directly loaded into a corresponding sub-level memory unit excluding multiplexing operations.

  10. The apparatus of claim 1, wherein each memory unit comprises a subset of sub-level memory units of the plurality of sub-level memory units.

  11. One or more computer-readable non-transitory storage media, associated with a computing system coupled to an apparatus, embodying software that is operable when executed to: load a data block onto a first-level of data bus comprising a plurality of first-level data lines; transmit, by the first level data bus of the apparatus, the data block to a plurality of second-level data buses each comprising a plurality of second-level data lines, wherein each second-level data bus is coupled to a memory unit, and wherein the plurality of second-level data lines of each second-level data bus corresponds to a subset of the plurality of first-level data lines; and transmit, by each second level data bus of the apparatus, a portion of the data block onto a plurality of third-level data buses each comprising a plurality of third-level data lines, wherein each third-level data bus is coupled to a sub-level memory unit, and wherein the plurality of third-level data lines of each third-level data bus corresponds to a subset of the plurality of second-level data lines of a second-level data bus along a structural hierarchy, wherein the apparatus is configured to allow the computing system to load the data block from the plurality of first-level data lines to a plurality of sub-level memory units through the plurality of third-level data buses excluding multiplexing operations.

  12. The media of claim 11, wherein the data block is associated with a plurality of texels and is retrieved from a memory block, and wherein the plurality of texels is stored in the memory block in a texel order that maps the plurality of texels to respective third-level data buses such that each subset of the plurality of texels that have the same associated low order address bits are read onto the same third-level data bus.

  13. The media of claim 12, wherein the texel order is determined based on a two-dimensional array used to organize the plurality of texels before the plurality of texels is stored in the memory block, and wherein each of the plurality of texels is associated with a low order address bit of a first address associated with a position of that texel in the two-dimensional array.

  14. The media of claim 12, wherein the texel order of the plurality of texels in the memory block is determined based on one or more mapping rules, and wherein the one or more mapping rules map the plurality of texels from a two-dimension array into a one-dimensional array in the texel order.

  15. The media of claim 14, wherein the one or more mapping rules map the plurality of texels from the two-dimensional array into the one-dimensional array using a swizzle pattern.

  16. A method comprising, by a computing system coupled to an apparatus: loading a data block onto a first-level of data bus comprising a plurality of first-level data lines; transmitting, by the first level data bus of the apparatus, the data block to a plurality of second-level data buses each comprising a plurality of second-level data lines, wherein each second-level data bus is coupled to a memory unit, and wherein the plurality of second-level data lines of each second-level data bus corresponds to a subset of the plurality of first-level data lines; and transmitting, by each second level data bus of the apparatus, a portion of the data block onto a plurality of third-level data buses each comprising a plurality of third-level data lines, wherein each third-level data bus is coupled to a sub-level memory unit, and wherein the plurality of third-level data lines of each third-level data bus corresponds to a subset of the plurality of second-level data lines of a second-level data bus along a structural hierarchy, wherein the apparatus is configured to allow the computing system to load the data block from the plurality of first-level data lines to a plurality of sub-level memory units through the plurality of third-level data buses excluding multiplexing operations.

  17. The method of claim 16, wherein the data block is associated with a plurality of texels and is retrieved from a memory block, and wherein the plurality of texels is stored in the memory block in a texel order that maps the plurality of texels to respective third-level data buses such that each subset of the plurality of texels that have the same associated low order address bits are read onto the same third-level data bus.

  18. The method of claim 17, wherein the texel order is determined based on a two-dimensional array used to organize the plurality of texels before the plurality of texels is stored in the memory block, and wherein each of the plurality of texels is associated with a low order address bit of a first address associated with a position of that texel in the two-dimensional array.

  19. The method of claim 17, wherein the texel order of the plurality of texels in the memory block is determined based on one or more mapping rules, and wherein the one or more mapping rules map the plurality of texels from a two-dimension array into a one-dimensional array in the texel order.

  20. The method of claim 19, wherein the one or more mapping rules map the plurality of texels from the two-dimensional array into the one-dimensional array using a swizzle pattern.

Description

PRIORITY

[0001] This application is a continuation under 35 U.S.C. .sctn. 120 of U.S. patent application Ser. No. 16/589,655, filed 1 Oct. 2019, which claims the benefit, under 35 U.S.C. .sctn. 119(e), of U.S. Provisional Patent Application No. 62/755,313, filed 2 Nov. 2018, which is incorporated herein by reference.

TECHNICAL FIELD

[0002] This disclosure generally relates to artificial reality, such as virtual reality and augmented reality.

BACKGROUND

[0003] Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured content (e.g., real-world photographs). The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Artificial reality may be associated with applications, products, accessories, services, or some combination thereof, that are, e.g., used to create content in an artificial reality and/or used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

SUMMARY OF PARTICULAR EMBODIMENTS

[0004] Particular embodiments described herein relate to a method of using a multi-level memory architecture with a number of independent memory blocks in texel buffer to allow the texel data to be loaded from texel memory to the texel buffer without multiplexing operations. The system may use a swizzle pattern to store 2D texel arrays (e.g., 4.times.4 texel array, 8.times.4 texel array) in one or more texel memory units (e.g., 256-bit memory unit) each of which can be read out with one read operation. The system may convert input stream data into texels with required formats and group the texels into texel arrays. The system may generate addresses for texels in each texel array based on a number of mapping rules which map the texels to memory space in a swizzle pattern or an interleaved swizzle pattern that allows that texel array to be contained within a texel memory unit that can be read out using one read operation. The system may store the texels in one or more memory units in the swizzle pattern as determined by the mapping rules. The system may support a number of texel data formats (e.g., RGBA formats, distance field/color index format) and a number of texel sizes (e.g., 8-bit texel, 16-bit texel, 32-bit texel). The system may parallelly retrieve a texel array from a memory unit (which stores the texel array in the swizzle pattern) using one read operation, no matter which supported format is used by the texels. The system may include a number of logic banks which are mapped to an address place in an interleaved order. The texel data stored in texel memory may be retrieved from the texel memory in 256-bit data units (i.e., 256-bit words) each including four 64-bit sub-units (i.e., 64-bit sub-words) which can be directed routed into four quad texel blocks through 64-bit data bus without multiplexing. Each 64-data unit may be divided into four 16-bit data units which can be directed routed into four texel buffer blocks in each quad texel block without multiplexing. Particular embodiments of the system may allow the texel data to be addressed and retrieved from texel memory in smaller data units (e.g., 64-bit data units or sub-words) than a 256-bit unit to reduce the wasted read operations and improve memory access efficiency. Particular embodiments of the system reduce the wasted read operations for retrieving texel arrays from memory, minimize the amount of multiplexing required to support multiple formats of texel data, eliminate multiplexing needed for loading texel data from texel memory into texel buffer, and provide faster memory reading with reduced power consumption and operation logic usage.

[0005] The embodiments disclosed herein are only examples, and the scope of this disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed above. Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However, any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof are disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] FIG. 1A illustrates an example artificial reality system.

[0007] FIG. 1B illustrates an example eye display system of the headset system.

[0008] FIG. 2 illustrates a system diagram for a display engine.

[0009] FIGS. 3A-3B illustrate an example 2.times.2 texel array stored in two 32 bytes memory blocks in traditional way.

[0010] FIGS. 4A-4D illustrate an example 4.times.4 16-bit texel array stored in a 32-byte memory unit with a swizzle pattern.

[0011] FIGS. 5A-5D illustrate an example of two 2.times.2 8-bit texel arrays stored in an 8-byte memory unit with an interleaved swizzle pattern.

[0012] FIGS. 5E-5F illustrate an example 8.times.4 8-bit texel array stored in a 32-byte memory unit in an interleaved swizzle pattern.

[0013] FIGS. 6A-6E illustrate an example 2.times.2 32-bit texel array divided into two 2.times.2 16-bit texel arrays to be stored in 8-byte memory units in a swizzle pattern.

[0014] FIGS. 6F-6G illustrate an example 4.times.4 32-bit texel array which is mapped to a 64-byte memory unit in a swizzle pattern.

[0015] FIG. 7 illustrates example 256-byte memory blocks (e.g., 710, 720, 730) for storing 8-bit texels, 16-bit texels, and 32-bit texels, respectively.

[0016] FIG. 8 illustrates four example RGBA texel data formats that are supported by the display engine

[0017] FIG. 9 illustrates example texel formats for distance field and color index.

[0018] FIG. 10 illustrates example 32-bit RGBA formats that are split into two half 16-bite RGBA texel data.

[0019] FIG. 11A illustrates example mask formats that are supported by the system.

[0020] FIG. 11B illustrates example sub-type alpha masks which are stored by replicating each alpha mask.

[0021] FIG. 12A illustrates an example diagram showing logic bank structure of texture memory.

[0022] FIG. 12B illustrates an example pattern for mapping the logic banks to corresponding addresses.

[0023] FIG. 13A illustrates an example diagram for filter blocks of pixel block.

[0024] FIG. 13B illustrates an example diagram for quad buffer block.

[0025] FIGS. 14A-14B illustrate an example 8.times.8 texel array stored in 16 independent texel buffer blocks to allow any 4.times.4 texel array to be read in one read operation.

[0026] FIG. 14C illustrates an example process for loading texel data from texel memory to texel buffer blocks without multiplexing.

[0027] FIG. 15 illustrates an example method for storing texels arrays in texel memory in a swizzle pattern.

[0028] FIG. 16 illustrates an example computer system.

DESCRIPTION OF EXAMPLE EMBODIMENTS

[0029] Artificial reality systems may use a swizzle pattern to store a 2D texel array (e.g., 2.times.2, 4.times.4, 16.times.16, etc.) in texel memory. However, when retrieving the 2D texel array from memory, traditional artificial reality systems need excessive multiplexing operations for reading the texel data from texel memory, selecting and assembling texels into appropriate groups, and loading the necessary texels in appropriate pattern into texel buffer. The excessive multiplexing operations make the data loading processes very inefficient. Moreover, although the swizzle patterns allow efficient reading of aligned texel arrays, reading unaligned texel arrays may pick up some texels that are not needed, and therefore have wasted read operations.

[0030] To solve these problems, particular embodiments of the system may use a multi-level memory architecture with a number of independent memory blocks in texel buffer to allow the texel data to be loaded from texel memory to the texel buffer without multiplexing operations. Particular embodiments of the system may use a swizzle pattern to store 2D texel arrays in one or more memory units (e.g., 256-bit memory unit), each of which can be read out with one read operation. The texel data stored in texel memory may be retrieved from the texel memory in 256-bit data units (i.e., 256-bit words) each including four 64-bit sub-units (i.e., 64-bit sub-words) which can be directed routed into four quad texel blocks through 64-bit data bus without multiplexing. Each 64-data unit may be divided into four 16-bit data units which can be directed routed into four texel buffer blocks in each quad texel block without multiplexing. Particular embodiments of the system may allow the texel data to be addressed and retrieved from texel memory in smaller data units (e.g., 64-bit data unit or sub-word) than a 256-bit unit to reduce the wasted read operations and improve memory access efficiency.

[0031] Particular embodiments of the system eliminate multiplexing when loading the texel data from texel memory to texel buffer through the data bus connecting texel memory to texel buffer. Particular embodiments of the system minimize the amount of multiplexing required to support multiple formats of texel data. Particular embodiments of the system may reduce the wasted read operations for retrieving texel data from texel memory by allowing the texel data to be addressed and retrieved small data units (e.g., 64-bit data unit). Particular embodiments of the system provide faster memory reading to retrieve the texels that are need for interpolation to determine corresponding pixels and reduce power consumption and operation logic usage for retrieving texels from memory.

[0032] FIG. 1A illustrates an example artificial reality system 100. In particular embodiments, the artificial reality system 100 may include a headset system 110, a body wearable computing system 120, a cloud computing system 132 in a cloud 130, etc. In particular embodiments, the headset system 110 may include a display engine 112 which is connected to two eye display systems 116A and 116B through a data bus 114. The headset system 110 may be a system including a head-mounted display (HMD) which may be mounted on a user’s head to provide artificial reality to the user. The headset system 110 may have limited amount of power available in its power sources (e.g., batteries). The display engine 112 may provide display data to the eye display systems 116A and 116B though the data bus 114 with relative high data rates (e.g., 200 Hz.about.800 Hz). As will be discussed later, the display engine 112 may include one or more controller blocks, texel memories, transform blocks, pixel blocks, etc. The texels stored in the texel memories may be accessed by pixel blocks and may be provided to the eye display systems 116A and 116B for display.

[0033] In particular embodiments, the body wearable computing system 120 may be worn on the body of a user. In particular embodiments, the body wearable computing system 120 may be a computing system (e.g., a laptop, a desktop, a mobile computing system) that is not worn on a user body. The body wearable computing system 120 may include one or more GPUs, one or more smart video decoders, memories, processors, and other modules. The body wearable computing system 120 may have more computational resources than the display engine 112 but may still have limited amount power in its power sources (e.g., batteries). The body wearable computing system 120 may be coupled with the headset system 110 through a wireless connection 144. The cloud computing system 132 may be high performance computers (e.g., servers) and may communicate with the body wearable computing system 120 through a wireless connection 142. FIG. 1B illustrates an example eye display system (e.g., 116A or 116B) of the headset system 110. In particular embodiments, the eye display system 116A may include a driver 154, a pupil display 156, etc. The display engine 112 may provide display data to the pupil display 156 the data bus 114 and the driver 154 at high data rates (e.g., 200 Hz.about.800 Hz).

[0034] FIG. 2 illustrates a system diagram for a display engine 112. In particular embodiments, the display engine 112 may include a control block 210, transform blocks 220A and 220B, pixel blocks 230A and 230B, display blocks 240A and 240B, etc. One or more of the components of the display engine 112 may be configured to communicate via a high-speed bus, shared memory, or any other suitable method. As shown in FIG. 2, the control block 210 of display engine 112 may be configured to communicate with the transform blocks 220A and 220B, pixel blocks 230A and 230B, and display blocks 240A and 240B. As explained in further detail herein, this communication may include data as well as control signals, interrupts and other instructions.

[0035] In particular embodiments, the control block 210 may receive input from the body wearable computing system (e.g., 114 in FIG. 1) and initialize a pipeline in the display engine to finalize the rendering for display. In particular embodiments, the control block 210 may receive data and control packets from the body wearable computing system. The data and control packets may include information such as one or more surfaces comprising texture data and position data and additional rendering instructions. The control block 210 may distribute data as needed to one or more other blocks of the display engine 112. The control block 210 may initiate pipeline processing for one or more frames to be displayed. In particular embodiments, the eye display systems 116A and 116B may each comprise its own control block 210. In particular embodiments, one or more of the eye display systems 116A and 116B may share a control block 210.

[0036] In particular embodiments, the transform blocks 220A and 220B may determine initial visibility information for surfaces to be displayed in the artificial reality scene. In general, the transform blocks 220A and 220B may cast rays from pixel locations on the screen and produce filter commands (e.g., filtering based on bilinear or other types of interpolation techniques) to send to the pixel blocks 230A and 230B. The transform blocks 220A and 220B may perform ray casting from the current viewpoint of the user (e.g., determined using the headset’s inertial measurement units, eye trackers, and/or any suitable tracking/localization algorithms, such as simultaneous localization and mapping (SLAM)) into the artificial scene where surfaces are positioned and may produce results to send to the pixel blocks 230A and 230B.

[0037] In general, the transform blocks 220A and 220B may each comprise a four-stage pipeline, in accordance with particular embodiments. The stages of a transform block 220A or 220B may proceed as follows. A ray caster may issue ray bundles corresponding to arrays of one or more aligned pixels, referred to as tiles (e.g., each tile may include 16.times.16 aligned pixels). The ray bundles may be warped, before entering the artificial reality scene, according to one or more distortion meshes. The distortion meshes may be configured to correct geometric distortion effects stemming from, at least, the eye display systems 116A and 116B of the headset system 110. The transform blocks 220A and 220B may determine whether each ray bundle intersects with surfaces in the scene by comparing a bounding box of each tile to bounding boxes for the surfaces. If a ray bundle does not intersect with an object, it may be discarded. Tile-surface intersections are detected, and the corresponding tile-surface pair is passed to the pixel blocks 230A and 230B.

[0038] In general, the pixel blocks 230A and 230B may determine color values from the tile-surface pairs to produce pixel color values, in accordance with particular embodiments. The color values for each pixel may be sampled from the texture data of surfaces received and stored by the control block 210. The pixel blocks 230A and 230B may receive tile-surface pairs from the transform blocks 220A and 220B and may schedule bilinear filtering. For each tile-surface pair, the pixel blocks 230A and 230B may sample color information for the pixels within the tile using color values corresponding to where the projected tile intersects the surface. In particular embodiments, the pixel blocks 230A and 230B may process the red, green, and blue color components separately for each pixel. In particular embodiments, the pixel block 230A of the display engine 112 of the first eye display system 116A may proceed independently, and in parallel with, the pixel block 230B of the display engine 112 of the second eye display system 116B. The pixel block may then output its color determinations to the display block.

[0039] In general, the display blocks 240A and 240B may receive pixel color values from the pixel blocks 230A and 230B, coverts the format of the data to be more suitable for the scanline output of the display, apply one or more brightness corrections to the pixel color values, and prepare the pixel color values for output to the display. The display blocks 240A and 240B may convert tile-order pixel color values generated by the pixel blocks 230A and 230B into scanline or row-order data, which may be required by the physical displays. The brightness corrections may include any required brightness correction, gamma mapping, and dithering. The display blocks 240A and 240B may output the corrected pixel color values directly to the physical display (e.g., pupil display 156 in FIG. 1 via the driver 154) or may output the pixel values to a block external to the display engine 112 in a variety of formats. For example, the eye display systems 116A and 116B or headset system 110 may comprise additional hardware or software to further customize backend color processing, to support a wider interface to the display, or to optimize display speed or fidelity.

[0040] In particular embodiments, the controller block 210 may include a microcontroller 212, a texel memory 214, a memory controller 216, a data bus 217 for I/O communication, a data bus 218 for input stream data 205, etc. The memory controller 216 and the microcontroller 212 may be coupled through the data bus 217 for I/O communication with other modules of the system. The microcontroller 212 may receive control packages such as position data and surface information though the data bus 217. The input stream data 205 may be input to controller blocks 210 from the body wearable computing system after being set up by the microcontroller 212. The input stream data 205 may be converted to the required texel format and stored into the texture memory 214 by the memory controller 216. In particular embodiments, the texel memory 214 may be static random-access memory (SRAM).

[0041] In particular embodiments, the body wearable computing system may send input stream data 205 to the memory controller 216, which may convert the input stream data into texels with required formats and store the texels with swizzle patterns in the texel memory 214. The texel memory organized in these swizzle patterns may allow the texels (e.g., in 4.times.4 texel blocks) that are needed for determining at least one color component (e.g., red, green, and/or blue) of every pixel associated with a tile (e.g., “tile” refers to an aligned block of pixels, such as a block of 16.times.16 pixels) to be retrieved by the pixel bocks 230A and 230B using one read operation. As a result, the headset could avoid the excess multiplexing operations that are needed for reading and assembling texel array if the texel array is not stored in such patterns, and therefore reduces computational resource requirement and power consumption of the headset system.

[0042] In particular embodiments, the pixel blocks 220A and 220B may generate pixel data for display based on retrieved texels from the texel memory 214. The memory controller 216 may be coupled to pixel blocks 230A and 230B through two 256 bits data buses 204A and 204B, respectively. The pixel bocks 230A and 230B may receive the tile/surface pair from the respective transform blocks 220A and 220B and may identify the texels that are needed to determine at least one color component of all the pixels associated with the tile. The pixel blocks 230A and 230B may parallelly retrieve the identified texels (e.g., a 4.times.4 texel array) from the texel memory 214 through the memory controller 216 and the 256 bits data buses 204A and 204B. For example, the 4.times.4 texel array that are needed to determine at least one color component of all the pixels associated with a tile may be stored in one memory block and may be retrieved from the texel memory 214 using one memory read operation. The pixel blocks 230A and 230B may use multiple sample filter blocks (e.g., one or each color component) to parallelly perform interpolation on different groups of texels to determine the corresponding color component for the corresponding pixels. The pixels values may be sent to the display blocks 240A and 240B for further processing before being displayed by the eye display systems 116A and 116B, respectively.

[0043] In particular embodiments, the system may use one texture memory to store the texture data which is used by the rendering pipelines of both eyes. In particular embodiments, the two pixel blocks 230A and 230B may processes data for the two eyes in parallel and may have independent states because the two displays may be not synced. Typically, labels (e.g., fonts, characters, glyphs, etc.) and images may be rendered to both eyes. For example, the GPU-generated images may be rendered to both eyes when the images are far away enough from a viewer’s perspective so that the stereo separation is minimal. Since in most cases both eyes will need the same texture data, processing both eyes in the same chip allows that data to be stored once instead of twice. As a result, it is beneficial to use a single texture memory to store the texture data for both eye pipelines. Even for GPU-generated images, separate stereoscopic images may be required only for near objects. If the background is rendered separately, for example, to allow foreground objects to move relative to the background, a stereo background image may not be required in general. In particular embodiments, the system may render an object that requires stereo view using a separate texel array for each eye. In particular embodiments, the system may use a shared texel array for both eye pipelines and each eye pipeline (e.g., pixel block) may access the shared texel array separately since there may be no reliable correlation about where the object may appear in each eye’s field of view.

[0044] A naive storage of texels without using swizzle patterns may yield suboptimal performance. FIGS. 3A-3B illustrate an example 2.times.2 texel array stored in two 32 bytes memory blocks without using swizzle patterns. The 2.times.2 texel array 330 may include four 8-bit texels corresponding to a 2.times.2 texel region 320 associated with a surface 310. The texels associated with the surface 310 may be stored in the texel memory 212 by linearly mapping the texel position to the address space of texel memory 212. For example, the 2.times.2 texel array 330 may be stored in the texel memory 212 (together with other texels associated with the surface 310) with a linear pattern in which the texel at location (0, 0) in the texel array 330 is stored in texel memory 212 at (Row 0, B3), the texel (1, 0) is stored in texel memory 212 at (Row 0, B4), the texel (0, 1) is stored in texel memory 212 at (Row 1, B3), and the texel (1, 1) is stored in texel memory 212 at (Row 1, B4). Each row of the texel memory 212 may correspond to a 256-bit memory unit which can be read out using one read operation. When the texel array 330 is needed (e.g., for interpolation), the traditional artificial reality systems need two read operations to read the first and second row of the texel memory 212, respectively. Then the systems need to select two texels (0, 0) and (1, 0) from the first 256-bit data read from the Row 0 of the texel memory 212 and select the other two texels (0, 1) and (1, 1) from the second 256-bit data read from Row 1 of the texel memory 212, and assemble the four selected pixels into a 2.times.2 texel array. These processes have many wasted reading. For example, two read operations are needed to read two 256-bit (32 bytes) memory row, but only two bytes out of each 32 bytes are useful data, which means that the system resources (e.g., bandwidth, power, etc.) used for reading the other bits of data are wasted. Therefore, these processes lead to unnecessary excess multiplexing on memory data bus and lead to inefficient memory reading. Such shortcomings are overcome by storing the texels in swizzle patterns described below.

[0045] In particular embodiments, the aforementioned transform block may sample surfaces using projected tile. In particular embodiments, rays may be cast for each pixel in the screen. This, however, may not be computationally efficient, since a scene with sparse surfaces may result in many rays not intersecting anything. Thus, in particular embodiments, the ray caster may instead cast ray bundles or beams from a collection of pixels into the scene so that larger sampling footprints may be examined at once. Conceptually, if a ray bundle does not intersect any surface, then any ray within the bundle would not intersect any surface either. Based on this observation, once it is determined that a ray bundle does not intersect a surface, no further processing needs to be performed against that surface for the pixels from which the ray bundle is cast. If, on the other hand, the ray bundle intersects the surface, then the ray caster may perform finer sampling for each pixel associated with the ray bundle. In particular embodiments, ray bundles may be cast from units of pixel collections. For example, a unit of aligned pixels from which a ray bundle is cast may be referred to herein as “tile.” For example, each tile may be an aligned block of 16.times.16 pixels of the screen. The position of a tile may be specified based on any appropriate scheme, such as by specifying the four corners of the tile, a center point and distance to the edges, a mathematical equation, any other suitable method, or any combination thereof. In particular embodiments, the ray bundle that is cast from a tile may be defined by four rays that are cast from the four corners of the tile. Thus, a ray’s position may be specified in the pixel coordinate space of the screen (e.g., the (x, y) pixel position within the screen space of the particular display, such as the left-eye or right-eye display, associated with the transform block).

[0046] When the four corners of a ray bundle intersect a surface, the points of intersections may be transformed from the 3D view space coordinate system into the 2D texture space coordinate system (e.g., specified in (u, v)). Those four sampling points in texture space may be referred as a projected tile. When the pixel block determines the color for each pixel sample within the projected tile, the size of the projected tile may be used to select a suitable texture resolution (e.g., from a mipmap) such that each texel within the selected texture is approximately the size of each pixel within the projected tile (e.g., the distance between two pixel samples within the projected tile is no more than 2 texels apart).

[0047] To ensure that the texels needed for determining the colors for 2.times.2 pixel sample locations are within their surrounding block of 4.times.4 texels, particular embodiments of the display engine may limit the amount of allowed zoom out. When the viewer zooms out, a surface associated with a rendered object may become smaller in size relative to the screen because of the minification effect. When the minified surface is sampled by the same sample locations, the sample locations may cover more texels on the minified surface (in other words, the distance, measured in texels, between sample locations would increase). Consequently, the system may need to access and interpolate more texels beyond the 4.times.4 block of texels 4 to determine the four pixel values. Therefore, sampling a minified surface by zooming out operation could be expensive with respect to computational resources, memory access, and power consumption. In particular embodiments, the system may restrict the minification effect for zooming out operations to be within a two-time range, and therefore allow the 2.times.2 pixel sampling points to always fall within a 4.times.4 texel region. In particular embodiments, the system may store the 4.times.4 texel region in a memory unit (e.g., 32-byte memory unit) which could be readout in one read operation, and therefore allows the 4.times.4 texels to be retrieved from memory in parallel.

[0048] To optimize texel memory reads, particular embodiments of the display engine may convert the input stream data into arrays of texels with required formats and store the texel arrays in texel memory using a 2D swizzle pattern. The swizzle pattern may be used to optimize texel memory access by storing a 2D texel array (e.g., 4.times.4 16-bit texel array, 8.times.4 8-bit texel array) into a memory unit (e.g., 32-byte memory unit) that can be read out using one read operation. For example, all texels in a 4.times.4 texel array that is read out from a 32-byte memory unit may be useful texel data needed for interpolation. This is a significant improvement over the wasted readings in traditional read operations where only a portion of the data read is useful and other data is wasted. In particular embodiments, the storage format used in the texel memory may be designed to support 4.times.4 texels accesses and to minimize the amount of multiplexing required to support multiple formats. In particular embodiments, the swizzle patterns may ensure that each 256-bit memory read can be divided into 16 16-bit values, each of which may always provide data for just one of the 16 texel buffers. This eliminates multiplexing when connecting the texel memory read bus to the texel buffers. In particular embodiments, the system may support multiple texel sizes, for example, 8-bit, 16-bit, 32-bit, etc. By using the swizzle pattern, particular embodiments of the system minimize the amount of multiplexing required to support multiple formats of texel data, provide faster memory reading to retrieve the texels that are need for interpolation, and reduce power consumption and operation logic usage for retrieving texels from memory.

[0049] FIGS. 4A-4D illustrate an example 4.times.4 16-bit texel array 410 stored in a 32-byte memory unit 420 with a swizzle pattern. In particular embodiments, the system may store a 2D texel array in a memory unit with a swizzle pattern to allow all the texels in the 2D texel array to be retrieved parallelly from memory using one read operation. As an example and not by way of limitation, the 4.times.4 texel array 410, as illustrated in FIG. 4A, may include 16 texels as indicated by the (u, v) coordinates and each texel may have 16-bit data. The 16 texels in the texel array 410 may be needed to determine the color of all pixels associated with the sample region 415 (e.g., the four pixel sample locations at the corners of the region 415), and therefore may be needed at the same time during the sampling process. Parallelly retrieving the 16 texels in the 4.times.4 texel array 410 may result in faster sampling process and improve system efficiency.

[0050] In particular embodiments, the 4.times.4 texel array 410 may be stored in the same memory unit, for example, a 32-byte (256 bits) memory unit 420 using a swizzle pattern which is the same or similar to the pattern as illustrated in FIG. 4D. To generate the swizzle pattern, the system may generate the addresses for the texels based on a number of rules which map the texels to corresponding memory space locations. In particular embodiments, the rules may include a specified order in the (U, V) coordinate space of the texels. As an example and not by way of limitation, for a 2.times.2 texel array in the (U, V) coordinate space, the swizzle order may be described as (0, 0), (1, 0), (0, 1), (1, 1), as illustrated in FIG. 4B. To generate this particular order, the V coordinate may be firstly kept constant at 0 when the U coordinate increases from 0 to 1, and then the V coordinate may be kept constant at 1 when the U coordinate increase from 0 to 1. The (U, V) coordinates used to define this order may be within a local (U, V) coordinate space which is defined using the texel at the upper-left corner of the 2.times.2 texel array as the reference texel (0, 0), rather than using the overall (U, V) coordinates for all 16 texels. Therefore, the four texels in each 2.times.2 texel array may be ordered in this particular order regardless the actual position of the 2.times.2 texel array in the overall larger texel region. It is notable that this particular order within a 2.times.2 texel array is only an example and the order of the texels in a 2.times.2 texel array is not limited to this order. In particular embodiments, the system may use other orders for the texels within a 2.times.2 texel array, for example, the order of (0, 0), (0, 1), (1, 0), (1, 1) or any possible order for four texels. In particular embodiments, the system may support any possible orders for ordering texels within a 2.times.2 texel array as along as the four texels in the 2.times.2 texel array are mapped to the memory address space (e.g., 8-byte memory space for 8-bit texel format). The 2.times.2 texel array may be stored in the corresponding memory space of the memory unit with a swizzle pattern as defined by the mapping between the texels and memory space.

[0051] FIG. 4B illustrates an example order for mapping a 4.times.4 16-bit texel array to memory address space of a 32-byte memory unit. In particular embodiments, the system may conceptually divide a 4.times.4 texel array into four aligned 2.times.2 texel arrays based on their relative positions. For example, the 4.times.4 texel array may be conceptually divided into four 2.times.2 texel arrays, each of which may occupy a corner 2.times.2 texel region of the 4.times.4 texel array. In particular embodiments, each of the four 2.times.2 texel arrays may be accessed in an order that is similar to the swizzle order within a 2.times.2 texel array. For example, the four 2.times.2 texel arrays may be accessed in the order of upper-left, upper-right, lower-left, and lower-right. During that process, each 2.times.2 texel array may be accessed in the swizzle order as described above. As a result, the 16 texels in the 4.times.4 texel array may be ordered in a pattern that is the same or similar to the pattern as illustrated in FIG. 4B. The 16 texels may be directly mapped to the memory address space (e.g., B0 to B31) of a 32-byte memory unit 420. Since each texel has 16-bits data in this example, each texel may correspond to a two-byte memory space. FIG. 4C illustrates example byte addresses for the texels in the 4.times.4 texel array 410. For example, the texel at (0, 0) is stored in byte addresses B0 and B1, the texel at (1, 0) is stored in byte addresses B2 and B3, and so on. FIG. 4D illustrates an example swizzle pattern in which the 4.times.4 16-bit texel array is stored in the 32-byte memory unit 420. When the 4.times.4 texel array 410 is needed, the system may access the 32-byte (256 bits) memory unit 420 using one read operation to retrieve all the 16 texels parallelly.

[0052] FIGS. 5A-5D illustrate an example of two 2.times.2 8-bit texel arrays stored in an 8-byte memory unit with an interleaved swizzle pattern. In particular embodiments, the system may support texels with different bit lengths including, for example, but not limited to, 8 bits, 16 bits, 32 bits, etc. In particular embodiments, the method and rules for mapping and storing the 4.times.4 16-bit texel arrays into 32-byte memory units may apply to texels with all possible bit lengths (e.g., 8 bits, 16 bits, 32 bits). In particular embodiments, the 32-byte memory unit may be divided into four 8-byte memory units and each 8-byte memory unit may store two 2.times.2 texel arrays with 8-bit texels using an interleaved swizzle pattern.

[0053] FIG. 5A illustrates an example of two 2.times.2 8-bit texel arrays 512 and 514 (within an 8.times.2 texel array 510) that are mapped to an 8-byte memory unit 520. In particular embodiments, the four texels in the 2.times.2 texel array 512 and the four texels in the 2.times.2 texel array 514 may be mapped to an 8-byte memory unit’s address space (byte B0 to byte B7). It is notable that this particular order is only an example and the order of the texels of a 2.times.2 texel array is not limited to this order. In particular embodiments, the four texels of a 2.times.2 texel array may be ordered using any possible order as long as the same order is used for all other 2.times.2 texel arrays. Since an 8-byte memory unit could contain two 2.times.2 texel arrays with 8-bit texels, particular embodiments of the system may map two 2.times.2 texel arrays into the 8-byte memory unit in an interleaved manager.

[0054] FIG. 5B illustrates an example texel order interleaving two 2.times.2 8-bit texel arrays 512 and 514. The interleaving order may traverse each texel in the texel array 512 in a swizzle pattern (e.g., (0,0), (1,0), (0,1), (1,1)), but after each texel (u, v) of the texel array 512, an interleaving texel located at (u+4, v) may be inserted from texel array 514. For example, the texel of the 2.times.2 texel array 512 may be ordered in an order of (0, 0), (1, 0), (0, 1), (1, 1) with respect to the texels of the same array 512. Using the interleaving rules as described above, the texel order of the two interleaved 2.times.2 texel arrays may be (0, 0), (4, 0), (1, 0), (5, 0), (0, 1), (4, 1) (1, 1), (5, 1). In other words, the texel order of the two interleaved 2.times.2 texel arrays may be described by (u+0, v+0), (u+4, v+0), (u+1, v+0), (u+5, v+0), (u+0, v+1), (u+4, v+1), (u+1, v+1), (u+5, v+1) with respect to a reference texel (u, v), which is the texel (0, 0) for the 2.times.2 texel array 512. The ordered texels may be mapped to the address space (B0 to B7) of the 8-byte memory unit 520. FIG. 5C illustrates example texel addresses that map the two 2.times.2 8-bit texel arrays 512 and 514 into the 8-byte memory unit 520 in the interleaved swizzle pattern. FIG. 5D illustrates example texels that are stored in the 8-bit memory unit 520 using the interleaved swizzle pattern.

[0055] FIGS. 5E-5F illustrate an example 8.times.4 8-bit texel array 530 stored in a 32-byte memory unit 540 in an interleaved swizzle pattern. The 8.times.4 texel array may be divided into eight 2.times.2 texel arrays 531, 532, 533, 534, 535, 536, 537, 538. Using the method for mapping two 2.times.2 texel arrays into an 8-byte memory unit, the eight 2.times.2 texel array may be grouped into four pairs, with each pair including two 2.times.2 texel arrays to be interleaved. The four pair of 2.times.2 texel arrays may include, for example, (531, 533), (532, 534), (535, 537), and (536, 538) and may be mapped to the 32-byte memory unit 540 by interleaving each pair of 2.times.2 texel arrays. FIG. 5F illustrates example texel addresses for the 8.times.4 texel array. FIG. 5G illustrates example 2.times.2 texel array pairs stored in the 32-byte memory unit in the interleaved swizzle pattern.

[0056] FIGS. 6A-6D illustrate an example 2.times.2 32-bit texel array 610 divided into two 2.times.2 16-bit texel arrays to be stored in 8-byte memory units (e.g., 620) in a swizzle pattern. FIG. 6A illustrates an example 2.times.2 32-bit texel array which corresponds to a 16-byte memory space. Therefore, the 8-byte memory unit 620 may store only half data of the 2.times.2 texel array 610. In particular embodiments, a 32-bit texel may be divided into a high half 16-bit texel data and a low half 16-bit texel data, as will be illustrated in later FIG. 10. A 2.times.2 texel array with 32-bit texels may be divided into two 2.times.2 texel arrays, with each storing 2.times.2 half 32-bit (i.e., 16-bit) texels. A 2.times.2 texel array with half 32-bit texels may be stored in an 8-byte memory unit. FIG. 6B illustrates two example 2.times.2 texel arrays, each of which stores half 32-bit texels corresponding the 32-bit texels in the 2.times.2 texel array 610. For example, the 2.times.2 texel array 612 may store four 16-bit texel data corresponding to the high half of the respective 32-bit texels in the 2.times.2 texel array 610. The 2.times.2 texel array 614 may store four 16-bit texel data corresponding to the low half of the respective 32-bit texels in the 2.times.2 texel array of 610. FIG. 6C illustrates an example order to map 2.times.2 texel array 612 into an 8-byte memory unit 620. FIG. 6D illustrates example addresses for the half 32-bit texel data in the 2.times.2 texel array 612. FIG. 6E illustrates an example swizzle pattern by which the half 32-bit texels in the 2.times.2 texel array 612 or 614 are stored in the 8-bit memory unit 620.

[0057] FIGS. 6F-6G illustrate an example 4.times.4 32-bit texel array which is mapped to a 64-byte memory unit 640 in a swizzle pattern. FIG. 6F illustrates example addresses for the respective high and low halves of 32-bit texels in the 4.times.4 texel array 630. Each 32-bit texel may be corresponding to two addresses which are within two respective 32-byte memory units (e.g., a first 32-byte memory unit corresponding to address space from B0 to B31, a second 32-byte memory unit corresponding to address space from B32 to B63). For example, the high half of the texel (0, 0) may correspond to a first address B<1:0> and the low half of the texel (0,0) may correspond to a second address B<33:32>. Similarly, all 32-bit texels in the 4.times.4 texel array 630 may be mapped to the 64-bit memory unit 640, as illustrated in the 6G.

[0058] FIG. 7 illustrates example 256-byte memory blocks (e.g., 710, 720, 730) for storing 8-bit texels, 16-bit texels, and 32-bit texels, respectively. In particular embodiments, the system may store a 4.times.4 texel array into a 32-byte memory unit using the processes as described above. The texels in each 4.times.4 texel array may be 8-bit texel, 16-bit texels, 32-bit texels, or texels with any other suitable bit lengths. In particular embodiments, the system may combine four 32-byte memory units into larger memory block, for example, 256-byte memory block, by stacking the four 32-byte memory units linearly, as illustrated in FIG. 7. As an example and not by way of limitation, the 256-byte memory block 710 may include eight 8.times.4 8-bit texel arrays, each of which may correspond to a 32-byte memory space. The eight 8.times.4 8-bit texel arrays may be stacked linearly. As another example, the 256-byte memory block 720 may include eight 4.times.4 16-bit texel array each of which may correspond to a 32-byte memory space. The eight 4.times.4 texel arrays may be stacked linearly. As another example, the 256-byte memory block 730 may include four 4.times.4 32-bit texel arrays each of which may correspond to a 64-byte memory unit. The four 4.times.4 32-bit texel arrays may be stacked linearly.

[0059] FIG. 8 illustrates four example RGBA texel data formats that are supported by the display engine. In particular embodiments, the system may support a variety of standard formats for images and alpha masks, as well as non-standard formats for labels, which may store distance and color index. In particular embodiments, the system may support images in the RGBA formats, for example, but not limited to, 16-bit RGBA 4/4/4/4 format 810, 16-bit RGBA 5/5/5/1 format 820, 32-bit RGBA 8/8/8/8 format 830, 32-bit RGBA 10/10/10/2 format 840, etc. For example, the 16-bit RGB S 4/4/4/4 format 810 may have red, green, blue, and alpha components each corresponding to a 4-bit data space. As another example, the 16-bit RGBA 5/5/5/1 format 820 may have red, green, blue component each corresponding to a 5-bit data space and the alpha corresponding to 1-bit space. As another example, the 32-bit RGBA 8/8/8/8 format 830 may have red, green, blue, alpha components each corresponding to an 8-bit space. As another example, the 32-bit RGBA 10/10/10/2 format 840 may have red, green, and blue component each corresponding to a 10-bit space and have the alpha component corresponding to a 2-bit space. For all these formats, each format may have at least one bit for alpha to specify transparent pixels and opaque pixels. In particular embodiments, the red, green, blue and alpha components of the texels may be stored as normalized linear light values or as sRGB encoded values. The normalized numbers may be values in the range of [0, 1] and sRGB encoded numbers may use perceptually equal steps instead of physically equal steps. In particular embodiments, the system may include a key requirement for image formats in which red, green, and blue values may use pre-multiplied alpha. In other words, each color value may be multiplied by its alpha value before being sent to headset system. As a result, if alpha equal to zero, all three of the color components may be zero as well.

……
……
……

You may also like...