空 挡 广 告 位 | 空 挡 广 告 位

Samsung Patent | Method and apparatus for multi-task processing

Patent: Method and apparatus for multi-task processing

Patent PDF: 20240013029

Publication Number: 20240013029

Publication Date: 2024-01-11

Assignee: Samsung Electronics

Abstract

A method and apparatus for multi-task processing are disclosed. The method includes obtaining a base output corresponding to a first layer, restoring an input map corresponding to a second layer, obtaining an output map corresponding to the second layer, obtaining a delta output map corresponding to the second layer, and storing the base output map and the delta output map.

Claims

1. A method performed by at least one processor based on a first neural network and a second neural network, the method comprising:obtaining a base output map corresponding to a first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer;restoring an input map corresponding to a second layer of the second neural network based on a delta input map corresponding to the second layer and the base input map;obtaining an output map corresponding to the second layer by applying, to the second layer, the restored input map corresponding to the second layer;obtaining a delta output map corresponding to the second layer based on a difference between the base output map and the output map corresponding to the second layer; andstoring the base output map and the delta output map.

2. The method of claim 1, wherein the storing the base output map and the delta output map comprises:storing the base output map as a base input map corresponding to a subsequent layer of the first layer; andstoring the delta output map as a delta input map corresponding to a subsequent layer of the second layer.

3. The method of claim 1, wherein the first neural network is obtained by fine-tuning a pretrained base model based on transfer learning for a first task, andthe second neural network is obtained by fine-tuning the base model based on transfer learning for a second task.

4. The method of claim 1, wherein the first layer corresponds to a layer of a base model, andthe second layer corresponds to the same layer of the base model as the first layer.

5. The method of claim 1, wherein the restoring the input map corresponding to the second layer comprises:adding the base input map and the delta input map to restore the input map corresponding to the second layer.

6. The method of claim 1, wherein the obtaining the base output map corresponding to the first layer comprises:obtaining an output map corresponding to the first layer by applying the base input map to the first layer; andcompressing the output map corresponding to the first layer to obtain the base output map corresponding to the first layer.

7. The method of claim 1, wherein the obtaining the delta output map corresponding to the second layer comprises:compressing the output map corresponding to the second layer; andsubtracting, from the base output map, the compressed output map corresponding to the second layer to obtain the delta output map corresponding to the second layer.

8. The method of claim 1, wherein the storing the base output map and the delta output map comprises:compressing the delta output map based on a characteristic of a sparse matrix of the delta output map; andencoding the compressed delta output map and the base output map to store the base output map and the delta output map.

9. The method of claim 1, further comprising:obtaining a base weight corresponding to the first layer and the second layer based on a base model corresponding to the first neural network and the second neural network;obtaining a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer;restoring the first layer based on the base weight and the first delta weight; andrestoring the second layer based on the base weight and the second delta weight.

10. The method of claim 1, further comprising:storing, as a base input map corresponding to a next layer of the first neural network, a first map obtained by applying input data to an initial layer of the first neural network; andstoring, as a delta input map corresponding to a next layer of the second neural network, a difference between the first map and a second map obtained by applying the input data to an initial layer of the second neural network.

11. The method of claim 1, wherein the first neural network and the second neural network comprise a sequence of a plurality of layers performing a series of operations on input data, andthe first neural network and the second neural network are different in at least a portion of the layers.

12. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1.

13. An apparatus including a first neural network and a second neural network, the apparatus comprising:at least one processor; andat least one memory including instructions executable by the processor to:obtain a base output map corresponding to a first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer;restore an input map corresponding to a second layer of the second neural network based on a delta input map corresponding to the second layer and the base input map;obtain an output map corresponding to the second layer by applying, to the second layer, the restored input map corresponding to the second layer;obtain a delta output map corresponding to the second layer based on a difference between the base output map and the output map corresponding to the second layer; andstore the base output map and the delta output map.

14. The apparatus of claim 13, wherein, when storing the base output map and the delta output map, the instructions are further executable by the processor to:store the base output map as a base input map corresponding to a subsequent layer of the first layer; andstore the delta output map as a delta input map corresponding to a subsequent layer of the second layer.

15. The apparatus of claim 13, wherein, when restoring the input map corresponding to the second layer, the instructions are further executable by the processor to:add the base input map and the delta input map to restore the input map corresponding to the second layer.

16. The apparatus of claim 13, wherein, when obtaining the base output map corresponding to the first layer, the instructions are further executable by the processor to:obtain an output map corresponding to the first layer by applying the base input map to the first layer; andcompress the output map corresponding to the first layer to obtain the base output map corresponding to the first layer.

17. The apparatus of claim 13, wherein, when obtaining the delta output map corresponding to the second layer, the instructions are further executable by the processor to:compress the output map corresponding to the second layer; andsubtract, from the base output map, the compressed output map corresponding to the second layer to obtain the delta output map corresponding to the second layer.

18. The apparatus of claim 13, wherein, when storing the base output map and the delta output map, the instructions are further executable by the processor to:compress the delta output map based on a characteristic of a sparse matrix of the delta output map; andencode the compressed delta output map and the base output map to store the base output map and the delta output map.

19. The apparatus of claim 13, wherein the instructions are further executable by the processor to:obtain a base weight corresponding to the first layer and the second layer based on a base model corresponding to the first neural network and the second neural network;obtain a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer;restore the first layer based on the base weight and the first delta weight; andrestore the second layer based on the base weight and the second delta weight.

20. The apparatus of claim 13, wherein the instructions are further executable by the processor to:store, as a base input map corresponding to a next layer of the first neural network, a first map obtained by applying input data to an initial layer of the first neural network; andstore, as a delta input map corresponding to a next layer of the second neural network, a difference between the first map and a second map obtained by applying the input data to an initial layer of the second neural network.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2022-0083717 filed on Jul. 7, 2022, and Korean Patent Application No. 10-2022-0110186 filed on Aug. 31, 2022, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated by reference herein for all purposes.

BACKGROUND

1. Field

The present disclosure relates to a multi-task processing method and apparatus, and more particularly, to a multi-task processing technology for processing multiple related tasks with limited hardware resources.

2. Description of Related Art

The development of deep neural networks (DNNs) has enabled various computer vision (CV) applications. Furthermore, transfer learning has been used to create DNNs exhibiting desirable performance characteristics in various CV tasks. For example, pre-training a model on a large dataset and fine-tuning it for a target task may allow the model to achieve desirable performance in the target task without over-fitting (i.e., without losing performance on related tasks).

Recently, DNNs have been used for artificial intelligence (AI) applications such as augmented reality (AR) and autonomous driving. In some cases, these applications simultaneously perform multiple related tasks using a single input. The tasks may include object classification, depth estimation, surface normal estimation, and the like. However, performing multiple tasks can be computationally intensive, which can limit applicability on edge devices. Accordingly, there is a need in the art to develop technology for simultaneously processing multiple related CV tasks using limited hardware resources.

SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form, and does not limit the scope of the claimed subject matter. Aspects of the present disclosure provide a multi-task processing technology for efficiently processing a plurality of computer vision (CV) tasks. Embodiments of the disclosure may be implemented in a mobile or edge terminal having limited hardware resources.

In one general aspect, a method is performed by at least one processor based on a first neural network and a second neural network to which the same input data is applied, wherein the method includes: obtaining a base output map corresponding to a first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer; restoring an input map corresponding to a second layer of the second neural network based on a delta input map corresponding to the second layer and the base input map; obtaining an output map corresponding to the second layer by applying, to the second layer, the restored input map corresponding to the second layer; obtaining a delta output map corresponding to the second layer based on a difference between the base output map and the output map corresponding to the second layer; and storing the base output map and the delta output map.

The storing the base output map and the delta output map may include storing the base output map as a base input map corresponding to a subsequent layer of the first layer; and storing the delta output map as a delta input map corresponding to a subsequent layer of the second layer.

The first neural network may include a neural network obtained through fine-tuning of a weight of at least one layer included in a pretrained base model based on transfer learning for a first task.

The second neural network may include a neural network obtained through fine-tuning of a weight of at least one layer included in the base model based on transfer learning for a second task.

The first layer may correspond to any one of layers included in the base model corresponding to the first neural network and the second neural network.

The second layer may correspond to the same layer as the first layer among the layers included in the base model.

The restoring the input map corresponding to the second layer may include adding the base input map and the delta input map to restore the input map corresponding to the second layer.

The obtaining the base output map corresponding to the first layer may include obtaining an output map corresponding to the first layer by applying the base input map to the first layer; and compressing the output map corresponding to the first layer to obtain the base output map corresponding to the first layer.

The obtaining the delta output map corresponding to the second layer may include compressing the output map corresponding to the second layer; and subtracting, from the base output map, the compressed output map corresponding to the second layer to obtain the delta output map corresponding to the second layer.

The storing the base output map and the delta output map may include compressing the delta output map based on a characteristic of a sparse matrix of the delta output map; and encoding the compressed delta output map and the base output map to store the base output map and the delta output map.

The method may further include: obtaining a base weight corresponding to the first layer and the second layer based on the base model corresponding to the first neural network and the second neural network; obtaining a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer; restoring the first layer based on the base weight and the first delta weight; and restoring the second layer based on the base weight and the second delta weight.

The method may further include: storing, as a base input map corresponding to a next layer of the first neural network, a first map obtained by applying the input data to an initial layer of the first neural network; and storing, as a delta input map corresponding to a next layer of the second neural network, a difference between the first map and a second map obtained by applying the input data to an initial layer of the second neural network.

The first neural network and the second neural network may include a sequence of a plurality of layers performing a series of operations on the input data.

The first neural network and the second neural network may be different in at least a portion of the layers.

In another general aspect, an apparatus processing multiple tasks based on a first neural network and a second neural network to which the same input data is applied, wherein the apparatus includes at least one processor configured to: obtain a base output map corresponding to a first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer; restore an input map corresponding to a second layer of the second neural network based on a delta input map corresponding to the second layer and the base input map; obtain an output map corresponding to the second layer by applying, to the second layer, the restored input map corresponding to the second layer; obtain a delta output map corresponding to the second layer based on a difference between the base output map and the output map corresponding to the second layer; and store the base output map and the delta output map.

When storing the base output map and the delta output map, the processor may store the base output map as a base input map corresponding to a subsequent layer of the first layer; and store the delta output map as a delta input map corresponding to a subsequent layer of the second layer.

When restoring the input map corresponding to the second layer, the processor may add the base input map and the delta input map to restore the input map corresponding to the second layer.

When obtaining the base output map corresponding to the first layer, the processor may obtain an output map corresponding to the first layer by applying the base input map to the first layer; and compress the output map corresponding to the first layer to obtain the base output map corresponding to the first layer.

When obtaining the delta output map corresponding to the second layer, the processor may compress the output map corresponding to the second layer; and subtract, from the base output map, the compressed output map corresponding to the second layer to obtain the delta output map corresponding to the second layer.

When storing the base output map and the delta output map, the processor may compress the delta output map based on a characteristic of a sparse matrix of the delta output map; and encode the compressed delta output map and the base output map to store the base output map and the delta output map.

The processor may further obtain a base weight corresponding to the first layer and the second layer based on a base model corresponding to the first neural network and the second neural network; obtain a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer; restore the first layer based on the base weight and the first delta weight; and restore the second layer based on the base weight and the second delta weight.

The processor may further store, as a base input map corresponding to a next layer of the first neural network, a first map obtained by applying the input data to an initial layer of the first neural network; and store, as a delta input map corresponding to a next layer of the second neural network, a difference between the first map and a second map obtained by applying the input data to an initial layer of the second neural network.

In one general aspect, a method is performed by at least one processor based on a first neural network and a second neural network, wherein the method includes: obtaining an input image; obtaining a base output map corresponding to a first layer of a first neural network by applying, to the first layer, a base input map corresponding to the first layer, wherein the base input map is based on the input image; obtaining an output map corresponding to a second layer of a second neural network by applying, to the second layer, a restored input map corresponding to the second layer; computing a difference between the base output map and the output map corresponding to the second layer to obtain a delta output map corresponding to a second layer; performing a first computer vision task based on the base output map; and performing a second computer vision task based on the base output map and the delta output map.

The method may further include restoring an input map corresponding to the second layer based on a delta input map corresponding to the second layer and to the base input map to obtain the restored input map. The method may further include storing the base output map and the delta output map in a memory.

The method may further include generating an augmented reality display based on the first computer vision task and the second computer vision task. The method may further include performing a navigation task based on the first computer vision task and the second computer vision task.

In another general aspect, a method is performed by at least one processor based on a first neural network and a second neural network, wherein the method includes: training the first neural network for a first task based on a base neural network; training the second neural network for a second task based on the base neural network; obtaining a base output map corresponding to a first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer; obtaining an output map corresponding to a second layer of the second neural network by applying, to the second layer, a restored input map corresponding to the second layer; computing a difference between the base output map and the output map corresponding to the second layer to obtain a delta output map corresponding to a second layer; and storing the delta output map in a memory.

Other features and aspects will be apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a flowchart of a multi-task processing method.

FIG. 2 illustrates an example of a task-specific model for multi-task processing.

FIG. 3 illustrates an example of a multi-task processing method.

FIG. 4 illustrates an example of a data weight squeeze (DWS) algorithm.

FIG. 5 illustrates examples of activation maps generated in task-specific models.

FIG. 6 illustrates an example of a multi-task processing method.

FIG. 7 illustrates an example of an apparatus.

FIG. 8 illustrates an example of a hardware architecture of an apparatus performing a multi-task processing method.

FIG. 9 illustrates an example of a data flow in layer operations.

FIG. 10 illustrates a computer vision system.

FIG. 11 illustrates an application of a computer vision system.

Throughout the drawings and the description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.

DETAILED DESCRIPTION

The present disclosure relates to machine learning, and more specifically to compression a neural network. Embodiments of the disclosure provide a method of compressing an intermediate activation map when processing multiple computer vision (CV) tasks using a deep neural network (DNN).

DNNs have been used for CV applications such as augmented reality (AR) and autonomous driving. In some cases, these applications simultaneously perform multiple related tasks using a single input. The tasks may include object classification, depth estimation, surface normal estimation, and the like. Conventional machine learning models use different models for each task, and process the input independently with each model. Thus, performing multiple tasks using conventional machine learning models can be computationally intensive, which can limit applicability on edge devices.

Accordingly, embodiments of the present disclosure provide an improvement to conventional CV models by processing input for multiple tasks using some shared computational elements. For example, an example embodiment divides intermediate activation values into a base activation and a delta activation. In some cases, the values may be compressed using a sparsity-based compression method, and subsequently restored when performing an operation on the compressed map. Accordingly, embodiments of the present disclosure can perform multiple CV tasks using reduced computation, which enables more complex multi-task applications to be performed on edge devices.

FIG. 1 illustrates a flowchart of a multi-task processing method. In some examples, the method described by FIG. 1 can be implemented by a neural network operating on a device with limited hardware resources. For example, the method can be performed by a computer performing self-driving or safety related driving tasks in a vehicle.

The device may collect input including an image or other sensory data, and provide the input to a machine learning model such as a neural network. The machine learning model process the input, and provides multiple outputs corresponding to different CV tasks such as object detection and depth estimation. These outputs may then be used to perform self-driving tasks or other tasks related to assisting the user (e.g., the driver).

Referring to FIG. 1, a multi-task processing method according to an example embodiment may include operation 110 of obtaining a base output map corresponding to a first layer, operation 120 of restoring an input map corresponding to a second layer, operation 130 of obtaining an output map corresponding to the second layer, operation 140 of obtaining a delta output map corresponding to the second layer, and operation 150 of storing the base output map and the delta output map.

The multi-task processing method may be performed by at least one processor. An example configuration of hardware performing the multi-task processing method are described herein.

The multi-task processing method may be performed by the processor based on one or more artificial neural networks (ANNs) including a first neural network and a second neural network to which the same input data is applied. An ANN is a hardware or a software component that includes a number of connected nodes (i.e., artificial neurons), which loosely correspond to the neurons in a human brain. Each connection, or edge, transmits a signal from one node to another (like the physical synapses in a brain). When a node receives a signal, it processes the signal and then transmits the processed signal to other connected nodes. In some cases, the signals between nodes comprise real numbers, and the output of each node is computed by a function of the sum of its inputs. In some examples, nodes may determine their output using other mathematical algorithms (e.g., selecting the max from the inputs as the output) or any other suitable algorithm for activating the node. Each node and edge is associated with one or more node weights that determine how the signal is processed and transmitted.

During the training process, these weights are adjusted to improve the accuracy of the result (i.e., by minimizing a loss function which corresponds in some way to the difference between the current result and the target result). The weight of an edge increases or decreases the strength of the signal transmitted between nodes. In some cases, nodes have a threshold below which a signal is not transmitted at all. In some examples, the nodes are aggregated into layers. Different layers perform different transformations on their inputs. The initial layer is known as the input layer and the last layer is known as the output layer. In some cases, signals traverse certain layers multiple times.

In some cases, the training is performed in a two-phase process including a first training phase (i.e., the pre-training phase) that is based on or useful for multiple tasks, and a second training phase (i.e., the fine-tuning phase) that is specific to an individual task.

Accordingly, the first neural network may include a neural network obtained through fine-tuning of a weight of at least one layer included in a pretrained base model based on transfer learning for a first task. The second neural network may include a neural network obtained through fine-tuning of a weight of at least one layer included in the base model based on transfer learning for a second task. For example, the first neural network and the second neural network may correspond to task-specific models derived by transfer learning from the same base model.

In some cases, the transfer learning is performed using a supervised learning paradigm. Supervised learning is one of three basic machine learning paradigms, alongside unsupervised learning, and reinforcement learning. Supervised learning is a machine learning technique based on learning a function that maps an input to an output based on example input-output pairs. Supervised learning generates a function for predicting labeled data based on labeled training data consisting of a set of training examples. In some cases, each example is a pair consisting of an input object (e.g., a vector or a two-dimensional map) and a desired output value (i.e., a single value, or an output vector, or an output map). A supervised learning algorithm analyzes the training data and produces the inferred function, which can be used for mapping new examples. In some cases, the learning results in a function that correctly determines values or class labels for unseen instances. In other words, the learning algorithm generalizes from the training data to unseen examples.

For example, referring to FIG. 2, a plurality of task-specific models 220 may be derived from a base model 210 pretrained using a relatively large training dataset 201 (i.e., larger than a training set used for fine-tuning). The task-specific models 220 corresponding to respective tasks may be generated through fine-tuning of weights of layers included in the base model 210 for the respective tasks. For example, the base model 210 may be tuned by transfer learning using a relatively small training dataset corresponding to each task to create task-specific models 220. For example, the base model 210 may correspond to a general image processing model, and the task-specific models 220 derived from the base model 210 may include at least one of an object classification model, a depth estimation model, or a surface normal estimation model.

The task-specific models 220 derived from the base model 210 may include a same layered structure of the base model 210. The task-specific models 220 may include the layered structure having the same size and arrangement as those of the layers included in the base model 210. Weights of layers included in the task-specific models 220 may correspond to tuned values of the weights of the layers included in the base model 210. In some cases, weights or layers may be added or removed from a task-specific model prior to or during the fine-tuning.

The task-specific models 220 may further include at least one layer in the layered structure included in the base model 210. For example, a task-specific model 220 for object classification may further include a function layer for classification in addition to the layers included in the base model 210.

According to an example embodiment, an apparatus 230 for multi-task processing may output a result of processing a plurality of tasks in response to input data (e.g., an image), using the task-specific models 220 derived from the base model 210. For example, the apparatus 230 may perform an operation of each of the task-specific models 220 corresponding to an input image 202 by applying the input image 202 to each of the task-specific models 220, and may output a result 203 of performing the operation. For example, the apparatus 230 may simultaneously perform operations for object classification, depth estimation, and surface normal estimation based on the task-specific models 220 in response to the input image 202.

For example, as illustrated in FIG. 2, a hardware-limited device such as a mobile device may take an image as input (e.g., an image of a room) and perform a multi-task CV operation that includes tasks such as classifying the image or identifying objects in the image. In some cases, the output of the machine learning model may be applied to other downstream tasks such as generating an augmented reality (AR) augmentation of the image. Because embodiments of the disclosure enable multiple tasks to be performed efficiently, such tasks can be performed in real time on devices using reduced computation and hardware resources.

According to an example embodiment, the first neural network and the second neural network may each include a sequence of a plurality of layers performing a series of operations on the input data. A layer may be a unit for an operation (or a computational unit), and a neural network may include one or more layers. The layers included in the neural network may each have a preset order, and operations may be performed according to the order. For example, output data of a specific layer may correspond to input data of a subsequent layer of the layer. An input map of an initial layer of the neural network may correspond to input data.

According to an example embodiment, input data and output data of a layer may be provided in the form of a map. For example, the input data of the layer may correspond to an input map corresponding to the layer, and the output data of the layer may correspond to an output map corresponding to the layer. For example, the input map and the output map of the layer may correspond to an activation map. The term activation may refer to a non-linear function for generating output values based on the input values. The activation function may include multiple weights, which may be referred to as the activation map.

According to an example embodiment, the first neural network and the second neural network may include layers included in the base model. A weight of a layer included in the first neural network may correspond to a tuned value of a weight of the layer included in the base model, and a weight of a layer included in the second neural network may correspond to a tuned value of the weight of the layer included in the base model.

According to an example embodiment, the first neural network and the second neural network may have some different layers. For example, a weight of a specific layer of the first neural network that is obtained by tuning a weight of the layer of the base model may be a value different from a weight of the layer of the second neural network that is obtained by tuning the weight of the same layer of the base model. For example, at least one layer may be further added to the first neural network and the second neural network in addition to the layers included in the base model. For example, the first neural network may further include an activation function layer in addition to the layers included in the base model, and the second neural network may further include at least one convolution layer in addition to the layers included in the base model. For example, the first neural network and the second neural network may include convolutional neural networks (CNNs).

A CNN is a class of neural network that is commonly used in computer vision or image classification systems. In some cases, a CNN may enable processing of digital images with minimal pre-processing. A CNN may be characterized by the use of convolutional (or cross-correlational) hidden layers. These layers apply a convolution operation to the input before signaling the result to the next layer. Each convolutional node may process data for a limited field of input (i.e., the receptive field). During a forward pass of the CNN, filters at each layer may be convolved across the input volume, computing the dot product between the filter and the input. During the training process, the filters may be modified so that they activate when they detect a particular feature within the input.

Referring back to FIG. 1, operation 110 may include obtaining the base output map corresponding to the first layer of the first neural network by applying, to the first layer, a base input map corresponding to the first layer of the first neural network.

According to an example embodiment, the first neural network may be one selected from among a plurality of task-specific models derived from the base model. For example, the first neural network may be a model predetermined by a user from among the task-specific models, or may be determined based on a criterion predetermined in the process of training the task-specific models. For example, the first neural network may be a task-specific model having a high similarity in an input/output map to other task-specific models, selected from among the task-specific models.

According to an example embodiment, the base input map may be obtained from a memory. The base input map may correspond to data stored in the memory based on an output map of a previous layer of the first layer in the first neural network.

According to an example embodiment, the base output map may be obtained as the output of the operation the first layer on the base input map. For example, referring to FIG. 3, when a first layer 310 is a convolution layer, a base input map 301 may correspond to data obtained as an operation result of a previous convolution layer of the first layer 310 in a first neural network. A result of performing a convolution operation between the base input map 301 and a weight of the first layer 310 may be obtained as a base output map 302.

Referring back to FIG. 1, operation 110 may include obtaining an output map corresponding to the first layer by applying a base input map to the first layer and compressing the output map corresponding to the first layer to obtain the base output map corresponding to the first layer. The base output map may correspond to a result of compressing the output map obtained by applying the base input map to the first layer. For example, the output map corresponding to the first layer may be compressed through a compression method, such as, quantization.

Also, operation 120 may include restoring an input map corresponding to the second layer based on a delta input map corresponding to the second layer of the second neural network and the base input map.

According to an example embodiment, the first layer 310 may correspond to any one of the layers included in the base model corresponding to the first neural network and the second neural network. The second layer 320 may correspond to the same layer as the first layer 310 among the layers included in the base model. In some examples, the first neural network is the base model, and the second neural network is fine-tuned based on the first neural network.

The first layer 310 and the second layer 320 may correspond to layers derived from the same layer of the base model. The first layer 310 and the second layer 320 may correspond to layers generated by tuning a weight of the same layer of the base model.

According to an example embodiment, the delta input map corresponding to the second layer 320 may be obtained from the memory. The delta input map corresponding to the second layer 320 may correspond to data stored in the memory based on an output map of a previous layer of the second layer 320 in the second neural network.

According to an example embodiment, operation 120 of restoring the input map corresponding to the second layer may include adding up the base input map and the delta input map to restore the input map corresponding to the second layer. For example, referring to FIG. 3, an input map 312 corresponding to a second layer 320 may be restored by adding the base input map 301 and a delta input map 311. For example, the base input map 301 and the delta input map 311 may be maps of the same size. For example, the base input map 301 and the delta input map 311 may be of the size of W×H×C. In this example, W, H, and C may denote the number of rows, columns, and channels of a map, respectively, and may be set to natural numbers. In the example of FIG. 3, the base input map 301 and the delta input map 311 may be of the size of 4×4×1.

For example, adding the base input map 301 and the delta input map 311 may indicate adding up values of elements at the same position in the base input map 301 and the delta input map 311. For example, a value of an element 301-1 at a position (1, 1) in the base input map 301 and a value of an element 311-1 at the position (1, 1) in the delta input map 311 may be added, and a value of an element 312-1 at the position (1, 1) in the input map 312 corresponding to the second layer 320 may thereby be obtained.

Referring back to FIG. 1, operation 130 may include obtaining the output map corresponding to the second layer by applying, to the second layer, the restored input map of the second layer.

According to an example embodiment, the output map corresponding to the second layer may be obtained as an operation result of the second layer and the restored input map corresponding to the second layer. For example, referring to FIG. 3, when the second layer 320 is a convolution layer, a result of performing a convolution operation on the input map 312 corresponding to the second layer 320 and a weight of the second layer 320 may be obtained as an output map 313 corresponding to the second layer 320.

Referring back to FIG. 1, operation 140 may include obtaining the delta output map corresponding to the second layer based on a difference between the base output map and the output map corresponding to the second layer. Also, operation 140 may include subtracting the base output map from the output map corresponding to the second layer to obtain the delta output map corresponding to the second layer. In another example, the output map corresponding to the second layer may be subtracted from the base output map to obtain the delta output map. In other examples, another difference function may be used to obtain the delta output map.

For example, referring to FIG. 3, a delta output map 314 corresponding to the second layer 320 may be obtained by subtracting the output map 313 corresponding to the second layer 320 from the base output map 302. The base output map 302 and the output map 313 corresponding to the second layer 320 may be maps of the same size. For example, the base output map 302 and the output map 313 corresponding to the second layer 320 may be of the size of W×H×C where W, H, and C denote the number of rows, columns, and channels of a map, respectively, and may be set to natural numbers. In the example of FIG. 3, the base output map 302 and the output map 313 may be of the size of 4×4×1. However, any number of rows, columns, and channels may be used.

For example, the subtraction of the output map 313 corresponding to the second layer 320 from the base output map 302 may indicate subtraction between values of elements at the same position in the output map 313 corresponding to the second layer 320 and the base output map 302. For example, a value of an element 313-1 at a position (1, 1) in the output map 313 corresponding to the second layer 320 may be subtracted from a value of an element 302-1 at a position (1, 1) of the base output map 302, and a value of an element 314-1 at the position (1, 1) in the delta output map 314 corresponding to the second layer 320 may thereby be obtained.

Referring back to FIG. 1, operation 140 may include compressing the output map corresponding to the second layer and subtracting the compressed output map corresponding to the second layer from the base output map to obtain the delta output map corresponding to the second layer.

According to an example embodiment, the compressing of the output map corresponding to the second layer may include compressing the output map corresponding to the second layer through quantization.

According to an example embodiment, the base output map and the delta output map corresponding to the second layer may be stored in the memory. For example, the base output map may be stored in relation to the first neural network, and the delta output map may be stored in relation to the second neural network.

According to an example embodiment, operation 150 may include compressing the delta output map based on a characteristic of a sparse matrix of the delta output map and encoding the compressed delta output map and the base output map to store the base output map and the delta output map. The delta output map may correspond to the sparse matrix including a plurality of elements having a value of zero. The delta output map may be compressed based on the characteristic of the sparse matrix including the elements having the value of zero.

For example, the delta output map may be compressed through pruning. In a pruning example, among the elements of the delta output map, elements having a value less than a preset threshold value may be changed to zero. For another example, the delta output map may be compressed by being changed to metadata including positions of non-zero elements and values of the non-zero elements. In some examples, a compressed output map may be stored using a compression format such as a Compressed Sparse Row (CSR) or a Compressed Sparse Column (CSC) format, or using any other compression format.

According to an example embodiment, operation 150 may include storing the base output map as a base input map corresponding to a subsequent layer of the first layer, and storing the delta output map as a delta input map corresponding to a subsequent layer of the second layer. As in a case where the base input map for obtaining the base output map corresponds to data stored as a result of a previous layer of the first layer, the base output map may be stored as the base input map for an operation of the subsequent layer of the first layer. Similarly, as in a case where the delta input map for obtaining the delta output map corresponds to data stored as a result of a previous layer of the second layer, the delta output map may be stored as the delta output map for an operation of the subsequent layer of the second layer.

According to an example embodiment, when the first layer and the second layer are, respectively, initial layers of the first neural network and the second neural network, the multi-task processing method may further include storing a first map obtained by applying the input data to the initial layer of the first neural network as a base input map corresponding to a next layer (which is a subsequent layer of the initial layer) of the first neural network; and storing a difference between the first map and a second map obtained by applying the input data to the initial layer of the second neural network as a delta input map corresponding to a next layer (which is a subsequent layer of the initial layer) of the second neural network. For example, when the first layer is the initial layer of the first neural network, there may be no previous layer of the first layer, which may indicate a state in which no base input map for an operation of the first layer is stored in the memory. In this example, the input data may be used as the base input map for an operation of the initial layer of the first neural network. For example, when the second layer is the initial layer of the second neural network, this may indicate a state in which no delta input map for restoring the input map corresponding to the second layer is stored in the memory. In this example, the restoring of the input map corresponding to the second neural network may be omitted, and the input data may be used as the base input map for an operation of the initial layer of the second neural network.

According to an example embodiment, the multi-task processing method may correspond to a SqueeD algorithm. SqueeD may refer to a combination of two algorithms: a delta weight squeeze (DWS) algorithm and a delta activation squeeze (DAS) algorithm. The DAS algorithm may correspond to the multi-task processing method described with reference to FIG. 1. The DWS algorithm is described with reference to FIG. 4.

According to an example embodiment, the multi-task processing method may further include restoring the first layer and the second layer. In the multi-task processing method, the restoring of the first layer and the second layer may include obtaining a base weight corresponding to the first layer and the second layer based on a base model corresponding to the first neural network and the second neural network; obtaining a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer; and restoring the first layer based on the base weight and the first delta weight and restoring the second layer based on the base weight and the second delta weight.

The restoring of the first layer and the second layer may be performed based on the delta weights stored by being compressed, respectively, for the first neural network and the second neural network and on the base weight. The base weight may correspond to a weight of a layer included in the base model from which the first neural network and the second neural network are derived. A method of compressing and storing a weight of a layer of a neural network as a delta weight may be referred to as the DWS algorithm. In another example, a first layer of the first neural network corresponds to base weights, and only the second neural network (e.g., the second layer from the second neural network) is obtain based on delta weights. In this case, only weights from the second neural network are restored based on stored delta weights.

FIG. 4 illustrates an example of a data weight squeeze (DWS) algorithm.

Referring to FIG. 4, the DWS algorithm includes operation 410 of fine-tuning weights of task-specific models using weights 401 of a pretrained base model as a backbone. The first neural network and the second neural network described above may correspond to the task-specific models. In another example, one of the first neural network or the second neural network corresponds to the base model. In some cases, the first neural network or the second neural network includes additional layers in addition to the layers of the base model.

According to an example embodiment, the DWS algorithm may include operation 420 of decomposing the tuned weights of the task-specific models into a weight of the base model and a delta weight. For example, for a task-specific model Wt that is tuned in relation to a weight lth and a task t of a pretrained base model WB, a delta weight δtwϵΔtw which is a parameter corresponding to the task-specific model Wt may be defined as a variation in weight after the fine-tuning, as represented by Equation 1 below.

Wt=WBtw.  [Equation 1]

The DWS algorithm may include compressing Δtw, using a compression method such as pruning and quantization. For example, the first delta weight and the second delta weight described above may correspond to the delta weight Δtw.

According to an example embodiment, the DWS algorithm may include operation 430 of performing magnitude-based pruning along with relearning of Δtw. For example, during the relearning, a bit mask matrix Mt may be updated such that a delta weight of p % has a value of zero. Operation 430 of performing the pruning on Δtw may be represented by Equation 2 below.

(Wt)′=WBtw⊙Mt.[Equation 2]

In Equation 2, Mt denotes a bit mask matrix for a delta weight, and the operator ⊙ denotes a Hadamard product.

According to an example embodiment, after pruning the delta weight, the DWS algorithm may include operation 440 of performing qw-bit quantization on a remaining delta weight as represented by Equation 3 below.

(Wt)″=WB+Qqwtw⊙Mt).  [Equation 3]

For example, extensive data close to zero (i.e., within a threshold distance of zero) among elements of a delta weight matrix may be removed after the pruning. Linearly quantizing a range of delta weights may not completely use a limited representation level in a short bit-width. Thus, a range of first pruned delta weights may be divided into a negative part {−δtw, max, −δtw,min} and a positive part {+δtw,min, +δtw,max}. Here, δtw,max and δtw,min, respectively, denote a maximum value and a minimum value of an absolute value of an element of Δtw⊙Mt. Subsequently, each part may be quantized to a level of 2qw-1. That is, a qw-bit quantization function Qqw tw) for the delta weight matrix may be represented by Equation 4 below.

sign( δ w t) ( [ ( "\[LeftBracketingBar]"δwt "\[RightBracketingBar]" - δ w,min t )· S Rl ]· Rl S + δ w , mint ) [ Equation 4 ]

In Equation 4, the quantization range may be Rltw,max−δtw,min, and S=2qw-1−1 may be a scalar. For example, histogram 441, shown in FIG. 4, may correspond to a delta weight distribution obtained after the pruning and quantization are performed.

According to an example embodiment, the task-specific models may be retrained with an amortization error from the pruning and quantization. For example, a straight line may be applied through an estimator to calculate a gradient of delta weights modified in backpropagation. A partial derivative function of an error function E for δtw may be represented by Equation 5 below.

E δwt = E ( w t) · m t [ Equation 5 ]

In Equation 5 above, (wt)″ϵ(Wt)″, and mtδMt.

According to an example embodiment, the DWS algorithm may include operation 450 of encoding the relearned delta weights. For example, the relearned delta weight matrix may be encoded into a sparse form. The encoded delta weights may include metadata and quantized non-zero values of the relearned delta weights. The metadata may include position information of the non-zero values and quantization parameters such as Rl/S and δtw,min. To encode the delta weights into a compressed form, a compressed sparse row (CSR), or compressed sparse column (CSC), and each filter may be stored separately.

FIG. 5 illustrates examples of activation maps generated in task-specific models.

Referring to FIG. 5, activation maps are generated in layers of tuned models for a classification task 501 and a semantic segmentation task 502, respectively.

In some examples, the neural networks comprise a ResNet architecture. A ResNet is a neural network architecture that addresses issues associated with training deep neural networks. It operates by including identity shortcut connections that skip one or more layers of the network. In a ResNet, stacking additional layers doesn't degrade performance or introduce training errors because skipping layers avoids the vanishing gradient problem of deep networks. In other words, the training gradient can follow “shortcuts” through the deep network. Weights are adjusted to “skip” a layer and amplify a previous skipped layer. In an example scenario, weights for an adjacent layer are adjusted and weights are not applied to an upstream layer.

To the two models, the same data 510 may be input. For example, the first neural network and the second neural network that are described above may correspond to the two models (e.g., ResNet-50 models) respectively corresponding to the classification task 501 and the semantic segmentation task 502. An input map corresponding to a layer and an output map corresponding to the layer may correspond to an activation map. An activation map may represent an input map corresponding to a layer or an output map corresponding to the layer.

A graph 520 shown in FIG. 5 represents a similarity of the activation maps generated in the models corresponding to the two tasks according to a layer progress direction, in which a horizontal axis indicates the layer progress direction, and a vertical axis indicates the similarity between two activation maps. The progress direction may represent an order in which operations (or computations) are performed in a model including a plurality of layers. For example, a bar positioned leftmost among bars shown in the graph 520 may indicate a similarity of an activation map generated in an initial layer in an operation order of the models corresponding to the two tasks.

According to an example embodiment, the activation maps generated in the two models may have a high similarity because they are generated from the same input data 510. Further, the two quantized activation maps may be the same in many parts. The graph 520 of FIG. 5 may indicate that, when the activation maps generated in the two models are quantized to 8 bits, the activation maps may be the same by, for example, 70.1% on average in the percentage of activation which is indicated by the same value in the two activation maps. This may be, for example, 1.3 times higher than an average sparsity of an activation map (i.e., a ratio of zeros in an activation map, for example, 53.1%). A compression method, such as the DAS algorithm, may be used to reduce the size of input/output activation, using a characteristic that the similarity of input/output activation maps generated in task-specific models derived from the same base model is high.

FIG. 6 illustrates an example of a multi-task processing method. Some embodiments of the multi-task processing method include SqueeD-based multi-task processing, including a DAS algorithm and a DWS algorithm.

The DAS algorithm may be similar to the DWS algorithm but may not include additional relearning of an amortization compression error (or an amortizing compression error) when using quantized networks. Similar to weights of the DWS algorithm, activation maps of the DAS algorithm may include a base activation map and a delta activation map. The base input map and the base output map that are described above may correspond to the base activation map. The base activation map may represent a base input map or a base output map corresponding to a layer. The delta input map and the delta output map that are described above may correspond to the delta activation map. The delta activation map may represent a delta input map or a delta output map corresponding to a layer.

According to an example embodiment, a task (e.g., a primary task among a plurality of tasks) may be set as a base task. An activation map generated in a selected neural network (e.g., a task-specific model) for processing the base task may be set as the base activation map. The delta activation map may be defined as a difference between the base activation map and an original activation map generated in a neural network for processing another task that is not the base task.

For example, referring to FIG. 6, an input map 601 corresponding to a first layer 610 of a first neural network for processing of task 1 may be set as a base input map. In addition, an input map corresponding to a second layer 620 of a second neural network for processing task 2 may be restored by adding the input map 601 and a delta input map 602 corresponding to the second layer 620.

According to an example embodiment, the first layer 610 may be restored based on a base weight and a first delta weight stored in relation to the first layer 610 of the first neural network. For example, the first layer 610 may be restored by summing the base weight and the first delta weight. In addition, the second layer 620 may be restored based on the base weight and a second delta weight stored in relation to the second layer 620 of the second neural network. For example, the second layer 620 may be restored by a sum of the base weight and the second delta weight. In some examples, only one of the first layer 610 or the second layer 620 is restored based on delta weights.

According to an example embodiment, after a series of layer operations 611 and 621 and quantization operations 612 and 622 are performed, output maps 603 and 604 corresponding to the respective tasks may be generated. For example, the output map 603 corresponding to the first layer 610 of the first neural network for processing task 1 may be set as a base output map. A delta output map 605 corresponding to the second layer 620 of the second neural network for processing task 2 may be calculated by subtracting the base output map 603 from the output map 604 corresponding to the second layer 620. The delta output map 605 corresponding to task 2, which is a sparse matrix, may be compressed and stored in a memory (e.g., a dynamic random-access memory (DRAM) 630). The base output map 603 may be compressed with its sparsity and stored in the memory. The sparsity of the delta output map 605 may vary depending on a task selected as the base task.

FIG. 7 illustrates an example of an apparatus.

Referring to FIG. 7, an apparatus 700 according to an example embodiment may include a processor 701, a memory 703, and a communication module 705. The apparatus 700 may include an apparatus performing the multi-task processing method described above with reference to FIGS. 1 to 6.

The processor 701 may perform at least one of the operations described above with reference to FIGS. 1 to 6. In some examples, processor 701 executes instructions (i.e., software code) stored in the memory 703. For example, the processor 701 may perform at least one of an operation of obtaining a base output map corresponding to a first layer, an operation of restoring an input map corresponding to a second layer, an operation of obtaining an output map corresponding to the second layer, an operation of obtaining a delta output map corresponding to the second layer, or an operation of storing the base output map and the delta output map.

Processor 701 may be an intelligent hardware device, (e.g., a general-purpose processing component, a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device, a discrete gate or transistor logic component, a discrete hardware component, or any combination thereof). In some cases, the processor is configured to operate a memory array using a memory controller. In other cases, a memory controller is integrated into the processor. In some cases, the processor is configured to execute computer-readable instructions stored in a memory to perform various functions. In some embodiments, a processor includes special purpose components for modem processing, baseband processing, digital signal processing, or transmission processing.

Processor 701 may execute software. Software may include code to implement aspects of the present disclosure. Software may be stored in a non-transitory computer-readable medium such as memory 703 or other system memory. In some cases, the software may not be directly executable by the processor but may cause a computer (e.g., when compiled and executed) to perform functions described herein.

The memory 703 may be a volatile memory or a non-volatile memory and may store data related to the multi-task processing method described above with reference to FIGS. 1 to 6. Examples of a memory device include flash memory, random access memory (RAM), read-only memory (ROM), or a hard disk. Examples of memory devices include solid state memory and a hard disk drive. In some examples, memory is used to store computer-readable, computer-executable software including instructions that, when executed, cause a processor to perform various functions described herein. In some cases, the memory contains, among other things, a basic input/output system (BIOS) which controls basic hardware or software operation such as the interaction with peripheral components or devices. In some cases, a memory controller operates memory cells. For example, the memory controller can include a row decoder, column decoder, or both. In some cases, memory cells within a memory store information in the form of a logical state.

For example, the memory 703 may store data generated during the execution of the multi-task processing method or data required to perform the multi-task processing method. For example, the memory 703 may store a base input map corresponding to the first layer and a delta input map corresponding to the second layer. For example, the memory 703 may store a base weight corresponding to the first layer and the second layer, and a first delta weight corresponding to the first layer and a second delta weight corresponding to the second layer.

The communication module 705 may provide a function for the apparatus 700 to communicate with other electronic devices or servers over a network. The apparatus 700 may be connected to an external device (e.g., a user terminal, a server, or a network) through the communication module 705 and may exchange data therewith.

According to an example embodiment, the memory 703 may not be included in the apparatus 700 but may be included in an external device accessible from the apparatus 700. In this case, the apparatus 700 may receive data stored in the memory 703 included in the external device through the communication module 705 and may transmit data to be stored in the memory 703.

According to an example embodiment, the memory 703 may store a program in which the multi-task processing method described above with reference to FIGS. 1 to 6 is implemented. The processor 701 may execute the program stored in the memory 703 and control the apparatus 700. Code of the program executed by the processor 701 may be stored in the memory 703.

The apparatus 700 may further include other components that are not shown. For example, the apparatus 700 may further include an input/output interface including an input device and an output device as a means for interfacing with the communication module 705. For another example, the apparatus 700 may further include other components such as a transceiver, various sensors, a database, or the like.

FIG. 8 illustrates an example of a hardware architecture of an apparatus performing a multi-task processing method.

Referring to FIG. 8, an apparatus 800 performing a multi-task processing method according to an example embodiment may correspond to the apparatus 700 of FIG. 7. The apparatus 800 may perform the multi-task processing method using an algorithm such as, for example, the SqueeD algorithm described above. For example, the apparatus 800 may process multiple tasks using SqueeD, or another algorithm, to reduce off-chip memory access.

The apparatus 800 may include a processing element (PE) array 801 that operates in an output-stationary state and a vector processing unit 802 for activation, normalization, and quantization. The PE array 801 may include multiple (i.e., n) PEs, and a single PE may include multiple (i.e., m) multipliers.

A base activation map may be stored in an input buffer 803, and a delta activation map may be stored in a delta input buffer 804.

Abase weight may be stored in a weight buffer 805, and delta weights including a first delta weight and a second delta weight may be stored in a delta weight buffer 806.

The apparatus 800 may include a weight restore unit (WRU) 807 allocated to one PE. The WRU 807 may include a CSC decompressor 807-1 and a dequantizer 807-2 to decode the delta weights, and an element-wise adder 807-3 to add up the base weight and decoded delta weights.

An input restore unit (IRU) 808 may be similar to the WRU 807, but a decompressor of the IRU 808 may be configured in the form of a zero value compression (ZVC). The IRU 808 may not include a dequantizer.

An output encoder 809 may extract and compress a delta output map obtained from a layer of a neural network. The output encoder 809 may include a register 809-1 for storing a base output map. The output encoder 809 may include an element-wise subtractor 809-2, and the element-wise subtractor 809-2 may perform subtraction on the base output map and an output map obtained for each task to obtain the delta output map corresponding to each task. The delta output map may be compressed through a quantizer 809-3 and a ZVC compressor 809-4, and the encoded delta output may be stored in a DRAM. The base output map may also be encoded and stored in the DRAM.

FIG. 9 illustrates an example of a data flow in layer operations.

A data flow according to an example embodiment may reduce redundant access to off-chip memory by maximally reusing a loaded base weight/input map. In the example of FIG. 9, a weight and an input map for one task are divided into three tiles and two tiles, respectively, and two tasks are simultaneously processed. In the example of FIG. 9, a timeline indicates a tile occupying a PE array at each time point. When a tile occupying a PE is changed, DRAM read access may occur. In the example of FIG. 9, a weight tile is indicated as Wit, and an input map is indicated as Ait.

Referring to FIG. 9, in a data flow 910 according to a conventional multi-task processing method, a weight tile may be read once, but an input tile may be iteratively read Nw times due to tiling. As the number of tasks increases, DRAM read access may increase linearly according to the number of tasks because each task is processed individually. Therefore, DRAM read access Rc in the conventional data flow 910 may be roughly calculated as represented by Equation 6 below.

Rc≈(size(W)+size(ANWT  [Equation 6]

In contrast, in a data flow 920 according to embodiments of the present disclosure (e.g., using the SqueeD algorithm), a base weight and an input map may be maximally reused between tasks, and thus the base weight and the base input map may be read a consistent number of times regardless of the number of tasks. A delta weight and a delta input map for each task may be read NA and NW times, respectively. Therefore, read access ratio ratioread of Rp, compared to the number of the DRAM read access Rc in the data flow 910, may be calculated as represented by Equations 7 and 8 below.

Rp size ( W )+ size(A) · NW + size( Δ W) · NA · T+ size( Δ A) · NW · ( T - 1) [ Equation 7 ] ratio read= R p R c 1 T+ size( Δ w) · NA + size( Δ a) · NW size ( W )+ size(A) · NW [ Equation 8 ]

Referring to Equations 7 and 8 above, relative read access of the described data flow 920 may decrease as T increases. Accordingly, the ratio can be less than 1 even at T=2.

A ratio ratiowrite of the number of DRAM write access WP in the data flow 920, compared to the number of DRAM write access WC in the data flow 910, may be estimated as represented by Equation 9 below.

ratiowrite = W p W c size(A) size ( Δa )· ( T-1 ) size(A) · T = 1T + size( Δ a) size(A) · T-1 T [ Equation 9 ]

Similarly, ratiowrite may be less than 1 when T>1, and may decrease as T increases. The estimations described above may be derived from a weight reuse data flow, but the estimation of an input reuse data flow may also reach a similar result.

Accordingly, embodiments of the disclosure reduce the weight and the size of an activation map by, for example, between a factor of 21.9 and a factor of 2.1, respectively, per task by sharing data between tasks. In addition, processing multiple tasks using a hardware architecture for minimizing DRAM access by utilizing the effects of SqueeD may reduce an increment in DRAM access and an increment in energy consumption by a factor of 2.2 and a factor of 1.3, respectively, per task.

The DWS algorithm, which is one of the algorithms according to an example embodiment, may prune and quantize delta weights of a model tuned from a pretrained model, and thus the size of weights stored for a task-specific model may decrease and a ratio of weights to be shared between tasks may increase. In addition, a DAS algorithm may be used for processing a single input image. The DAS algorithm compresses an activation map using a characteristic that intermediate activation maps between different tasks are highly similar.

In some cases, it is possible to effectively process multiple different computer vision (CV) tasks using the same hardware architecture. Embodiments of the architecture described herein maximize the reuse of shared weights and/or activation maps between tasks and may thereby reduce off-chip DRAM access required to process multiple models.

FIG. 10 illustrates a computer vision system 1000. The computer vision system 1000 may include a computing device 1005, a server 1010, and a database 1015. The computer vision system 1000 may be capable of generating and applying one or more neural networks capable performing multiple computer vision tasks on a computing device 1005 with limited hardware resources (e.g., limited processor or memory resources). The computing device 1005 may be an example of the apparatus 700 described with reference to FIG. 7, and may perform the multi-task methods described herein.

In some examples, the computing device 1005 is a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. The computing device 1005 may include sensors (i.e., one or more cameras) for obtaining images which may be processed for multiple computer vision tasks simultaneously.

In one example, computing device 1005 may generate augmented reality images that depend on processing images in multiple ways, such as performing object detection and depth estimation. In another example, computing device 1005 may be located on a vehicle, and may perform tasks related to driving safety or navigation (e.g., for a self-driving vehicle). Based on the efficiencies derived from using the methods described herein, including the SqueeD algorithm, may enable complex computer vision tasks that depend on multiple computer vision subtasks to be performed on a device with limited hardware resources such as on a personal computing device, a mobile computing device, or a vehicle computing device.

The computing device 1005 may operate one or more neural networks for performing multiple computer vision tasks. The neural networks may be trained at another device, such as on a server 1010. In some cases, parameters for one or more neural networks are trained on the server 1010 and transmitted to the computing device 1005. In other examples, parameters for one or more neural networks are trained prior to manufacturing the computing device 1005.

The server 1010 provides one or more functions to users linked by way of one or more of the various networks. In some cases, the server includes a single microprocessor board, which includes a microprocessor responsible for controlling all aspects of the server. In some cases, a server uses microprocessor and protocols to exchange data with other devices/users on one or more of the networks via hypertext transfer protocol (HTTP), and simple mail transfer protocol (SMTP), although other protocols such as file transfer protocol (FTP), and simple network management protocol (SNMP) may also be used. In some cases, a server is configured to send and receive hypertext markup language (HTML) formatted files (e.g., for displaying web pages). In various embodiments, a server comprises a general-purpose computing device, a personal computer, a laptop computer, a mainframe computer, a supercomputer, or any other suitable processing apparatus.

In some cases, training data (e.g., training images for one or more computer vision tasks) for training the one or more machine learning models is stored at the database 1015. A database is an organized collection of data. For example, a database stores data in a specified format known as a schema. A database may be structured as a single database, a distributed database, multiple distributed databases, or an emergency backup database. In some cases, a database controller may manage data storage and processing in a database. In some cases, a user interacts with database controller. In other cases, a database controller may operate automatically without user interaction.

FIG. 11 illustrates an application of a computer vision system. For example, aspects of a computer vision process 1100 may be performed by the computer vision system 1000 described with reference to FIG. 10.

At step 1105, a server may pretrain a base model for performing computer vision tasks. For example, the base model may be trained based on a relatively large amount of training data.

At step 1110, the server may fine-tune the base model for multiple specific tasks. For example, each of the task-specific models may be initialized with parameters from the base model and fine-tuned using a relatively small amount of task-specific training data. The task-specific models may be transmitted to a smaller device such as a mobile device or a vehicle for performing a complex computer vision application such as generating augmented reality (AR) images or navigating a vehicle.

At step 1115, the computing device obtains an input image to be processed and used to perform the computer vision task. In some cases, the input image is used for generating multiple computer-vision related subtasks simultaneously, such as for object detection and depth perception.

At step 1120, the computing device processes the input image using multiple neural networks to obtain multiple task-specific outputs simultaneously. In some cases, the computing device processes the input image by storing values for one of the neural networks in a base form, and storing delta values for another neural network performing another computer vision task. Since many of the values at intermediate layers of the two neural networks are similar, processes of storing and retrieving the values can be performed efficiently by generating and storing delta values.

At step 1125, the computing device may use the output from the multiple computer vision tasks to perform a complex task such as generating AR images or navigating a vehicle (or another vehicle related task). The complex task may utilize inputs from multiple subtasks, such as object detection, classification, or depth perception. Since these tasks may be performed efficiently based on the methods described herein, the computing device may perform the complex task in real time with limited processing and memory resources.

The present description describes additional aspects of the methods, apparatuses, and/or systems related to the disclosure. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order.

Accordingly, the features described herein may be embodied in different forms and are not to be construed as being limited to the example embodiments described herein. Rather, the example embodiments described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application. As used herein, “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B, or C,” “at least one of A, B, and C,” and “A, B, or C,” each of which may include any one of the items listed together in the corresponding one of the phrases, or all possible combinations thereof. Terms, such as first, second, and the like, may be used herein to describe components. Each of these terminologies is not used to define an essence, order or sequence of a corresponding component but used merely to distinguish the corresponding component from other component(s). For example, a first component may be referred to as a second component, and similarly the second component may also be referred to as the first component. Throughout the disclosure, when an element is described as “connected to” or “coupled to” another element, it may be directly “connected to” or “coupled to” the other element, or there may be one or more other elements intervening therebetween. In contrast, when an element is described as “directly connected to” or “directly coupled to” another element, there may be no other elements intervening therebetween.

The terminology used herein is for describing various example embodiments only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms “comprises,” “includes,” and “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof.

Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Also, in the description of example embodiments, description of structures or functions that are thereby known after an understanding of the disclosure of the present application will be omitted when it is deemed that such description will cause ambiguous interpretation of the example embodiments. Example embodiments are described with reference to the accompanying drawings, and like reference numerals in the drawings refer to like elements throughout.

The examples described herein may be implemented using hardware components, software components and/or combinations thereof. A processing device may be implemented using one or more general-purpose or special purpose computers, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of responding to and executing instructions in a defined manner. The processing device may run an operating system (OS) and one or more software applications that run on the OS. The processing device also may access, store, manipulate, process, and create data in response to execution of the software. For purpose of simplicity, the description of a processing device is used as singular; however, one skilled in the art will appreciated that a processing device may include multiple processing elements and multiple types of processing elements. For example, a processing device may include multiple processors or a processor and a controller. In addition, different processing configurations are possible, such as, parallel processors.

Software may include a computer program, a piece of code, an instruction, or some combination thereof, to independently or collectively instruct or configure the processing device to operate as desired. Software and/or data may be embodied permanently or temporarily in any type of machine, component, physical or virtual equipment, computer storage medium or device, or in a propagated signal wave capable of providing instructions or data to or being interpreted by the processing device. The software also may be distributed over network-coupled computer systems so that the software is stored and executed in a distributed fashion. The software and data may be stored by one or more non-transitory computer-readable recording mediums.

The methods according to the above-described examples may be recorded in non-transitory computer-readable media including program instructions to implement various operations of the above-described examples. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of examples, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM discs, DVDs, and/or Blue-ray discs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory (e.g., USB flash drives, memory cards, memory sticks, etc.), and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher-level code that may be executed by the computer using an interpreter.

The above-described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described examples, or vice versa.

While this disclosure includes specific examples, it will be apparent to one of ordinary skill in the art that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.

Therefore, the scope of the disclosure is defined not by the description, but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.

您可能还喜欢...