Liliana Andrade - Multi-Processor System-on-Chip 1

Здесь есть возможность читать онлайн «Liliana Andrade - Multi-Processor System-on-Chip 1» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Multi-Processor System-on-Chip 1: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Multi-Processor System-on-Chip 1»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. <p><i>Multi-Processor System-on-Chip 1</b> covers the key components of MPSoC: processors, memory, interconnect and interfaces. It describes advance features of these components and technologies to build efficient MPSoC architectures. All the main components are detailed: use of memory and their technology, communication support and consistency, and specific processor architectures for general purposes or for dedicated applications.

Multi-Processor System-on-Chip 1 — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Multi-Processor System-on-Chip 1», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

However, there are also good reasons to aim to reduce the number of processors. Lower cost is a key benefit, which is particularly relevant for low-cost IoT edge devices that are produced in high volumes. The use of fewer processors also reduces design complexity, as it simplifies the interconnect and memory subsystem required to integrate the processors. Furthermore, if multiple interacting functions are combined to be executed on a single processor, then this will limit data movements and reduce the software overhead for communication. An additional benefit for software developers is that a single tool chain can be used. To enable the flexible combination of functions, we need versatile processorsthat can efficiently execute different types of workloads, including control tasks, DSP and machine learning. Such processors are also referred to as DSP-enhanced RISC cores. They add a broad set of instructions for DSP and machine learning to a RISC core. If done well, the hardware overhead of these additions is small, for example, by sharing the register file and having unified functional units (e.g. a multiplier) for control processing, DSP and machine learning. Today, optimized DSP-enhanced RISC cores are available from IP vendors.

1.2.2. Configurability and extensibility

Integrated circuits for low-power IoT edge devices are often built using off-the-shelf processor IP that can be licensed from IP vendors. Since such licensable processors are multi-purpose by nature, to enable reuse across different customers and applications, they may not be optimal for efficiently implementing a specific set of application functions. However, some of these licensable processors offer support for customization by chip designers, in order to allow the processors to be tailored to the functions they need to perform for a specific application (Dutt and Choi 2003). More specifically, two mechanisms can be used to provide such customization capabilities:

– Configurability: the processor IP is delivered as a parameterized processor that can be configured by the chip designer for the targeted application. More specifically, unnecessary features can be deconfigured and optimal parameters can be selected for various architectural features. This may involve optimization of the compute capabilities, memory organization, external interfaces, etc. For example, the chip designer may configure the memory subsystem with closely coupled memories and/or caches. Configurability allows performance to be optimized for the application at hand, while reducing area and power consumption.

– Extensibility: the processor can be extended with custom instructions to enhance the performance for specific application functions. For the application at hand, the performance may be dominated by specific functions that execute critical code segments. The execution of such code segments may be accelerated dramatically by adding a few custom instructions. A further benefit of using these custom instructions is that the code size is reduced.

Both configurability and extensibility need to be used at design time. This must be supported by a tool chain (i.e. compiler, simulator, debugger) that is automatically enhanced to support the selected configuration and the added custom instructions. For example, the compiler must generate optimal code for the selected configuration while supporting programmers in using the custom instructions. Similarly, simulation models must support the selected configuration and include the custom instructions. If done properly, large performance gains can be achieved while optimizing area, power and code size, with a minimal impact on design time.

As an example of extensibility, we consider Viterbi decoding, which is a prominent function in an NB-IoT protocol stack for performing forward error correction (FEC) in the receiver. When using a straightforward software implementation on an off-the-shelf processor, this kernel becomes one of the most computationally intensive parts of an NB-IoT modem. Viterbi or similar FEC schemes are used in many communication technologies, especially in the IoT field, and often are a bottleneck in modem design.

In (Petrov-Savchenko and van der Wolf 2018), a processor extension for Viterbi decoding is presented using four custom instructions, which enhance the performance to just a few cycles per decoded bit. The instructions include a reset instruction, two instructions to calculate the path metrics and one instruction for the traceback. The instructions can be conveniently used as intrinsic instructions in the C source code. The resulting implementation reduces the worst-case MHz requirements for the Viterbi decoding function in an NB-IoT protocol stack to less than 1 MHz.

We note that the ability to extend the processor with custom instructions is radically different from adding an external hardware accelerator on a system bus. Using an external bus-based hardware accelerator requires data to be moved over a bus, with additional memory and synchronization requirements (e.g. through interrupts), thereby impacting area, cycles, power consumption and code size. When using custom instructions, these can be used directly in the software thread on the processor, accessing data that is available locally on the processor, in local registers or in local memory. Hence, there are no overheads for moving data to/from an accelerator or for performing explicit synchronization. It also greatly simplifies software development.

1.3. Machine learning inference

In this section, we investigate in detail the requirements and processor capabilities for efficient machine learning in low-power IoT edge devices. The common theme in machine learning is that algorithms that have the ability to learn without being explicitly programmed are used (Samuel 1959). As shown in Figure 1.1, in machine learning, we distinguish between training and inference .

Figure 11 Training and inference in machine learning Training starts with an - фото 2

Figure 1.1. Training and inference in machine learning

Training starts with an untrained model, for example a multi-layered neural network with a chosen graph structure. In these neural networks, each layer transforms input data into output data while applying sets of coefficients or weights . Using a machine learning framework such as Caffe or TensorFlow, the model is trained using a large training dataset. The result of the training is a trained model, for example, a neural network with its weights tuned for classifying input data into certain categories. Such categories may, for example, be the different types of human activity in the activity tracker device mentioned above.

Inference uses the trained model for processing input data captured by sensors to infer the complex patterns it has been trained to recognize. For example, it can check whether the input data matches one of the categories that a neural network has been trained for, such as “walking” or “sitting” in the activity tracker device. Therefore, upon inference, the trained model is applied to new data. Inference is typically performed in the field. In this chapter, we focus on inference rather than on training. More specifically, we address the efficient implementation of machine learning inference on a programmable processor.

The processing requirements of machine learning inference can vary wildly for different applications. Some key factors impacting the processing requirements are:

– input data rate: this is the rate at which data samples are captured by the sensor(s). These samples can, for example, be pixels coming from a camera or pulse-code modulation (PCM) samples coming from a microphone. The input data rate can range from tens of samples per second, for example, for human activity recognition with a small number of sensors, to hundreds of millions of samples per second, for advanced computer vision with a high-resolution camera capturing images at a high frame rate;

Читать дальше
Тёмная тема
Сбросить

Интервал:

Закладка:

Сделать

Похожие книги на «Multi-Processor System-on-Chip 1»

Представляем Вашему вниманию похожие книги на «Multi-Processor System-on-Chip 1» списком для выбора. Мы отобрали схожую по названию и смыслу литературу в надежде предоставить читателям больше вариантов отыскать новые, интересные, ещё непрочитанные произведения.


Отзывы о книге «Multi-Processor System-on-Chip 1»

Обсуждение, отзывы о книге «Multi-Processor System-on-Chip 1» и просто собственные мнения читателей. Оставьте ваши комментарии, напишите, что Вы думаете о произведении, его смысле или главных героях. Укажите что конкретно понравилось, а что нет, и почему Вы так считаете.

x