LibCat » Книги » Приключения » unrecognised » Liliana Andrade - Multi-Processor System-on-Chip 1

Liliana Andrade - Multi-Processor System-on-Chip 1

Здесь есть возможность читать онлайн «Liliana Andrade - Multi-Processor System-on-Chip 1» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Multi-Processor System-on-Chip 1
Автор:
Liliana Andrade
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Multi-Processor System-on-Chip 1: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Multi-Processor System-on-Chip 1»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. <p><i>Multi-Processor System-on-Chip 1</b> covers the key components of MPSoC: processors, memory, interconnect and interfaces. It describes advance features of these components and technologies to build efficient MPSoC architectures. All the main components are detailed: use of memory and their technology, communication support and consistency, and specific processor architectures for general purposes or for dedicated applications.

Multi-Processor System-on-Chip 1 — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Multi-Processor System-on-Chip 1», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

The performance data for the RISC-V processor published in (Croome 2018) reports a total of 1.5 Mcycles for executing the CIFAR-10 graph on a highly parallel 8-core RISC-V architecture. For calculating the total number of cycles on a single RISC-V core, we consider that the performance is highly dominated by the cycles spent on 5x5 convolutions, which constitute more than 98% of the compute operations in this graph. For these 5x5 convolutions, (Croome 2018) reports a speed-up from a 1-core system to an 8-core system of 18.5/2.2 = 8.2. Hence, a reasonable estimate for the total number of cycles on a single RISC-V core is 1.5x8.2 = 12.3 Mcycles.

Table 1.4. Performance data for the CIFAR-10 CNN graph

#	Layer type	ARC EM9D [ Mcycles ]	Processor A [ Mcycles ]	Processor B (RISC-V ISA) [ Mcycles ]
0	Permute	0.01	–	–
1	Convolution	1.63	6.78	–
2	Max Pooling	0.14	0.34	–
3	Convolution	3.46	9.25	–
4	Avg Pooling	0.09	0.09	–
5	Convolution	1.76	4.88	–
6	Avg Pooling	0.07	0.04	–
7	Fully-connected	0.03	0.02	–
8	Fully-connected	0.001		–
Total		7.2	21.4	12.3

From Table 1.4, we conclude that the ARC EM9D processor spends 3x fewer cycles than processor A and 1.7x fewer cycles than the RISC-V core (processor B) for the same machine learning inference task, without using any specific accelerators. Thanks to the good cycle efficiency, the ARC EM9D processor can be clocked at a low frequency, which helps to save power in a smart IoT edge device.

1.4. Conclusion

Smart IoT edge devices that interact intelligently with their users are appearing in many application areas. These devices have diverse compute requirements, including a mixture of control processing, DSP and machine learning. Versatile processors are required to efficiently execute these different types of workloads. Furthermore, these processors must allow for easy customization to improve their efficiency for a specific application. Configurability and extensibility are two key mechanisms that provide such customization. Increasingly, IoT edge devices apply machine learning technology for processing captured sensor data, so that smart actions can be taken based on recognized patterns. We presented key processor features and a software library for the efficient implementation of low/mid-end machine learning inference. More specifically, we highlighted several processor capabilities, such as vector MAC instructions and XY memory with advanced AGUs, that are key to the efficient implementation of machine learning inference. The ARC EM9D processor is a universal processor for low-power IoT applications which is both configurable and extensible. The complete and highly optimized embARC MLI library makes effective use of the ARC EM9D processor to efficiently support a wide range of low/mid-end machine learning applications. We demonstrated this efficiency with excellent results for the CIFAR-10 benchmark.

1.5. References

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., Chen, J., Chen, J., Chen, Z., Chrzanowski, M., Coates, A., Diamos, G., Ding, K., Du, N., Elsen, E., Engel, J., Fang, W., Fan, L., Fougner, C., Gao, L., Gong, C., Hannun, A., Han, T., Johannes, L.V., Jiang, B., Ju, C., Jun, B., LeGresley, P., Lin, L., Liu, J., Liu, Y., Li, W., Li, X., Ma, D., Narang, S., Ng, A., Ozair, S., Peng, Y., Prenger, R., Qian, S., Quan, Z., Raiman, J., Rao, V., Satheesh, S., Seetapun, D., Sengupta, S., Srinet, K., Sriram, A., Tang, H., Tang, L., Wang, C., Wang, J., Wang, K., Wang, Y., Wang, Z., Wang, Z., Wu, S., Wei, L., Xiao, B., Xie, W., Xie, Y., Yogatama, D., Yuan, B., Zhan, J., Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. Proceedings of the 33rd International Conference on Machine Learning – Volume 48, ICML-16 , 173–182.

Croome, M. (2018). Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge devices [Online]. Available at: https://content.riscv.org/wp-content/uploads/2018/07/Shanghai-1325_GreenWaves_Shanghai-2018-MC-V2.pdf.

Dutt, N. and Choi, K. (2003). Configurable processors for embedded computing. IEEE Computer , 36(1), 120–123.

embARC Open Software Platform (2019). Available at: https://embarc.org/.

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A.G., Adam, H., and Kalenichenko, D. (2017). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Computing Research Repository . Available at: http://arxiv.org/abs/1712.05877.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. Computing Research Repository . Available at: https://arxiv.org/abs/1408.5093.

Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009.

Lai, L., Suda, N., and Chandra, V. (2018). CMSIS-NN: Efficient neural network kernels for arm cortex-M CPUs. Computing Research Repository . Available at: http://arxiv.org/abs/1801.06601.

Petrov-Savchenko, A. and van der Wolf, P. (2018). Get smart with NB-IoT: Efficient low-cost implementation of NB-IoT for smart applications. Technical paper, Synopsys [Online]. Available at: https://www.synopsys.com/dw/doc.php/wp/NB_IoT_for_Smart_Applications.pdf.

Q Number Format (2019). Available at: https://en.wikipedia.org/wiki/Q_(number_format).

Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development , 3(3), 210–229.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 1–9.

For a color version of all figures in this book, see www.iste.co.uk/andrade/multi1.zip.

A Qualitative Approach to Many-core Architecture

Benoît DUPONT DE DINECHIN

Kalray S.A., Grenoble, France

We present the design of the Kalray third-generation MPPA many-core processor, whose objectives are to combine the performance scalability of GPGPUs, the energy efficiency of DSP cores and the I/O capabilities of FPGA devices. These objectives are motivated by the consolidation of high-performance and high-integrity functions on a single computing platform for autonomous vehicles. High-performance computing functions, represented by deep learning inference and computer vision, need to execute under soft real-time constraints. High-integrity functions are developed under model-based design, and must meet hard real-time constraints. Finally, the third-generation MPPA processor integrates a hardware root of trust and implements a security architecture, in order to support trusted execution environments.