LibCat » Книги » Приключения » unrecognised » Albert Chun-Chen Liu - Artificial Intelligence Hardware Design

Albert Chun-Chen Liu - Artificial Intelligence Hardware Design

Здесь есть возможность читать онлайн «Albert Chun-Chen Liu - Artificial Intelligence Hardware Design» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Artificial Intelligence Hardware Design
Автор:
Albert Chun-Chen Liu
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
4 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 80
- 1
- 2
- 3
- 4
- 5

Artificial Intelligence Hardware Design: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Artificial Intelligence Hardware Design»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

ARTIFICIAL INTELLIGENCE HARDWARE DESIGN
Learn foundational and advanced topics in Neural Processing Unit design with real-world examples from leading voices in the field Artificial Intelligence Hardware Design: Challenges and Solutions
Artificial Intelligence Hardware Design

Artificial Intelligence Hardware Design — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Artificial Intelligence Hardware Design», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

Table of Contents

1 Cover

2 Series Page IEEE Press 445 Hoes Lane Piscataway, NJ 08854 IEEE Press Editorial Board Ekram Hossain, Editor in Chief Jón Atli Benediktsson Xiaoou Li Jeffrey Reed Anjan Bose Lian Yong Diomidis Spinellis David Alan Grier Andreas Molisch Saeid Nahavandi Elya B. Joffe Sarah Spurgeon Ahmet Murat Tekalp

3 Title Page

4 Copyright Page Copyright © 2021 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com . Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission . Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com . Library of Congress Cataloging‐in‐Publication data applied for: ISBN: 9781119810452 Cover design by Wiley Cover image: © Rasi Bhadramani/iStock/Getty Images

5 Author Biographies

6 Preface

7 Acknowledgments

8 Table of Figures

9 1 Introduction 1.1 Development History 1.2 Neural Network Models 1.3 Neural Network Classification 1.4 Neural Network Framework 1.5 Neural Network Comparison Exercise References

10 2 Deep Learning 2.1 Neural Network Layer 2.2 Deep Learning Challenges Exercise References

11 3 Parallel Architecture 3.1 Intel Central Processing Unit (CPU) 3.2 NVIDIA Graphics Processing Unit (GPU) 3.3 NVIDIA Deep Learning Accelerator (NVDLA) 3.4 Google Tensor Processing Unit (TPU) 3.5 Microsoft Catapult Fabric Accelerator Exercise References

12 4 Streaming Graph Theory 4.1 Blaize Graph Streaming Processor 4.2 Graphcore Intelligence Processing Unit Exercise References

13 5 Convolution Optimization 5.1 Deep Convolutional Neural Network Accelerator 5.2 Eyeriss Accelerator Exercise References

14 6 In‐Memory Computation 6.1 Neurocube Architecture 6.2 Tetris Accelerator 6.3 NeuroStream Accelerator Exercise References

15 7 Near‐Memory Architecture 7.1 DaDianNao Supercomputer 7.2 Cnvlutin Accelerator Exercise References

16 8 Network Sparsity 8.1 Energy Efficient Inference Engine (EIE) 8.2 Cambricon‐X Accelerator 8.3 SCNN Accelerator 8.4 SeerNet Accelerator Exercise References

17 9 3D Neural Processing 9.1 3D Integrated Circuit Architecture 9.2 Power Distribution Network 9.3 3D Network Bridge 9.4 Power‐Saving Techniques Exercise References

18 Appendix A: Neural Network Topology

19 Index

20 End User License Agreement

List of Tables

1 Chapter 1Table 1.1 Neural network framework.

2 Chapter 2Table 2.1 AlexNet neural network model.

3 Chapter 3Table 3.1 Intel Xeon family comparison.Table 3.2 NVIDIA GPU architecture comparison.Table 3.3 TPU v1 applications.Table 3.4 Tensor processing unit comparison.

4 Chapter 5Table 5.1 Efficiency loss comparison.Table 5.2 DNN accelerator performance comparison.Table 5.3 Eyeriss v2 architectural hierarchy.Table 5.4 Eyeriss architecture.

5 Chapter 6Table 6.1 Neurocube performance comparison.

6 Chapter 8Table 8.1 SeerNet system performance comparison.

List of Illustrations

1 Chapter 1Figure 1.1 High‐tech revolution.Figure 1.2 Neural network development timeline.Figure 1.3 ImageNet challenge.Figure 1.4 Neural network model.Figure 1.5 Regression.Figure 1.6 Clustering.Figure 1.7 Neural network top 1 accuracy vs. computational complexity.Figure 1.8 Neural network top 1 accuracy density vs. model efficiency [14]....Figure 1.9 Neural network memory utilization and computational complexity [1...

2 Chapter 2Figure 2.1 Deep neural network AlexNet architecture [1].Figure 2.2 Deep neural network AlexNet model parameters.Figure 2.3 Deep neural network AlexNet feature map evolution [3].Figure 2.4 Convolution function.Figure 2.5 Nonlinear activation functions.Figure 2.6 Pooling functions.Figure 2.7 Dropout layer.Figure 2.8 Deep learning hardware issues [1].

3 Chapter 3Figure 3.1 Intel Xeon processor ES 2600 family Grantley platform ring archit...Figure 3.2 Intel Xeon processor scalable family Purley platform mesh archite...Figure 3.3 Two‐socket configuration.Figure 3.4 Four‐socket ring configuration.Figure 3.5 Four‐socket crossbar configuration.Figure 3.6 Eight‐socket configuration.Figure 3.7 Sub‐NUMA cluster domains [3].Figure 3.8 Cache hierarchy comparison.Figure 3.9 Intel multiple sockets parallel processing.Figure 3.10 Intel multiple socket training performance comparison [4].Figure 3.11 Intel AVX‐512 16 bits FMA operations (VPMADDWD + VPADDD).Figure 3.12 Intel AVX‐512 with VNNI 16 bits FMA operation (VPDPWSSD).Figure 3.13 Intel low‐precision convolution.Figure 3.14 Intel Xenon processor training throughput comparison [2].Figure 3.15 Intel Xenon processor inference throughput comparison [2].Figure 3.16 NVIDIA turing GPU architecture.Figure 3.17 NVIDIA GPU shared memory.Figure 3.18 Tensor core 4 × 4 × 4 matrix operation [9].Figure 3.19 Turing tensor core performance [7].Figure 3.20 Matrix D thread group indices.Figure 3.21 Matrix D 4 × 8 elements computation.Figure 3.22 Different size matrix multiplication.Figure 3.23 Simultaneous multithreading (SMT).Figure 3.24 Multithreading schedule.Figure 3.25 GPU with HBM2 architecture.Figure 3.26 Eight GPUs NVLink2 configuration.Figure 3.27 Four GPUs NVLink2 configuration.Figure 3.28 Two GPUs NVLink2 configuration.Figure 3.29 Single GPU NVLink2 configuration.Figure 3.30 NVDLA core architecture.Figure 3.31 NVDLA small system model.Figure 3.32 NVDLA large system model.Figure 3.33 NVDLA software dataflow.Figure 3.34 Tensor processing unit architecture.Figure 3.35 Tensor processing unit floorplan.Figure 3.36 Multiply–Accumulate (MAC) systolic array.Figure 3.37 Systolic array matrix multiplication.Figure 3.38 Cost of different numerical format operation.Figure 3.39 TPU brain floating‐point format.Figure 3.40 CPU, GPU, and TPU performance comparison [15].Figure 3.41 Tensor Processing Unit (TPU) v1.Figure 3.42 Tensor Processing Unit (TPU) v2.Figure 3.43 Tensor Processing Unit (TPU) v3.Figure 3.44 Google TensorFlow subgraph optimization.Figure 3.45 Microsoft Brainwave configurable cloud architecture.Figure 3.46 Tour network topology.Figure 3.47 Microsoft Brainwave design flow.Figure 3.48 The Catapult fabric shell architecture.Figure 3.49 The Catapult fabric microarchitecture.Figure 3.50 Microsoft low‐precision quantization [27].Figure 3.51 Matrix‐vector multiplier overview.Figure 3.52 Tile engine architecture.Figure 3.53 Hierarchical decode and dispatch scheme.Figure 3.54 Sparse matrix‐vector multiplier architecture.Figure 3.55 (a) Sparse Matrix; (b) CSR Format; and (c) CISR Format.