LibCat » Книги » Приключения » unrecognised » Liliana Andrade - Multi-Processor System-on-Chip 2

Liliana Andrade - Multi-Processor System-on-Chip 2

Здесь есть возможность читать онлайн «Liliana Andrade - Multi-Processor System-on-Chip 2» — ознакомительный отрывок электронной книги совершенно бесплатно, а после прочтения отрывка купить полную версию. В некоторых случаях можно слушать аудио, скачать через торрент в формате fb2 и присутствует краткое содержание. Жанр: unrecognised, на английском языке. Описание произведения, (предисловие) а так же отзывы посетителей доступны на портале библиотеки ЛибКат.

Читать книгу

Название:
Multi-Processor System-on-Chip 2
Автор:
Liliana Andrade
Жанр:
unrecognised / на английском языке
Год:
неизвестен
ISBN:
нет данных
Рейтинг книги:
3 / 5. Голосов: 1
Избранное:

Добавить в избранное
Отзывы:
Написать комментарий
Ваша оценка:
- 60
- 1
- 2
- 3
- 4
- 5

Multi-Processor System-on-Chip 2: краткое содержание, описание и аннотация

Предлагаем к чтению аннотацию, описание, краткое содержание или предисловие (зависит от того, что написал сам автор книги «Multi-Processor System-on-Chip 2»). Если вы не нашли необходимую информацию о книге — напишите в комментариях, мы постараемся отыскать её.

A Multi-Processor System-on-Chip (MPSoC) is the key component for complex applications. These applications put huge pressure on memory, communication devices and computing units. This book, presented in two volumes – Architectures and Applications – therefore celebrates the 20th anniversary of MPSoC, an interdisciplinary forum that focuses on multi-core and multi-processor hardware and software systems. It is this interdisciplinarity which has led to MPSoC bringing together experts in these fields from around the world, over the last two decades. <p><i>Multi-Processor System-on-Chip 2</i> covers application-specific MPSoC design, including compilers and architecture exploration. This second volume describes optimization methods, tools to optimize and port specific applications on MPSoC architectures. Details on compilation, power consumption and wireless communication are also presented, as well as examples of modeling frameworks and CAD tools. Explanations of specific platforms for automotive and real-time computing are also included.

Multi-Processor System-on-Chip 2 — читать онлайн ознакомительный отрывок

Ниже представлен текст книги, разбитый по страницам. Система сохранения места последней прочитанной страницы, позволяет с удобством читать онлайн бесплатно книгу «Multi-Processor System-on-Chip 2», без необходимости каждый раз заново искать на чём Вы остановились. Поставьте закладку, и сможете в любой момент перейти на страницу, на которой закончили чтение.

Тёмная тема

Шрифт:

↓

↑

Сбросить

Интервал:

↓

↑

Закладка:

Сделать

2.3.3. Polar decoder

As discussed earlier, Polar codes have recently attracted much attention in the context of 5G. Successive Cancelation (SC), Successive Cancelation List (SCL) and BP are the most prominent decoding algorithms for Polar codes. Decoding corresponds to a breadth-first (BP) or depth-first traversal (SC, SCL) of the Polar Factor Tree (PFT) (Alamdar-Yazdi and Kschischang 2011), in which the received log-likelihood ratios from the channel are processed by the tree nodes. BP needs a large set of iterations to achieve the error correction performance of SC and is, thus, not well suited for very high throughput and low latency (Abbas et al . 2017). SC and SCL, on the other hand, have sequential behavior due to the mandatory depth-first tree traversal. To achieve a very high throughput, the tree traversal can be unrolled and pipelined (Giard et al . 2015), very similar to the iteration unrolling in Turbo code and LDPC code decoding as described above. Whenever a node is visited during the tree traversal, a corresponding pipeline stage can be instantiated. In this way, for a block length of N (= number of leaves in the PFT), the maximum number of pipeline stages is 2 ∗ (2 N − 2) + 1. In this way, all N ∗ log N operations are executed in parallel, i.e. P = 1. Obviously, the complexity of the decoding architecture is directly proportional to the size of the PFT. However, for a given code, the tree can be reduced by various transformations. For example, if a subtree represents a repetition code or a parity check code, the corresponding subtree can be replaced by a single node. Similarly, we can merge rate-0 and rate-1 nodes into its parent nodes (Sarkis et al . 2014) or use majority logic decoding in subtrees. The achievable tree reduction strongly depends on the underlying Polar code. SC is the best candidate to achieve throughput towards 1 Tbit/s. Here, we consider a (1,024, 512) Polar code that is decoded with the SC algorithm. Again we use the same technology and PVT setup as for the Turbo and LDPC decoders. The unoptimized PFT has 2,047 nodes, corresponding to 4,093 pipeline stages. Performing the aforementioned tree optimization yields a reduction to 385 stages. Implementing this tree in a fully pipelined architecture would require about 310 KB pipeline memory. This huge memory requirement illustrates a further challenge for high-throughput decoder architectures. Although pipelining enables highest throughput for decoders, these architectures suffer from large latency and, more importantly, large power consumption. A huge amount of data has to be stored inside the pipelined architecture. These data sets are stored in registers that have to be driven by a clock tree. If a design contains many registers, the clock tree can become the dominant power consumer. Hence, minimizing the register load is a main optimization goal. On the code level, it can be performed by the aforementioned tree optimizations. On an architectural level, register balancing (retiming) is a very efficient technique to further reduce the pipeline stages. Advanced retiming with a frequency constraint of 700 MHz results in a partially pipelined architecture with only 105 pipeline stages (85 KB memory). The power consumption of this design is 5.7 W, of which more than 70% is consumed in registers and the clock tree. In a further step, some of the registers can be replaced by latches that have a much lower load. Latch-based design with some further optimizations reduces the area from 3.14 mm 2to 2.79 mm 2and the power consumption from 5.7 W to 2.7 W. The final (coded) throughput is 664 Gbit/s. Only 32% of the overall power accounts for registers/latches and the clock tree. Figure 2.2 (right) shows the layout of this Polar decoder. Each color represents a pipeline stage (105), the black color is memory. The decoder was fully automatically optimized and generated with the framework presented in Kestel et al . (2018b) and Lehnigk-Emden et al . (2019).

2.4. Conclusion

We have shown that throughputs towards 1 Tbit/s are feasible for all three code classes by appropriate “unrolling”, using heavy pipelining and spatial parallelism. However, this architectural approach is limited to small block sizes and small numbers of iterations (Turbo and LDPC codes), which negatively impacts the communications performance. Moreover, although pipelining largely increases the throughput and locality, it also increases the latency. All architectures suffer from limited flexibility in terms of block sizes (all three codes), varying number of iterations (Turbo and LDPC codes) and code rate flexibility (LDPC and Polar codes). In summary, the biggest challenge for very high-throughput decoder architectures lies in the improvement of the communications performance, under the aforementioned implementation constraints and providing block size, code rate and algorithmic flexibility. As discussed in the introduction, microelectronic progress will largely contribute to an improved area efficiency but not as much to an increased performance and a reduced power density. Thus, further research is mandatory to keep pace with the increasing requirements on communication systems in terms of throughput, latency, power/energy efficiency, flexibility, cost and communications performance.

2.5. Acknowledgments

We gratefully acknowledge financial support by the EU (project-ID: 760150-EPIC).

2.6. References

Abbas, S.M., Fan, Y., Chen, J., and Tsui, C.Y. (2017). High-throughput and energy-efficient belief propagation polar code decoder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 25(3), 1098–1111.

Alamdar-Yazdi, A. and Kschischang, F.R. (2011). A simplified successive-cancellation decoder for polar codes. IEEE Communications Letters , 15(12), 1378–1380.

Amdahl, G.M. (2013). Computer architecture and Amdahl’s law. Computer , 46(12), 38–46.

Arikan, E. (2009). Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory , 55(7), 3051–3073.

Berrou, C., Glavieux, A., and Thitimajshima, P. (1993). Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1. Proceedings of ICC ‘93 - IEEE International Conference on Communications , vol. 2, 1064–1070.

EPIC (2020). Enabling Practical Wireless Tb/s Communications with Next Generation Channel Coding [Online]. Available at: https://epic-h2020.eu/results.

Fettweis, G.P. and Matus, E. (2017). Scalable 5G MPSoC architecture. 2017 51st Asilomar Conference on Signals, Systems, and Computers , 613–618.

Gallager, R. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, 8(1), 21–28.

Ghanaatian, R., Balatsoukas-Stimming, A., Müller, T.C., Meidlinger, M., Matz, G., Teman, A., and Burg, A. (2018). A 588-Gb/s LDPC decoder based on finite-alphabet message passing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 26(2), 329–340.

Giard, P., Sarkis, G., Thibeault, C., and Gross, W.J. (2015). 237 Gbit/s unrolled hardware polar decoder. Electronics Letters , 51(10), 762–763.

Horowitz, M. (2014). Computing’s Energy Problem: (and what we can do about it). Keynote in 2014 IEEE International Solid-State Circuits Conference . [Online]. Available at: http://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdf

Ilnseher, T., Kienle, F., Weis, C., and Wehn, N. (2012). A 2.15GBit/s turbo code decoder for LTE advanced base station applications. 2012 7th International Symposium on Turbo Codes and Iterative Information Processing (ISTC) , 21–25.