The CS/ECE 4/599 Course Blog

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing

by Sam Shaaban (Blogger), Isaac Lonergan (Presenter), Mykyta "Nick" Syntsia (Presenter), Darren Mai (Scribe), Shubhangi Pandey (Scribe), Nat Rurka, Adam Bobich

Introduction

Methods For Improving Performance

Without the ability to increase clock rates, performance gains mainly come from reducing memory latency and increasing parallelism.

Instruction Level Parallelism (ILP)

Thread Level Parallelism (TLP)

Architecture

Top Level

Scalable Architecture

Designed to allow easy scaling and custom implementation for up to 1024 nodes (of below processor and I/O chips).

Processor Chips

Processing Node Diagram

I/O Chips

I/O Chip Diagram

Components

Shared L2 Cache

Intra-chip Switch

Protocol Engine

Performance Results

Piranha Performance Comparison to Next-Gen OoO Processors

Class Discussion

Conclusion

Around 5 years after this paper, dual-core CPUs began to enter the market. Nowadays, it would be difficult to find anything, even a phone or chromebook, with less than 8 cores. This shows just how effective the ideas in this paper were at improving performance for not only commercial workloads but desktop workloads as well. Through simulation and modelling, they managed to prove the efficacy of a design that is scaled through adding more cores, instead of making them more complex. Although it can hurt performance on workloads that only utilize a single-core, the large potential to boost performance has proven itself worth this cost.

AI Disclosure