Building a Pipelined CPU Core - An Introduction to Computer Architecture and Digital Design

Introduction

Computing has evolved in such a way that it has become increasingly impossible to be fully understood without several abstractions layers hiding complexities. A computer programmer does not need to directly interact with the machine anymore, high level programming languages (which simply means abstracted programming languages) allow you to write what you want the computer to do in a way that’s more easily understood by humans, making it significantly easier to express complex algorithms and mathematical ideas in general.

Hardware Abstraction

High level programming languages are usually transformed into machine code through the help of compilers, assemblers and linkers. In a nutshell, their job is to take your written code and “convert” it into the actual instructions a computer can “understand”. Note that this allows the same piece of code to target different instruction set architectures since you don’t actually need to rewrite the code itself, you just need to recompile it.

LLVM Compiler Infrastructure

But what is an instruction set architecture? An ISA defines which instructions are available, what their behaviour is and how they are encoded. It also defines the set of registers, what they represent, how you can access them, the memory model and the levels of privilege. You might feel tempted to think the ISA dictates how a given processor internally works but this is not true. The ISA is just the software/hardware interface, it’s merely a standard that needs to be followed if you want programs to work in a wide array of processors with different characteristics without having to recompile it each time.

How can CPUs with the same ISA be different then? To answer this I first want you to think about it in a software way. The action of sorting only has a single valid output (the sorted result) but you can achieve the same thing using different algorithms (binary search, bubble sort, quick sort, etc). It’s the exact same thing for an ISA. An instruction defines what the output should be for a given input, but it does not define how this output should be computed.

The field of computer architecture comprises everything related to the design of computing systems at an abstracted level, which means understanding ISAs, pipelines, superscalar CPUs, Out-of-Order Execution, caching, GPUs, TPUs, etc. The field of digital design is responsible for implementing said things, usually through RTL design using a HDL such as VHDL or SystemVerilog.

The following sections will guide you through the implementation of a 5-stage pipeline in-order execution scalar RV32I CPU core written in VHDL, this might sound relatively complicated right now but if you continue you’ll see it’s actually quite simple.

The RV32I ISA

RISC-V is an open-source and royalty-free ISA that has been gaining a lot of traction in the last few years. Think of it as a competitor to x86 and ARM (you’ve heard those 2 names several times right?) that allows universities and companies to freely innovate building accelerators and custom domain-specific systems. The ISA aims to be the Linux of hardware and I highly recommend reading my blog post on open-hardware.

Before continuing, I'd like to make it clear that RISC-V is not only relevant because it's royalty-free, the ISA itself is actually very competitive and the resulting code after the compilation process is usually not larger than x86 or ARM equivalents. The following image depicts what I'm talking about on SPEC, please check the entire presentation from Christopher Celio.

RISC-V vs x86 and ARM

Without further ado, I’ll briefly introduce the RISC-V ISA so we can understand what we’ll be implementing in the next sections.

RISC-V is a modular ISA, which means not all instructions (or registers) need to be implemented if your implementation doesn’t need them. The base set with only the integer instructions is called the I subset, and it exists in 32b, 64b and 128b versions, RV32I, RV64I and RV128I, respectively. The I subset defines 32 internal general purpose registers but a reduced version for embedded systems exists with 16 only, the E subset (RV32E, RV64E and RV128E).

Extensions to the base set simply get their names appended into the suffix, so for example a CPU that implements the multiplication, atomics and compressed instructions extensions on top of RV32E has a RV32EMAC ISA. Some extensions can be “grouped” together to simplify the naming scheme, so instead of having RV64IMAFDC with all extensions explicitly stated, we have RV64GC, G means general purpose and shortens IMAFD.

The list of extensions is always changing and I recommend checking the most recent ISA manual PDF.

Instructions are grouped into encoding schemes by the kind of operation they perform, and the least significant bits tell the encoder how long the instruction is (in the case there’s support for the C extension).