32 Bit Risc Processor
32 Bit Risc Processor
32 Bit Risc Processor
Abstract
To design 32 bit RISC processor. Develop VHDL code to implement the processor on
Cyclone board. Here we develop a simple multi-cycle processor. The top-down approach
is used in the design process. The processor is divided into two major parts—control unit
and datapath, which are independently developed and integrated to get the overall
processor.
By
G.Dinesh
Roll no:03425
Contents
1. Overview of Microprocessor
2. RISC vs CISC Processor
3. RISC processor Implementation—Single Cycle and Multi Cycle
processor.
4. Processor—top level entity
I. Instruction Format
II. Control unit
III. Datapath
a. ALU
b. ALU controller
c. Memory
d. Register file
e. Mux-32
f. Mux-5
g. Mux_sel
h. Pc Register
i. Instruction Register
j. Instruction split
k. Extender
5. Cyclone EP1C6 devices
Overview of Microprocessor:
The circuit for the microprocessor can be divided into two parts: the datapath and the
control unit as shown in the figure.
The datapath is responsible for the actual execution of all operations performed by the
microprocessor such as the addition inside the arithmetic logic unit (ALU). The datapath
also includes the registers for the temporary storage of your data. The functional units
inside the datapath (ALU, shifter, etc.) and the registers are connected together with
multiplexers and buses to form one unit, the datapath.
Even though the datapath is capable of performing all the operations of the
microprocessor, it cannot, however, do it on its own. In order for the datapath to execute
the operations automatically, the control unit is required. The control unit, also known as
the controller, controls the operations of the datapath, and therefore, the operations of the
entire microprocessor. The controller is a finite state machine (FSM) because it is a
machine that executes by going from one state to another, and the fact that there are only
a finite number of states for the machine to go to. The controller is made up of three
parts: the next-state logic, the state memory, and the output logic. The purpose of the
state memory is to remember the current state that the FSM is in. The next-state logic is
the circuit for determining what the next state ought to be for the machine. And the
output logic is the circuit for generating the actual control signals for controlling the
datapath.
Every digital logic circuit, regardless of whether it is part of the control unit or the
datapath, is categorized as either a combinational circuit or a sequential circuit. A
combinational circuit is one where the output of the circuit is dependent only on the
current inputs to the circuit. For example, an adder circuit is a combinational circuit. It
takes two numbers as inputs. When given the two inputs, the adder outputs the sum of the
two numbers as the output. A sequential circuit, on the other hand, is dependent not only
on the current inputs but also on all the previous inputs. In other words, a sequential
circuit has to remember its past history.
Design:
Digital circuits can be designed at any one of several abstraction levels. Designing at the
transistor level, which is the lowest level, we would be dealing with discrete transistors
and connecting them together to form the circuit. The next level up in the abstraction is
the gate level. At this level we are working with logic gates to build the circuit. At the
gate level, we can also specify the circuit using either a truth table or a Boolean equation.
Using logic gates, a designer usually creates combinational and sequential components to
be used in building larger circuits. In this way a very large circuit such as a
microprocessor can be built in a hierarchical fashion. Design methodologies have shown
that solving a problem hierarchically is always easier than trying to solve the entire
problem as a whole. These combinational and sequential components are used at the
register-transfer level in building the datapath and the control unit in the
microprocessor. At the register-transfer level, we are concerned about how the data is
transferred between the various registers and functional units to realize or solve the
problem at hand. Finally, at the highest level, which is the behavioral level, we construct
the circuit by describing the behavior or operation of the circuit using a hardware
description language.
Here we would be using VHDL to describe the circuit, at Register-level.
RISC vs CISC
A Complex Instruction Set Computer (CISC) provides a large and powerful range of
instructions, which is less flexible to implement. For example, the 8086 microprocessor
family has these instructions:
JA Jump if Above
JAE Jump if Above or Equal
JB Jump if Below
...
JPO Jump if Parity Odd
JS Jump if Sign
JZ Jump if Zero
There are 32 jump instructions in the 8086, and the 80386 adds more. The primary goal
of CISC architecture is to complete a task in as few lines of assembly as possible. This is
achieved by building processor hardware that is capable of understanding and executing a
series of operations.
Reduced Instruction Set Computer (RISC) only use simple instructions. As these are
much simpler, they can be implemented directly in silicon, so will run at the maximum
possible speed. There are only two Jump instructions in the ARM processor - Branch and
Branch with Link.
Most modern CISC processors, such as the Pentium, uses a fast RISC core with an
interpreter sitting between the core and the instruction.
In spite of above advantages RISC processor has the disadvantage that the program
would be lengthy. Software for RISC processors must handle more operations than
traditional CISC [Complex Instruction Set Computer] processors.
In Multi cycle processor, each instruction is divided to number of small parts, each of
which is executed in one clock cycle. Datapath is portioned in to equal size chuncks, to
minimize the cycle time.It has advantage that each instruction doesn’t require to take
same amount of time. So, cycle time is decided by datapath and control path
combinational delay.
In the following pages, we would develop a Multi cycle 32 bit RISC processor. Here we
would be using top-down approach.
Processor:
It is the top level design entity which comprises of datapath and control unit. It
structurally combines controller and datapath.
Entity:
Here InstrWr is active high signal that is used to write instructions and data into the
memory asynchronously. Instr_data<31:0> is data or instruction to be written, and
Instradr<31:0> is address of the location.
VHDL Code:
--The following package code is given in appendix.
use work.cpu_lib.ALL;--this includes a package, which defines various data types used in
--the program
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity processor is
port(clk:in std_logic;
InstrWr:in std_logic;
Instradr:in std_logic_vector(size downto 0);
Instr_data:in std_logic_vector(size downto 0);
r:out std_logic_vector(size downto 0));--defining the ports
end processor;
--Datapath
component datapath is
Port ( IorD : in std_logic;
MemWr : in std_logic;
MemRd : in std_logic;
InstrWr:in std_logic;
IRwr : in std_logic;
RegDst : in std_logic;
RegWr : in std_logic;
ExtOp : in std_logic;
MemtoReg : in std_logic;
ALUSelA : in std_logic;
ALUSelB : in std_logic_vector(1 downto 0);
ALUOp : in std_logic_vector(2 downto 0);
PCSrc: in std_logic;
clock: in std_logic;
Instradr:in std_logic_vector(size downto 0);
Instr_data:in std_logic_vector(size downto 0);
op:out std_logic_vector(5 downto 0);
Eq:out std_logic;
result:out std_logic_vector(size downto 0));
end component;
--Generating signals for Datapath, Takes status signals from datapath as input
a1:control_unit port map(
op,Eq,clk,IorD,MemWr,MemRd,IRwr,RegDst,REgWr,Extop,MemtoReg,ALUSelA,ALU
SelB,ALUop,PCSrc
);
end Behavioral;
Schematic:
Instruction Format:
One of the important things that we must decide before starting the processor design is
the instruction format and the number of instruction. For RISC processors the number of
instructions is less.
The following table indicates the decoding logic for the various opcodes.
For R-type instruction the type of instruction operation depends on the last 6-bits that is
on funct bits. The type of operation is as indicated below.
Control Unit:
The control unit is a sequential circuit in which its outputs are dependent on both its
current and past inputs. This history of past inputs is stored in the state memory and is
said to represent the state of the circuit. Thus, the circuit changes from one state to the
next when the content of the memory changes. Depending on the current state of the
circuit and the input signals, the next-state logic will determine what the next state ought
to be by changing the content of the state memory. Hence, a sequential circuit executes
by going through a sequence of states. Since the state memory is finite, therefore the total
number of different states that the circuit can go to is also finite. This is not to be
confused with the fact that the sequence length can be infinitely long. However, because
of the reason of having only a finite number of states, a sequential circuit is also referred
to as a finite-state machine (FSM).
Control unit takes inputs opcode and equal from datapath and generates required control
signals to the datapath.
State Diagram:
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity control_unit is
port(opcode:in std_logic_vector(5 downto 0);
equal:in std_logic;
clk: in std_logic;
IorD : out std_logic;
MemWr : out std_logic;
MemRd : out std_logic;
IRwr : out std_logic;
RegDst :out std_logic;
RegWr : out std_logic;
ExtOp : out std_logic;
MemtoReg :out std_logic;
ALUSelA :out std_logic;
ALUSelB : out std_logic_vector(1 downto 0);
ALUOp : out std_logic_vector(2 downto 0);
PCSrc: out std_logic
);
end control_unit;
case st is
--Fetch the instruction
when "0000" =>
IorD<='0';
MemRd<='1';
MemWr<='0';
IRWr<='1';
RegDst<='0';
RegWr<='0';
Extop<='0';
MemtoReg<='0';
ALUSelA<='0';
ALUSelB<="00";
ALUop<="000";
PCSrc<='1';
st<="0001";
--decode the instruction
end case;
--branch instruction
when "0010" =>
IorD<='0';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='0';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="001";
PCSrc<='0';
--check if both operands are equal
if equal='1' then
st<="0011";
else
st<="0000";
end if;
--Beq increment the pc by the immediate value
when "0011" =>
IorD<='0';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='1';
MemtoReg<='0';
ALUSelA<='0';
ALUSelB<="01";
ALUop<="000";
PCSrc<='1';
st<="0000";
--R-type instruction
when "0100" =>
IorD<='0';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='1';
RegWr<='0';
Extop<='0';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="100";
PCSrc<='0';
st<="0101";
--store the result of the R-type in regfile R[Rd]
when "0101" =>
IorD<='0';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='1';
RegWr<='1';
Extop<='0';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="111";
PCSrc<='0';
st<="0000";
--Or immediate instruction
--zero extend the immediate value and perform or operation
when "0110" =>
IorD<='0';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='0';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="11";
ALUop<="010";
PCSrc<='0';
st<="0111";
--store the result in regfile
st<="0000";
--Load word instruction
--calucalate the memory location address
when "1000" =>
IorD<='1';
MemRd<='1';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='1';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="11";
ALUop<="000";
PCSrc<='0';
st<="1001";
--Fetch the data from the memory location
when "1001" =>
IorD<='1';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='0';
MemtoReg<='1';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="111";
PCSrc<='0';
st<="1010";
--write the data to the regfile
when "1010" =>
IorD<='1';
MemRd<='0';
MemWr<='0';
IRWr<='0';
RegDst<='0';
RegWr<='1';
Extop<='0';
MemtoReg<='1';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="111";
PCSrc<='0';
st<="0000";
--store word instruction
--calucalate the memory location address
when "1011" =>
IorD<='0';
MemRd<='0';
MemWr<='1';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='1';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="10";
ALUop<="000";
PCSrc<='0';
st<="1100";
--write the value from busB to the memory
when "1100" =>
IorD<='1';
MemRd<='0';
MemWr<='1';
IRWr<='0';
RegDst<='0';
RegWr<='0';
Extop<='0';
MemtoReg<='0';
ALUSelA<='1';
ALUSelB<="01";
ALUop<="111";
PCSrc<='0';
st<="0000";
Schematic:
Datapath:
The datapath is responsible for the manipulation of data. It includes (1) functional units
such as adders, shifters, multipliers, ALUs, and comparators, (2) registers and other
memory elements for the temporary storage of data, and (3) buses and multiplexers for
the transfer of data between the different components in the datapath. External data can
be entered into the datapath through the data input lines.
In order for the datapath to function correctly, appropriate control signals must be
asserted at the right time. Control signals are needed for all the select and control lines for
all the components used in the datapath. This includes all the select lines for multiplexers,
ALU and other functional units having multiple operations, all the read/write enable
signals for registers and register files, address lines for register files, and memory. The
operation of the datapath is determined by which control signals are asserted and at what
time. In a microprocessor, these control signals are generated by the control unit.
In return, the datapath needs to supply status signals back to the control unit in order for
it to operate correctly. These status signals are usually from the output of comparators.
The comparator tests for a given logical condition between two values. These values can
be obtained either from memory elements, directly from the output of functional units, or
hardwired as constants. These status signals provide input information for the control unit
to determine what operation to perform next. For example, in a branch equal instruction,
the status signal will tell the control unit whether jumps or not. Since the datapath
performs all the functional operations of a microprocessor, and the microprocessor is for
solving problems, therefore the datapath must be able to perform all the operations
required to solve the given problem. Datapath design is also referred to as the register-
transfer level (RTL) design.
• ALU
• Multiplexer (32 bit and 5 bit) of 2 input,1 output
• Multiplexer (32 bit ) – 4 input ,1 output
• Register-file which consist of 32 registers, each of 32 bits.
• Memory—which consists of 32 locations, each of 32 bits long.
• Extender which does either sign-extension or zero-extension
• Some registers to store temporary data
o Instruction register—to store the instruction
o ALU_out –to store the output of alu.
o Pcreg—to store the program counter.
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
--Extender
component extender is
Port ( imm : in std_logic_vector(bsize downto 0);
extop : in std_logic;
y : out std_logic_vector(size downto 0));
end component;
component mux32 is
Port ( a : in std_logic_vector(size downto 0);
b : in std_logic_vector(size downto 0);
s : in std_logic;
y : out std_logic_vector(size downto 0));
end component;
component mux5 is
Port ( a : in std_logic_vector(raddr downto 0);
b : in std_logic_vector(raddr downto 0);
s : in std_logic;
y : out std_logic_vector(raddr downto 0));
end component;
component mux_sel is
Port ( b : in std_logic_vector(size downto 0);
c : in std_logic_vector(size downto 0);
d : in std_logic_vector(size downto 0);
s : in std_logic_vector(1 downto 0);
q : out std_logic_vector(size downto 0));
end component;
--Memory
component memory is
Port ( Memwr : in std_logic;
Memrd : in std_logic;
InstWr: in std_logic;
clk : in std_logic;
Radr : in std_logic_vector(size downto 0);
Wradr : in std_logic_vector(size downto 0);
Instradr: in std_logic_vector(size downto 0);
instr_data:in std_logic_vector(size downto 0);
data_in : in std_logic_vector(size downto 0);
data_out : out std_logic_vector(size downto 0));
end component;
--Instruction Register
component instr_reg is
Port ( a : in std_logic_vector(size downto 0);
IRWr:in std_logic;
clk : in std_logic;
inst : out std_logic_vector(size downto 0));
end component;
component shft is
Port ( a : in std_logic_vector(31 downto 0);
q : out std_logic_vector(31 downto 0));
end component;
component ALUout is
Port ( a : in std_logic_vector(size downto 0);
q : out std_logic_vector(size downto 0));
end component;
component instr_spilt is
Port ( instr : in std_logic_vector(31 downto 0);
op:out std_logic_vector(5 downto 0);
Rs : out std_logic_vector(4 downto 0);
Rt : out std_logic_vector(4 downto 0);
Rd : out std_logic_vector(4 downto 0);
func:out std_logic_vector(5 downto 0);
imm16 : out std_logic_vector(15 downto 0));
end component;
component ALU_control is
Port ( func : in std_logic_vector(5 downto 0);
ALU_op : in std_logic_vector(2 downto 0);
ALU_ctr : out std_logic_vector(2 downto 0));
end component;
component pcreg is
port( pcin :in std_logic_vector(size downto 0);
PcSrc: in std_logic;
clk: in std_logic;
pcout : out std_logic_vector(size downto 0));
end component;
begin
a:pcreg port map (alu_out,PcSrc,clock,pc);
s0:mux32 port map(pc,aluout_r,IorD,m1);
s1: memory port
map(Memwr,MemRd,InstrWr,clock,m1,aluout_r,Instradr,Instr_data,busB,dout);
s2: instr_reg port map(dout,IRWr,clock,instr);
s3: instr_spilt port map(instr,op,Rs,Rt,Rd,func,imm16);
s4: mux5 port map(Rt,Rd,RegDst,Rdd);
s5: regfile port map (Rs,Rt,Rdd,Regwr,clock,busW,busA,busB);
s6: mux32 port map (aluout_r,dout,MemtoReg,busW);
s9: extender port map(imm16,ExtOp,ext_out);
s10: mux32 port map(pc,busA,ALUSelA,alu_ina);
s11: shft port map(ext_out,ext_s_out);
s12: mux_sel port map(busB,ext_s_out,ext_out,ALUSelB,alu_inb);
s13: ALU_control port map(func,ALUop,aluctr);
s14: alu port map(alu_ina,alu_inb,aluctr,Eq,alu_out);
s15: aluout port map (alu_out,aluout_r);
s18:result<=aluout_r;
end Behavioral;
Schematic of Datapath:
Simulation:
ALU
ALU is the major block in the datapath that performs all the required operations.
ALUctr Operation
000 Add
001 Subtract
010 Or
011 And
100 Nop
In implementing the ALU, we are using VHDL functions.
ALU works without any clock.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity alu is
Port ( a : in std_logic_vector(size downto 0);
b : in std_logic_vector(size downto 0);
aluctr : in std_logic_vector(2 downto 0);
equal : out std_logic;
result : out std_logic_vector(size downto 0));
end alu;
begin
process(a,b,aluctr)
variable t:std_logic_vector(size downto 0); --temporary variable
variable r :std_logic_vector(size downto 0);--result
begin
t:=a-b;
case t is
end case;
case aluctr is
end Behavioral;
Schematic:
Simulation:
ALU Controller:
It generates the control signals for ALU depending on ALU-op. It works without the
clock.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity ALU_control is
Port ( func : in std_logic_vector(5 downto 0);
ALU_op : in std_logic_vector(2 downto 0);
ALU_ctr : out std_logic_vector(2 downto 0));
end ALU_control;
begin
process(func,ALU_op)
begin
case ALU_op is
end process;
end Behavioral;
Schematic:
Memory:
Memory here is consisting of 32 locations and each is of 32 bit long. It has both read and
write signals. There are other signals for writing instructions into the memory. Memory
read operation doesn’t require any clock edge while memory write requires an active
clock edge.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity memory is
Port ( Memwr : in std_logic;
Memrd : in std_logic;
InstWr: in std_logic;
clk : in std_logic;
Radr : in std_logic_vector(size downto 0);
Wradr : in std_logic_vector(size downto 0);
Instradr: in std_logic_vector(size downto 0); --address to write the instruction
instr_data:in std_logic_vector(size downto 0);--Instruction or data to be written
data_in : in std_logic_vector(size downto 0); --Data input to memory
data_out : out std_logic_vector(size downto 0));--data out from memory
end memory;
(x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00
000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"0000000
0",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"0
0000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"000000
00",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000");
begin
process(clk,InstWr,Instradr,Instr_data,Radr)
begin
end if;
Schematic:
In the above schematic “Instradr” and “Instr_data” are omitted as, it gives synthesis
problem because of two clocks.
Regfile:
Like, memory it also doesn’t require clock to read, but requires clock to write to the
register. Here Regfile of 32 register, each of 32 bits long is implemented. Regwr should
be high, when writing to the register.
Entity:
VHDL Code:
use work.cpu_lib.all;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity regfile is
Port ( Ra : in std_logic_vector(4 downto 0);
Rb : in std_logic_vector(4 downto 0);
Rw : in std_logic_vector(4 downto 0);
RegWr : in std_logic;
clk : in std_logic;
busW:in std_logic_vector(size downto 0);
busA : out std_logic_vector(size downto 0);
busB : out std_logic_vector(size downto 0));
end regfile;
(x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00
000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"0000000
0",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"0
0000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"000000
00",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000",x"00000000");
begin
process(Ra,Rb,Rw,RegWr,clk,busW)
variable addr_a,addr_b,addr_w:integer;
begin
addr_a:=CONV_INTEGER(Ra);
addr_b:=CONV_INTEGER(Rb);
addr_w:=CONV_INTEGER(Rw);
busA<=regarray(addr_a); --Read R[Ra]
busB<=regarray(addr_b);--Read R[Rb]
end Behavioral;
Schematic:
Here two rams are resulted because we are using three address to address 32
register.
Mux-32:
Mux-32 has two 32 bit input and one select line, to output one of them. This doesn’t
require any clock edge.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity mux32 is
Port ( a : in std_logic_vector(size downto 0);
b : in std_logic_vector(size downto 0);
s : in std_logic;
y : out std_logic_vector(size downto 0));
end mux32;
begin
process(a,b,s)
begin
case s is
when '0' =>--select a input
y<=a;
when '1' =>--select b input
y<=b;
when others =>
y<="00000000000000000000000000000000";
end case;
end process;
end Behavioral;
Schematic:
Mux 5:
It has two 5 bit inputs and one select line, to select the output. It works without any clock
edge.
Entity:
VHDL Code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity mux5 is
Port ( a : in std_logic_vector(4 downto 0);
b : in std_logic_vector(4 downto 0);
s : in std_logic;
y : out std_logic_vector(4 downto 0));
end mux5;
architecture Behavioral of mux5 is
begin
process(a,b,s)
begin
case s is
when '0' =>
y<=a;
when '1' =>
y<=b;
when others =>
y<="00000";
end case;
end process;
end Behavioral;
Schematic:
Mux_sel:
It takes four input and two select lines(ALUSelB) and outputs one of them. It doesn’t
require any clock edge. One input is the default value of ‘1’ required to increment pc.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity mux_sel is
Port (
b : in std_logic_vector(size downto 0);
c : in std_logic_vector(size downto 0);
d : in std_logic_vector(size downto 0);
s : in std_logic_vector(1 downto 0);
q : out std_logic_vector(size downto 0));
end mux_sel;
begin
process(b,c,d,s)
begin
case s is
when "00" =>--output the value 1—required to increment pc
q<="00000000000000000000000000000001";
when "01" =>--output the value b—value on busB
q<=b;
when "10" =>--output the value c—connected to the output of shifter.
q<=c;
when "11" =>--output the value d—connected to the output of extender
q<=d;
when others =>
q<="00000000000000000000000000000000";
end case;
end process;
end Behavioral;
Schematic:
PC Register:
It is used to hold the value of pc (program counter). PcSrc, signal is used to indicate when
to write the next value of pc. It works on rising edge of the clock.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity pcreg is
port(pcin:in std_logic_vector(size downto 0);
PcSrc: in std_logic;
clk: in std_logic;
pcout: out std_logic_vector(size downto 0));
end pcreg;
begin
process(clk)
begin
if clk'event and clk='1' then—wait for rising clock edge
if PcSrc='1' then—to know when to load next pc value
pc<=pcin;
end if;
end if;
end process;
end Behavioral;
Schematic:
Instruction Register:
The instruction fetched during the begging of the instruction execution must be held until
the execution of the instruction is completed. This is done by instruction register. This
writes the value from memory when IRwr=’1’ and on the falling clock edge.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity instr_reg is
Port ( a : in std_logic_vector(size downto 0);
IRWr:in std_logic;
clk : in std_logic;
inst : out std_logic_vector(size downto 0));
end instr_reg;
begin
if clk'event and clk='0' then –check for falling clock edge.
If IRWr='1' then—condition for writing to the register
instrreg<=a;
end if;
end if;
end process;
inst<=instrreg;--output the value present in the register.
end Behavioral;
Schematic:
Instruction spilt:
It continuously takes the instruction from instruction register and splits them to various
parts as given in the instruction format. It doesn’t require a clock.
Entity:
VHDL Code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity instr_spilt is
Port ( instr : in std_logic_vector(31 downto 0);
op:out std_logic_vector(5 downto 0);
Rs : out std_logic_vector(4 downto 0);
Rt : out std_logic_vector(4 downto 0);
Rd : out std_logic_vector(4 downto 0);
func:out std_logic_vector(5 downto 0);
imm16 : out std_logic_vector(15 downto 0));
end instr_spilt;
begin
end Behavioral;
Schematic:
Extender:
This extenders either the given immediate 16 bit value to 32 bit either as sign extension
or as zero extension depending on the value of Extop (1—for sign extension and 0 –for
zero extension). It works without any clock. The output from extender goes as input to
mux_sel and to the shifter.
Entity:
VHDL Code:
use work.cpu_lib.ALL;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity extender is
Port ( imm : in std_logic_vector(bsize downto 0);
extop : in std_logic;
y : out std_logic_vector(size downto 0));
end extender;
begin
process(imm,extop)
variable sign:std_logic;--sign of the given input
variable t:std_logic_vector(size downto 0);--temporary variable
begin
case extop is
when '0' =>--zero extension
y(size downto bsize+1)<="0000000000000000";
y(bsize downto 0)<=imm;
when '1' => --sign extension
sign:=imm(bsize);
y<=t;
when others =>
y<="00000000000000000000000000000000";
end case;
end process;
end Behavioral;
Schematic:
Implementation(Cyclone EP1C6):
The processor designed above is implemented on cyclone (EP1C6) board. The following
would discuss this device.
EP1C6 Devices:
The EP1C6, a member of the Cyclone device family, provides 6,523 registers; 92,160
memory bits; and 5,980 logic elements. The Cyclone device meets the low-voltage
requirements of 1.5-V applications and supports multiple I/O standards including LVDS
(low-voltage differential signaling), LVTTL, LVCMOS, PCI, SSTL-3 Class I & II,
and SSTL-2 Class I & II.
The EP1C6 is available in 144-pin TQFP packages with 92 I/O pins, 240-pin QFP
packages with 181 I/O pins, and 256-pin FineLine BGA packages (See Note (12)). The
device has 5,980 logic elements grouped into 598 LABs. These LABs are arranged into
20 rows and 32 columns. The embedded memory consists of one column of M4K
memory blocks, containing a total of 92,160 RAM bits. Each M4K block can implement
shift registers and various types of memory with or without parity bits, including dual-
port, true dual-port, and single-port RAM, ROM, FIFO buffers, and shift registers.
Each I/O element contains a bidirectional I/O buffer and three registers for complete
embedded bidirectional single data rate transfer. The I/O element contains individual
input, output, and output enable registers. The input register provides fast setup times, the
output register provides fast clock-to-output times, and the output enable register
provides fast clock-to-output enable times. The EP1C6 also contains four dedicated clock
pins and eight dual-purpose clock pins for large fan-out control signals. In addition, the
EP1C6 contains two phase-locked loops (PLLs), which provide general purpose clocking
with clock multiplication and phase shifting as well as high-speed outputs for high-speed
differential I/O support.
The EP1C6 also supports ICR and JTAG BST. The EP1C6 JTAG Instruction Register
length is 10; the Boundary-Scan Register length is 582; and the JTAG ID code is
0x020820DD.
The following table displays the pin-out information for EP1C6 devices:
Dedicated
25 nCEO - 1 IOC_X0_Y11_N1 H4 32 20
Programming
Dedicated
26 nCE - 1 IOC_X0_Y11_N2 J4 33 21
Programming
Dedicated
27 MSEL0 - 1 IOC_X0_Y10_N0 J3 34 22
Programming
Dedicated
28 MSEL1 - 1 IOC_X0_Y10_N1 J2 35 23
Programming
Dedicated
29 DCLK - 1 IOC_X0_Y10_N2 K4 36 24
Programming
Row I/O 30 ASDO 19 1 IOC_X0_Y9_N0 K3 37 25
Row I/O 31 PLL1_OUTp 19 1 IOC_X0_Y8_N0 J1 38 26
Row I/O 32 PLL1_OUTn 19 1 IOC_X0_Y8_N1 K2 39 27
Row I/O 33 - 45 1 IOC_X0_Y7_N0 L3 41 -
Row I/O 34 LVDS6p 45 1 IOC_X0_Y7_N1 K1 42 -
Row I/O 35 LVDS6n 45 1 IOC_X0_Y7_N2 L1 43 -
Row I/O 36 LVDS5p 45 1 IOC_X0_Y6_N0 L2 44 -
Row I/O 37 LVDS5n 45 1 IOC_X0_Y6_N1 M1 45 -
Row I/O 38 LVDS4p 45 1 IOC_X0_Y5_N0 N1 46 -
Row I/O 39 LVDS4n 45 1 IOC_X0_Y5_N1 M2 47 -
Row I/O 40 LVDS3p/DQ0L4 45 1 IOC_X0_Y4_N0 N2 48 - DQ
Row I/O 41 LVDS3n/DQ0L5 45 1 IOC_X0_Y4_N1 M3 49 - DQ
Row I/O 42 DPCLK0/DQS1L 45 1 IOC_X0_Y4_N2 L5 50 28
Row I/O 43 LVDS2p/DQ0L6 45 1 IOC_X0_Y3_N0 M4 53 - DQ
Row I/O 44 LVDS2n/DQ0L7 45 1 IOC_X0_Y3_N1 N3 54 - DQ
X0Y3SUB_LOC2 45 VREF2B1 - 1 IOC_X0_Y3_N2 K5 55 31
Row I/O 46 - 45 1 IOC_X0_Y2_N0 L4 56 32
Row I/O 47 LVDS1p 45 1 IOC_X0_Y2_N1 R1 57 33
Row I/O 48 LVDS1n 45 1 IOC_X0_Y2_N2 P2 58 34
Row I/O 49 LVDS0p 45 1 IOC_X0_Y1_N0 P3 59 35
Row I/O 50 LVDS0n 45 1 IOC_X0_Y1_N1 N4 60 36