Design and Performance Analysis of 8-Bit RISC Processor Using Xilinx & Microwind Tool

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

International Journal of Advances in Science and Technology, Vol. 4, No.

2, 2012

Design and Performance Analysis of 8-bit RISC Processor using Xilinx & Microwind Tool
R.Uma
(Research Scholar, Department of Computer Science, Pondicherry University, Pondicherry Email: [email protected])

Abstract
RISC or Reduced Instruction Set Computer is a design philosophy that has become a mainstream in Scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices permits complex logic systems to be implemented on a single programmable device. In FPGA the design is hardwired whereas in ASIC based implementation the design has the flexibility for minimizing the gate count and delay. So the main objective of this paper is to design and implement an 8-bit Reduced Instruction Set (RISC) processor using XILINX tool and microwind tool and its performance is analyzed. The important feature of this processor is very simple and support load/store architecture. The important components of this processor include the Arithmetic Logic Unit, Shifter, Rotator and Control unit. The module functionality and performance issues like area, power dissipation and propagation delay are analyzed at 90 nm process technology using SPARTAN 3E XCS500E XILINX tool for FPGA and microwind tool for ASIC design.

Keywords: RISC, 90 nm technology, Load/store architecture, Pipeline, Uniform bit-stream. 1. Introduction


Nowadays, computers are indispensable tools for most of everyday activities. With the rapid development of the silicon technology and the decreasing cost of the integrated circuit, RISC processor is increasing widely used in every field. RISC is an extension of the architecture principles of the Reduced Instruction Set Computer (RISC). The simple design provides exceptional performance and is ideal for use in a broad family of cost-effective, compatible systems. Some typical applications include: commercial data processing, computation-intensive scientic and engineering applications, and real-time control. The main features of RISC processor are the instruction set can be hardwired to speed instruction execution. No microcode is needed for single cycle execution. All instructions are one word (fixed bit) in length. This simplies the instruction fetch mechanism since the location of instruction boundaries is not a function of the instruction type. The processor has small number of addressing modes. Only load and store instructions access memory. There are no computational instructions that access memory; load/store instructions operate between memory and a register. Control hardware is simplied and the machine cycle time is minimized. The instructions were designed to be easily divisible into parts. This and the xed size of the instructions allow the instructions to be easily piped. RISC provides a exible and expandable architecture that maximizes performance from any given semiconductor technology. RISC includes extensions to RISC concepts that help achieve given levels of performance at signicantly lower cost than other systems. In the present work, the design of an 8-bit data width Reduced Instruction Set Computer (RISC) processor is presented; it was developed with simplicity and implementation efficiency in mind. It has a complete instruction set, program and data memories, general purpose registers and a simple Arithmetical Logical Unit (ALU) for basic operations. In this design, most instructions are of uniform length and similar structure, arithmetic operations are restricted to CPU registers and only separate load and store instructions access memory. The Instruction cycle consists of three stages namely fetch, decode and execute. After every instruction fetch, Control Unit generate signals for the

February Issue

Page 37 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012 selected Instruction. The architecture supports 16 instructions to support Arithmetic, Logical, Shifting and Rotational operations. The remainder of this paper is organized as follows. Section 2 explains the architecture detail of 8-bit RISC processor. Section 3 presents the design module of ALU, Control unit and general purpose registers both in FPGA and ASIC. Section 4 presents the simulation results implemented in advanced 90nm process technology and FPGA implementation. Section 5 discusses summary with the implementation of the RISC design topology. The final section presents the conclusion.

2. Architecture of 8-bit RISC Processor


The architecture of an 8-bit RISC processor is shown in Figure (1). This architecture consists of arithmetic logic unit, control unit, shifter and rotator. The processor is designed with load/store (Von Neumann) architecture .One shared memory for instructions (program) and data with one data bus and one address bus between processor and memory. Instruction and data are fetched in sequential order so that the latency incurred between the machine cycles can be reduced. Three stages of pipelining have been incorporated in the design which increases the speed of operation. The pipelining stages are fetch, decode and execute. In fetch, the instruction and the necessary data are drawn from the memory. Whereas in decode, the instruction and data that are drawn from the memory are separated activating the components and the data path to execute. And finally in execution, the instruction is performed, the data is manipulated and the result is stored.

CONTROL UNIT

INSTRUCTION REGISTER

INSTRUCTION DECODER REGISTER A UNIVERSAL SHIFT REGISTER A L U ACCUMULATOR BARALLEL SHIFT REGISTER

REGISTER B

Figure 1. Architecture of 8-bit RISC processor

The control unit reads the opcode and instruction bits and then creates control signals as outputs that triggers the respective components and data path to perform the desired task. The control unit has two instruction decoders that decodes the instruction bits and the decoded output of the control unit is fed as control signal either into Arithmetic logic unit (ALU) or Universal shifter or Barrel shift rotator. The operands are received from register A and register B by the ALU. Depending on the control signal from the control unit the ALU performs either arithmetic or logic operations. After the execution of the instruction, the result is stored in the accumulator register. Input is taken from source register A and is either loaded or shifted in right or left direction based on the control lines activated by the control unit. The shifted data is saved in the destination register which is nothing but the accumulator register. Input data is given from source register A and rotated N number of times based on the opcode fed from the control unit. The rotated data is stored in the accumulator register.

February Issue

Page 38 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012

3. Modules design of 8-bit RISC processor


This section presents the design of different modules like control unit, ALU, Universal shift register, barrel shifter register and general purpose registers. A. Control unit Control unit is designed using finite state machine as depicted in Figure (2).The state machine is designed to perform the logical, arithmetic, shifting and rotate operations. For example, if the instruction bit is 0010 the operation performed is NOR operation and the next consecutive opcode is 1001 then it remains in the same state or else it will have a transition to the next state depending on the opcode it receives. The overall operation is shown in Table (1).The control unit consists of two decoders in which the first decoder performs logical and arithmetic opcode generation and the second decoder performs shifting and rotating opcode generation. The top block of the decoder circuit are shown in Figure(3a & 3b). The circuit is simulated in microwind and Xilinx environment and its simulation results are shown in Figure (4).

Figure 2. State Diagram of Controller

Figure 3a. Top block of Controller in FPGA

Table1. Operation of control unit


OUTPUT OF DECOCDER SELECT LINES S3 S2 S1 S0 FUNCTION PERFORMED d 6 0 0 0 0 0 0 1 0 Z 6 0 0 0 0 0 0 1 0 d 7 0 0 0 0 0 0 0 1 Z 7 0 0 0 0 0 0 0 1 OPERATION

d 0 1 0 0 0 0 0 0 0 Z 0 1 0 0 0 0 0 0 0

d 1 0 1 0 0 0 0 0 0 Z 1 0 1 0 0 0 0 0 0

d 2 0 0 1 0 0 0 0 0 Z 2 0 0 1 0 0 0 0 0

d 3 0 0 0 1 0 0 0 0 Z 3 0 0 0 1 0 0 0 0

d 4 0 0 0 0 1 0 0 0 Z 4 0 0 0 0 1 0 0 0

d 5 0 0 0 0 0 1 0 0 Z 5 0 0 0 0 0 1 0 0

0 0 0 0 0 0

0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 1 S3 S2 S1 S0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1

AND NAND NOR OR XOR XNOR SUB ADD

NOT NO CHANGE SHIFT-RIGHT SHIFT-LEFT ROTATE 1-BIT ROTATE 3-BIT ROTATE 5-BIT ROTATE 7-BIT

February Issue

Page 39 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012

Figure 3b. Top block of Controller in ASIC

Figure 4. Timing diagram of control unit in FPGA and ASIC

B. Arithmetic Logic Unit The ALU design comprises of 2 units. One unit is meant for logic operation containing eight bit logic gates such as AND,NAND,OR,NOR,XOR,XNOR and the other unit is meant for arithmetic operations such as ADD and SUBTRACT. In arithmetic unit, based on the control input Cin the Add and Subtract operations take place. For Cin low, addition of the given input data is performed whereas for Cin high subtraction performed. The entire design of the ALU in FPGA and ASIC is represented in Figure (5a & 5b) and the internal submodule of arithmetic unit is shown in Figure (6a & 6b) and the simulated timing waveform for arithmetic unit and CLA using microwind and xilinx tool is shown in Figures (7 and 7.1 ) .

Figure 5a. Top block of 8 bit arithmetic and logic unit in FPGA

Figure 5b. Top block of 8 bit arithmetic and logic unit in ASIC

Figure 6. Carry look ahead adder/subtractor in FPGA and ASIC

February Issue

Page 40 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012

Figure 7. Simulated timing diagram of ALU in FPGA and ASIC

Figure 7.1 Simulated timing diagram of carry look ahead in FPGA and ASIC

C. Universal Shift Register The Universal shift register is designed with features such as loading, right shift, left shift and no change. The design has eight 4x1 multiplexers and nine basic gates and is shown in the Figure (8) for FPGA and ASIC. Loading the input is attained by applying eight bits of data as input with control lines S0 and S1 taken as low. Right shifting takes place for the given eight bit input data with control lines S0 high and S1 low and similarly the left shift takes place for the eight bit data as input provided the control lines S0 should be low and S1 should be high. The output remains low for the control lines S0 and S1 taken high. The entire operation is represented in Table (2) and Figure (9) shows the simulated result of the universal shift register in FPGA and ASIC.

Figure 8. Top block of universal shift register in FPGA & ASIC Table 2.Operation of the universal shift register
SELECT LINES INPUT A 7 1 1 1 1 1 1 1 1 AA 6 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A 4 1 1 1 1 1 1 1 1 A 3 0 0 0 0 0 0 0 0 A A 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 Ci n 0 1 0 1 0 1 0 1 S1S0 0 0 00 01 01 10 10 11 11 OPERATION PERFORMED OUTPUT QQQQ 7 6 5 4 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 QQQQ 3 2 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Cout

0 0 1 1 0 0 0 0

Load Load Right shift Right shift Left shift Left shift No change No change

Figure 9. Simulated timing diagram of universal shift register in FPGA & ASIC

February Issue

Page 41 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012 D. Barrel Shift Rotator The design consists of a total of eight 8x1 multiplexers. The output of one multiplexer is connected as input to the next multiplexer in such a way that the input data gets shifted in each multiplexer thus performing the rotation operation. Depending on the select lines the number of rotation varies. With select lines low there is no output. If select line S0 is high 1-bit rotation takes place, if S1 is high 2-bit roation takes place and the roation continues untill all select lines are high. The rotation of the input data for different select lines is shown in Table (3) and simulated timing diagram in FPGA and ASIC is shown in Figure (11). The Figure (10) shows the top block of the barrel shift rotator in FPGA and ASIC.
Table 3. Operations of Barrel rotator
INPUT OF ROTATOR AAAAAAAA 7 6 5 4 3 2 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 S2 S1 S0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 OUTPUT OF ROTATOR Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 Zero 1 Bit Rotate 2 Bit Rotate 3 Bit Rotate 4 Bit Rotate 5 Bit Rotate 6 Bit Rotate 7 Bit Rotate
FUNCTION PERFORMED

Figure 10.Top Block Of Barrel Shift Rotator in FPGA and ASIC

Figure 11. Timing Diagram of Barrel shift rotator in FPGA and ASIC

E. General Purpose Register The eight bit input data is stored in this register. This register acts as a source register. It consists of eight D flip flops and eight AND gates. The gate level view of the register is given by Figure (12). Initially the RESET is set high to clear the register. Taking RESET as low and CLOCK as low or high and READ as high the data is stored in the register. The condition for which the data is stored in the register is clearly shown in Table (4) and simulated timing waveform in Figure (13). Table 4.Operations of general purpose register
INPUT
CLK RESET RD Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 OUTPUT Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

0 1 0

1 0 0

0 1 1

February Issue

Page 42 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012

Figure 12.Top block of general purpose register in FPGA and ASIC

Figure13. Timing diagram of general purpose register

Figure 14 Top Block of 8-bit Processor in FPGA

Figure 15. Timing Diagram of RISC processor in ASIC

Table 5. Delay Vs Area of 8-bit processor in Xilinx


Power dissipation Slices Utilized (Area) Delay(ns)
SUB BLOCKS

TOPOLOGY

10-9 W

AT2

AT

Control Unit

ALU

Universal Shift Reg Barrel Shift Reg GPR TOTAL

Decoder AND NAND NOR XOR CLA Inverter AND OR MUX MUX D-FF

6.275 5.753 5.753 5.753 5.753 7.732 6.034 6.546 7.508 6.582 7.198 6.546 77.43

8 4 4 4 4 5 4 4 14 9 16 4 80

50.2 23.012 23.012 23.012 23.012 38.66 24.136 26.184 105.11 59.238 115.16 26.184 536.92

315.0 132.3 132.3 132.3 132.3 298.9 144.6 171.4 789.1 389.9 828.9 171.4 3638

0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081

February Issue

Page 43 of 84

)
0.508 0.463 0.463 0.463 0.463 0.626 0.488 0.530 0.608 0.533 0.583 0.530 6.258

PD

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012 Table 5a. Delay Vs Area of 8-bit processor in microwind

TRANSISTOR COUNT

FALL DELAY nS 0.056 0.755 0.025 0.016 0.005 0.001 0.004 0.015 0.486 0.393 0.618 2.374

RISE DELAY nS

Control Unit

Decoder AND NAND NOR OR XOR XNOR CLA MUX MUX D-FF

158 48 32 48 32 48 48 404 504 1440 192 2954

0.062 0.028 0.019 0.008 0.006 0.008 0.020 0.490 0.759 0.408 0.628 2.436

2.6854 3.7224 7.5252 8.2032 11.899 10.699 1.8804 6.2088 15.734 15.098 17.992 101.64

ALU

Universal Shift Reg Barrel Shift Reg GPR TOTAL

F. The Instruction set format rule The instruction set of the RISC processor has been designed following several rules: All instructions are executed in just one clock cycle. Doing so, processor is simpler, smaller, faster and easier to understand. The instruction code is received at the beginning of each cycle, all operations are executed during the clock period, and results are stored at the end of it. ALU operations take two operands from registers and store the result in one of them. External read and write operations are synchronous.

4. Result
The performance of the RISC processor has been evaluated in this research work by using XILINX and Microwind tool. The design meets the need of high performance logic solution for high volume, very low cost, consumer-oriented applications. The RISC processor designed in Xilinx tool employs a multi-voltage, multi-standard SelectIO interface pins with a voltage range of 3.3V,2.5V,1.8V,1.5V and 1.2V at a 622+ Mb/s data transfer rate. It is operated at a maximum frequency range of 5MHz to 300MHz. The microwind tool integrates traditionally separated front-end and back-end chip design into an integrated flow, accelerating the design cycle and reduced design complexities. It tightly integrates mixed-signal implementation with digital implementation, circuit simulation, transistor level extraction and verification. The performance of the RISC processor using microwind tool is implemented with 0.12m CMOS technology. The simulations are carried out at conditions VDD = 1.2 V, I/O supply voltage = 2.5 V and at a room temperature of 27oC and the device model as empirical level 3 and Monte-Carlo with the MOSFET model parameter for each module as given below
*n-Mos Model *low leakage Model N1 NMOS level = 3 VTO =0.40 UO = 600.000 TOX = 2.0E-9 +LD = 0.000 THETA = 0.500 GAMMA = 0.400 +PHI =0.200 KAPPA = 0.060 VMAX = 120.00K +CGSO = 100.0p CGDO =100.0 +CGBO = 60.0p CJSW = 240.0P *p-Mos Model *low leakage

February Issue

Page 44 of 84

POWER DISSIPAION mW

TOPOLOGY

SUB BLOCKS

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012


Model P1 NMOS level = 3 VTO =0.45 UO = 200.000 TOX = 2.0E-9 +LD = 0.000 THETA = 0.300 GAMMA = 0.400 +PHI =0.200 KAPPA = 0.060 VMAX = 110.00K +CGSO = 100.0p CGDO =100.0 +CGBO = 60.0p CJSW = 240.0P

The overall design of 8-bit processor is shown in Figure 14. The simulation of overall execution of RISC processor is shown in Figure 15. The processor has two eight-bit input signals A7 - A0 and B7 B0 taken externally and loaded into registers A and B respectively. Memory Interface Signal is a signal READ (RD). This signal indicates that the selected memory location is to be read and data is to be put on the data bus. The synchronization of various operation are done using CLK signal. The processor is designed with two control signals RD and RESET. If reset is high then the processor will not perform any operation it will stay in idle state. If the reset is low and RD is high then the data is loaded into the data bus and its corresponding values are loaded into the general purpose registers A and B. Depending on the opcode provided by the control unit the particular operation is performed as stated in Table(1). This 8- bit RISC processor works on one clock cycles. clk is the external clock which is always equal to one which triggers the inputs and gives us the desired output. RD triggers the state of the registers through which data is passed into the internal registers A and B. I0 to I3 specifies the opcode to enable the operation. For example if the opcode value is 0111 then the operation performed will be addition.

4. Summary
This section presents the overall performance of the 8 bit RISC processor obtained from the Xilinx and microwind tool. Table (6) presents performance comparison of the designed processor in terms of delay, area and power dissipation.
Table 6. Overall performance of 8 bit RISC in FPGA & ASIC Delay(ns) Area Power RISC dissipation 77.43 80(slices) 6.258W FPGA 5.39 2954(gates) 101.64mW ASIC

It is observed that the overall delay of the processor is 77.43ns in FPGA and 5.39ns in ASIC. The overall power dissipation of this processor is observed to be 6.258 W in FPGA and 101.258mW in ASIC. The power dissipation can even be reduced if the circuit is designed with any adiabatic logic.

4. Conclusion
An 8-bit RISC processor with 16 instruction set has been designed. Every instruction is executed in one clock cycles with 3-stage pipelining. The design is verified through exhaustive simulations. The performance analysis is compared with Xilinx and microwind tool. This processor can be used as a systolic core to perform mathematical computations like solving polynomial and differential equations. Apart from that this can be used in portable gaming kits.
REFERENCES

[1] Samiappa Sakthikumaran,S.Salivahanan and V.S.Kaanchana Bhaaskaran , June 2011, 16-Bit RISC Processor Design For Convolution Application,IEEE International Conference on Recent Trends In Information Technology, pp.394-397. [2] Rohit Sharma, Vivek Kumar Sehgal, Nitin Nitin1, Pranav Bhasker, Ishita Verma , 2009, Design And Implementation Of 64- Bit RISC Processor Using VHDL,UKSim : 11th International Conference on Computer Modeling And Simulation, pp. 568 573.

February Issue

Page 45 of 84

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.2, 2012 [3]Rupali S. Balpande and Rashmi S. Keote.2011, Design of FPGA based Instruction Fetch & Decode Module of 32-bit RISC (MIPS) Processor, International Conference on Communication Systems and Network Technologies pp. 409 413 [4]Sivarama P.Dandamudi ,A Guide To RISC Processor For Programmers And Engineers, Springer. [5]Tom Wada, Small Risc Processor (SPR) design specification v1.0, 12th Design Contest In OKINAWA, pp. 1-17 [6]Seung PyoJung, Jingzhe Xu, Donghoon Lee, Ju Sung Park, 2008, Design And Verification Of 16 Bit RISC Processor , International SOC Design Conference. 7] Xiaoping Huang,Xiaoya Fan, Shengbing Zhang , 2008,Design and Performance Analysis of One 32-bit Dual Issue RISC Processor for Embedded Application. [8]R. N. Noyce and M. E. Hoff, A History of Microprocessor Development at Intel, IEEE Micro, vol.1, no.1, 1981, pp.8-21. [9] J.L.Hennessy, "VLSI Processor Architecture," IEEE Trans. Computers, vol. C-33, no. 12, Dec. 1984, pp. 1221-1246. [10] John L. Hennessy, and David A. Patterson, Computer Architecture A Quantitative Approach, 4th Edition ; 2006. [11].Vincent P. Heuring, and Harry F. Jordan, Computer Systems Design and Architecture, 2nd Edition, 2003. [12].Wayne Wolf, FPGA-Based System Design , Prentice Hall, 2005.

AUTHOR PROFILE She is graduated B.E (EEE) from Bharathiyar University Coimbatore in the year 1998, Post graduated in M.E (VLSI Design) from Anna University Chennai in the year 2004. Currently she has been working as Assistant Professor in Electronics and Communication Engineering, Rajiv Gandhi College of Engineering and Technology, Puducherry. She has been teaching VLSI Design, Embedded Systems, Microprocessor and Microcontrollers for PG and UG students. She authored books on VLSI Design. She has published several papers on national conference and symposium. She is the guest faculty for Pondicherry University for M.Tech Electronics. she has been actively guiding PG and UG students in the area of VLSI, Embedded and image processing. She has received the best teacher award for the year 2006 and 2007. Her research interests are Analog VLSI Design, Low power VLSI Design, Testing of VLSI Circuits, Embedded systems and Image processing. She is a member of ISTE.

February Issue

Page 46 of 84

ISSN 2229 5216

You might also like