On Edge project

"On Edge" DLX Computer Project

Objective

Methodology

Block Diagrams

Processor Simulation

Using the Project SOF file on the Altera UP1 Board

Downloads Available:

Cmpe 4500 Report(Word 97 Format--viewer available from Microsoft Corporation)
DLX Altera VHD Project Files and SOF
Tim DLX Source and Bubble Sort

The main objective for the project was to produce a fully pipelined DLX processor. This objective was broken into many smaller pieces for simplification. The first milestone was to produce a machine language version of the bubble sort algorithm, since the processor can only read machine code. Once this was accomplished, the next milestone was to produce a block diagram of the architecture. The following milestone was to write a small program that would take the machine code from the first milestone and import it into the source code for the processor. A test program was then written in Assembly language, converted into machine language, and inserted into the processor. The next milestone was to successfully simulate an ADD instruction. Once this was completed, the full bubble sort program was successfully simulated. After simulation, the last milestone was to load the source code onto a programmable gate array and successfully emulate the program on the chip.

The next step was to enhance the DLX processor with two more modules. To serve that purpose, another level was added to the fully pipelined DLX processor. This level consisted of three different modules. The central module was TOP-DLX, which was responsible for assembling the TOP-SPIM module (written in the first part of the project) with the two additional modules. The first of those modules was a keyboard component. The keyboard module is an acquisition module for a PS/2 Keyboard.

Along with the keyboard, there was the addition of a video module, in which memory is allocated in the form of RAM.

Methodology

Instruction set

Before the program could be written, an instruction set had to be defined. This set was specified to be a subset of the DLX instruction set. Once the instruction set was defined, the instruction format was created. Figure 1 is a summary of the instructions used in the bubble sort program. It resembles the format of the DLX architecture.

ADD

        ADD: DEF OPSPECIAL,BLANK5,BLANK5,BLANK5,B#00000,FADD
       OPSPECIAL: EQU Q#00
        ADD: EQU Q#40

Ex : ADD R1, R2, R3
Adds R2 and R3 in R1

0 5 6 10 11 15 16 20 21 25 26 31

reg dest=R1

operand=R2

operand=R3

00000

ADDI

ADDI: DEF OPADDI,BLANK5,BLANK5,BLANK16
OPADDI: EQU Q#06

Ex : ADDI R1, R2, H#0003
Adds R2 and value at H#0003 in R1

0 5 6 10 11 15 16 31

reg dest=R1

reg src=R2

Immediate

LW and SW

        LW: DEF OPLW,BLANK5,BLANK5,BLANK16
        SW: DEF OPSW,BLANK5,BLANK5,BLANK16
        OPLW: EQU Q#43
        OPSW: EQU Q#53

EX : LW R1, R2, H#0000

0 5 6 10 11 15 16 31

reg dest=R1

reg src=R2

immediate

BEQZ and BNEZ

        BEQZ: DEF OPBEQZ,BLANK5,NULL5,BLANK16
        BNEZ: DEF OPBNEZ,BLANK5,NULL5,BLANK16
        OPBEQZ: EQU Q#04
        OPBNEZ: EQU Q#05

Ex : BEQZ R2, 06
Jumps to instruction 06 if value in R2 = 0

0 5 6 10 11 15 16 31

reg used=R2

00000

instruction address

J and JAL

        J: DEF OPJ,26VB#00000000000000000000000000
        JAL: DEF OPJAL,26VB#00000000000000000000000000
        OPJ: EQU Q#02
        OPJAL: EQU Q#03
Ex : J <value>
        Unconditional jump to instruction <value>

0 5 6 31

value

JAL test
Jumps to instruction labeled as test

0 5 6 31

test

JR: DEF OPJR,BLANK5,NULL16,B#01000
OPJR: EQU Q#20
Ex: JR R1

0 5 6 10 11 26 31

reg used=R1

0000000000000000

01000

SLL

        SLL: DEF OPSPECIAL,BLANK5,BLANK5,BLANK5,BLANK5,FSLL
        OPSPECIAL: EQU Q#00
        FSLL: EQU Q#04

Ex : SLL R2
Shifts the value in register 2 by 2, therefore multiplying by 4

0 5 6 10 11 26 31

reg used=R2

0000000000000000

SLT

         SLT: DEF OPSPECIAL,BLANK5,BLANK5,BLANK5,B#00000,FSLT
        OPSPECIAL: EQU Q#00
        FSLT: EQU Q#52

EX : SLT R3,R2,R1
Sets R3 to be zero if R2 is less than R1

0 5 6 10 11 15 16 20 21 25 26 31

reg used=R3

operand=R2

operand=R3

00000

NOP

NOP: DEF OPADDI,BLANK5,BLANK5,BLANK16
OPNOP: EQU Q#77

0 5 6 10 11 15 16 31

00000

0000000000000000

Figure 1. Summary of instruction formats used in bubble sort

The addition of a video and keyboard module had repercussions on many stages of the program, including the instruction set. At first, there was a need for a more complete instruction set. However, only one more instruction was necessary, known as PutChar. Here is the format of this instruction as well as its usage:

0 5 6 10 11 15 16 23 24 31

101100

Char-Mode

Reg value

0000000

00000000

The instruction format is : OpCode, Blank5, Blank5, 00000000, Blank8
The format of the instrcution is similar to other instructions so that forwarding is implemented the same way..
The Opcode is defined as 101100. The next 5 bits correspond to the selection of the type of outputs. Char_Mode is a 5 bit signal that enables us to select the type. (Only 2 bits are used at the moment, but more may be required later on).
Bits 12 to 16 give the value of the register that needs to be printed to the screen is located.

Types of outputs:

00 : In this case, the most significant hex digit of the register passed will be outputted to the screen. ( There is the assumption that we are displaying two hex numbers).

01 Now, the least least significant hex digits are selected, and outputted to the screen.

10 Used when there is a need to write a character passed by the compiler (at the assembly level). This would select the last 8 bits from the instruction. When the first 2 bits are ‘00’, the remaining 6 bits are the ascii code for the character that will be outputted. When ‘01’, this corresponds to the enter key, therefore there will be no ouput to the screen. When ‘10’, a ‘backspace’ key is passed, so there will be no output either.

11 This is never suppose to be passed, but assigning 00000000 to avoid powering up in an unknown state.

Block Diagrams

The next milestone was to draw a block diagram of the system based on the DLX diagrams in Patterson and Hennesey. This proved to be a vital part of the project. Due to the complexity of the final diagram, the schematic is provided in different stages. The first figure is the simplest version of the architecture, showing the processor as a single stage (non-pipelined) version. It shows the basic components that make up the processor, including the instruction memory, the register files, the ALU, and the data memory. The following block diagram shows the same processor broken into five stages. The control unit was also added to show the different control signals controlling the various units, such as the ALU and the data memory. The hazard detection unit was added to the decode unit and is responsible for determining whether possible hazards exist. Although hazard detection is one of the disadvantages to running a pipelined system, it is possible that a result is needed in an instruction register value that has not been written to the register yet. In this case, the hazard detection unit would, if possible, forward the correct value; otherwise, it would stall the pipeline long enough to avoid the problem. The last two diagrams are very similar, but the flushing signals added to the last block diagram shows which units had to be stalled when the forwarding unit could not resolve the hazard. This occurred in two instances: whenever the instruction is a branch, or whenever a LW instruction was followed by an instruction needing that value. For example, LW R5, R2, H0002 followed by ADD R1, R5, R6 would cause a hazard. In the case of the branch, the branch is not determined immediately. By the time it is determined, the next instruction has already been loaded into the processor. The control signals are responsible for flushing that instruction and correctly branching to the new instruction. Originally, the branch was determined in the execute stage of the pipeline; therefore, branch flushing would cause two instructions to be flushed from the pipeline. By calculating the branch address in the decode stage, only one instruction would have to be flushed. In the current implementation, the need for forwarding is determined in the decode stage; however, the forwarding values are not needed until later stages. Because the branch address is now calculated in the decode stage, it was necessary to provide forwarding for the branch and zero calculation in the decode module.

The hazard detection and forwarding units are also needed to prevent problems with the LW instruction. Because the value from the LW is not determined until the instruction reaches the data memory, the instruction following the LW must wait until the value has been loaded from memory. This requires a stall after the LW, which must be implemented by the compiler. Since this project did not include a compiler, it was inserted manually. This last block diagram represents the final version of the block diagram, which includes the forwarding unit. All result signals are routed back to the ALU in case they were needed for calculations.

With the addition of the keyboard and video modules, a third level was added to the DLX processor. The combination of these two modules with TOP-DLX enables the input of characters through the processor, and visualization of the characters on a screen. Here is a representation of the project with the three different levels.

Implementation

After the instruction set was defined, the bubble program, which was given in the programming language C (see Figure 2), had to be written in Assembly language.

            Int v[111 102 14 –13 17 6 5 0];
            Sort(int v[], int n)
            {
                int I, j;
                for (I = 0; I < n ; I ++)
                {
                    for (j = I – 1 ; j >= 0 && v[j]>v[j+1] ; j --)
                      {
                            swap(v,j);
                       }
                }
              }

Figure 2. C code for sort program

Using the example in "Computer Architecture A Quantitative Approach" by Patterson and Hennessy, the bubble sort program was written in Assembly language. In order to convert the assembly language file into machine language, a macro file and a meta-assembler program called TIM were used. The meta-assembler creates a machine language with instruction program counters using any assembly language source file.

It was then necessary to write a program that would convert the machine code produced by TIM into the instruction code for the processor. This was done using the Progress 4GL programming language. The "convmif.p" program converts the "sort.lst" file generated from TIM into an Altera formatted MIF (Memory Initialization File). The MIF was then used to initialize the ROM defined in the Instruction Fetch module. This program provided an easy way to write test programs, run them through the TIM meta-assembler, and convert them into MIF files for the Altera program.

This program provided the instruction set that was needed in the MIF format. All instructions starting with a ‘B’ are PutChar instructions, instructions that were added to test the functionality of the third level of the DLX processor. Here is the link for the instruct.mif file.

Processor Simulation

Using the instruction set for the bubble sort algorithm, the VHDL code was compiled and simulated.

This waveform simulation shows that the memory locations 1-8 are initialized to 6F, 66, 0F, F3, 11, 06, 05 and 00 respectively. The intermediary clock cycles show the swapping of the memory locations. For example, after a number of clock cycles, 6F and 66 are swapped because 66 is smaller than 6F. Then 6F and 0E are swapped, followed by 66 and 0E. This algorithm was repeated until the numbers were sorted in memory in ascending order. Therefore, the final ordering of the number was as follows: "F3, 00, 05, 06, 0E, 11, 66, 6F", as the ouput to the screen shows.

00 :	In this case, the most significant hex digit of the register passed will be outputted to the screen. ( There is the assumption that we are displaying two hex numbers).
01	Now, the least least significant hex digits are selected, and outputted to the screen.
10	Used when there is a need to write a character passed by the compiler (at the assembly level). This would select the last 8 bits from the instruction. When the first 2 bits are ‘00’, the remaining 6 bits are the ascii code for the character that will be outputted. When ‘01’, this corresponds to the enter key, therefore there will be no ouput to the screen. When ‘10’, a ‘backspace’ key is passed, so there will be no output either.
11	This is never suppose to be passed, but assigning 00000000 to avoid powering up in an unknown state.