The MIPS integer unit implements a load/store ar chitecture with
single cycle ALU operations (logical, shift, add, sub) and autono-
mous multiply/divide unit. The 64-bit register resources include: 32
general-purpose orthogonal integer registers, the HI/LO result
registers for the integer multiply/divide unit, and the program
counter. In addition, the on-chip floating-point co-processor adds
32 floating-point registers, and a floating-point control/status
register.
The RC4650 has thirty-two general-purpose 64-bit registers.
These regi sters are used for scalar int eger operations and address
calculation. The register file consists of two read ports and one
write port and i s full y bypassed to minimize operation latency in the
pipeli ne. Figure 1 illustrates the RC4650 Regist er File.
The RC4650 ALU consists of the integer adder and logic unit.
The adder performs address calculations in addition to arithmetic
operations, and the logic unit performs all logical and shift opera-
tions. Each of these units is highly optimized and can perform an
operation in a si ngle pipeline cycle.
The RC4650 uses a dedicated integer multiply/divide unit, opti-
mized for high-speed multiply and multiply-accumulate operation.
Table 1 shows the performance, expressed in terms of pipeline
clocks, achieved by the RC4650 integer multiply unit.
The MIPS-II I archit ecture defines that the results of a mul tiply or
divide oper ation are placed in t he H I and LO registers. The values
can then be transferred to the general purpose register file using
the MFHI /MFLO instructions.
'$ & '$!&
( ! % $! !
MULT/U, MAD/U 16 bit 3 2 0
32 bit 4 3 0
MUL 16 bit 3 2 1
32 bit 4 3 2
DMULT,
DMULTU any 6 5 0
DIV, DIVU any 36 36 0
DDIV, DDIVU any 68 68 0
Table 1: RC4650 In teger Multip ly Operati on
The RC4650 adds a new multiply instruction, “MUL”, which can
specify that the multiply results bypass the “Lo” register and are
placed immediately in the pr imary regi ster fi le. By avoidi ng the explicit
“Move-from-Lo” instruction required when using “Lo”, throughput of
multiply-intensive operat ions is inc reased.
An additional enhancement offered by the RC4650 is an atomic
“multiply-add” operation, MAD, used to perform multiply-accumulate
operations. This instruction multiplies two numbers and adds the
product to the current contents of the HI and LO registers. This oper-
ation is used in numerous DSP algorithms, and allow s the RC4650 to
cost reduce system s requiring a mix of DSP and contr ol functions.
Finally, aggressive implementation techniques feature low latency
for these operations along with pipelining to allow new operations to
be issued before a previous one has fully completed. Table 1 also
shows the repeat rate (peak issue rate), latency, and number of
processor stalls required for the various operations. The RC4650
performs automatic operand size detection to determine the size of
the operand, and im plements hardware interlocks to prevent overrun,
allowing this high-performance to be achieved with simple program-
ming.
The RC4650 incorporates an entire single-precision floating-point
co-processor on chip, including a floating-point register file and
execution units. The floating-point co-processor forms a “seamless”
interface with the integer unit, decoding and execut ing instructions in
parallel with the integer unit.
The RC4650’s floating- point unit directly implements single-preci-
sion floating-point operations. This enables the RC4650 to perform
functins such as graphics rendiering, without requiring extensive die
are or power consumption.
The RC4650 does not directly implement the double-precision
operations found in the RC4700. However, to maintain software
compatibility, the RC4650 will signal a trap when a double-precision
operation is initiated, allowing the requested function to be emulated
in software. Alternatively, the system architect could use a software
library emulation of double-precision functions, selected at compile
time, to elim inate the overhead associated with trap and emulation.
The RC4650 floating-point execution units perform single preci-
sion arithmetic, as specified in the I EEE Standard 754. The execution
unit is broken into a separate multiply unit and a combined add/
convert/divide/square r oot uni t. Overlap of m ult ipl ies and add/ subtr act
is supported. The multiplier is partially pipelined, allowing a new
multiply to begin every 6 cycl es.
As in the IDT79RC4700, the RC4650 maintains fully precise
floating-point exceptions while allowing both overlapped and pipe-
lined operations. Precise exceptions are extremely important in
mission-critical environments, such as ADA, and highly desirable for
debugging in any environm ent.