IDT79R4650 COMMERCIAL TEMPERATURE RANGE
5.8
4
Integer Multiply/Divide
The R4650 uses a dedicated integer multiply/divide unit,
optimized for high-speed multiply and multiply-accumulate
operation. Table 1 shows the performance, expressed in
terms of pipeline clocks, achieved by the R4650 integer
multiply unit.
The MIPS-III architecture defines that the results of a
multiply or divide operation are placed in the HI and LO
registers. The values can then be transferred to the general
purpose register file using the MFHI/MFLO instructions.
The R4650 adds a new multiply instruction, “MUL”, which
can specify that the multiply results bypass the “Lo” register
and are placed immediately in the primary register file. By
avoiding the explicit “Move-from-Lo” instruction required
when using “Lo”, throughput of multiply-intensive opera-
tions is increased.
An additional enhancement offered by the R4650 is an
atomic “multiply-add” operation, MAD, used to perform
multiply-accumulate operations. This instruction multiplies
two numbers and adds the product to the current contents
of the HI and LO registers. This operation is used in
numerous DSP algorithms, and allows the R4650 to cost
reduce systems requiring a mix of DSP and control
functions.
Finally, aggressive implementation techniques feature
low latency for these operations along with pipelining to
allow new operations to be issued before a previous one
has fully completed. Table 1 also shows the repeat rate
(peak issue rate), latency, and number of processor stalls
required for the various operations. The R4650 performs
automatic operand size detection to determine the size of
the operand, and implements hardware interlocks to
prevent overrun, allowing this high-performance to be
achieved with simple programming.
Floating-Point Co-Processor
The R4650 incorporates an entire single-precision
floating-point co-processor on chip, including a floating-
point register file and execution units. The floating-point co-
processor forms a “seamless” interface with the integer
unit, decoding and executing instructions in parallel with
the integer unit.
The floating-point unit of the R4650 directly implements
single-precision floating point operations. This enables the
R4650 to perform functions such as graphics rendering,
without requiring extensive die area or power consumption.
The single-precision unit of the R4650 is directly
compatible with the single-precision operation of the
R4600, and features the same latencies and repeat rates.
The R4650 does not directly implement the double-
precision operations found in the R4600. However, to
maintain software compatibility, the R4650 will signal a trap
when a double-precision operation is initiated, allowing the
requested function to be emulated in software. Alterna-
tively, the system architect could use a software library
emulation of double-precision functions, selected at
compile time, to eliminate the overhead associated with
trap and emulation.
Floating-Point Units
The R4650 floating-point execution units perform single
precision arithmetic, as specified in the IEEE Standard 754.
The execution unit is broken into a separate multiply unit
and a combined add/convert/divide/square root unit.
Overlap of multiplies and add/subtract is supported. The
multiplier is partially pipelined, allowing a new multiply to
begin every 6 cycles.
As in the IDT79R4600, the R4650 maintains fully precise
floating-point exceptions while allowing both overlapped
and pipelined operations. Precise exceptions are extremely
important in mission-critical environments, such as ADA,
and highly desirable for debugging in any environment.
The floating-point unit’s operation set includes floating-
point add, subtract, multiply, divide, square root,
conversion between fixed-point and floating-point format,
conversion among floating-point formats, and floating-point
compare. These operations comply with IEEE Standard
754. Double precision operations are not directly
supported; attempts to execute double-precision floating
point operations, or refer directly to double-precision
registers, result in the R4650 signalling a “trap” to the CPU,
enabling emulation of the requested function.
Opcode Operand
Size Latency Repeat Stall
MULT/U,
MAD/U 16 bit 3 2 0
32 bit 4 3 0
MUL 16 bit 3 2 1
32 bit 4 3 2
DMULT,
DMULTU any 6 5 0
DIV, DIVU any 36 36 0
DDIV,
DDIVU any 68 68 0
Table 1: R4650 Integer Multiply Operation