Introduction: Background and Aim of Project

1 Outline of the problem to be tackled

The stated aim of this project is to "investigate techniques for efficient software-based graphics APIs". To this end, I intend to design and produce an efficient 3-D graphics library for the ARM microprocessor, with RISC OS being the target operating system. This platform is typified by a lack of hardware graphics acceleration, making it an ideal candidate for a software graphics renderer.

Although I shall only implement a 'toy' library that is capable of quickly throwing polygons around the screen (e.g. for 3-D games), ideally the graphics API adopted will be general enough to also be used for applications where advanced graphics rendering is required.

I shall begin by describing the hardware and software background for the project.

2 The ARM microprocessor

2.1 Background

In 1979 Dr Hermann Hauser and Dr Chris Curry founded Acorn Computers Ltd. During the early 1980s Acorn produced a series of home computers - the kit-built Atom, the BBC Microcomputer and its cheaper cousin the Electron. All were based upon the 6502, a popular off-the-shelf 8-bit processor from Western Design Centre. Over 150,000 BBC Micros were sold in the first two years, virtually monopolising the British schools market.

For their next generation computer Acorn wanted a windowing system, such as that pioneered by the Apple Lisa. This meant a move to a new architecture, but the 16-bit processors that they looked at all had very complex instruction sets and poor interrupt response. However the experimental microprocessor built by graduate students at Berkeley's RISC project looked much more promising. [6]

2.2 The RISC design philosophy

At this point I should probably explain the basic tenets of the RISC philosophy of processor design:

A conventional CPU such as the Intel 80x86 has a large number of instructions, many of which are quite powerful (multimedia extensions etc.). These CPUs are therefore known as CISC (Complex Instruction Set Computer) processors, and because of the complexity of their design they execute code relatively slowly.

RISC (Reduced Instruction Set Computer) processors such as ARM or Sun's SPARC are designed on the principle that CISC processors spend most of their time executing a only small subset of simple instructions. By providing only these commonly used instructions, a RISC processor can run code more quickly and efficiently.

On the occasions where a CISC processor does use complex instructions, a RISC processor may need a number of instructions to perform the equivalent task. However, performance may still be comparable since a single slow instruction is being replaced by several fast ones. [3]

RISC processors are also sometimes called "load-store". This refers to the characteristic that memory access is simplified or restricted, to load and store operations. The reasoning behind this is that memory access is relatively slow compared to register access, so the majority of instructions should operate on registers rather than memory. [11]

In general, RISC processors have the following characteristics:

They take less man-hours to design and they are cheaper to manufacture.
They are high performance, particularly when this is measured in terms of MIPS (millions of instructions per second).
They have far lower power consumption than equivalent CISC chips, and produce less waste heat (no cooling fan needed).

The RISC philosophy is a very software-oriented rather than hardware-oriented approach. Libraries of efficient routines are necessary to allow programmers access to such niceties as complex mathematical functions. Good compilers for high-level languages such as C/C++ are also an asset, to allow the programmer to use concepts not necessarily directly supported by the CPU.

2.3 The Acorn RISC Machine

The Acorn RISC Machine was designed by Steve Furber with Sophie Wilson (who had developed BBC BASIC). They started work in late 1983, with Furber designing the architecture, and Wilson developing the instruction set.

Unsurprisingly, the instruction set has a passing resemblance to that of the 6502. It is straightforward to hand-code ARM assembly language, unlike some RISC processors which rely on sophisticated compilers to manage complicated instruction interdependencies. [3]

The first ARM ran on April 26th 1985, making it arguably the first commercial RISC processor ("MIPS for the masses"). The ARM was only part of a family of custom chips developed for Acorn's first RISC computer, the Archimedes. These included MEMC (memory controller), VIDC (video & sound controller) and IOC (timing, interrupts, peripherals). [11]

Key features of the ARM processor:

Load/store architecture based on Reduced Instruction Set Computer principles.
32-bit data bus, can access data in words (32-bit) or bytes (8-bit).
Combined PC/PSR¹ means that address bus is only 26-bits wide.
Simple 3-stage instruction pipeline (fetch, decode, execute).
All instructions are 32-bits long, and must be word aligned.
Every instruction is predicated, using a 4-bit condition code. Another bit indicates whether condition codes should be set, so that intervening instructions may be prevented from changing them.
Four modes: User (program), FIQ (fast interrupt), IRQ (general interrupt), SVC (supervisor).
16 registers visible to programmer. All registers are general-purpose except for R15 (program counter/status) and R14 (link register). Some registers are 'shadowed' in interrupt modes, for fast response.
'Barrel shifter' operates on second operand of most ALU² instructions, allowing shifts to be combined with most operations.
Support for external co-processors, such as a floating point arithmetic unit.

The fact that every instruction is conditionally executed allows many branches to be eliminated entirely, speeding execution. The ARM's barrel shifter is another unique idea, which allows the equivalent of two or more instructions to be combined into one.

Because of these features, ARM code is both efficient and dense, compared to other RISC processors. Despite relatively low clock speeds and a short pipeline, in operation the ARM is equivalent to much more complex and power-hungry processors. Low-power consumption and high MIPS-to-watts ratio make it ideal as an embedded processor, e.g. for hand-held devices.

Whilst early ARM processors such as the ARM1 and ARM2 had no cache, the ARM3 was the first to break this trend, featuring a 4kb on-chip cache. More recent CPUs such as the ARM610/710 have a cache as standard.

In later versions of the ARM architecture, the program counter was extended to a full 32-bits, with the program status register moving to a dedicated register - the CPSR (current PSR), with a SPSR (saved PSR) for each privileged mode. [3]

Apart from incremental increases in clock speeds, other improvements have included the addition of DSP-like fast 64-bit multiply instructions and the availability of hardware floating point systems. The ARM processors around which current RISC OS machines are built range from the 56Mhz ARM7500FE at the low end to 300Mhz StrongARM in clock speeds.

In 1990 Acorn's microprocessor group was spun out as a separate venture 'Advanced RISC Machines', backed by Apple and manufacturers VLSI Technology. Today, this company dominates the world market for embedded processors: "ARM processors are teetering on the verge of ubiquity in widgets of all shapes, sizes, and functions. Nintendo's Game Boy Advance, mobile phones too numerous to mention..." [12]

3 The RISC OS operating system

The first operating system to be known as RISC OS was RISC OS 2 of 1988, which was so-called because it was in fact the second operating system for the Archimedes computer. The first had been "Arthur", which was essentially a hasty port of the BBC Microcomputer's OS, bundled with a primitive desktop written in BASIC(!)

By contrast, RISC OS 2 provided a proper desktop environment in which multiple applications could run simultaneously, exchanging data between themselves and with the Filer by a drag-and-drop user interface. Subsequent releases have gradually improved the OS both aesthetically and internally, and updated it for new hardware, but it has never undergone a large-scale overhaul. [13]

RISC OS has the following characteristics:

The kernel contains core routines. This is not replaceable, but to some extent behaviour may be changed by claiming software vectors.
System extension modules add facilities such as filing systems, a window manager, a font manager and an internet stack.
It is easy to extend the OS by adding new modules, or replacing existing modules with replacements that provide equivalent facilities.
The functional interface of a module is defined in terms of SWIs or SoftWare Interrupts. These routines are accessed by invoking an allocated SWI number, through the ARM's SWI instruction.
Most of RISC OS (including the kernel) is written in ARM assembler, which restricts it to ARM based hardware: Only a few peripheral components are written in C.
RISC OS is substantially dependant on Acorn's custom VIDC/IOMD chipset, although specific modifications have been made to keep up with hardware improvements.
The multi-tasking scheme implemented is co-operative rather than pre-emptive, meaning that tasks must be well behaved and relinquish control at regular intervals.
At approximately 4 Mb, RISC OS is relatively small compared to other operating systems.
This allows it to be supplied on ROM chips.

The fact that RISC OS is supplied on ROM rather than disc has a number of advantages: It cannot be damaged or lost by viruses or accident, and since it does not need to be loaded into memory it is much faster to start up. When running it does not take a significant proportion of the computer's memory. Admittedly it is harder to upgrade ROMs, but this can generally be done by soft-loading replacement modules from disc. [3]

Because it is ROM-based, RISC OS is ideal for embedded systems and network computers. Designed to be usable on low-end Archimedes computers that had no hard disc and low-resolution monitors, the suitability of RISC OS for TV set-top box products is obvious. In particular the quality of RISC OS's font rendering at low resolutions has attracted praise.

Acorn were involved in Oracle's network computer project, and brought at least one NC to market. Pace Micro Technology (the new copyright holders) are currently using RISC OS in their set-top boxes and in consumer products such as the Bush Internet TV.

4 Conventions used in this document

The following typographical conventions are followed in the rest of this document:

Where direct quotations or information from other authors is included, the source is attributed using a reference number in square brackets, e.g. [5]. This can be looked up in the numbered list of references given at the end of this document.

When computer software is referred to, the name is italicised rather than bracketed by quotation marks, e.g. "TechWriter is based on EasiWriter, with the addition of a powerful equation editor."

Equations, variables and other mathematical expressions also appear in an italicised font, e.g. "The tangent at the point where , cuts the -axis at ."

The general convention for any quoted code, function or SWI name is that it appears in a monospaced font, e.g. "Changes made to array data between the execution of glBegin and the corresponding execution of glEnd may affect calls to glArrayElement..."

Where larger code examples are given to illustrate a technical point, the section is additionally highlighted by a grey background:

LDR R9,[R10,#12]                 ; Ship they are attacking
CMP R9,R5                        ; Old dead ship ?
  BNE war_attackingoldshiploop%  ; No - next ship

Most code examples are given either in BBC BASIC or ARM assembly language (as above). The former is generally used for client program code whilst the latter is generally used for module code. Whilst a full explanation of neither language is within the scope of this report, the following sections give a (very) basic grounding in the syntax of some of the commands used.

5 BBC BASIC syntax

Since many of the code examples involve calling SWIs (see section 3), it may be helpful to know the syntax of BASIC's SYS statement:

SYS "<swi-name>" [,<input-expr>]^ [TO <output-var>[<output-var>]^ [;<flags-var>]]

A comma-separated list of expressions may follow the SWI name, each an argument to be passed in one of the ARM registers R0-R7. Numbers are converted to integers and placed directly into a register whilst strings are passed by pointer. Any registers omitted from the list (indicated by ,,) are zeroed. After the optional TO, a similar list specifies output variables in which the returned values of registers are to stored. Again, registers may be omitted from the list. Finally, a trailing semicolon and variable can be used to retrieve the state of the processor flags on exit from the SWI.

For example, SYS "OS_Find",&40,"foo" TO handle% would call the OS_Find SWI with &40 in register 0 (meaning open existing file with read access) and register 1 pointing to the string "foo". The return value of register 0 (a file handle) would be stored in the BASIC integer variable handle%.

Another BASIC keyword commonly used in this document is REM, which simply means that the rest of the line is a comment, to be ignored by the interpreter.

6 Assembly language syntax

In general the instruction mnemonics and options used follow the de facto standard - that in Peter Cockerell's book [5]. The general syntax of an assembler source line is as follows:

<label-part>:] [<instruction>[<cond>] <operands>][;<comment-part>]

An address label at the beginning of the line is terminated by a colon, followed by the instruction. Comment text is prefixed by a semicolon, and is ignored by the assembler. The label, instruction and comment are all optional parts of the source line.

ARM instructions are referred to by a mnemonic such as LDR (load) or MLA (multiply with accumulate). All instruction mnemonics may be postfixed by a condition code that must be satisfied for the instruction to be executed (see section 2.3). Examples might be RSBMI (reverse subtract, if negative flag set) or SWIVC (software interrupt, if overflow flag clear).

Operands are specified as a list of comma separated registers, which are referred to by number as R0-R15. Alternative names for R13-R15 are SP (stack pointer), LR (link register) and PC (program counter), reflecting their usual roles. The commonest (though by no means the only) format for instruction operands is as follows:

<destination>, <operand1>, <operand2>

For example, ADD R0,R1,R2 would perform R0=R1+R2. To complicate matters, for many instructions <operand2> may be either a constant value, a register, a register shifted by a register, or a register shifted by a constant value:

<operand2>=<reg>|#<const>|<reg> ,ASL|LSL|LSR|ASR| ROR|RRX <reg>|#<const>

For example, ADD R0,R0,R0,ASL#1would multiply R0 by 3 (R0 = R0 + R0×2)³.

Finally, a convention that I use to aid readability of ARM code is that conditionally executed instructions are indented. This is analogous to the way loops and other constructions are indented in well formatted C sourcecode. Groups of instructions dependent on different condition codes are indented slightly by different amounts:

CMN R6,R0,ASL#1 ; compare with -2*R0
  MOVLT R1,R6 ; if less than then let R1=R6
   MOVEQ R1,#0 ; if equal then let R1=0...
   SUBEQ R6,R6,#1 ; ..and decrement R6
    MOVGT R1,R0,ASR#2 ; if greater than then let R1=R0/4

Furthermore, where a sequence of complimentary comparisons are made, the indentation is cumulative. The following code branches on the condition R1=10 or R2=20 or R3=30:

CMP R1,#10 ; is R1 = 10?
  CMPNE R2,#20 ; if no, then is R2 = 20?
    CMPNE R3,#30 ; if neither, then is R3 = 30?
      BEQ ten_or_twenty_thirty ; one of the above is true

It is probably obvious that this indentation scheme is less than foolproof, but generally conditional constructions follow a few well known patterns (the cumulative AND, the cumulative OR etc) and the indentation works nicely.

¹ The program counter (address from which instructions are fetched) and processor status (flags and current mode/interrupt state) are packed into a single register, R15. Therefore less than the full 32-bits are available for the PC address.

² Arithmetic and logic unit.

³ Note that the assembler treats ASL and LSL as synonymous, since shifting a 2's compliment signed value n bits to the left multiplies it by 2ⁿ whether or not it is negative. The same is not true of the right-shifts ASR and LSR.

Contents Previous chapter Next chapter