This article describes some optimizing techniques for ANSI C code for all 8-bit vintage systems, i.e., computers, consoles, hand-helds, scientific calculators from the end of the '70s until mid '90s and in particular for the systems based on the following architectures (and the derived and compatible architectures):
- Intel 8080 (*)
- MOS 6502
- Motorola 6809
- Zilog Z80 (*)
(*) The Zilog Z80 is an extension of the Intel 8080. Therefore an Intel 8080 binary is compatible with a Z80-based system but not the other way round.
Most of the techniques presented here remain valid on other 8-bit architectures such as the COSMAC 1802 and the Intel 8051.
The goal of this article is two-fold:
- present general techniques to optimize C code for all 8-bit systems;
- present general techniques to write portable C code, i.e., valid and compatible code for all 8-bit systems, including systems that are not natively and explicitly supported by C compilers
This article is not a introduction nor a manual for the C language and has the following preconditions:
- knowledge of the C language;
- knowledge of structured and object-oriented programming;
- familiarity with compilers and linkers.
Besides this article does not cover in depth some advanced topics such as:
- coding in specific domains such as graphics, sound, input/output;
- interaction between C and Assembly.
These advanced topics are very important and would require separate articles.
In this article we will refer to system, target and architecture with the following meanings:
- A system is any kind of processor-equipped machine such as computers, consoles, hand-helds, scientific calculators, embedded systems, etc.
- A target of a compiler is any kind of system supported by the compiler, i.e., a system for which the compiler provides specific support with libraries and the generator of a binary in a format that is usable by an emulator and/or real hardware.
- An architecture is a processor family (e.g., Intel 8080, MOS 6502, etc.). Therefore a target has an architecture that corresponds to its processor. So it has only one architecture unless if it has two or more processors belonging to different families (such as the Commodore 128, which has both a Z80 and a 6502-derived processor)
In order to produce binaries from source code we recommend multi-target cross-compilers (i.e., compilers that are run on a modern PC and that produce binaries for multiple targets).
We do not recommend the use of native compilers because they would be inconvenient (even if used inside an accelerated emulator) and would never produce the same kind of optimized code due to the limited resources of the 8-bit system.
In particular we will refer to the following multi-target cross-compilers:
Architecture | Compiler/Dev-Kit | Web Page |
---|---|---|
Intel 8080 | ACK | https://github.com/davidgiven/ack |
MOS 6502 | CC65 | https://github.com/cc65/cc65 |
Motorola 6809 | CMOC | https://perso.b2b2c.ca/~sarrazip/dev/cmoc.html |
Zilog 80 | SCCZ80/ZSDCC (Z88DK) | https://github.com/z88dk/z88dk |
We mention other multi-target cross-compilers that we do not cover here but for which most of the described general techniques are valid:
- SDCC (http://sdcc.sourceforge.net/) for several architectures including the 8-bit Zilog Z80 and Intel 8051;
- LCC1802 (https://sites.google.com/site/lcc1802/) for the COSMAC 1802 8-bit processor;
- GCC-6809 (https://github.com/bcd/gcc) for the Motorola 6809 (GCC adaptation);
- GCC-6502 (https://github.com/itszor/gcc-6502-bits) for the MOS 6502 (GCC adaptation);
- SmallC-85 (https://github.com/ncb85/SmallC-85) for the Intel 8080/8085 ;
- devkitSMS (https://github.com/sverx/devkitSMS) for Sega Z80-based consoles (Sega Master System, Sega Game Gear, Sega SG-1000).
We remark that the Z88DK dev-kit provides two compilers:
- the more reliable SCCZ80 that also offers fast compilation,
- the experimental ZSDCC (Z80-only optimized SDCC version) that can produce faster and more compact code than SCCZ80 at the cost of much slower compilation and the risk introducing erratic behavior.
Almost all of considered compilers generate code for just one architecture (they are mono-architecture) even though they are multi-target. ACK is an exception because it is multi-architecture (Intel 8080, Intel 8088/8086, I386, 68K, MIPS, PDP11, etc.).
This article is not an introduction nor a manual for these compilers and it will not cover the following topics:
- compiler installation
- basic usage of the compiler
For details on these topics we refer to the compiler’s manuals and web pages.
A sub-set of ANSI C
In this article we will refer to ANSI C as to a large sub-set of the C89 standard where float
and long long
are optional but pointers to functions and pointers to struct
are present.
We will not consider previous versions such as C in K&R syntax.
Why should we use C to code for vintage 8-bit systems?
Traditionally these systems were coded in either Assembly or interpreted BASIC or a mix of the two.
Given the limited resources, Assembly was often necessary. BASIC was convenient for its simplicity and because an interpreter was often present on the system’s ROM.
If we limit our comparison to just Assembly, BASIC and C, the following tables summarizes the reasons of using C:
language | simplicity | portability | efficiency |
---|---|---|---|
Assembly | low | no | optimal |
BASIC | high | low | low |
C | high | high | good |
In particular ANSI C allows us:
- to ease porting from different architectures
- to write “universal” code, that is valid code for different targets without any necessary modification
Someone sees C as a sort of universal Assembly language. I do not fully agree with this statement because optimally-written C will never beat optimally-written Assembly.
Nevertheless, C is the closest language to Assembly among the languages that allow high level programming.
One not fully rational reason for not using C in this context is the fact that coding in C provides a less vintage experience compared to BASIC and Assembly because it was less common on the home computers from the 80s (but it was common on 8-bit business computers such as on computers that used the CP/M operating system).
On the other hand, I believe that a good reason for coding in C is that C allows us to code for any 8-bit system.
Writing easily portable code or even directly compilable code for different architectures is possible in C through different strategies:
- Write code that is hardware-agnostic through abstract interfaces (i.e., is hardware-independent APIs)
- Use different implementations of the interfaces and select them at compilation-time (by using precompiler-directives or by providing different files at linking-time)
This is trivial if our multi-target dev-kit provides a multi-target library or if we just use standard C libraries (e.g., stdio, stdlib, etc.). Under these conditions we just need to recompile our code. The multi-target library will do the “magic” for us.
Unfortunately only CC65 and Z88DK provide significant multi-target libraries for input and output other than standard C libraries:
Dev-Kit | Architecture | multi-target libraries |
---|---|---|
Z88DK | Zilog Z80 | standard C lib, conio, vt52, vt100, sprite software, UDG, bitmap |
CC65 | MOS 6502 | standard C lib, conio, tgi (bitmap) |
CMOC | Motorola 6809 | standard C lib |
ACK | Intel 8080 | standard C lib |
In particular Z88DK has very powerful libraries for multi-target graphics and even provides APIs for software sprites (https://github.com/z88dk/z88dk/wiki/monographics) and redefined characters for most of its 80 targets.
Example: The code of the game H-Tron (https://sourceforge.net/projects/h-tron/) uses Z88DK’s APIs for low resolution bitmap graphics for a multitude of Z80-based targets.
Therefore if we were to use exclusively the standard C libraries we could compile our code with ACK, CMOC, CC65 and Z88DK. If we used conio we could compile the code with CC65 and Z88DK (maybe with minor adaptations).
In all other cases, if we want to write portable code on different architectures and systems, we would need to write a “hardware abstraction layer” that allows us to separate:
- the code that does not depend on the hardware (e.g., the logic part)
- the code that depends on the harware (e.g., input/output in a videogame)
This pattern is very common in modern programming and it is not exclusive to C. For this purpose C provides a set of tools to implement this pattern to select the different portions of code required by each hardware at compilation-time.
In particular C provides a powerful pre-compiler with commands such as:
#define
-> to define a macro#if
…defined(...)
…#elif
…#else
-> to select code portions that depend on the existence of value of a given macro
Moreover all compilers provide the option -D
to pass a macro to the pre-compiler. Some compilers such as CC65 implicitly define a macro that depends on the selected target (e.g., VIC20).
In our code we may have something like:
...
#elif defined(__PV1000__)
#define XSize 28
#elif defined(__OSIC1P__) || defined(__G800__) || defined(__RX78__)
#define XSize 24
#elif defined(__VIC20__)
#define XSize 22
...
When we compile for the Vic 20 target, the pre-compiler will select for us the Vic 20-specific definition of XSize
.
This also allows to select specific options for the configuration of the target (additional memory, video card, video mode, debug compilation, etc.).
As main example we refer to the Cross-Chase project:
https://github.com/Fabrizio-Caruso/CROSS-CHASE
The code of Cross-Chase provides an example of how to write universal code for any system and architecture:
- the code of the game (src/chase directory) is hardware-independent
- the code of the crossLib library (src/cross_lib directory) implements all the hardware-specific details
The dev-kits under our consideration support a list of targets for each architecture by providing specific libraries. Nevertheless it is possible to exploit these dev-kits for other systems with the same architecture but we will have to implement all the hardware-specific code:
- the necessary code for input/output (e.g., graphics, sounds, keyboard, joystick, etc.)
- the necessary code for correct machine initialization
Alternatively, it is possible to extend a dev-kit to support to new targets.
In many cases, we can use the ROM routines to do this (see the section on the ROM routines)
Moreover we may have to convert the binary to a format that can be accepted by the system.
Therefore, we can indeed write portable code for even these unsupported systems.
For example CC65 does not support the BBC Micro, nor the Atari 7800 and CMOC does not support the Olivetti Prodest PC128. Yet, it is possible to use these dev-kits to produce binaries for such targets:
- Cross Chase (https://github.com/Fabrizio-Caruso/CROSS-CHASE) supports (theoretically) any architecture even the unsupported ones such as for example the l’Olivetti Prodest PC128.
- The game Robotsfindskitten is been compiled for thr Atari 7800 with CC65 (https://sourceforge.net/projects/rfk7800/files/rfk7800/).
- BBC has already been added unofficially as an experimental new target in CC65 (https://github.com/dominicbeesley/cc65).
We give a list of compilation options for generic targets for each dev-kit. These options tell to compile without any dependence on a specific target. For more details we refer to the manual of the dev-kits.
Architecture | Dev-Kit | Option(s) |
---|---|---|
Intel 8080 | ACK | (*) |
MOS 6502 | CC65 | +none |
Motorola 6809 | CMOC | --nodefaultlibs |
Zilog 80 | SCCZ80/ZSDCC (Z88DK) | +test , +embedded (new lib), +cpm (generic CP/M target) |
(*) ACK officially only supports the CP/M-80 for the Intel 8080 architecture but it is possible to use ACK to build generic Intel 8080 binaries but it is not very simple because ACK uses a sequence of commands to produce intermediate results (including the “EM” byte-code):
ccp.ansi
: C precompilerem_cemcom.ansi
: compiles precompiled C code into “EM” byte-codeem_opt
: optimizes “EM” byte-codecpm/ncg
: generates Intel 8080 Assembly from “EM” bytecodecpm/as
: generates Intel 8080 binary from Assemblyem_led
: links object files
We describe some general rules to improve the code that do not depend on whether the architecture is 8-bit or not.
In general, in whatever programming language we want to code, it is important to avoid code duplication and unnecessary code.
We have to examine each function in order to find common portions that we can factor by introducing sub-functions that are original function can call.
However we must take into account that, beyond a certain limit, excessive code granularity has negative effects because each function call has a computational and memory cost.
If two functions do the same thing on different objects then just simply use the same function and pass to it the specific object as a parameter.
In other cases, some portions of the code differ only by an applied function. In such cases, we should write one function to which we pass a pointer to the specific function we want to apply.
Not everyone is familiar with the C syntax for pointer to functions. Therefore we give here a simple example in which we define sumOfSomething(range, something)
that sums something(i)
on values of i
from 0 to i-1
:
unsigned short sumOfSomething(unsigned char range, unsigned short (* something) (unsigned char)) { unsigned char i; unsigned short res =0; for(i=0;i<range;++i) { res+=something(i); } return res; }
Hence given the two functions:
unsigned short square(unsigned char val)
{
return val*val;
}
unsigned short next(unsigned char val)
{
return ++val;
}
we can use sumOfSomething
on either of the two:
printf("%d\n",sumOfSomething(4,square));
prints 14, i.e., the sum of squares of 0,1,2,3.
printf("%d\n",sumOfSomething(4,next));
prints 10, i.e., the sum of 0+1,1+1,2+1,3+1.
In some cases it is possible to generalize the code by passing a parameter to avoid writing very similar functions.
An advanced example is in https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/chase/character.h where, given a struct
with two fields _x
and _y
,
we want to be able to change the value of one field or the other in different situations:
struct CharacterStruct
{
unsigned char _x;
unsigned char _y;
...
};
typedef struct CharacterStruct Character;
We avoid two different functions for _x
and _y
by creating one function to which we pass an offset to select the field:
unsigned char moveCharacter(Character* hunterPtr, unsigned char offset)
{
if((unsigned char) * ((unsigned char*)hunterPtr+offset) < ... )
{
++(*((unsigned char *) hunterPtr+offset));
}
else if((unsigned char) *((unsigned char *) hunterPtr+offset) > ... )
{
--(*((unsigned char *) hunterPtr+offset));
}
...
}
In this case, we use the fact that the field _y
is exactly one byte after the field _x
. Therefore with offset=0
we access _x
and with offset=1
we access _y
.
Warning: We must always remember that adding a parameter has a cost and we must verify that the cost of the parameter is lower than the cost of an extra function (e.g., by looking at the size of the obtained binary).
We can do even better and use the same code on objects that are not identical but shares some common features by using offset
in struct
, pointers to functions, etc. In general this is possible through object-oriented programming whose light-weight implementation for 8-bit systems is described in a subsequent section in this article.
We must avoid post-increment/decrement operators (i++
, i--
) when they are not needed, i.e., when we do not need the original value and replace them with (++i
, --i
). The reason is that the post-increment operator requires at least an extra operation to save the original value.
Remark: It is totally useless to use a post-increment in a for
loop.
Any architecture will perform better if variables are replaced by constants.
Therefore if a variable has a known value at compilation-time, it is important to replace it with a constant.
If its value depends on some compilation option, then we should use a macro to set its value.
For single pass compilers (the majority of 8-bit cross-compilers, e.g., CC65), it is important to help the compiler decide whether a given expression is a constant.
Example (from https://www.cc65.org/doc/coding.html):
A single pass compiler may evaluate the following expression from left to right and miss the fact that OFFS+3
is a constant:
#define OFFS 4
int i;
i = i + OFFS + 3;
In this case it would be better to re-write i = i + OFFS+3
as i = OFFS+3+i
or i = i + (OFFS+3)
:
#define OFFS 4
int i;
i = OFFS + 3 + i;
The C language has both high level constructs (such as struct
, functions as parameters, etc.) and low level constructs (such as pointers, bitwise operators, etc.).
This is not enough to make C a programming language well-suited for programming 8-bit systems.
Most probably we will need to read and write single bytes from and to specific memory locations. In old BASIC this was done through peek
and poke
commands. In C we must do this through pointers whose syntax is not very readable. In order to make our code more readable we can create the following macros:
#define POKE(addr,val) (*(unsigned char*) (addr) = (val))
#define PEEK(addr) (*(unsigned char*) (addr))
Remark: The compilers will produce optimal code when we use constants as parameters of these macros.
For more details we refer to: https://github.com/cc65/wiki/wiki/PEEK-and-POKE
First of all we must take into account that we have the following situation:
- all arithmetic operations are just 8-bit
- most of other operations use 8 bits while some may use 16 bits and none uses 32 bits
signed
operations are slower thanunsigned
operations- the hardware does not support floating point operations
The C languages provides signed
integer types (char
, short
, int
, long
, long long
, etc.) and their unsigned
counterparts.
Most cross compilers (but not CC65) support the float
type (for floating point numbers), which we do not cover here. We only remark that float
numbers in 8-bit architecture are always software float and therefore have a high computational cost. Hence we should only use them when strictly necessary.
Since the 8-bit architectures under consideration do NOT handle signed
types well, we must avoid them whenever possible.
The size of the standard integer types depend on the compiler and architecture and not on the C standard.
Recently the C99 standard has introduced some types that have an unambiguous size (e.g., uint8_t
for an 8-bit unsigend
integer).
In order to use these types in our code we should include stdint.h
with:
#include <stdint.h>
Not all 8-bit cross compiler support these types.
Fortunately for most 8-bit compilers we have the following situation:
type | number of bits | stdint.h |
alternative name |
---|---|---|---|
unsigned char |
8 | uint8_t |
byte |
unsigned short |
16 | uint16_t |
word |
unsigned int |
16 | uint16_t |
word |
unsigned long |
32 | uint32_t |
dword |
Therefore we must:
- use
unsigned char
(oruint8_t
) for arithmetic operations whenever possible; - use
unsigned char
(oruint8_t
) andunsigned short
(oruint16_t
) for all other operations and avoid all 32-bit operations.
Remark: When the fixed-size types are not available we can introduce them by using typedef
:
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;
When writing code for an 8-bit architecture we must avoid inefficient operations or operations that force us to use inefficient types (such as signed
or 32-bit types).
In particular, it is often possible to rewrite the code in a way to avoid subtractions or when this is not possible, we can at least have a code that does not produce negative results.
All the architectures under consideration, with the only exception of the Motorola 6809, do not have a product operation between two 8-bit values. Therefore, if possible, we should avoid products or limit ourselves to products and divisions by power of 2 that we can implement with the << e >> operators:
unsigned char foo, bar;
...
foo << 2; // multiply by 2^2=4
bar >> 1; // divide by 2^1=2
Other operations such as modulo can be rewritten in a more efficient way for the 8-bit systems by using bit-wise operators because the compiler is not always capable of optimizing these operations:
unsigned char foo;
...
if(foo&1) // equivalent to foo%2
{
...
}
One of the greatest limitations of the MOS 6502-architecture is not the lack of registers as someone may think but it is the small size of its hardware stack (in page one: $0100-01FF
), which is unusable in C for managing the scope of variables and parameter passing.
Therefore a C compiler for the MOS 6502 may have to use a software stack:
- to manage the scope of local variables,
- to manage parameter passing.
The other 8-bit architectures under our consideration may suffer less from this problem but the scope of local variables and parameter passing also have a cost when a hardware stack can be used.
One way to mitigate this problem is to reduce the use of local variables and passed parameters. This is clearly an antipattern and if we were to apply it to all our code we would get some spaghetti code.
We must therefore wisely choose which variables deserve to be local and which variables can be declared as global.
We would then have less re-usable code but we will gain in efficiency. I am NOT suggesting the use of just global variables and to renounce to all parameters in functions.
The CC65 compiler for the MOS 6502 architecture provides the -Cl
option that tells the compiler to interpret all local variables as static
, i.e., global.
This has the effect of avoiding the use of the software stack for their scope. This also has the effect of making all the functions non-reentrant.
In practice this prevents us from using recursive functions. This is not a serious loss because recursion would be a costly operation, which we should avoid on 8-bit systems.
Standard C provides the register
keyword to give a hint to the compiler to use a register for a given variable.
Most modern compiler simply ignore this keyword because their optimizers can choose better than the programmer.
This is also true for most of the compilers under consideration but not for CC65 that uses this keyword to tell the compiler to use page zero for a given variable. The MOS 6502 can access this page more efficiently than any other memory area. The operating system already uses this page but the CC65 compilers leaves a few available bytes for the programmer. By default CC65 reserves 6 bytes in page zero for variables declared as register
.
One may think that all variables should be declared as register
but things are NOT so simple because everything has a cost. In order to store a variable in page zero, some extra operations are required. Hence, page zero provides an advantage only for variables that are heavily used.
In practice the two most common scenarios where this is the case are:
- parameters of type pointer to
struct
that are used at least 3 times within the function scope; - variables inside a loop that is repeated at least about 100 times.
A reference with more details is: https://www.cc65.org/doc/cc65-8.html
My personal advice is to compile and verify if the produced binary is shorter/faster.
If our program uses data in a specific memory area, it would be better to have the data already stored in the binary and have the load process of the binary copy the data at the expected locations without any code to do actual copying of the data.
If the data is in the source code instead, we will have to copy them and we will also end up having them twice in memory.
The most common case is the data for sprites and redefined characters or tiles.
Different compilers provide different tools to define the final binary structure.
It is possible to configure CC65’s linker through a .cfg file that describes the structure of the binary that we want to produce.
This is not very simple and a description of the linker would go beyond the scope of this article. For details we refer to
https://cc65.github.io/doc/ld65.html
We advice to read the manual and start by modifying the default .cfg file in order to adapt it to one’s use-case.
In some cases we may have graphics data in a memory area far from the code and have them both on the same binary. If we do this, we may end up with a “hole” between the two areas.
A common example is provided by the C64 where graphics data may be in higher memory than the code.
In this case I recommend the exomizer tool to compress the binary: https://bitbucket.org/magli143/exomizer/wiki/Home
Z88DK makes our life easier and its power appmake tool automatically builds binaries in the correct format for most scenarios.
Z88DK also allows the user to define memory sections and to redefine the binary “packaging” but doing this is quite complicated.
This topic is treated in detail in:
z88dk/z88dk#860
Usually separating the source code into multiple files is a good practice but it may produce poorer code because 8-bit optimizers do not perform link-time optimization, i.e., they cannot optimize code between two or more files and only optimize each file separately.
For example if we have a function that is called only once and the function is defined in the same file where it is invoked, then the optimizer may be able to inline it but this would never be possible if the function were defined and invoked in different files.
My advice is not to create one or few huge files but to take into account how separating the code into multiple files can affect the optimization.
The C compiler usually produces a unique binary that contains both code and data, which will be loaded in specific memory locations (even with non contiguous memory areas).
In many architectures some RAM areas are used as buffers for the ROM routines or are used only in some special cases (e.g., some graphics modes).
My advice is to study the memory map. For example for the Vic 20 we would have to look at:
http://www.zimmers.net/cbmpics/cbm/vic/memorymap.txt
In particular we should look for:
- cassette buffer, keyboard buffer, printer buffer disk buffer, etc.
- memory used by ROM routines and in particular by BASIC routines
- memory areas used by special graphics modes
- free small portions of free memory that are not usually used by code because they are not contiguous with the main code memory area.
These memory areas could be used by our code if they do not serve their standard purpose in our use-case, e.g., if we do not intend to use the tape after the program has been loaded (including from the tape), then we can use the tape buffer in our code to store some variables.
Useful cases
We list some of these useful memory areas for some systems including many with very limited RAM:
computer | descrizione | area |
---|---|---|
Commodore 16/116/+4 | BASIC input buffer | $0200-0258 |
Commodore 16/116/+4 | tape buffer | $0333-03F2 |
Commodore Pet | system input buffer | $0200-0250 |
Commodore Pet | tape buffer | $033A-03F9 |
Commodore 64 & Vic 20 | BASIC input buffer | $0200-0258 |
Commodore 64 & Vic 20 | tape buffer | $033C-03FB |
Galaksija | variable a-z | $2A00-2A68 |
Sinclair Spectrum 16K/48K | printer buffer | $5B00-5BFF |
Mattel Aquarius | random number space | $381F-3844 |
Mattel Aquarius | input buffer | $3860-38A8 |
Oric | alternate charset | $B800-B7FF |
Oric | grabable hires memory | $9800-B3FF |
Oric | Page 4 | $0400-04FF |
Sord M5 | RAM for ROM routines (*) | $7000-73FF |
TRS-80 Model I/III/IV | RAM for ROM routines (*) | $4000-41FF |
VZ200 | printer buffer & misc | $7930-79AB |
VZ200 | BASIC line input buffer | $79E8-7A28 |
(*): Multiple buffer and auxiliary ram for ROM routiens. For more details please refer to:
http://m5.arigato.cz/m5sysvar.html and http://www.trs-80.com/trs80-zaps-internals.htm
In standard C we can only define some pointer and array variables at some specific memory locations.
In the following with give a theoretical example on how to define some of these pointer and array variables at address starting at 0xC000
where given a 5-byte struct
type Character
we want to also handle the following variables:
player
of typeCharacter
,ghosts
, anarray
with 8Character
elements (40=$28 bytes)bombs
, an array with 4Character
elements (20=$14 bytes)
Character *ghosts = 0xC000;
Character *bombs = 0xC000+$28;
Character *player = 0xC000+$28+$14;
This generic solution with pointers does not always produce optimal code because it forces us to dereference our pointers and creates pointer variables (usually 2 bytes per pointer) that the compiler has to allocate in memory.
No standard solution exists to store any other type of variables in a specific memory area but the CC65 and Z88DK linkers provide a special syntax to do this and let us save hundreds or even thousands of precious bytes. Some examples are in
https://github.com/Fabrizio-Caruso/CROSS-CHASE/tree/master/src/cross_lib/memory
In particular we will have to create an Assembly file: a .s file (underCC65) or .asm file (under Z88DK) that we will link to our binary. In this file we will be able to assign each variable to a specific memory area.
Remark: We need to add an underscore prefix to each variable.
CC65 syntax (Commodore Vic 20 example)
.export _ghosts;
_ghosts = $33c
.export _bombs;
_bombs = _ghosts + $28
.export _player;
_player = _bombs + $14
Z88DK syntax (Galaksija example)
PUBLIC _ghosts, _bombs, _player
defc _ghosts = 0x2A00
defc _bombs = _ghosts + $28
defc _player = _bombs + $14
CMOC provides the --data=<address>
option to allocate all writable global variables at a given starting memory address.
ACK documentation does not say anything about this. We could nevertheless define pointer and array types at given free memory locations through the generic standard syntax.
Contrary to common belief, object-oriented programming is possible in ANSI C and can help up produce more compact compact in certain situations. There are complete object-oriented frameworks for ANSI C (e.g., Gnome is writtwn with GObject, which is one of these frameworks).
We can implement classes, polymorphim and inheritance very efficiently even for memory-limited 8-bit systems.
A detailed description of object-oriented programming goes beyond the purpose of this articile.
Here we decribe how to implement its main features:
- Use pointers to functions to implement *polymorphic" methods, i.e., methods with dynamic binding, whose behavior is defined at run-time. It is possible to avoid the implementation of a vtable if we limit ourselves to classes with just one polymorphic method.
- Use pointers to
struct
and composition to implelent sub-classes: given astruct
A, we implement a sub-class with astruct
B defined as astruct
whose first field is of type A. When passing pointers to such newstruct
, the C language guarantees that the offset of B are the same as the ones of A and therefore a pointer to B can be cast into a pointer to A.
Example (taken from https://github.com/Fabrizio-Caruso/CROSS-CHASE/tree/master/src/chase)
Let us define Item
as a sub-class ofCharacter
to which we add some variables and a polymorphic method _effect()
:
struct CharacterStruct
{
unsigned char _x;
unsigned char _y;
unsigned char _status;
Image* _imagePtr;
};
typedef struct CharacterStruct Character;
...
struct ItemStruct
{
Character _character;
void (*_effect)(void);
unsigned short _coolDown;
unsigned char _blink;
};
typedef struct ItemStruct Item;
We can then pass a pointer to Item
as if it were a pointer to Character
(by performing a simple cast):
Item *myIem;
void foo(Character * aCharacter);
...
foo((Character *)myItem);
Why can we save memory by doing this?
Because we may treat different, yet similar, objects with the same code and so avoid code duplication.
We won’t cover exhaustively all compilation options of the cross-compilers under our consideration. We refer to their respective manuals for the derails.
Here we give a list of options to produced optimized code on our compilers.
The following options will apply the highest optimizations to produce faster and above all more compact code:
Architecture | Compiler | Options |
---|---|---|
Intel 8080 | ACK | -O6 |
Zilog Z80 | SCCZ80 (Z88DK) | -O3 |
Zilog Z80 | ZSDCC (Z88DK) | -SO3 --max-alloc-node20000 |
MOS 6502 | CC65 | -O -Cl |
Motorola 6809 | CMOC | -O2 |
The most common problem for many 8-bit systems is the presence of little memory for code and data. Usually optimizing for speed also improves memory usage but this is not always the case. In some other cases, our goal is speed even at the cost of extra memory. Some compilers provide options to specify our preference with respect to speed and memory:
Architecture | Compiler | Options | Description |
---|---|---|---|
Zilog Z80 | ZSDCC (Z88DK) | --opt-code-size |
Optimize memory |
Zilog Z80 | SCCZ80 (Z88DK) | --opt-code-speed |
Optimize speed |
MOS 6502 | CC65 | -Oi , -Os |
Optimize speed |
Known problems
- CC65:
-Cl
prevents the use of recursive functions - ZSDCC: has bugs that do not depend on the options and has specific bugs that are triggered by
-SO3
when no--max-alloc-node20000
option is provided.
In order to avoid these problems and reduce compilation time we recommend the use of just SCCZ80 for Z80 during development and debugging and resort to ZSDCC only for the final optimization and tests.
Our compilers will not always be able to detect and remove unused and useless code from the binary. Therefore we must avoid to include it in the first place.
We can do even better with some of the compilers by instructing them to not include some standard libraries or even portions of the libraries that we are sure not to use.
Avoiding the standard library can save some generated code. This has a significant impact when using ACK to produce CP/M-80
binaries. When compiling with ACK, whenever possible, we shoud try to replace functions such as printf
and scanf
with just getchar()
and putchar(c)
.
Z88DK provides several pragma commands to instruct the compiler and linker to not include some useless code.
For example:
#pragma printf = "%c %u"
includes only %c
and %u
converts and excludes all the others;
#pragma-define:CRT_INITIALIZE_BSS=0
does not generate code to inizialize the BSS memory area;
#pragma output CRT_ON_EXIT = 0x10001
the program does not when it exists (e.g., to BASIC);
#pragma output CLIB_MALLOC_HEAP_SIZE = 0
no heap
memory (i.e., no malloc
are possible);
#pragma output CLIB_STDIO_HEAP_SIZE = 0
removes stdio heap
(no file can be opened).
More examples are in: https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/cross_lib/cfg/z88dk
Most of the 8-bit systems (almost all computers), have plenty of routines in ROM. It is important to know about them and use them when they are need. In order to use them explicitly in our code, we may have to write some in line Assembly in our C code (or use separate Assembly routines). How to do this is different in every dev-kit and we refer to their respective manuals for more details.
This is very important for systems that are not natively supported by the compilers and for which all input/output routines have to be written.
Example (taken from https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/cross_lib/display/display_macros.c)
In order to display characters on the screen for the Thomson Mo5, Mo6 and Olivetti Prodest PC128 (which are not supported by CMOC), we can use the ROM routine by using a little in line Assembly code:
void PUTCH(unsigned char ch)
{
asm
{
ldb ch
swi
.byte 2
}
}
Luckily we use ROM routines implicitly by just using the libraries that are provided by the dev-kit. This saves us a lot of RAM memory because the code is already stored in ROM.
Nevertheless we must take into consideration that when we use a ROM routine may add some constraints in our code because we cannot modify them and they may use some auxiliary RAM locations (e.g., buffers) that we won’t be allowed to use.
When information on the ROM routines of a lesser known system is scant or we do not know the entry points (start addresses) of such routines we may resort BASCK (https://github.com/z88dk/z88dk/blob/master/support/basck/basck.c, developed by Stefano Bodrato), which is distributed as part of Z88DK. BASCK takes as input ROM files of Z80 and 6502-based systems and searches for known patterns of ROM routines. Once the routines and the their entry points are found, using them is not always simple but in some cases it is trivial.
Example
- Let us say we are looking for the PRINT routine in the ROM, then we run BASCK and we filter its output with the “PRS” string (e.g., with the Unix “grep” command )
> basck -map romfile.rom |grep PRS
PRS = $AAAA ; Create string entry and print it
This gives us the address of the PRINT routine.
- Now we can write C or Assembly code to use it:
extern void rom_prs(char * str) __z88dk_fastcall @0xAAAA;
main() {
rom_prs ("Hello WORLD !");
while (1){};
}
As seen in the previous section, even if we could in C we should not forget the specific hardware. In some cases the hardware can help us write more compact and faster code. In particular the graphics chip can help us save lots of RAM.
Example (TI VDP chip such as the TMS9918A used in the MSX, Spectravideo, Memotech MTX, Sord M5, etc.)
In some cases we could exploit a special text mode (Mode 1) where the color of a character is implicitfor each group of characters. In such case, a single byte is sufficient to define a character and its color. The 8-bit Atari computer have a similar text mode (graphics mode 1+16, Antic mode 6).
Example (VIC chip used in the Commodore Vic 20)
The Commodore Vic 20 is a special case because of its hardware limits (RAM totale: 5k, RAM disponibile per il codice: 3,5K) but also for some of the tricks it provides to reduce the impact of these limits.
One surprising feature of this chip, is its ability to map just a subset of its characters to RAM while mapping the rest to ROM. If we only need n (<=64) redefined characters we can map onto RAM just 64 of them with POKE(0x9005,0xFF);
. In such a way we may also use less than 64.
Moreover, in some cases we can use the separate video ram to which some graphics chips have access (e.g., the TI VDP, MOS VDC of the C128, etc.) for different purposes than doing graphics, such as storing data. This is possible but it has a very high computational cost because the CPU has an indirect access to this separate RAM.