libnds v2 is a milestone for our Nintendo DS homebrew support, with major low level refactoring. At its core there is a brand new environment called calico, which provides operating system-like facilities.
Quick porting checklist:
- Update makefiles to the latest provided by the examples
- Change
while (1)
main loop towhile (pmMainLoop())
- Revise timer usage: avoid timer 2/3; prefer using
TickTask
instead - Revise LCD interrupts: if using HBlank/VCount interrupts, you need to explicitly enable them (see
lcdSetHBlankIrq
,lcdSetVCountCompare
) - Revise usage of interrupt routines (
irqSet
), generally prefer using worker threads instead - Remove custom ARM7 cores if possible, otherwise: port FIFO code to PXI
If you have issues updating your homebrew for these library updates then please ask us for help. Please don't just stick with old releases or hostile and incompatible forks and recommend other people do the same. We've done what we can to hopefully ensure most things require minor changes to make use of this updated set of libraries.
Threading
Calico introduces support for threading inspired by Horizon, Nintendo's OS for 3DS & Switch. Specifically, it
implements a priority based preemptive algorithm without timeslicing typical of real-time operating systems (sometimes known as SCHED_FIFO
).
The system ensures that the current thread is always the highest priority runnable thread using preemption. This however does mean that if higher priority threads never perform blocking operations, background threads can be starved of system resources. Likewise, threads of the same priority need to be carefully designed to avoid starving each other.
Standard threading APIs (POSIX threads, C threads, C++ std::thread) can now be used on Nintendo DS. Thread-local storage (TLS) is also available but currently only on ARM9.
Synchronization between threads can be carried out using typical primitives such as mutexes (with priority inheritance), condition variables, mailboxes, and others. These primitives are all implemented using a list of blocked threads, and each thread can be tagged with additional information that can be used to select a subset to awake.
A practical example of how to use the new threading system in a DS homebrew game: The main thread can synchronize to VBlank, and handle graphics updates and input. Low priority worker threads can be created in order to load or stream assets from the filesystem, perform computationally intensive tasks, video/audio decoding, etc.
In a nutshell, the addition of a threading system allows for more complex use cases that previously required fragile workarounds, such as nested interrupt routines or even usage of the ARM7 processor for parallel processing.
Timers
Another important new feature is the tick system. Calico now takes ownership of timers 2 and 3 to provide both a 64-bit monotonically increasing tick counter and a mechanism for scheduling timed events (both periodic and one-shot). Multiple tick tasks can be registered at the same time (all sharing the same resources), and they will be scheduled by calico to run at the appropriate times. Timers 0 and 1 can still be used for especially timing-critical code that requires the lowest response times possible.
Interrupts
Waiting for interrupts is closely integrated into calico's threading system. Any number of threads can wait for interrupts using threadIrqWait
, which has the same parameters and semantics as the classic BIOS interrupt waiting routines. Calico will automatically wake up all threads waiting for a particular interrupt when it occurs. When all threads are blocked Calico will put the CPU in low power mode. The classic swiWaitForVBlank
and swiIntrWait
functions have been made aliases of threadIrqWait
, allowing existing code to work seamlessly with the new system.
Interrupt service routines (set by irqSet
) now exclusively run in IRQ mode and do not nest. This includes things such as timer callbacks, tick tasks and PXI (FIFO) callbacks. Less critical/heavier processing is expected to be done in a companion thread, which can be woken up by the ISR. This majorly improves system reliability, leaving ISRs to respond more quickly to timing-critical events.
irqEnable
/irqDisable
no longer contain special treatment for certain IRQs. The affected IRQs (VBlank, HBlank, PXI sync) must be manually enabled in their respective registers beforehand (i.e. REG_DISPSTAT
, REG_PXI_CNT
).
Environment
Power management and application state handling has been consolidated into a new pm
API resembling the 3DS apt
, Wii U ProcUI
and Switch applet
APIs. This means user code must call pmMainLoop()
in the main loop. This new function explicitly handles events such as sleep mode (lid closing) or exiting (DSi power button taps) allowing your application to gracefully respond to them, and mitigating issues such as 3D engine lockups or the loss of unsaved data.
Homebrew now exclusively runs in low exception vector mode. This means the ARM9 BIOS is bypassed during interrupt and exception handling, instead running calico's own enhanced handlers. It is still possible to invoke BIOS routines using svc
(formerly swi
) instructions: calico by default forwards the SVC handler to the ARM9 BIOS. Exception handling on the ARM9 is now capable of distinguishing prefetch and data aborts. Guru Meditations now contain some extra useful information such as the current thread (and more).
Inter-processor communication (PXI) has been completely redesigned. The protocol now offers 31 separate PXI channels, each of which support two message formats: simple 26-bit immediates, and extended messages with a 16-bit immediate plus
a data payload (up to 32 words). In addition, it is possible to reply to messages with a 26-bit immediate (these replies are handled separately to regular messages). Incoming PXI messages can either be forwarded to a calico mailbox, or handled by a custom callback (called from calico's PXI interrupt handler).
The classic libnds FIFO APIs are no longer available, so existing code that makes use of custom FIFO channels needs to be refactored to use PXI instead. An example of how to use calico's PXI API can be found under pxi
in nds-examples
. Given the improvements in threading and the ARM7 driver stack, it may be also worth evaluating the need to write custom ARM7 code.
Access to the DS gamecard hardware and the GBA slot is now brokered by new ntrcard
and gbacart
APIs. These routines can be used on either ARM7 or ARM9, and also check whether the hardware is in use by the other CPU, returning failure if it is. Direct manipulation of REG_EXMEMCNT
(including via sysSetCartOwner
, sysSetCardOwner
or sysSetBusOwners
) is now strictly deprecated and should be avoided whenever possible.
Existing Slot-2 Expansion Pak APIs (rumble
, paddle
, piano
, guitarGrip
) are unchanged, but it is now necessary to explicitly enable the GBA slot (gbacartOpen
) before using them.
Changes and optimizations have been made to low level system support such as crt0 (initialization) and linker scripts.
MPU settings have been made much more strict in order to improve early detection of programming errors:
- ITCM has been moved to
0x1ff8000
, and DTCM has been moved to0x2ff0000
. - There no longer exist separate cached/uncached views of main RAM:
- It is now always cacheable, and
memCached
/memUncached
have been removed. - The upper area of main RAM shared between the ARM7 and ARM9 (at
0x2ff4000
onwards) is now a separate uncacheable region.
- It is now always cacheable, and
- All memory except for ITCM, BIOS and main RAM is now marked as non-executable.
- GBA slot, shared IWRAM and DSi main RAM mirror regions are no longer mapped by default. Corresponding accessor functions (e.g.
gbacartOpen
) need to be used in order to obtain access.
Filesystem
All available block devices (DLDI, DSi SD/eMMC) have been consolidated into a common orthogonal API (blkdev
), implemented on the ARM7 and also accessible from the ARM9. DLDI drivers now run on the ARM7, from its internal fast work RAM and contained within a background priority thread. The DSi SD/eMMC driver has been written from scratch, taking full advantage of the new threading and interrupt handling functionality.
While block device drivers run on the ARM7, filesystem drivers still run on the ARM9. libfat and libfilesystem have been superseded by the new libdvm (standing for disk/volume management). libdvm is capable of handling multi-partition disks, and presents a pluggable filesystem driver interface based around devkitPro devoptabs. libdvm ships with bundled FAT/NitroFS devoptabs and libfat/libfilesystem compatible APIs, requiring zero changes to existing code.
FAT is now handled by a customised version of the well known FatFs library.
libdvm also implements per-disk cache, based on a tweaked MRU algorithm that favors partial accesses (e.g. cluster chain tables, directory metadata), while simultaneously providing a fast path that bypasses the cache for large reads/writes.
Experimental testing shows that the new filesystem stack offers up to 4x performance increases in DSi SD access (~7.5 MiB/s), and up to 2x performance increases in DLDI access (~5.0 MiB/s).
Wireless networking
The networking stack has undergone a major overhaul too. New wireless management (wlmgr
) and network packet (netbuf
) APIs have been added, which allow ARM9 code to interact with the wireless drivers running on the ARM7, as well as sending/receiving raw network packets. The word "drivers" is plural because, in addition to a rewritten DS-mode Mitsumi Wi-Fi driver, a DSi-mode Atheros Wi-Fi driver with WPA2 AES encryption support has finally been added to the stack.
The most appropriate driver is selected depending on the environment. Support for active access point scanning (Probe Requests) has been added, which finally allows hidden SSIDs to work with homebrew applications.
A refactored dswifi v2 builds upon the new wireless infrastructure to provide the high-level WFC and TCP/IP socket layer. New wfc
APIs have been added to load Wi-Fi access point settings, and launch the connection state machine. sgIP has received some minor refactoring, and now runs in its own thread. Future versions of dswifi will continue to improve on sgIP's feature set, including integration with devkitPro devoptabs.
Experimental testing with ftpd shows transfer speeds of around 300 KiB/s in DSi mode, and around 100 KiB/s in DS mode.
Sound
Calico also provides a revamped basic sound API that exposes all features of the DS sound hardware mixer, including sound capture and special output modes for reverb. It is no longer necessary to write custom ARM7 code to access these features.
Maxmod has been refactored to work alongside calico and its sound API. Everything beside the playback/mixing core has been rewritten in C, for better maintainability. Automatic audio streaming is now handled in a background thread instead of within an interrupt routine.
A new microphone driver is provided by calico, which supports DMA on DSi. DMA frees up CPU load on the ARM7, although at the expense of only supporting a handful of fixed sampling frequencies. Attempting to use microphone DMA in DS mode will graciously fall back to CPU timer driven sampling, so it is advised to always use DMA for maximum efficiency. CPU sampling is also improved thanks to the threading system: the recorded audio no longer contains distortions during high ARM7 CPU load, or when touching the screen.