-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathmmix-doc.w
3369 lines (3066 loc) · 150 KB
/
mmix-doc.w
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
\def\title{MMIX}
\input epsf % input macros for dvips to include METAPOST illustrations
\def\MMIX{\.{MMIX}}
\def\NNIX{\hbox{\mc NNIX}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
\def\beginword{\vcenter\bgroup\let\\=\wordrule\halign\bgroup&\hfil##\hfil\cr}
\def\endword{\noalign{\vskip\baselineskip}\egroup\egroup
\advance\belowdisplayskip-\baselineskip}
\def\wordrule{\vrule height 9.5pt depth 4.5pt width .4pt}
\newdimen\bitwd \bitwd=6.6pt
\def\field#1#2{\vrule depth 3pt width 0pt \hbox to#1\bitwd{\hss$#2$\hss}}
\def\XF{\\{XF}}\def\XM{\\{XM}}\def\XD{\\{XD}} % these not in \tt
\def\PC{\\{PC}}
\def\Jx{\.{J}} % conversely, I type J_ to get J in \tt
\def\s{{\rm s}}
\def\rX{{\rm\$X}} \def\rY{{\rm\$Y}} \def\rZ{{\rm\$Z}}
\def\mm{{\rm M}} \def\xx{{\rm X}} \def\yy{{\rm Y}} \def\zz{{\rm Z}}
%\def\ll{{\rm L}} \def\gg{{\rm G}}
\def\ll{L} \def\gg{G}
\def\?{\mkern-1mu}
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
@* Introduction to MMIX.
Thirty-eight years have passed since the \.{MIX} computer was designed, and
computer architecture has been converging during those years
towards a rather different
style of machine. Therefore it is time to replace \.{MIX} with a new
computer that contains even less saturated fat than its predecessor.
Exercise 1.3.1--25 in the third edition of
{\sl Fundamental Algorithms\/} speaks of an extended
\.{MIX} called MixMaster, which is upward compatible with the old version.
But MixMaster itself is hopelessly obsolete; although it allows for
several gigabytes of memory, we can't even use it with {\mc ASCII} code to
get lowercase letters. And ouch, the standard subroutine calling convention
of \.{MIX} is irrevocably based on self-modifying code! Decimal arithmetic
and self-modifying code were popular in 1962, but they sure have disappeared
quickly as machines have gotten bigger and faster. A completely new
design is called for, based on the principles of RISC architecture as
expounded in {\sl Computer Architecture\/} by Hennessy and Patterson
(Morgan Kaufmann, 1996). % first ed was "Morgan Kaufman"! but now "nn" is legit
@^Hennessy, John LeRoy@>
@^Patterson, David Andrew@>
So here is \MMIX, a computer that will totally replace \.{MIX}
in the ``ultimate'' editions of {\sl The Art of Computer Programming},
Volumes 1--3, and in the first editions of the remaining volumes.
I~must confess that
I~can hardly wait to own a computer like this.
How do you pronounce \MMIX? I've been saying ``em-mix'' to myself,
because the first `\.M' represents a new millennium. Therefore I~use
the article ``an'' instead of~``a'' before the name \MMIX\
in English phrases like ``an \MMIX\ simulator.''
Incidentally, the {\sl Dictionary of American Regional English\/ \bf3} (1996)
lists ``mommix'' as a common dialect word used both as a noun and a verb;
to mommix something means to botch it, to bollix it. Only time will
tell whether I~have mommixed the definition of \MMIX.
@ The original \.{MIX} computer could be operated without an operating
system; you could bootstrap it with punched cards or paper tape and do
everything yourself. But nowadays such power is no longer in the hands of
ordinary users. The \MMIX\ hardware, like all other computing machines
made today, relies on an operating system to get jobs
started in their own address spaces and to provide I/O capabilities.
Whenever anybody has asked if I will be writing about operating systems,
my reply has always been ``Nix.'' Therefore the name of\/ \MMIX's operating
system, \NNIX, will come as no surprise.
@:NNIX}{\NNIX\ operating system@>
@^operating system@>
From time to time I will necessarily have to refer to things that \NNIX\ does
for its users, but I am unable to build \NNIX\ myself. Life is
too short. It would be wonderful if some expert in operating system design
became inspired to write a book that explains exactly how to construct a nice,
clean \NNIX\ kernel for an \MMIX\ chip.
@ I am deeply grateful to the many people who have helped me shape the behavior
of\/ \MMIX. In particular, John Hennessy and (especially) Dick Sites
have made significant contributions.
@^Hennessy, John LeRoy@>
@^Sites, Richard Lee@>
@ A programmer's introduction to \MMIX\ appears in ``Volume~1, Fascicle~1,''
@^Fascicle 1@>
a booklet containing tutorial material that will ultimately appear in the
fourth edition of {\sl The Art of Computer Programming}.
The description in the following sections is rather different, because
we are concerned about a complete implementation, including all of the
features used by the operating system and invisible to normal programs.
Here it is important to emphasize exceptional cases that were glossed over
in the tutorial, and~to consider
nitpicky details about things that might go wrong.
@* MMIX basics.
\MMIX\ is a 64-bit RISC machine with at least 256 general-purpose registers
and a 64-bit address space.
Every instruction is four bytes long and has the form
$$\vcenter{\offinterlineskip
\def\\#1&{\omit&}
\hrule
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
height 9pt depth4pt&OP&&X&&Y&&Z&\cr}
\hrule}\,.$$
The 256 possible OP codes fall into a dozen or so easily remembered
@^OP codes@>
categories; an instruction usually means, ``Set register X to the
result of\/ Y~OP~Z\null.'' For example,
$$\vcenter{\offinterlineskip
\def\\#1&{\omit&}
\hrule
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
height 9pt depth4pt&32&&1&&2&&3&\cr}
\hrule}$$
sets register~1 to the sum of registers 2 and 3.
A few instructions combine the Y and Z bytes into
a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field.
But the three bytes X, Y, Z usually have three-pronged significance
independent of each other.
Instructions are usually represented in a symbolic form corresponding
to the \MMIX\ assembly language, in which each operation code has a mnemonic
name. For example, operation~32 is \.{ADD}, and the instruction above
might be written `\.{ADD} \.{\$1,\$2,\$3}'; a dollar sign `\.\$' symbolizes
a register number. In general, the instruction
\.{ADD}~\.{\$X,\$Y,\$Z} is the operation of setting $\rX=\rY+\rZ$.
An assembly language instruction with two commas has three operand
fields X, Y,~Z; an instruction with one comma has two operand fields
X,~YZ; an instruction with no comma has one operand field,~XYZ;
an instruction with no operands has $\xx=\yy=\zz=0$.
\def\0{\$Z\char'174Z}
Most instructions have two forms, one in which the Z field stands for
register \$Z, and one in which Z is an unsigned ``immediate'' constant.
@^immediate operands@>
Thus, for example, the command `\.{ADD} \.{\$X,\$Y,\$Z}' has a counterpart
`\.{ADD} \.{\$X,\$Y,Z}', which sets $\rX=\rY+\zz$. Immediate constants
are always nonnegative.
In the descriptions
below we will introduce such pairs of instructions
by writing just `\.{ADD}~\.{\$X,\$Y,\0}' instead of naming both
cases explicitly.
The operation code for \.{ADD}~\.{\$X,\$Y,\$Z} is 32, but the operation
code for \.{ADD}~\.{\$X,\$Y,Z} is~33. The \MMIX\ assembler chooses the correct
code by noting whether the third argument is a register number or~not.
Register numbers and constants can be given symbolic names; for example, the
assembly language instruction `\.x~\.{IS}~\.{\$1}' makes \.x an
abbreviation for register number~1. Similarly, `\.{FIVE}~\.{IS}~\.5'
makes \.{FIVE} an abbreviation for the constant~5.
After these abbreviations have been specified, the instruction
\.{ADD}~\.{x,x,FIVE} increases \$1 by~5, using opcode~33, while
the instruction \.{ADD}~\.{x,x,x} doubles \$1 using opcode~32.
Symbolic names that stand for register numbers
conventionally begin with a lowercase letter, while names that stand
for constants conventionally begin with an uppercase letter.
This convention is not actually enforced by the assembler,
but it tends to reduce a programmer's confusion.
@ A {\it nybble\/} is a 4-bit quantity, often used to denote a decimal
or hexadecimal digit.
A {\it byte\/} is an 8-bit quantity, often used to denote an alphanumeric
character in {\mc ASCII} code. The Unicode standard extends {\mc ASCII} to
@^Unicode@>
@^ASCII@>
essentially all the world's languages by using 16-bit-wide characters called
{\it wydes\/}. (Weight watchers know that two nybbles make one byte,
but two bytes make one wyde.)
In the discussion below we use the term
{\it tetrabyte\/} or ``tetra'' for a 4-byte quantity, and the similar term
@^nybble@>
@^byte@>
@^wyde@>
@^tetrabyte@>
@^octabyte@>
{\it octabyte\/} or ``octa'' for an 8-byte quantity. Thus, a tetra is
two wydes, an octa is two tetras; an octabyte has 64~bits. Each \MMIX\
register can be thought of as containing one octabyte, or two tetras,
or four wydes, or eight bytes, or sixteen nybbles.
When bytes, wydes, tetras, and octas represent numbers they are said to be
either {\it signed\/} or {\it unsigned}. An unsigned byte is a number between
0~and $2^8-1=255$ inclusive; an unsigned wyde lies, similarly, between
0~and $2^{16}-1=65535$; an unsigned tetra lies between
0~and $2^{32}-1=4{,}294{,}967{,}295$; an unsigned octa lies between
0~and $2^{64}-1=18{,}446{,}744{,}073{,}709{,}551{,}615$.
Their signed counterparts use the
conventions of two's complement notation, by subtracting respectively $2^8$,
$2^{16}$, $2^{32}$, or~$2^{64}$ times the most significant bit. Thus,
the unsigned bytes 128 through 255 are regarded as the numbers $-128$
through~$-1$ when they are evaluated as signed bytes; a signed byte therefore
lies between $-128$ and $+127$, inclusive. A signed wyde is a number
between $-32768$ and $+32767$; a signed tetra lies between
$-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$; a signed octa lies between
$-9{,}223{,}372{,}036{,}854{,}775{,}808$ and $+9{,}223{,}372{,}036{,}854{,}775{,}807$.
The virtual memory of\/ \MMIX\ is an array M of $2^{64}$ bytes. If $k$ is any
unsigned octabyte, M[$k$]~is a 1-byte quantity. \MMIX\ machines do not
actually have such vast memories, but programmers can act as if $2^{64}$ bytes
are indeed present, because \MMIX\ provides address translation mechanisms by
which an operating system can maintain this illusion.
We use the notation $\mm_{2^t}[k]$ to stand for a number consisting of
$2^t$~consecutive bytes starting at location~$k\land\nobreak(2^{64}-2^t)$.
(The notation $k\land(2^{64}-2^t)$ means that the least
significant $t$ bits of~$k$ are set to~0, and only the least 64~bits
@^bit stuffing@>
of the resulting address are retained. Similarly, the notation
$k\lor(2^t-1)$ means that the least significant $t$ bits of~$k$ are set to~1.)
All accesses to $2^t$-byte quantities by \MMIX\ are {\it aligned}, in the sense
that the first byte is a multiple of~$2^t$.
Addressing is always ``big-endian.'' In other words, the
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
most significant (leftmost) byte of $\mm_{2^t}[k]$ is
$\mm_1[k\land\nobreak(2^{64}-2^t)]$
and the least significant (rightmost) byte is $\mm_1[k\lor(2^t-1)]$.
We use the notation $\s(\mm_{2^t}[k])$ when we want to regard
this $2^t$-byte number as a {\it signed\/} integer.
Formally speaking, if $l=2^t$,
@^signed integers@>
$$\s(\mm_l[k])=\bigl(\mm_1[k\land(-l)]\,\mm_1[k\land(-l)+1]\,\ldots\,
\mm_1[k\lor(l-1)]\bigr)_{256}
-2^{8l}[\mm_1[k\land(-l)]\!\ge\!128].$$
@* Loading and storing.
Several instructions can be used to get information from memory into
registers. For example, the ``load tetra unsigned'' instruction
\.{LDTU} \.{\$1,\$4,\$5}
puts the four bytes $\mm_4[\$4+\$5]$ into register~1 as an unsigned
integer;
the most significant four bytes of register~1 are set to zero.
The similar instruction \.{LDT} \.{\$1,\$4,\$5}, ``load tetra,'' sets
\$1 to the {\it signed\/} integer $\s(\mm_4[\$4+\$5])$.
(Instructions generally treat numbers as
@^signed integers@>
signed unless the operation code specifically calls them
unsigned.) In the signed case, the most significant four bytes of the
register will be copies of the most significant bit of the tetrabyte
loaded; thus they will be all~0s or all~1s, depending on whether the
number is $\ge0$ or $<0$.
\def\bull{\smallbreak\textindent{$\bullet$}}
\def\bul{\par\textindent{$\bullet$}}
\def\<#1 #2 {\.{#1}~\.{#2} }
\def\>{\hfill\break}
\bull\<LDB \$X,\$Y,\0 `load byte'.\>
@.LDB@>
Byte $\s(\mm[\rY+\rZ])$ or $\s(\mm[\rY+\zz])$ is loaded into register~X as a
signed number between $-128$ and $+127$, inclusive.
\bull\<LDBU \$X,\$Y,\0 `load byte unsigned'.@>
@.LDBU@>
Byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$ is loaded into register~X as an
unsigned number between $0$ and $255$, inclusive.
\bull\<LDW \$X,\$Y,\0 `load wyde'.\>
@.LDW@>
Bytes $\s(\mm_2[\rY+\rZ])$ or $\s(\mm_2[\rY+\zz])$
are loaded into register~X as a signed number between $-32768$ and $+32767$,
inclusive. As mentioned above, our notation $\mm_2[k]$ implies that
the least significant bit of the address $\rY+\rZ$ or $\rY+\zz$ is
ignored and assumed to be~0.
@^bit stuffing@>
\bull\<LDWU \$X,\$Y,\0 `load wyde unsigned'.@>
@.LDWU@>
Bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$ are loaded
into register~X as an unsigned number between $0$ and $65535$, inclusive.
\bull\<LDT \$X,\$Y,\0 `load tetra'.\>
@.LDT@>
Bytes $\s(\mm_4[\rY+\rZ])$ or $\s(\mm_4[\rY+\zz])$
are loaded into register~X as a signed number between $-2{,}147{,}483{,}648$ and
$+2{,}147{,}483{,}647$, inclusive.
As mentioned above, our notation $\mm_4[k]$ implies that
the two least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
ignored and assumed to be~0.
@^bit stuffing@>
\bull\<LDTU \$X,\$Y,\0 `load tetra unsigned'.\>
@.LDTU@>
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$
are loaded into register~X as an unsigned number between 0 and
4{,}294{,}967{,}296, inclusive.
\bull\<LDO \$X,\$Y,\0 `load octa'.\>
@.LDO@>
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
register~X\null.
As mentioned above, our notation $\mm_8[k]$ implies that
the three least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
ignored and assumed to be~0.
@^bit stuffing@>
\bull\<LDOU \$X,\$Y,\0 `load octa unsigned'.\>
@.LDOU@>
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
register~X\null. There is in fact no difference between the behavior of
\.{LDOU} and~\.{LDO}, since
an octabyte can be regarded as either signed or unsigned. \.{LDOU} is included
in \MMIX\ just for completeness and consistency, in spite of the fact that
a foolish consistency is the hobgoblin of little minds.
@^Emerson, Ralph Waldo@>
(Niklaus Wirth made a strong plea for such consistency in his early critique
of System/360; see {\sl JACM\/ \bf15} (1967), 37--74.)
@^Wirth, Niklaus Emil@>
@^System/360@>
\bull\<LDHT \$X,\$Y,\0 `load high tetra'.\>
@.LDHT@>
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ are loaded into the most
significant half of register~X, and the least significant half is
cleared to zero. (One use of ``high tetra arithmetic'' is to detect
overflow easily when tetrabytes are added or subtracted.)
\bull\<LDA \$X,\$Y,\0 `load address'.\>
The address $\rY+\rZ$ or $\rY+\zz$ is loaded into register~X. This
instruction is simply another name for the \.{ADDU} instruction
discussed below; it can
be used when the programmer is thinking of memory addresses
instead of numbers.
The \MMIX\ assembler converts \.{LDA} into the same OP-code as \.{ADDU}.
@.LDA@>
@.ADDU@>
@ Another family of instructions goes the other way, storing registers into
memory. For example, the ``store octa immediate'' command
\<STO \$3,\$2,17 puts the current contents of register~3
into $\mm_8[\$2+17]$.
\bull\<STB \$X,\$Y,\0 `store byte'.\>
@.STB@>
The least significant byte of register~X is stored into
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. An integer overflow exception occurs if
@^overflow@>
\$X is not between $-128$ and $+127$. (We will discuss overflow and other
kinds of exceptions later.)
\bull\<STBU \$X,\$Y,\0 `store byte unsigned'.@>\>
@.STBU@>
The least significant byte of register~X is stored into
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. \.{STBU} instructions are the same
as \.{STB} instructions, except that no test for overflow is made.
\bull\<STW \$X,\$Y,\0 `store wyde'.\>
@.STW@>
The two least significant bytes of register~X are stored into
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
An integer overflow exception occurs if
\$X is not between $-32768$ and $+32767$.
\bull\<STWU \$X,\$Y,\0 `store wyde unsigned'.@>\>
@.STWU@>
The two least significant bytes of register~X are stored into
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
\.{STWU} instructions are the same
as \.{STW} instructions, except that no test for overflow is made.
\bull\<STT \$X,\$Y,\0 `store tetra'.\>
@.STT@>
The four least significant bytes of register~X are stored into
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
An integer overflow exception occurs if
\$X is not between $-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$.
\bull\<STTU \$X,\$Y,\0 `store tetra unsigned'.\>
@.STTU@>
The four least significant bytes of register~X are stored into
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
\.{STTU} instructions are the same
as \.{STT} instructions, except that no test for overflow is made.
\bull\<STO \$X,\$Y,\0 `store octa'.\>
@.STO@>
Register X is stored into bytes $\mm_8[\rY+\rZ]$ or
$\mm_8[\rY+\zz]$.
\bull\<STOU \$X,\$Y,\0 `store octa unsigned'.\>
@.STOU@>
Identical to \.{STO} \.{\$X,\$Y,\0}.
\bull\<STCO X,\$Y,\0 `store constant octabyte'.\>
@.STCO@>
An octabyte whose value is the unsigned byte X is stored into
$\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$.
\bull\<STHT \$X,\$Y,\0 `store high tetra'.\>
The most significant four bytes of register~X are stored into
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
@.STHT@>
@* Adding and subtracting.
Once numbers are in registers, we can compute with them. Let's consider
addition and subtraction first.
\bull\<ADD \$X,\$Y,\0 `add'.\>
@.ADD@>
The sum $\rY+\rZ$ or $\rY+\zz$ is placed into register~X using signed,
two's complement arithmetic.
An integer overflow exception occurs if the sum is $\ge2^{63}$ or $<-2^{63}$.
(We will discuss overflow and other kinds of exceptions later.)
@^overflow@>
\bull\<ADDU \$X,\$Y,\0 `add unsigned'.\>
@.ADDU@>
The sum $(\rY+\rZ)\bmod2^{64}$ or $(\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
These instructions are the same
as \.{ADD}~\.{\$X,\$Y,\0} commands
except that no test for overflow is made.
(Overflow could be detected if desired by using the command
\<CMPU ovflo,\$X,\$Y after addition, where \.{CMPU} means
@.CMPU@>
``compare unsigned''; see below.)
\bull\<2ADDU \$X,\$Y,\0 `times 2 and add unsigned'.\>
@.2ADDU@>
The sum $(2\rY+\rZ)\bmod2^{64}$ or $(2\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
\bull\<4ADDU \$X,\$Y,\0 `times 4 and add unsigned'.\>
@.4ADDU@>
The sum $(4\rY+\rZ)\bmod2^{64}$ or $(4\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
\bull\<8ADDU \$X,\$Y,\0 `times 8 and add unsigned'.\>
@.8ADDU@>
The sum $(8\rY+\rZ)\bmod2^{64}$ or $(8\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
\bull\<16ADDU \$X,\$Y,\0 `times 16 and add unsigned'.\>
@.16ADDU@>
The sum $(16\rY+\rZ)\bmod2^{64}$ or $(16\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
\bull\<SUB \$X,\$Y,\0 `subtract'.\>
@.SUB@>
The difference $\rY-\rZ$ or $\rY-\zz$ is placed into register~X using
signed, two's complement arithmetic.
An integer overflow exception occurs if the difference is $\ge2^{63}$ or
$<-2^{63}$.
\bull\<SUBU \$X,\$Y,\0 `subtract unsigned'.\>
@.SUBU@>
The difference $(\rY-\rZ)\bmod2^{64}$ or $(\rY-\zz)\bmod2^{64}$
is placed into register~X\null.
These two instructions are the same
as \.{SUB}~\.{\$X,\$Y,\0} except that no test for overflow is made.
\bull\<NEG \$X,Y,\0 `negate'.\>
@.NEG@>
The value $\yy-\rZ$ or $\yy-\zz$ is placed into register~X using
signed, two's complement arithmetic.
An integer overflow exception occurs if the result is greater
than~$2^{63}-\nobreak1$.
(Notice that in this case \MMIX\ works with the ``immediate'' constant~Y,
not register~Y\null. \.{NEG} commands are analogous to the immediate variants
of other commands, because they save us from having to put one-byte
constants into a register. When $\yy=0$, overflow occurs if and
only if $\rZ=-2^{63}$. The instruction \<NEG \$X,1,2 has exactly the
same effect as \.{NEG}~\.{\$X,0,1}.)
\bull\<NEGU \$X,Y,\0 `negate unsigned'.\>
@.NEGU@>
The value $(\yy-\rZ)\bmod2^{64}$ or $(\yy-\zz)\bmod2^{64}$
is placed into register~X\null.
\.{NEGU} instructions are the same
as \.{NEG} instructions, except that no test for overflow is made.
@* Bit fiddling.
Before looking at multiplication and division, which take longer than
addition and subtraction, let's look at some of the other things that
\MMIX\ can do fast. There are eighteen instructions for bitwise
logical operations on unsigned numbers.
\bull\<AND \$X,\$Y,\0 `bitwise and'.\>
@.AND@>
Each bit of register Y is logically anded with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are both~1;
in symbols, $\rX=\rY\land\rZ$ or $\rX=\rY\land\zz$.
This means in particular that \<AND \$X,\$Y,Z always zeroes out the seven
most significant bytes of register~X, because 0s are prefixed to the
constant byte~Z\null.
\bull\<OR \$X,\$Y,\0 `bitwise or'.\>
@.OR@>
Each bit of register Y is logically ored with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are both~0;
in symbols, $\rX=\rY\lor\rZ$ or $\rX=\rY\lor\zz$.
In the special case $\zz=0$, the immediate variant of
this command simply copies register~Y to
register~X\null. The \MMIX\ assembler allows us to write
`\.{SET}~\.{\$X,\$Y}' as a convenient abbreviation for
`\.{OR}~\.{\$X,\$Y,0}'.
@.SET@>
\bull\<XOR \$X,\$Y,\0 `bitwise exclusive-or'.\>
@.XOR@>
Each bit of register Y is logically xored with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are equal;
in symbols, $\rX=\rY\oplus\rZ$ or $\rX=\rY\oplus\zz$.
\bull\<ANDN \$X,\$Y,\0 `bitwise and-not'.\>
@.ANDN@>
Each bit of register Y is logically anded with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bit of register~Y is~1 and the other corresponding bit is~0;
in symbols, $\rX=\rY\setminus\rZ$ or $\rX=\rY\setminus\zz$.
(This is the {\it logical difference\/} operation; if the operands
are bit strings representing sets, we are computing the elements that
lie in one set but not the other.)
\bull\<ORN \$X,\$Y,\0 `bitwise or-not'.\>
@.ORN@>
Each bit of register Y is logically ored with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bit of register~Y is greater than or equal to the other corresponding bit;
in symbols, $\rX=\rY\lor\overline\rZ$
or $\rX=\rY\lor\overline\zz$.
(This is the complement of $\rZ\setminus\rY$ or $\zz\setminus\rY$.)
\bull\<NAND \$X,\$Y,\0 `bitwise not-and'.\>
@.NAND@>
Each bit of register Y is logically anded with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are both~1;
in symbols, $\rX=\rY\mathbin{\overline\land}\rZ$ or
$\rX=\rY\mathbin{\overline\land}\zz$.
\bull\<NOR \$X,\$Y,\0 `bitwise not-or'.\>
@.NOR@>
Each bit of register Y is logically ored with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are both~0;
in symbols, $\rX=\rY\mathbin{\overline\lor}\rZ$ or
$\rX=\rY\mathbin{\overline\lor}\zz$.
\bull\<NXOR \$X,\$Y,\0 `bitwise not-exclusive-or'.\>
@.NAND@>
Each bit of register Y is logically xored with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are equal;
in symbols, $\rX=\rY\mathbin{\overline\oplus}\rZ$ or
$\rX=\rY\mathbin{\overline\oplus}\zz$.
\bull\<MUX \$X,\$Y,\0 `bitwise multiplex'.\>
@.MUX@>
For each bit position~$j$, the $j$th bit of register~X is set either to
bit~$j$ of register~Y
or to bit~$j$ of the other operand \$Z~or~Z, depending on
whether bit~$j$ of the special {\it mask register\/}~rM is 1 or 0:
@^rM@>
if ${\rm M}_j$ then $\yy_j$ else~$\zz_j$.
In symbols, $\rm\rX=(\rY\land rM)\lor(\rZ\land\overline{rM})$ or
$\rm\rX=(\rY\land rM)\lor(\zz\land\overline{rM})$.
(\MMIX\ has several such special registers, associated with instructions that
need more than two inputs or produce more than one output.)
@ Besides the eighteen bitwise operations, \MMIX\ can also perform unsigned
bytewise and biggerwise operations that are somewhat more exotic.
\bull\<BDIF \$X,\$Y,\0 `byte difference'.\>
@.BDIF@>
For each byte position~$j$, the $j$th byte of register~X is set to byte~$j$ of
register~Y minus byte~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, byte~$j$ of~\$X is set to zero.
\bull\<WDIF \$X,\$Y,\0 `wyde difference'.\>
@.WDIF@>
For each wyde position~$j$, the $j$th wyde of register~X is set to wyde~$j$ of
register~Y minus wyde~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, wyde~$j$ of~\$X is set to zero.
\bull\<TDIF \$X,\$Y,\0 `tetra difference'.\>
@.TDIF@>
For each tetra position~$j$, the $j$th tetra of register~X is set to tetra~$j$ of
register~Y minus tetra~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, tetra~$j$ of~\$X is set to zero.
\bull\<ODIF \$X,\$Y,\0 `octa difference'.\>
@.ODIF@>
Register~X is set to register~Y minus the other operand \$Z~or~Z, unless
\$Z~or~Z exceeds register~Y; in the latter case,
\$X~is set to zero. The operands are treated as unsigned integers.
\smallskip
The \.{BDIF} and \.{WDIF} commands are useful
in applications to graphics or video; \.{TDIF} and \.{ODIF} are also
present for reasons of consistency. For example, if \.a and \.b are
registers containing
8-byte quantities, their bytewise maxima~\.c and
bytewise minima~\.d are computed by
$$\hbox{\tt BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;}$$
similarly, the individual ``pixel differences'' \.e, namely the absolute
values of the differences of corresponding bytes, are computed by
$$\hbox{\tt BDIF x,a,b; BDIF y,b,a; OR e,x,y.}$$
To add individual
bytes of \.a and \.b while clipping all sums to 255 if they don't fit
in a single byte, one can say
$$\hbox{\tt NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;}$$
in other words, complement \.a, apply \.{BDIF}, and complement the result.
The operations can also be used to construct efficient operations on
strings of bytes or wydes.
@^graphics@>
@^pixels@>
@^saturating arithmetic@>
@^nybble@>
Exercise: Implement a ``nybble difference'' instruction that operates
in a similar way on sixteen nybbles at a time.
Answer: {\tt\spaceskip=.5em minus .3em
AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m;
BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx} where register \.m contains
the mask \Hex{0f0f0f0f0f0f0f0f}.
(The \.{ANDN} operation can be regarded as
a ``bit difference'' instruction that operates
in a similar way on 64 bits at a time.)
@ Three more pairs of bit-fiddling instructions round out the collection of exotics.
\bull\<SADD \$X,\$Y,\0 `sideways add'.\>
@.SADD@>
Each bit of register Y is logically anded with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the number of 1~bits in the
result is placed in register~X\null.
In other words, register~X is set to the number of bit positions
in which register~Y has a~1 and the other operand has a~0;
in symbols, $\rX=\nu(\rY\setminus\rZ)$ or $\rX=\nu(\rY\setminus\zz)$.
When the second operand is zero this operation is sometimes called
``population counting,'' because it counts the number of 1s in register~Y\null.
@^population counting@>
@^counting ones@>
\bull\<MOR \$X,\$Y,\0 `multiple or'.\>
@.MOR@>
Suppose the 64 bits of register Y are indexed as
$$y_{00}y_{01}\ldots y_{07}y_{10}y_{11}\ldots y_{17}\ldots
y_{70}y_{71}\ldots y_{77};$$
in other words, $y_{ij}$ is the $j$th bit of the $i$th byte, if we
number the bits and bytes from 0 to 7 in big-endian fashion from left to right.
Let the bits of the other operand, \$Z or~Z, be indexed similarly:
$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots
z_{70}z_{71}\ldots z_{77}.$$
The \.{MOR} operation replaces each bit $x_{ij}$ of register~X by the bit
$$ y_{0j}z_{i0}\lor y_{1j}z_{i1}\lor \cdots \lor y_{7j}z_{i7}.$$
Thus, for example, if register Z contains the constant
\Hex{0102040810204080},
\.{MOR} reverses the order of the bytes in register~Y, converting between
little-endian and big-endian addressing.
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
(The $i$th byte of~\$X depends on the bytes of~\$Y as specified by the
$i$th byte of~\$Z or~Z\null. If we regard
64-bit words as $8\times8$ Boolean matrices, with one byte per column,
this operation computes the
Boolean product $\rX=\rY\,\rZ$ or $\rX=\rY\,\zz$. Alternatively, if we
regard 64-bit words as $8\times8$ matrices with one byte per~{\it row},
\.{MOR} computes the Boolean product $\rX=\rZ\,\rY$ or $\rX=\zz\,\rY$
with operands in the opposite order. The immediate form
\<MOR \$X,\$Y,Z always sets the leading seven bytes of register~X
to zero; the other byte is set to the bitwise or of whatever bytes of
register~Y are specified by the immediate operand~Z\null.)
Exercise: Explain how to compute a mask \.m that is \Hex{ff} in byte
positions where \.a exceeds \.b, \Hex{00} in all other bytes.
Answer: \.{BDIF}~\.{x,a,b;} \.{MOR}~\.{m,minusone,x;}
here \.{minusone} is a register consisting of all 1s. (Moreover,
if we \.{AND} this result
with \Hex{8040201008040201}, then \.{MOR} with $\zz=255$, we get
a one-byte encoding~of~\.m.)
\bull\<MXOR \$X,\$Y,\0 `multiple exclusive-or'.\>
@.MXOR@>
This operation is like the Boolean multiplication just discussed, but
exclusive-or is used to combine the bits. Thus we obtain a matrix
product over the field of two elements instead of a Boolean matrix product.
This operation can be used to construct hash functions, among many other things.
(The hash functions aren't bad, but they are not ``universal'' in the
sense of {\sl Sorting and Searching}, exercise 6.4--72.)
@^matrices of bits@>
@^Boolean multiplication@>
@ Sixteen ``immediate wyde'' instructions are available for the common
case that a 16-bit constant is needed. In this case the Y~and~Z fields
of the instruction are regarded as a single 16-bit unsigned number~YZ\null.
@^immediate operands@>
\bull\<SETH \$X,YZ `set to high wyde';
@.SETH@>
\<SETMH \$X,YZ `set to medium high wyde';
@.SETMH@>
\<SETML \$X,YZ `set to medium low wyde';
@.SETML@>
\<SETL \$X,YZ `set to low wyde'.\>
@.SETL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and placed into register~X\null.
Thus, for example, \.{SETML} inserts
a given value into the second-least-significant wyde of register~X and
sets the other three wydes to zero.
\bull\<INCH \$X,YZ `increase by high wyde';
@.INCH@>
\<INCMH \$X,YZ `increase by medium high wyde';
@.INCMH@>
\<INCML \$X,YZ `increase by medium low wyde';
@.INCML@>
\<INCL \$X,YZ `increase by low wyde'.\>
@.INCL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and added to register~X,
ignoring overflow; the result is placed back into register~X\null.
If YZ is the hexadecimal constant \Hex{8000}, the command \<INCH \$X,YZ
complements the most significant bit of register~X\null. We will see
below that this can be used to negate a floating point number.
@^negation, floating point@>
\bull\<ORH \$X,YZ `bitwise or with high wyde';
@.ORH@>
\<ORMH \$X,YZ `bitwise or with medium high wyde';
@.ORMH@>
\<ORML \$X,YZ `bitwise or with medium low wyde';
@.ORML@>
\<ORL \$X,YZ `bitwise or with low wyde'.\>
@.ORL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and ored with register~X;
the result is placed back into register~X\null.
Notice that any desired 4-wyde constant \.{GH} \.{IJ} \.{KL} \.{MN}
can be inserted into a register with a sequence of four instructions
such as
$$\hbox{\tt SETH \$X,GH; INCMH \$X,IJ; INCML \$X,KL; INCL \$X,MN;}$$
any of these \.{INC} instructions could also be replaced by \.{OR}.
\bull\<ANDNH \$X,YZ `bitwise and-not high wyde';
@.ANDNH@>
\<ANDNMH \$X,YZ `bitwise and-not medium high wyde';\>
@.ANDNMH@>
\<ANDNML \$X,YZ `bitwise and-not medium low wyde';
@.ANDNML@>
\<ANDNL \$X,YZ `bitwise and-not low wyde'.\>
@.ANDNL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, then
complemented and anded with register~X;
the result is placed back into register~X\null.
If YZ is the hexadecimal
constant \Hex{8000}, the command \<ANDNH \$X,YZ forces the most significant
bit of register~X to be~0. This can be used to compute the absolute value of
a floating point number.
@^absolute value, floating point@>
@ \MMIX\ knows several ways to shift a register left or right
by any number of bits.
\bull\<SL \$X,\$Y,\0 `shift left'.\>
@.SL@>
The bits of register~Y are shifted left by \$Z or Z places, and 0s
are shifted in from the right; the result is placed in register~X\null.
Register~Y is treated as a signed number, but
the second operand is treated as an unsigned number.
The effect is the same as multiplication by
$2^{\mkern1mu\rZ}$ or by $2^\zz$; an integer overflow exception occurs if the
result is $\ge2^{63}$ or $<-2^{63}$.
In particular, if the second operand is 64 or~more, register~X will
become entirely zero, and integer overflow will be signaled unless
register~Y was zero.
\bull\<SLU \$X,\$Y,\0 `shift left unsigned'.\>
@.SLU@>
The bits of register~Y are shifted left by \$Z or Z places, and 0s
are shifted in from the right; the result is placed in register~X\null.
Both operands are treated as unsigned numbers. The \.{SLU} instructions
are equivalent to \.{SL}, except that no test for overflow is made.
\bull\<SR \$X,\$Y,\0 `shift right'.\>
@.SR@>
The bits of register~Y are shifted right by \$Z or Z places, and copies
of the leftmost bit (the sign bit) are shifted in from the left; the result is
placed in register~X\null.
Register~Y is treated as a signed number, but
the second operand is treated as an unsigned number.
The effect is the same as division by $2^{\mkern1mu\rZ}$ or by
$2^\zz$ and rounding down. In particular, if the second operand is 64 or~more,
register~X will become zero if \$Y was nonnegative, $-1$ if \$Y was negative.
\bull\<SRU \$X,\$Y,\0 `shift right unsigned'.\>
@.SRU@>
The bits of register~Y are shifted right by \$Z or Z places, and 0s
are shifted in from the left; the result is placed in register~X\null.
Both operands are treated as unsigned numbers.
The effect is the same as unsigned division of a 64-bit number
by $2^{\mkern1mu\rZ}$ or by~$2^\zz$;
if the second operand is 64 or~more, register~X will become entirely~zero.
@* Comparisons.
Arithmetic and logical operations are nice,
but computer programs also need to compare numbers
and to change the course of a calculation depending on what they find.
\MMIX\ has four comparison instructions to facilitate such decision-making.
\bull\<CMP \$X,\$Y,\0 `compare'.\>
@.CMP@>
Register X is set to $-1$ if register Y is less than register Z or less than
the unsigned immediate value~Z, using the conventions of signed
arithmetic; it is set to 0 if register~Y is equal to register Z or equal to
the unsigned immediate value~Z; otherwise it is set to~1.
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
\bull\<CMPU \$X,\$Y,\0 `compare unsigned'.\>
@.CMPU@>
Register X is set to $-1$ if register Y is less than register Z or less than
the unsigned immediate value Z, using the conventions of unsigned
arithmetic; it is set to 0 if register Y is equal to register Z or equal to
the unsigned immediate value~Z; otherwise it is set to~1.
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
@ There also are 32 conditional instructions, which choose quickly between
two alternative courses of action.
\bull\<CSN \$X,\$Y,\0 `conditionally set if negative'.\>
@.CSN@>
If register Y is negative (namely if its most significant bit is~1),
register~X is set to the contents of register~Z or to the
unsigned immediate value~Z. Otherwise nothing happens.
\bull\<CSZ \$X,\$Y,\0 `conditionally set if zero'.
@.CSZ@>
\bul\<CSP \$X,\$Y,\0 `conditionally set if positive'.
@.CSP@>
\bul\<CSOD \$X,\$Y,\0 `conditionally set if odd'.
@.CSOD@>
\bul\<CSNN \$X,\$Y,\0 `conditionally set if nonnegative'.
@.CSNN@>
\bul\<CSNZ \$X,\$Y,\0 `conditionally set if nonzero'.
@.CSNZ@>
\bul\<CSNP \$X,\$Y,\0 `conditionally set if nonpositive'.
@.CSNP@>
\bul\<CSEV \$X,\$Y,\0 `conditionally set if even'.\>
@.CSEV@>
These instructions are entirely analogous to \.{CSN}, except
that register~X changes only if register~Y is respectively zero, positive,
odd, nonnegative, nonzero, nonpositive, or nonodd.
\bull\<ZSN \$X,\$Y,\0 `zero or set if negative'.\>
@.ZSN@>
If register Y is negative (namely if its most significant bit is~1),
register~X is set to the contents of register~Z or to the
unsigned immediate value~Z. Otherwise register~X is set to zero.
\bull\<ZSZ \$X,\$Y,\0 `zero or set if zero'.
@.ZSZ@>
\bul\<ZSP \$X,\$Y,\0 `zero or set if positive'.
@.ZSP@>
\bul\<ZSOD \$X,\$Y,\0 `zero or set if odd'.
@.ZSOD@>
\bul\<ZSNN \$X,\$Y,\0 `zero or set if nonnegative'.
@.ZSNN@>
\bul\<ZSNZ \$X,\$Y,\0 `zero or set if nonzero'.
@.ZSNZ@>
\bul\<ZSNP \$X,\$Y,\0 `zero or set if nonpositive'.
@.ZSNP@>
\bul\<ZSEV \$X,\$Y,\0 `zero or set if even'.\>
@.ZSEV@>
These instructions are entirely analogous to \.{ZSN}, except
that \$X is set to \$Z or~Z if register~Y is respectively zero, positive,
odd, nonnegative, nonzero, nonpositive, or even; otherwise
\$X is set to zero.
Notice that the two instructions \<CMPU r,s,0 and \<ZSNZ r,s,1 have
the same effect. So do the two instructions \<CSNP r,s,0 and \.{ZSP} \.{r,s,r}.
So do \<AND r,s,1 and \.{ZSOD}~\.{r,s,1}.
@* Branches and jumps.
\MMIX\ ordinarily executes instructions in sequence, proceeding from
an instruction in tetrabyte M$_4[\lambda]$ to the instruction in
M$_4[\lambda+4]$. But there are several ways to interrupt
the normal flow of control, most of which use the Y and Z fields of
an instruction as a combined 16-bit YZ field.
For example, \<BNZ \$3,@@+4000 (branch if nonzero)
is typical: It means that control should skip ahead 1000 instructions
to the command that appears $4000$ bytes after the
\.{BNZ}, if register~3 is not equal to zero.
There are eight branch-forward instructions, corresponding to the
eight conditions in the \.{CS} and \.{ZS} commands that we discussed earlier.
And there are eight similar branch-backward instructions; for example,
\<BOD \$2,@@-4000 (branch if odd) takes control to the
instruction that appears $4000$ bytes {\it before\/}
this \.{BOD} command, if register~2 is odd. The numeric OP-code when branching
backward is one greater than the OP-code when branching
forward; the assembler takes care of this automatically, just as it takes
cares of changing \.{ADD} from 32 to 33 when necessary.
Since branches are relative to the current location, the \MMIX\ assembler
treats branch instructions in a special way.
Suppose a programmer writes `\.{BNZ} \.{\$3,Case5}',
where \.{Case5} is the address of an instruction in location~$l$.
If this instruction appears in location~$\lambda$, the assembler first
computes the displacement $\delta=\lfloor(l-\lambda)/4\rfloor$. Then if
$\delta$ is nonnegative, the quantity~$\delta$
is placed in the YZ field of a \.{BNZ}
command, and it should be less than $2^{16}$; if $\delta$ is negative,
the quantity $2^{16}+\delta$ is placed in the YZ field of a \.{BNZ}
command with OP-code increased by~1,
and $\delta$ should not be less than $-2^{16}$.
The symbol \.{@@} used in our examples of
\.{BNZ} and \.{BOD} above is interpreted by the
assembler as an abbreviation for ``the location of the current
instruction.'' In the following
notes we will define pairs of branch commands by writing, for example,
`\.{BNZ}~\.{\$X,@@+4*YZ[-262144]}'; this stands for a branch-forward
command that
branches to the current location plus four times~YZ, as well as for
a branch-backward command that branches to the current
location plus four times $(\rm YZ-65536)$.
\bull\<BN \$X,@@+4*YZ[-262144] `branch if negative'.
@.BN@>
\bul\<BZ \$X,@@+4*YZ[-262144] `branch if zero'.
@.BZ@>
\bul\<BP \$X,@@+4*YZ[-262144] `branch if positive'.
@.BP@>
\bul\<BOD \$X,@@+4*YZ[-262144] `branch if odd'.
@.BOD@>
\bul\<BNN \$X,@@+4*YZ[-262144] `branch if nonnegative'.
@.BNN@>
\bul\<BNZ \$X,@@+4*YZ[-262144] `branch if nonzero'.
@.BNZ@>
\bul\<BNP \$X,@@+4*YZ[-262144] `branch if nonpositive'.
@.BNP@>
\bul\<BEV \$X,@@+4*YZ[-262144] `branch if even'.\>
@.BEV@>
If register X is respectively negative, zero, positive, odd, nonnegative,
nonzero, nonpositive, or even, and if this instruction appears in memory
location $\lambda$, the next instruction is taken from memory location
$\lambda+4{\rm YZ}$ (branching forward) or $\lambda+4({\rm YZ}-2^{16})$
(branching backward). Thus one can go from location~$\lambda$ to any location
between $\lambda-262{,}144$ and $\lambda+262{,}140$, inclusive.
\smallskip
Sixteen additional branch instructions called {\it probable branches\/}
are also provided. They have exactly the same meaning as ordinary
branch instructions; for example, \<PBOD \$2,@@-4000 and \<BOD \$2,@@-4000 both
go backward 4000 bytes if register~2 is odd. But they differ in running time:
On some implementations of\/ \MMIX,
a branch instruction takes longer when the branch is taken, while a
probable branch takes longer when the branch is {\it not\/} taken.
Thus programmers should use a \.B instruction when they think branching is
relatively unlikely, but they should use \.{PB} when they expect
branching to occur more often than not. Here is a list of the
probable branch commands, for completeness:
\bull\<PBN \$X,@@+4*YZ[-262144] `probable branch if negative'.
@.PBN@>
\bul\<PBZ \$X,@@+4*YZ[-262144] `probable branch if zero'.
@.PBZ@>
\bul\<PBP \$X,@@+4*YZ[-262144] `probable branch if positive'.
@.PBP@>
\bul\<PBOD \$X,@@+4*YZ[-262144] `probable branch if odd'.
@.PBOD@>
\bul\<PBNN \$X,@@+4*YZ[-262144] `probable branch if nonnegative'.
@.PBNN@>
\bul\<PBNZ \$X,@@+4*YZ[-262144] `probable branch if nonzero'.
@.PBNZ@>
\bul\<PBNP \$X,@@+4*YZ[-262144] `probable branch if nonpositive'.
@.PBNP@>