-
Notifications
You must be signed in to change notification settings - Fork 2
/
overview.html
173 lines (165 loc) · 10.3 KB
/
overview.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
<!DOCTYPE html>
<html lang="en">
<body>
<a href="https://github.com/jrte/ribose" target="_blank"><b>Ribose</b></a> is about inversion of
control for high-volume text analysis and information extraction and transformation. For a more
detailed presentation see <a href="https://github.com/jrte/ribose#readme" target="_blank">
https://github.com/jrte/ribose</a>.
<br><br><b>How Do I Work This?</b><br><br>
The ribose model compiler builds transducers from <b><a href="https://github.com/ntozubod/ginr"
target="_blank">ginr</a></b> automata and packages them in persistent models for use in
the Java runtime. A copy of the ginr <a href="https://github.com/jrte/ribose/wiki/Ginr"
target="_blank">user guide</a> is published, with a sidebar index, on the ribose wiki.
<br><br>
Ginr automata for ribose are 3-tape finite state transducers (FSTs) compiled
from 3-dimensional regular patterns that express the source syntax on the input tape and
unambiguously map it onto parametric effectors, expressed by a target, that extract and
assimilate input data into the target domain. Ginr encodes Unicode literal characters and
tokens, eg 'Σ', 'Π', `ΣΠ`, as multibyte UTF-8 sequences and the encoded form is used
throughout ribose. This is largely transparent to end users, but care must be taken when
pasting non-ASCII text in ribose patterns. For example ('ΣΠ', paste) and ('Π', paste)
will paste only the last byte of the encoded 'Π', whereas ('ΣΠ' @@ PasteAny) gets them
all. If not certain that all input is 7-bit ASCII it is better to use {@code PasteAny}
if the intention is to paste entire sequences of UTF-8 bytes.
<br><br>
In the Java runtime, compiled ribose models are presented through the {@code IModel} and
{@code ITransductor} interfaces. A transductor provides a Java thread with a transducer stack,
an input stack and a {@code run()} method that drives input through the transducer stack to
{@code IEffector<T extends ITarget>} effectors expressed by an {@code ITarget} instance.
The transductor invokes the target effectors in response to syntactic cues in the input,
and the effectors act on the target to get stuff done. Transduction ends when the input or
transducer stack dries up, but the transductor persists and can be reused for subsequent
transductions.
<br><br><b><i>Ribose Transducer Patterns</i></b><br><br>
So, to begin, you will need to prepare some ribose transducer patterns, compile them to
FSTs, and save the FSTs to a directory to be compiled into a ribose model file. You will
likely want to prepare a ginr source file, or prologue, that defines <a
href="https://github.com/jrte/ribose/blob/master/patterns/alpha.inr" target="_blank">
common character classes</a> and other patterns that are generally used throughout your
other transducer patterns. For example, the ribose prologue {@code alpha.inr} defines
{@code PasteAny=(byte,paste)*}, which is essential for transcribing multibyte UTF-8
character encodings. Prepend your prologue to your other ginr source files for ginr
compilation.
<br><br>
For example, the ribose compiler model {@code TCompile.model} includes an {@code Automaton}
transducer pattern, shown <a href="https://github.com/jrte/ribose/blob/master/patterns/ribose/Automaton.inr"
target="_blank">here</a>, that is compiled with the {@code rinr} shell script as follows:
<br><pre>./rinr patterns/alpha.inr patterns/ribose patterns/after.inr</pre> to produce
{@code build/compiler/Automaton.dfa}. The ribose model compiler then transduces each FST
(*.dfa) file in the {@code build/automata} directory to assemble ribose transducers
and saves them in {@code TCompile.model}.
<br><pre>./ribose compile build/automata build/TCompile.model</pre>
The target in this case is the {@code ModelCompiler}, which expresses three
effectors to extract {@code Header} and {@code Transition} records and finally
assemble a ribose transducer ({@code Automaton}) and persist it in {@code
TCompile.model}. A {@code build/TCompile.map} file is also produced, listing
the model target class name along with names and ordinals for the transducers,
effectors, parameters, fields and signals defined or contained in the model.
The {@code TCompile} model contains only one transducer, which can be decompiled
by the model decompiler:
<br><pre>./ribose decompile build/TCompile.model Automaton</pre>
More examples of ginr patterns for ribose transducers can be found in the {@code
patterns/test} directory, with some discussion in the <a
href="https://github.com/jrte/ribose#more-examples" target="_blank">ribose wiki</a>.
These test patterns are compiled into {@code build/Test.model} for CI testing, with
test inputs extracted from {@code patterns/test/inputs.zip}. The test model target
class is contained in {@code jars/ribose-0.0.2-test.jar}, which needs to be specified
(not necessary for the ModelCompiler and SimpleTarget target classes as they are
contained in the ribose jar file). To build the test model from the test patterns
you would proceed as follows:
<br><pre>-: targetpath="--target-path jars/ribose-0.0.2-test.jar"
-: ./rinr patterns/alpha.inr patterns/test patterns/after.inr
-: ./ribose compile $targetpath --target com.characterforming.jrte.test.TestTarget build/automata build/Test.model</pre>
To test the <a href="https://github.com/jrte/ribose/blob/master/patterns/test/Fibonacci.inr"
target="_blank">Fibonacci</a> transducer:
<br><pre>-: unzip -p patterns/test/inputs.zip fib.txt | ./ribose run $targetpath --nil build/Test.model Fibonacci -
1
1
11
111
11111
11111111
1111111111111
111111111111111111111
1111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
-: ./ribose decompile $targetpath build/Test.model Fibonacci
Fields
0:
1: *
2: q
3: r
4: p
Input equivalents (equivalent: input...)
0: #101
1: #a
2: #0-#9 #b-/ 1-#100 #102 #104-#108
3: #103
4: 0
State transitions (from equivalent -> to effect...)
0 0 -> 1 clear[ `~*` ]
0 3 -> 0 stop
1 1 -> 0 paste out signal[ `!nil` ]
1 4 -> 2 select[ `~q` ] paste[ `1` ]
2 1 -> 0 paste out signal[ `!nil` ]
2 4 -> 2 select[ `~r` ] cut[ `~p` ] select[ `~p` ] copy[ `~q` ] select[ `~q` ] cut[ `~r` ]</pre>
<br><b><i>Transduction Targets and Effectors</i></b><br><br>
Every ribose model is associated with a specific transduction target class that
expresses a collection of effectors. A transduction target class is any class that
has a public default constructor and implements the {@code ITarget} interface, which
exposes two methods: {@code String getName()} and {@code IEffector<?>[] getEffectors()}.
The name can be whatever you like, and the target can express as many effectors as
required. All interactions between transductor and target are directed by the involved
transducers and mediated by the target effectors. Targets can be fat like {@code
ModelCompiler} or thin like {@code SimpleTarget}, which is just a facade for the
ribose transductor as target. Whatever works best for your situation. The transductor
knows nothing about target semantics, it just presents input to transducers that
engage very specific target effectors as they recognize specific input patterns.
Stuff happens in the target, maybe. That's all the transductor knows about the target.
<br><br>
Effectors are typically implemented as private inner classes of the target class,
which allows them to access the private members of the target. They all receive a
reference to the target instance and an {@code IOutput} instance that supports
conversion of transducer fields to Java primitive values or {@code byte[]} or
{@code char[]} arrays. There are three types of effectors: simple effector,
parametric effector and receptor effector. Simple effectors extend {@code
IEffector<T extends ITarget>}, which expresses a niladic {@code invoke()} method
that is called when the effector fires in running transduction. Parametric
effectors extend {@code IParametricEffector<T extends ITarget, P> extends
IEffector<T>>} and contain an array {@code P[]} of parametric objects compiled
from lists effector tokens. Parametric effectors inherit {@code IEffector.invoke()}
and add a monadic {@code invoke(int)} method, where the {@code int} argument selects
a specific {@code P} instance to use. Receptor effectors are specialized parametric
effectors that extend {@code BaseReceptorEffector<T extends ITarget> extends
BaseParametricEffector<T, Receiver[]>}. Receptor effectors express public fields
that receive data decoded from transducer fields.
<br><br>
The transductor is itself a target expressing a suite of core effectors that are
sufficient for transducers that transform input and write raw bytes to an output
stream. For UTF-8 input these transforms are effected without decoding or widening
input; only UTF-8 bytes extracted to transducer fields are decoded. These core
effectors, which are listed in the {@code ITransductor} documentation, are injected
into every target and are available for use in any ribose transducer. A {@code
SimpleTarget} target implementation that expresses only the {@code ITransductor}
effectors is provided. Ribose models that are bound to {@code SimpleTarget} are
called <i>simple</i> models.
<br><br>
Specialized targets can define additional, domain-specific, effectors. Composite
targets, comprised of >1 {@code ITarget} instances, can be presented for transductor
binding. In that case, one target is selected to gather effectors from all participating
targets into a single array that it presents to the transductor in its {@code
getEffectors()} method. In any case, each effector receives a reference to its
respective target. Models that are associated with a specialized target as referred
to as <i>fancy</i> models.
<br><br><b><i>Building and Using Ribose Models</i></b><br><br>
The {@code Ribose} class provides a {@code main()} method that enables models to be
compiled, run and decompiled, as described in the {@code Ribose} class documentation.
It can be executed via the java command line or the provided {@code ribose} shell
script. The runtime features are encapsulated in the {@code IModel} interface. For further
information please refer to the package and interface documentation contained herein.
See the {@link com.characterforming.ribose} package for ribose interfaces and the runnable
{@link com.characterforming.ribose.Ribose} class. The {@link com.characterforming.ribose.base}
package includes ribose base classes for extension and use in other domains.
</body>
</html>