-
Notifications
You must be signed in to change notification settings - Fork 23
/
CHANGES.xml
255 lines (243 loc) · 9.82 KB
/
CHANGES.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
<document xmlns="http://maven.apache.org/changes/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/changes/1.0.0 http://maven.apache.org/xsd/changes-1.0.0.xsd">
<properties>
<title>Norconex Importer Project</title>
<author email="info@norconex.com">Norconex Inc.</author>
</properties>
<body>
<release version="3.1.0-SNAPSHOT" date="2024-??-??" description="Minor release.">
<action dev="essiembre" type="update">
Minimum Java Version is now 11.
</action>
</release>
<release version="3.0.1" date="2023-07-09">
<action dev="essiembre" type="add" issue="76">
New DOMPreserveTransformer.
</action>
<action dev="essiembre" type="update">
Maven dependency updates: norconex-commons-maven-parent 1.0.2-SNAPSHOT.
</action>
<action dev="essiembre" type="fix">
Fix RegexTagger not picking up XML-configured "fieldMatcher".
</action>
</release>
<release version="3.0.0" date="2022-01-02"
description="Major release. NOT a drop-in replacement for 2.x.">
<!-- 3.0.0 (GA) -->
<action dev="essiembre" type="update">
Updated transitive dependencies with known vulnerabilities.
</action>
<action dev="essiembre" type="update">
Updated dependencies to avoid logging library detection conflict.
</action>
<!-- 3.0.0-RC1 -->
<action dev="essiembre" type="update">
Maven dependency updates: Apache Tika 1.27 (and its many transitive
dependencies), UCAR jj2000 5.4, Opencsv 5.5.2,
JAI Image-IO jpeg2000 1.4.0, JBIG2 ImageIO 2.0.
</action>
<action dev="essiembre" type="fix">
Fixed invalid configuration in POM "maven-dependency-plugin".
</action>
<!-- 3.0.0-M2 -->
<action dev="essiembre" type="add">
Handlers now support XML "flow", which adds supports for
if/ifNot/condition/then/else tags in XML configuration.
</action>
<action dev="essiembre" type="add">
New "condition" classes for XML "flow" configuration: BlankCondition,
DateCondition, DOMCondition, NumericCondition, ReferenceCondition,
ScriptCondition, and TextCondition.
</action>
<action dev="essiembre" type="add">
New RejectFilter.
</action>
<action dev="essiembre" type="add">
New CharsetUtil#firstNonBlankOrUTF8(...) methods.
</action>
<action dev="essiembre" type="add">
When not already set, an attempt to detect document character encoding
is now always made before invoking handlers.
</action>
<action dev="essiembre" type="add">
New CommonMatchers class.
</action>
<!-- 3.0.0-M1 -->
<action dev="essiembre" type="add">
New ImageTransformer class.
</action>
<action dev="essiembre" type="add">
New NoContentTransformer class.
</action>
<action dev="essiembre" type="add">
New -f or "outputMetaFormat" command-line argument for saving
exported metadata fields in alternate formats.
</action>
<action dev="essiembre" type="add">
New TextFilter class.
</action>
<action dev="essiembre" type="add">
New ReferenceFilter class.
</action>
<action dev="essiembre" type="add">
New ExternalHandler class.
</action>
<action dev="essiembre" type="add">
New DOMFilter class.
</action>
<action dev="essiembre" type="add">
New EmptyFilter class.
</action>
<action dev="essiembre" type="add">
New RegexTagger class.
</action>
<action dev="essiembre" type="add">
New URLExtractorTagger class.
</action>
<action dev="essiembre" type="add">
New DOMDeleteTransformer class.
</action>
<action dev="essiembre" type="add">
New XMLStreamSplitter class.
</action>
<action dev="essiembre" type="add">
New HandlerDoc to ease handler implementations.
</action>
<action dev="essiembre" type="add">
Importer now uses an EventManager and triggers several events:
IMPORTER_HANDLER_BEGIN, IMPORTER_HANDLER_END, IMPORTER_HANDLER_ERROR,
IMPORTER_PARSER_BEGIN, IMPORTER_PARSER_END, IMPORTER_PARSER_ERROR
</action>
<action dev="essiembre" type="add">
New ImporterDocument#getStreamFactory() method.
</action>
<action dev="essiembre" type="add">
ReplaceTagger now has the option to discard values that are unchanged
after replacement.
</action>
<action dev="essiembre" type="add">
New options on CharacterCaseTagger: "wordsFully", "stringFully",
"sentences", and "sentencesFully".
</action>
<action dev="essiembre" type="add">
Most configurable classes adding/setting metadata values now have
an extra "onSet" option for dictating how values are set:
append, prepend, replace, optional.
</action>
<action dev="essiembre" type="add">
New DocInfo class.
</action>
<action dev="essiembre" type="add">
New ImporterRequest class.
</action>
<action dev="essiembre" type="add">
New option in DOMTagger to delete elements matched by a selector.
</action>
<action dev="essiembre" type="add">
Added time zone support to DateMetadataFilter.
</action>
<action dev="essiembre" type="add">
Added support for Webp image format.
</action>
<action dev="essiembre" type="update">
Now requires Java 8 or higher.
</action>
<action dev="essiembre" type="update">
Importer#importDocument(...) now expects an ImporterRequest or a Doc.
</action>
<action dev="essiembre" type="update">
Default allocated memory for caching of document content was increased
by a factor of 10 (100MB max per document, 1GB max total).
</action>
<action dev="essiembre" type="update">
XML configuration of handlers had their XML tag names changed from
"filter", "tagger", "transformer, "splitter" to simply "handler".
</action>
<action dev="essiembre" type="update">
JBIG2 image support now included under apache license.
</action>
<action dev="essiembre" type="update">
Logging now using SLF4J.
</action>
<action dev="essiembre" type="update">
Maven dependency updates: Norconex Commons Lang 2.0.0,
Apache Tika 1.22, Apache Commons CLI 1.4, Junit 5.
</action>
<action dev="essiembre" type="update">
RegexFieldExtractor and RegexUtil have been deprecated in favor
of Norconex Commons Lang FieldValueExtractor and Regex.
</action>
<action dev="essiembre" type="update">
RegexContentFilter and RegexMetadataFilter have been deprecated in
favor of TextFilter.
</action>
<action dev="essiembre" type="update">
RegexReferenceFilter has been deprecated in favor of ReferenceFilter.
</action>
<action dev="essiembre" type="update">
DOMContentFilter has been deprecated in favor of DOMFilter.
</action>
<action dev="essiembre" type="update">
EmptyMetadataFilter has been deprecated in favor of EmptyFilter.
</action>
<action dev="essiembre" type="update">
TextPatternTagger has been deprecated in favor of RegexTagger.
</action>
<action dev="essiembre" type="update">
TextBetweenTagger now has "inclusive" and "caseSensitive" options
configurable for each "between" details.
</action>
<action dev="essiembre" type="update">
Now using Path instead of File in many cases.
</action>
<action dev="essiembre" type="update">
Parsing no longer attempted on zero-length content.
</action>
<action dev="essiembre" type="update">
List of PropertyMatcher replaced with PropertyMatchers.
</action>
<action dev="essiembre" type="update">
ContentTypeDetector methods are now static.
</action>
<action dev="essiembre" type="update">
Eliminated Apache Tika log warnings on startup when missing specific
optional libraries not package due to licensing
(e.g. JPEG 2000, jbig2).
</action>
<action dev="essiembre" type="update">
Occurrences of accessors for overwrite="[false|true]" and
onConflict="..." have been deprecated in favor of
new onSet="...".
</action>
<action dev="essiembre" type="update">
Most places where regular expressions could be used now also
support "basic" matching and "wildcard" as well as being able to
ignore diacritical marks (e.g., accents).
</action>
<action dev="essiembre" type="update">
Most occurrences of "caseSensitive" or "caseInsensitive" configuration
options are now replaced with "ignoreCase".
</action>
<action dev="essiembre" type="update">
Filters implementing AbstractStringFilter will now have their
isStringContentMatching(...) method invoked at least once, even
if there is no document content.
</action>
<action dev="essiembre" type="update">
"parsed" boolean arguments were replaced by ParseState.PRE and
ParseState.POST.
</action>
<action dev="essiembre" type="update">
Many methods with a combinations of reference, input stream, and
metadata were updated to now accept a Doc instance instead.
</action>
<action dev="essiembre" type="remove">
Removed some of the methods deprecated in previous releases.
</action>
<action dev="essiembre" type="remove">
Removed SplittableDocument.
</action>
</release>
</body>
</document>