Releases: openpreserve/jhove
JHOVE 1.6
XML HANDLER AND TEXT HANDLER
- The default version of MIX is now 2.0. In earlier versions it was 0.2.
However, MIX 2.0 still isn't supported in the text handler, so it will
produce 1.0 output by default. The XML handler will produce MIX 2.0
output.
TIFF MODULE
- JHOVE returned a "String index out of range: 4" exceptions during
TIFF validation for a tiff contains an empty (not NULL) date/time
field. This has been corrected so that a date/time field with
the wrong length won't be parsed but will report an error instead. - If text tags contain characters which aren't printable ASCII, these
are now output as escape sequences so that invalid XML isn't
output.
UTF-8 MODULE
- Updated to Unicode 6.0.0.
JHOVE 1.5
PDF MODULE
- An ArrayIndexOutOfBoundsException was thrown on a PDF with an invalid
object number in the cross-reference stream. In JHOVE 1.5, this is
correctly reported as a violation of well-formedness.
UTF-8 MODULE
- With some very simple UTF-8 files, JHOVE handlers would throw an exception
processing them, and the GUI would fail silently. This happened with files
using no UTF-8 blocks. This has been fixed.
TEXTMD (multiple modules)
- TextMD metadata can now optionally be reported. To get this, it's
necessary to edit jhove.conf. TextMD can be enabled on a per-module
basis for HtmlModule, AsciiModule, Utf8Module, and XmlModule.
The element for each chosen module must contain the element withtextmd=true (no spaces). - The TextMD feature was added by Thomas Ledoux.
JHOVE 1.4
PDF MODULE
-
The PDF/A profile has been updated to the final version of
19005-1:2005(E) and made more thorough. Among the changes:a. The set-state and no-op actions disqualify a PDF/A candidate.
b. The ASCIIHexDecode and ASCII85Decode filters no longer
disqualify a candidate.c. Checking of outlines has been added.
d. Additional checking of Type 1 fonts and symbolic fonts.
e. Bug fix in checking type 2 subfonts.
f. An LZW filter in an image object disqualifies a candidate.
g. The xpacket processing instruction is checked for attributes
which disqualify from PDF/A.h. Conformity to implementation limits is checked as a condition
of PDF/A conformity.
JPEG2000 MODULE
- The pathological case of an image with no components is checked so
it won't cause a crash.
XML HANDLER
- A reset() function has been added so that if the handler is reused,
it will return to a valid initial state.
JHOVE 1.3
GENERAL
- The build.xml files now force compilation to Java 1.4, preventing
accidental distributions that aren't 1.4-compatible. - Spaces are allowed in file paths on Windows, if the path is
enclosed in quotes. This fix had been in version 1.1i, and had been
lost since then.
PDF MODULE
- According to the PDF 1.6 specification, table 3.4, parameters for a
stream filter can be either a dictionary or the null object. The null
object was treated as an error; it is now allowed. - Object stream handling was seriously buggy, causing rejection of
well-formed and valid files; it's better now. - In PDF 1.4, an outline dictionary unconditionally must have a "First"
and a "Last" entry. JHOVE follows this requirement, declaring a file
invalid if it isn't met. However, PDF 1.6 relaxes the requirement,
applying it only "if there are any open or closed outline entries."
Thus, an empty outline dictionary with no "First" or "Last" entry
is valid. It is now accepted (for all PDF versions). - If a page number tree in a PDF file is missing an expected "Nums"
entry, this was being reported as an invalid date. A more appropriate
error message is now given.
TIFF MODULE
- TIFF tag 33723 (IPTC-NAA) was considered valid only if the data
type is ASCII or LONG. But according to Aware Systems, the valid
types are UNDEFINED and BYTE. All four types are now accepted.
XML HANDLER
- Omissions in MIX 1.0 and 2.0 output have been fixed.
JHOVE 1.2
GENERAL
- A bug has been fixed in CountedInputStream, which could potentially
have caused infinite recursion in some modules.
HTML MODULE
- An incompatibility with Java 1.6 has been fixed.
PDF MODULE
- A null pointer exception would be thrown for PDF documents without a
document root tree. This has been fixed. - A source of possible false positives in PDF profiles has been fixed.
- Certain checks weren't being done to Type 2 fonts, and some PDF/A
profile violations might have been missed as a result. This has
been fixed.
WAVE MODULE
- Sub-chunks of the 'adtl' chunk are now constrained to even byte
boundaries.
XML HANDLER
- MIX 2.0 is now supported.
- The URL for the MIX 0.2 schema has changed to reflect the change
on the LOC MIX site. - The handler was sometimes incorrectly reporting whether the
AESAudioMetadata property had an empty value or not. This has
been fixed.
JHOVE 1.1
COMMAND-LINE INTERFACE
- Allow filenames with internal spaces if they are quoted on the
command line. - Corrected error setting the Classpath in the Windows Shell script
(jhove.bat) - Corrected error opening the configuration file using the default
GCJ parser in the GNU Java Runtime Environment.
GUI (SWING) INTERFACE (JHOVE VIEW)
-
AES metadata properties displayed in the RepInfo window rearranged
slightly to make their ordering consistent with the Text and XML
handlers. -
The JhoveView.main() method will now accept a "-c configFile" option
on the command line. The GUI interface can now be invoked by:java -jar bin/JhoveView.jar -c configFile
-
Corrected error opening the configuration file using the default
GCJ parser in the GNU Java Runtime Environment. -
Correct recurrent problems with reading the configuration file on
Windows installations.
AIFF MODULE
- Correct value for first sample offset by included non-zero offset
defined in the SSND chunk. - Do not report bitrate reduction data for PCM data.
- All non-final instance fields and methods are protected, rather than
private.
ASCII MODULE
- A minimal file containing no line-end characters now does not
produce an empty ASCIIMetadata property, which is invalid against
the JHOVE schema. - Zero-length files are considered not well-formed.
- Issue informative message if file contains no printable characters.
- All non-final instance fields and methods are protected, rather than
private.
BYTESTREAM MODULE
- All non-final instance fields and methods are protected, rather than
private.
GIF MODULE
- All non-final instance fields and methods are protected, rather than
private.
HTML MODULE
- The HTMLMetadata block in the module output is only produced if
there is at least one actual metadata property to report. - All non-final instance fields and methods are protected, rather than
private.
JPEG MODULE
- The JPEG module reports the X and Y sampling frequency for files
meeting the JFIF profile. - The JPEG module reports the pixel aspect ratio for JFIF profile
files for which it is defined. - File handles were not being properly closed when processing embedded
EXIF metadata. In cases where JHOVE was invoked against large
numbers of objects this was causing a premature crash due to the
resource leak. - All non-final instance fields and methods are protected, rather than
private. - Correct parsing of the EXIF "subsecTimeOriginal" (37251) and
"subsecTimeDigitized" (37522) properties. - Validation errors in embedded EXIF metdata were not being fully
reported.
JPEG 2000 MODULE
- All non-final instance fields and methods are protected, rather than
private. - Files generated by the LuraWave codec are no longer incorrecly identified
as having unrecognized QCC marker segments.
PDF MODULE
- Date strings are now parsed with strict conformance to the ASN.1
syntax. - Destinations defined by indirect references to non-existent objects
are assumed to have the value "null". Files containing such
destinations are reported as "well-formed, but not valid". - No attempt is made to display encrypted outline item title strings are
not displayed. - Catch error if the Info key of the trailer dictionary is not an
indirect reference. - Read entire page tree structure, regardless of its internal
organization. This error may have caused the under reporting of
page resources, such as fonts and images. - The NISO Compression Scheme for all images using the CCITTFaxDecode
compression filter is now reported properly; previously, the scheme
was always reported as CCITT 1D even if the actual compression
algorithm was CCITT Group 3 or 4. - Properly parse UTF-16 escape characters encoded in double-byte form.
- The module properly stops looking for the header comment after 1024
bytes. - All non-final instance fields and methods are protected, rather than
private.-
The number of incremental updates is now reported correctly, rather than
the total number of file trailers, which is one greater than the number
of updates. -
Only up to 1000 fonts will be reported. After that, an informative
message will be generated. The limit can be set using the parameter
"nxxxx" in the module-specific section of the configuration file:<module> <class>edu.harvard.hul.ois.jhove.module.PdfModule</class> <param>n2000</param> </module>
-
Subfonts of Type 0 are now being properly reported.
-
PDF/A-1b profile is now being properly reported.
-
Permit trailer info key to be optional.
-
Additional correction for outline recursion.
-
Fix treatment of indirect object of Actions.
-
Correctly handle trailer dictionary without Info entry.
-
Ignore comments within dictionaries.
-
TIFF MODULE
-
Corrected error parsing pyramidal TIFF using the SubIFDs tag with a
type of IFD (13) rather than LONG (4). -
Correct parsing of the EXIF "subsecTimeOriginal" (37251) and
"subsecTimeDigitized" (37522) properties. -
All sub-IFDs of a pyramidal TIFF are now properly parsed.
-
The EXIF GainControl tag (41991) is now correctly identified as
a SHORT, not a RATIONAL, value. -
Corrected error in which valid files were reported as being only
well-formed due to an incorrect parsing of the DateTime (306) tag. -
Byte-aligned offsets can be considered well-formed if the module
parameter "byteoffset=true" is set in the configuration file:<module> <class>edu.harvard.hul.ois.jhove.module.TiffModule</class> <param>byteoffset=true</param> </module>
-
All non-final instance fields and methods are protected, rather than
private. -
Correct parsing of the EXIF "subsecTimeOriginal" (37251) and
"subsecTimeDigitized" (37522) properties. -
Using the "-s" option, the TIFF module was incorrectlly reporting
signature matches for text files starting with "II". -
Validation errors in embedded EXIF metdata were not being fully
reported.
UTF8 MODULE
- Corrected error under which malformed UTF-8 files containing encoding
sequences starting with a byte value in the range 0xF8 through 0xFF
were reported as well-formed and valid. - Zero-length files are considered not well-formed.
- Issue informative message if file contains no printable characters.
- All non-final instance fields and methods are protected, rather than
private.
WAVE MODULE
- BWF files now set the correct start time in the AES metadata.
- All non-final instance fields and methods are protected, rather than
private. - "cue " and "adtl" chunks are now properly read.
XML MODULE
- The DTD is assumed to be the first DOCTYPE system ID in the file with an
".dtd" extension. - All non-final instance fields and methods are protected, rather than
private. - The module correctly handles schemaLocation attributes that do not
provide two whitespace-separated URIs.
TEXT HANDLER
- AES audio metadata properties rearranged slightly to make their
ordering consistent with the XML schema.
XML HANDLER
-
Correct sample rate formatting in AES Time Code Format (TCF)
temporal references. -
Correct face IDREF in AES metadata.
-
Disallowed control characters are removed from content.
-
Null property values no longer generate empty elements.
-
Image technical metadata can be reported in terms of the MIX 1.0 schema,
as opposed to the default reporting against MIX 0.2. To specify the
1.0 schema include the directive:<mixVersion>1.0</mixVersion>
if the configuration file.
JHOVE API
- The process() and processFile() methods of the JhoveBase class are now
public, to permit direct access to the API by applications. - Checksum calculations now use buffered I/O uniformly for improved
performance. - All non-final fields and methods in the JhoveBase class are
protected, rather than private. - When invoked with the "-s" option JHOVE now reports the signature
matched format and MIME type. - The processing of files in a directory is now performed in an
alphabetically sorted order.
ADUMP UTILITY
- Display the field values of known chunks.
TDUMP UTILITY
- New format that sorts all tag definitions by their byte offset and
also displays the byte ranges for image data. - Command line flags permit the suppression of BYTE data display (-b) and
and subIFD parsing (-s).
USERHOME UTILITY
- A new utility program, UserHome, is available to determine the value
of the Java user.home property needed to know where to place the
configuration file. This utility can be invoked by the driver scripts
"userhome" (Bourne shell) or "userhome.bat" (Windows).