Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 1.23 KB

File metadata and controls

24 lines (20 loc) · 1.23 KB

Proj3000: Only use UTF-8 without BOM

The byte-order mark (BOM) is a particular usage of the special Unicode character code, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text.

[...] Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information. [...]

The use of BOM for UTF-8 however, can be problematic:

  1. Files that hold no text are no longer empty because they always contain the BOM.
  2. Files that hold text within the ASCII subset of UTF-8 are no longer themselves ASCII because the BOM is not ASCII. This can cause existing, often hard to replace, (legacy) tools to break down.
  3. It is not possible to concatenate several files together (without changes) because each file now has a BOM at the beginning.
  4. Multiple standards (including CSS, JSON, YAML) explicitly disallow the use of UTF-8 BOM.

Therefor, this rule will warn when a file is stored with an UTF-8 BOM.