Skip to content

Latest commit

 

History

History
56 lines (29 loc) · 4.12 KB

html-entities.md

File metadata and controls

56 lines (29 loc) · 4.12 KB

XML and HTML character entity references

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a _numeric character reference_and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.

A character entity reference refers to the content of a named entity. An entity declaration is created by using the <!ENTIT name "value"> syntax in a Document Type Definition (DTD).

Character reference overview

A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format:

`&#`_nnnn_`;`

or

`&#x`_hhhh_`;`

where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style.

In contrast, a character entity reference refers to a character by the name of an entity which has the desired character as its replacement text. The entity must either be predefined (built into the markup language) or explicitly declared in a Document Type Definition (DTD). The format is the same as for any entity reference:

`&`_name_`;`

where name is the case-sensitive name of the entity. The semicolon is required.XML

Standard public entity sets for characters

ISO Entity Sets: SGML supplied a comprehensive set of entity declarations for characters widely used in Western technical and reference publishing, for Latin, Greek and Cyrillic scripts. The American Mathematical Societyalso contributed entities for mathematical characters.

HTML Entity Sets: Early versions of HTML built in small subsets of these, relating to characters found in three Western 8-bit fonts.

MathML Entity Sets: The W3C developed a set of entity declarations for MathML characters.

XML Entity Sets: The W3C MathML Working Group took over maintenance of the ISO public entity sets, combined with the MathML and documents them in XML Entity Definitions for Characters. This set can support the requirements of XHTML, MathML and as an input to future versions of HTML.

HTML 5: HTML5 adopts the XML entities as named character references, however it restates them without reference to their sources and does not group them into sets. The HTML 5 specification additionally provides mappings from the names to Unicode character sequences using JSON.

Numerous other entity sets have been developed for special requirements, and for major and minority scripts. However, the advent of Unicode has largely superseded them.

...