-
Notifications
You must be signed in to change notification settings - Fork 15
Design Principles
This section explains the design principles behind LinqToXsd.
LINQ to XSD relies on a systematic mapping of schema to object types that meets the following constraints:
- The mapping covers all of XML Schema.
- The mapping is predictable and comprehensible.
- The mapping facilitates round-tripping of instance data.
- The mapping does not rely on any customization per default.
- The mapping conveys most schema intents into the object models, where possible.
- The mapping aims to derive classes that are close to the expectations of an OO programmer.
Note: The current release of LINQ to XSD only approximates these constraints. The following list of mapping rules summarizes the canonical mapping that is assumed by LINQ to XSD. There is a separate document describing the mapping in detail. There are also separate resources motivating the mapping. These documents are listed in the introduction.
- XML namespaces are mapped to CLR namespaces.
- XML names are mapped to CLR names subject to lexical conversions and clash resolution.
- Global element declarations are mapped to (top-level) classes.
- Complex-type definitions are mapped to (top-level) classes.
- Local element/attribute declarations as well as references are mapped to properties.
- Anonymous complex types for local elements are mapped to inner classes, by default.
- Named and anonymous simple types are not mapped to classes, by default.
- Complex-type derivation (by both extension and restriction) is mapped to OO subclassing.
- Substitution grouping is mapped to OO subclassing.
- Simple-type restrictions are mapped to preconditions on properties for elements of these types.
- Global attribute declarations and attribute-group definitions are inlined per reference.
- Redefinitions are carried out before mapping (according to System.Xml.Schema rules).
- Simple-type references are mapped to the use of CLR value types or string.
Instances of LINQ to XSD types may be referred to as ‘XML objects’. This choice of a term is meant to emphasize that the generated classes do not facilitate ‘plain fields’. Instead, these classes model typed views on untyped XML trees. That is, the classes use properties that reach into untyped XML trees. Technically, instances of LINQ to XSD classes are wrappers around instances of the LINQ to XML class XElement
. All LINQ to XSD classes have a common base: XTypedElement
.
public class XTypedElement
{
private XElement xElement;
// Remainder of XTypedElement omitted
}
A global element declaration is mapped to a subclass of XTypedElement
.
public class PurchaseOrder : XTypedElement
{
// API omitted
}
Objects for typed XML trees can be constructed in these ways:
- By the default constructor (followed by DML operations; to be discussed later).
- By a static method Load; here is one overload for Load:
public static PurchaseOrder Load(string xmlFile);
- By the coercion from an untyped XML tree:
public static explicit operator PurchaseOrder(XElement xe);
The full LINQ to XML API is accessible through a redirection property:
// Part of XTypedElement API
public XElement Untyped { get; set; }
For instance, one could invoke the element-name-based (and untyped) descendant axis on typed XML trees:
batch.Untyped.Descendants("Item"))
If necessary, the result could be cast back into the typed world:
(Item)batch.Untyped.Descendants("Item")).First()
Several untyped XML-programming idioms are complemented by typed variations:
- Most notably, the generated typed properties complement these untyped properties:
// Untyped services covered by generated classes public XElement Element(XName name); public IEnumerable<XElement> Elements(XName name); public XAttribute Attribute(XName name);
- There are type-driven (in addition to element-name-based) descendant and ancestor axes:
// Part of XTypedElement API public IEnumerable<T> Descendants<T>() where T : XTypedElement; public IEnumerable<T> Ancestors<T>() where T : XTypedElement;
The XPath notation “./foo” directly maps to OO member access “myObject.foo”.
Hence, the typed properties of LINQ to XSD facilitate an XPath-like child axis.
In particular:
- Recurrence of an element name in a content model maps to a single property.
- The element particles of a sequence and a choice can be both queried in the same way.
- There is 1:1 correspondence between element names in a content model and properties.
- There is no 1:1 correspondence between element particles and properties (in general).
This style is also called instance-oriented mapping.
The following examples demonstrate the design principle of XPath alignment. Consider the following schema fragment for US addresses with a choice group highlighted:
<xs:complexType name="USAddress">
<xs:sequence>
<xs:choice>
<xs:element name="Street" type="xs:string"/>
<xs:element name="POBox" type="xs:int"/>
</xs:choice>
<xs:element name="City" type="xs:string"/>
<xs:element name="Zip" type="xs:int"/>
<xs:element name="State" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Regardless of such a schema, XML programming may simply tests for the presence of certain elements to organize the problem logic. This will be illustrated with functionality for formatting US address, say for use in a printed letter. Here is an example instance of an US address and the corresponding rendering of the address, as we wish to compute:
<Address>
<POBox>423788</POBox>
<City>London</City>
<Zip>12345</Zip>
<State>CA</State>
</Address>
PO Box 423788 London, CA 12345
The untyped XML processing code for formatting looks as follows:
static string Format(XElement a)
{
string variablePart = null;
if (a.Element("Street") != null)
variablePart = (string)a.Element("Street");
else if (a.Element("POBox") != null)
// Prefix POBox with "PO Box"
variablePart = "PO Box " + (string)a.Element("POBox");
return
variablePart + "\n" // new line for rest
+ (string)a.Element("City") + ", "
+ (string)a.Element("State") + " "
+ (string)a.Element("Zip");
}
The corresponding LINQ to XSD code uses the same instance-oriented style and the same sort of presence tests as the LINQ to XML code. (The discoverability of choices is still supported by means of appropriate tool tips).
static string Format(USAddress a)
{
string variablePart = null;
if (a.Street != null)
variablePart = a.Street;
else if (a.POBox != null)
variablePart = "PO Box " + a.POBox;
return
variablePart + "\n"
+ a.City + ", "
+ a.State + " "
+ a.Zip;
}
Here is another schema type for addresses; this time we exercise ‘recurring element names’.
That is, a street address may have an optional second line.
<xs:complexType name="USStreetAddress">
<xs:sequence>
<xs:element name="Street" type="xs:string"/>
<xs:element name="Street" type="xs:string" minOccurs="0"/>
<xs:element name="City" type="xs:string"/>
<xs:element name="Zip" type="xs:int"/>
<xs:element name="State" type="xs:string"/>
</xs:sequence>
</xs:complexType>
The generated class comprises a single property Street only.
public IList<string> Street { get; set; }
Per redirection to the untyped tree, all DML capabilities of LINQ to XML are available.
In addition, LINQ to XSD classes provide typed DML as follows:
- All typed properties provide setters.
- All list-typed properties use mutable lists (i.e., IList as opposed to IEnumerable).
Given is an element e which is supposed to carry a child of name Salary. The content of the Salary node shall be updated such that it is increased by a factor.
The stated DML problem is modeled in LINQ to XML as follows:
var s = e.Element(ns + "Salary");
s.ReplaceContent((double)s * factor);
The stated DML problem is modeled in LINQ to XSD as follows:
e.Salary *= factor;
Given is a PurchaseOrder
element o.
Another Item i
shall be added to o.
The DML problem is modeled in LINQ to XSD as follows:
var o = PurchaseOrder.Load(...);
Item i = new Item { ... };
o.Item.Add(i);
The following rules apply to setters whose property type is not a list type.
All setters provide either ‘insert’ or ‘append’ mode. These modes cover the case where the value to be set is not null, and where the relevant element is not yet present in the tree to be modified. The actual choice depends on the underlying content model.
The ‘insert’ semantics models that the new element is inserted into the actual content in a ‘valid’ position with regard to order. Insert semantics is restricted to simpler content models such as content models without recurrent element names and without nested compositors.
The ‘append’ semantics is the general fall-back. The relevant element is simply appended to the end of the current content. Clearly, different orders of invoking a number of append setters may have an impact on validity. To this end, tool tips highlight setters with append semantics and list the regular expression for the content model.
***Note: the current release of LINQ to XSD is not yet aggressive in providing the more attractive ‘insert’ semantics for all possible content models and properties. The intention is that the ‘append’ semantics is eventually only left for those content models and properties that are intrinsically ambiguous when using an instance-oriented mapping. ***
All setters provide two additional modes:
- ‘delete’ – for value == null, and the relevant element is already present.
- ‘update’ – for value != null, and the relevant element is already present.
The dichotomy of ‘insert’ and ‘append’ semantics also applies to properties of a list type.
In this case, these two different semantics are concerned with the IList
operations Add and Insert.
LINQ to XML facilitates functional constructors for the construction of untyped XML trees.
That is, all attributes and child elements are listed as arguments of the XElement constructor.
VB 9.0 also provides XML literals as a more language-embedded approach to construction.
By contrast, LINQ to XSD facilitates default constructors and DML for the construction of XML objects.
To this end, the C# 3.0 / VB 9.0 expression-oriented object-initializer syntax can be used.
Consider the following XML fragment:
<PurchaseOrder xmlns="http://www.example.com/Orders">
<CustId>0815</CustId>
<Item>
<ProdId>1234</ProdId>
<Price>37</Price>
<Quantity>2</Quantity>
</Item>
<Item>
<ProdId>5678</ProdId>
<Price>1.5</Price>
<Quantity>3</Quantity>
</Item>
</PurchaseOrder>
XNamespace ns = "http://www.example.com/Orders";
XElement o = new XElement(ns + "PurchaseOrder",
new XElement(ns + "CustId", "0815"),
new XElement(ns + "Item",
new XElement(ns + "ProdId", "1234"),
new XElement(ns + "Price", "37"),
new XElement(ns + "Quantity", "2")
),
new XElement(ns + "Item",
new XElement(ns + "ProdId", "5678"),
new XElement(ns + "Price", "1.5"),
new XElement(ns + "Quantity", "3")
)
);
var o = new PurchaseOrder {
CustId = "0815",
Item = new Item[] {
new Item {
ProdId = "1234",
Price = 37,
Quantity = 2
},
new Item {
ProdId = "5678",
Price = 1.5,
Quantity = 3
}
}
};
Object initialization syntax should only be used on object types with ‘insert semantics’; see previous section.
As a fall-back, statement-oriented (as opposed to expression-oriented) initialization can always be used.
var o = new PurchaseOrder();
o.CustId = "0815";
var i = new Item();
i.ProdId = "1234";
i.Price = 37;
i.Quantity = 2;
o.Item.Add(i);
i = new Item();
i.ProdId = "5678";
i.Price = 1.5;
i.Quantity = 3;
o.Item.Add(i);
The generated classes rule out several programming mistakes statically (when compared to the basic LINQ to XML style of programming) because several XSD constraints are readily captured by leveraging the CLR type system. Most obviously, misspelling of element and attribute names is ruled out. However, it is important to note that (the current release of) LINQ to XSD does not support any sort of ‘full validation’ or ‘valid at all times’ contract. Separate validation of the input and the output of LINQ to XSD functionality should be considered. In the following, the validation contract for the various operations in typed XML programming is explained.
Given an XElement
instance, e, we can attempt to cast e to a specific subclass of XTypedElement. The validation contract varies depending on the fact whether the subclass corresponds to a global element declaration or a complex-type definition. In the case of a class corresponding to a global element declaration, the tag of e is checked to agree with the element name that is associated with the class. Subtyping (in the sense of substitution groups) is taken into account in this context. In the case of a class corresponding to a complex-type definition, e must be a subtree in a readily typed XML tree such that the subtree position is known (per XSD) to be of the relevant. Again, subtyping is taken into account. For instance, assume that the file XMLFile1.xml
contains an invoice. Then, the following statement sequence throws due to infeasible cast to a purchase order:
var element = XElement.Load("XMLFile1.xml");
var order = (PurchaseOrder)untypedOrder;
Executing load on a specific type, say PurchaseOrder, actually can be seen as a composition of untyped load followed by cast; hence, see cast-time validation. Consider the following sample code. This code throws, if we assume that the file XMLFile1.xml contains an invoice.
var order = PurchaseOrder.Load("XMLFile1.xml");
Constraints for required particles are enforced. That is, a getter for a required element throws if the relevant element is not present in the queried tree. However, it is important to notice that this check is tied to the specific getter; the absence of a missing element is not uncovered by any other operation. Data-type constraints are enforced similarly. For instance, the getter for a local element declaration with xs:int as element type throws if the inner text of the relevant element cannot be parsed to the CLR counterpart for xs:int.
Update-time validation – Data-type constraints are enforced. Insert-time validation – Data-type constraints are enforced. Delete-time validation – No validation is performed. Save-time validation – No validation is performed.
Note: the current release of LINQ to XSD does not fully comply with the above contract. Also, extra forms of validation are desirable. In particular, complex-type restrictions may be checked, and minOccurs/maxOccurs constraints may be checked more precisely.
Recurrent challenges in dealing with programming against complex schemas are these:
- What are the possible rooting types for complete XML trees in the input?
- When constructing valid instances, again, what are the rooting types to start with?
- For any given content model, is there any additional schema-level documentation available?
- When constructing and querying complex content models, what is its structure anyhow?
LINQ to XSD addresses these challenges by deriving discoverable object models:
- A LINQ to XSD project provides a special helper class, XRoot to be discussed below.
- Similarly, each namespace provides a special helper class, XRootNamespace.
- Tool tips for types comprise schema-level documentation where available.
- Likewise, tool tips for properties leverage element-particle documentation, if available.
- The regular expressions for content models are integrated into tool tips.
- Append semantics is pointed out per tool tip.
Semantically, the XRoot
class is the typed variation on LINQ to XML’s XDocument class. The class is generated from the XML-schema set of a LINQ to XSD project. The class adds some typed services for exploring the schemas in the project. Per mapped CLR namespace, there is also an XRootNamespace
class, which provides exactly the same services as XRoot
, but limited to the scope of a namespace.
Note: the XRoot
class and the overall support for discoverable object models are in a relatively experimental state in the current release of LINQ to XSD. In particular, more guidance on object construction is desirable.
The XRoot
class provides constructors and getters (per root element declaration of the original XML schema). These services help in creating and observing global element declarations. We recall that complete XML trees are necessarily of the type of a global element declaration (which is therefore also called root-element declaration). Hence, the XRoot
class makes it easier to construct and explore complete XML trees without the need to ‘guess on class names’, without confusing auxiliary classes for complex-type definitions with the more essential classes for root-element declarations.
Here is the interface of the XRoot class for the project of Quick overview.
public class XRoot {
// Load/Parse/Save methods; only one overload shown
public static XRoot Load(string xmlFile);
public static XRoot Parse(string xmlFile);
public virtual void Save(string fileName);
// Constructors per root element
public XRoot(Batch root);
public XRoot(PurchaseOrder root);
public XRoot(Item root);
// Getters per root element
public Batch Batch { get; }
public PurchaseOrder PurchaseOrder { get; }
public Item Item { get; }
// Handle on document
public XDocument XDocument { get; }
}
The use of the XRoot services is illustrated below.
You have got a document and you wish to start querying into it. Rather than guessing what the type at hand could be, you go through the member list of the XRoot class so as to get suggestions for possible roots that are admitted by the schema(s) in the project. The getters of the XRoot class serve this purpose. Of course, your choice (batch vs. purchase order vs. item) should better be backed up by an inspection of a concrete XML instance, where available. You can also check on the null value returned by the getter, and thereby dispatch on all possible root-element declarations.
You want to quickly get started with the construction of an instance for the XML schema(s) in the project. There is an easy way of doing this by going through the constructors of the XRoot class.