Skip to content

Latest commit

 

History

History
216 lines (179 loc) · 9.66 KB

File metadata and controls

216 lines (179 loc) · 9.66 KB

7.1 XML

XML is a commonly used data communication format in web services. Today, it's assuming a more and more important role in web development. In this section, we're going to introduce how to work with XML through Go's standard library.

I will not make any attempts to teach XML's syntax or conventions. For that, please read more documentation about XML itself. We will only focus on how to encode and decode XML files in Go.

Suppose you work in IT, and you have to deal with the following XML configuration file:

<?xml version="1.0" encoding="utf-8"?>
<servers version="1">
    <server>
        <serverName>Shanghai_VPN</serverName>
        <serverIP>127.0.0.1</serverIP>
    </server>
    <server>
        <serverName>Beijing_VPN</serverName>
        <serverIP>127.0.0.2</serverIP>
    </server>
</servers>

The above XML document contains two kinds of information about your server: the server name and IP. We will use this document in our following examples.

Parse XML

How do we parse this XML document? We can use the Unmarshal function in Go's xml package to do this.

func Unmarshal(data []byte, v interface{}) error

the data parameter receives a data stream from an XML source, and v is the structure you want to output the parsed XML to. It is an interface, which means you can convert XML to any structure you desire. Here, we'll only talk about how to convert from XML to the struct type since they share similar tree structures.

Sample code:

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "os"
)

type Recurlyservers struct {
    XMLName     xml.Name `xml:"servers"`
    Version     string   `xml:"version,attr"`
    Svs         []server `xml:"server"`
    Description string   `xml:",innerxml"`
}

type server struct {
    XMLName    xml.Name `xml:"server"`
    ServerName string   `xml:"serverName"`
    ServerIP   string   `xml:"serverIP"`
}

func main() {
    file, err := os.Open("servers.xml") // For read access.     
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }
    defer file.Close()
    data, err := ioutil.ReadAll(file)
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }
    v := Recurlyservers{}
    err = xml.Unmarshal(data, &v)
    if err != nil {
        fmt.Printf("error: %v", err)
        return
    }

    fmt.Println(v)
}

XML is actually a tree data structure, and we can define a very similar structure using structs in Go, then use xml.Unmarshal to convert from XML to our struct object. The sample code will print the following content:

{{ servers} 1 [{{ server} Shanghai_VPN 127.0.0.1} {{ server} Beijing_VPN 127.0.0.2}]
<server>
    <serverName>Shanghai_VPN</serverName>
    <serverIP>127.0.0.1</serverIP>
</server>
<server>
    <serverName>Beijing_VPN</serverName>
    <serverIP>127.0.0.2</serverIP>
</server>
}

We use xml.Unmarshal to parse the XML document to the corresponding struct object. You should see that we have something like xml:"serverName" in our struct. This is a feature of structs called struct tags for helping with reflection. Let's see the definition of Unmarshal again:

func Unmarshal(data []byte, v interface{}) error

The first argument is an XML data stream. The second argument is storage type and supports the struct, slice and string types. Go's XML package uses reflection for data mapping, so all fields in v should be exported. However, this causes a problem: how does it know which XML field corresponds to the mapped struct field? The answer is that the XML parser parses data in a certain order. The library will try to find the matching struct tag first. If a match cannot be found then it searches through the struct field names. Be aware that all tags, field names and XML elements are case sensitive, so you have to make sure that there is a one-to-one correspondence for the mapping to succeed.

Go's reflection mechanism allows you to use this tag information to reflect XML data to a struct object. If you want to know more about reflection in Go, please read the package documentation on struct tags and reflection.

Here are some rules when using the xml package to parse XML documents to structs:

  • If the field type is a string or []byte with the tag ",innerxml", Unmarshal will assign raw XML data to it, like Description in the above example:

    Shanghai_VPN127.0.0.1Beijing_VPN127.0.0.2

  • If a field is called XMLName and its type is xml.Name, then it gets the element name, like servers in above example.

  • If a field's tag contains the corresponding element name, then it gets the element name as well, like servername and serverip in the above example.

  • If a field's tag contains ",attr", then it gets the corresponding element's attribute, like version in above example.

  • If a field's tag contains something like "a>b>c", it gets the value of the element c of node b of node a.

  • If a field's tag contains "=", then it gets nothing.

  • If a field's tag contains ",any", then it gets all child elements which do not fit the other rules.

  • If the XML elements have one or more comments, all of these comments will be added to the first field that has the tag that contains ",comments". This field type can be a string or []byte. If this kind of field does not exist, all comments are discarded.

These rules tell you how to define tags in structs. Once you understand these rules, mapping XML to structs will be as easy as the sample code above. Because tags and XML elements have a one-to-one correspondence, we can also use slices to represent multiple elements on the same level.

Note that all fields in structs should be exported (capitalized) in order to parse data correctly.

Produce XML

What if we want to produce an XML document instead of parsing one. How do we do this in Go? Unsurprisingly, the xml package provides two functions which are Marshal and MarshalIndent, where the second function automatically indents the marshalled XML document. Their definition as follows:

func Marshal(v interface{}) ([]byte, error)
func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error)

The first argument in both of these functions is for storing a marshalled XML data stream.

Let's look at an example to see how this works:

package main

import (
    "encoding/xml"
    "fmt"
    "os"
)

type Servers struct {
    XMLName xml.Name `xml:"servers"`
    Version string   `xml:"version,attr"`
    Svs     []server `xml:"server"`
}

type server struct {
    ServerName string `xml:"serverName"`
    ServerIP   string `xml:"serverIP"`
}

func main() {
    v := &Servers{Version: "1"}
    v.Svs = append(v.Svs, server{"Shanghai_VPN", "127.0.0.1"})
    v.Svs = append(v.Svs, server{"Beijing_VPN", "127.0.0.2"})
    output, err := xml.MarshalIndent(v, "  ", "    ")
    if err != nil {
        fmt.Printf("error: %v\n", err)
    }
    os.Stdout.Write([]byte(xml.Header))

    os.Stdout.Write(output)
}

The above example prints the following information:

<?xml version="1.0" encoding="UTF-8"?>
<servers version="1">
<server>
    <serverName>Shanghai_VPN</serverName>
    <serverIP>127.0.0.1</serverIP>
</server>
<server>
    <serverName>Beijing_VPN</serverName>
    <serverIP>127.0.0.2</serverIP>
</server>
</servers>

As we've previously defined, the reason we have os.Stdout.Write([]byte(xml.Header)) is because both xml.MarshalIndent and xml.Marshal do not output XML headers on their own, so we have to explicitly print them in order to produce XML documents correctly.

Here we can see that Marshal also receives a v parameter of type interface{}. So what are the rules when marshalling to an XML document?

  • If v is an array or slice, it prints all elements like a value.
  • If v is a pointer, it prints the content that v is pointing to, printing nothing when v is nil.
  • If v is a interface, it deal with the interface as well.
  • If v is one of the other types, it prints the value of that type.

So how does xml.Marshal decide the elements' name? It follows the ensuing rules:

  • If v is a struct, it defines the name in the tag of XMLName.
  • The field name is XMLName and the type is xml.Name.
  • Field tag in struct.
  • Field name in struct.
  • Type name of marshal.

Then we need to figure out how to set tags in order to produce the final XML document.

  • XMLName will not be printed.
  • Fields that have tags containing "-" will not be printed.
  • If a tag contains "name,attr", it uses name as the attribute name and the field value as the value, like version in the above example.
  • If a tag contains ",attr", it uses the field's name as the attribute name and the field value as its value.
  • If a tag contains ",chardata", it prints character data instead of element.
  • If a tag contains ",innerxml", it prints the raw value.
  • If a tag contains ",comment", it prints it as a comment without escaping, so you cannot have "--" in its value.
  • If a tag contains "omitempty", it omits this field if its value is zero-value, including false, 0, nil pointer or nil interface, zero length of array, slice, map and string.
  • If a tag contains "a>b>c", it prints three elements where a contains b and b contains c, like in the following code:
FirstName string   `xml:"name>first"`
LastName  string   `xml:"name>last"`

<name>
<first>Asta</first>
<last>Xie</last>
</name>

You may have noticed that struct tags are very useful for dealing with XML, and the same goes for the other data formats we'll be discussing in the following sections. If you still find that you have problems with working with struct tags, you should probably read more documentation about them before diving into the next section.

Links