Parse xml as schema

Supported in: Batch, Streaming

Parses xml strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: File, Struct

Declared arguments

  • Schema - Schema definition used when parsing the xml strings.
    Type<Struct>
  • Xml - The xml string to parse.
    Expression<String>
  • optional Attribute prefix - Prefix for attributes on tags.
    Literal<String>
  • optional Ignore namespace - If set, ignores the namespace on elements and attributes. For example, <ns1 ns2="value" /> would be treated as if it were just . Defaults to false.
    Literal<Boolean>
  • optional Value tag - The tag used for the value when there are attributes in the element having no child.
    Literal<String>

Output type: Struct

Examples

Example 1: Base case

Argument values:

  • Schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: 2000,
},
id: XB-112,
}

Example 2: Base case

Description: When namespace is ignored, parsing ignores namespace in the data. Note that namespaces in the schema will never match a key since the namespace is filtered. Argument values:

  • Schema: Struct<name, email, address<nevermatches:street, city, state, zip>>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: true
  • Value tag: null
xmlOutput
<ns1>
<ns1>John Doe</ns1>
<ns1>john.doe@example.com</ns1...
{
address: {
city: Exampleville,
nevermatches: null,...

Example 3: Null case

Description: When a requested field is missing in the input XML the field becomes null. Argument values:

  • Schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: null,
},
id: XB-112,
}

Example 4: Null case

Description: When the requested schema is too small, only the fields in the schema are parsed. Argument values:

  • Schema: Struct<id>
  • Xml: xml
  • Attribute prefix: null
  • Ignore namespace: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
 </airport>
</airline>
{
id: XB-112,
}

Example 5: Null case

Description: You can read attributes by putting attribute prefix in front of the name. Argument values:

  • Schema: Struct<id, airport<_id, miles>>
  • Xml: xml
  • Attribute prefix: _
  • Ignore namespace: null
  • Value tag: null
xmlOutput
<airline> <id>XB-112</id> <airport id="JFK">
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
_id: JFK,
miles: 2000,
},
id: XB-112,
}