Parse xml as schema

Supported in: Batch, Streaming

Parses xml strings following the given schema definition, ignoring any fields not in the schema.

Expression categories: File, Struct

Declared arguments

  • Schema - Schema definition used when parsing the xml strings.
    Type<Struct>
  • Xml - The xml string to parse.
    Expression<String>
  • optional Attribute prefix - Prefix for attributes on tags.
    Literal<String>
  • optional Value tag - The tag used for the value when there are attributes in the element having no child.
    Literal<String>

Output type: Struct

Examples

Example 1: Base case

Argument values:

  • Schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: 2000,
},
id: XB-112,
}

Example 2: Null case

Description: When a requested field is missing in the input XML the field becomes null. Argument values:

  • Schema: Struct<id, airport<id, miles>>
  • Xml: xml
  • Attribute prefix: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
 </airport>
</airline>
{
airport: {
id: JFK,
miles: null,
},
id: XB-112,
}

Example 3: Null case

Description: When the requested schema is too small, only the fields in the schema are parsed. Argument values:

  • Schema: Struct<id>
  • Xml: xml
  • Attribute prefix: null
  • Value tag: null
xmlOutput
<airline>
 <id>XB-112</id>
 <airport>
  <id>JFK</id>
 </airport>
</airline>
{
id: XB-112,
}

Example 4: Null case

Description: You can read attributes by putting attribute prefix in front of the name. Argument values:

  • Schema: Struct<id, airport<_id, miles>>
  • Xml: xml
  • Attribute prefix: _
  • Value tag: null
xmlOutput
<airline> <id>XB-112</id> <airport id="JFK">
  <miles>2000</miles>
 </airport>
</airline>
{
airport: {
_id: JFK,
miles: 2000,
},
id: XB-112,
}