Text segmentation

Supported in: Batch, Streaming

Extract a series of text segments using sliding window segmentation.

Expression categories: String

Declared arguments

  • Expression - The body of text that is to be segmented.
    Expression<String>
  • Length - The length in terms of words for the segments that the text will be broken into.
    Expression<Integer>
  • optional Overflow - The number of words a segment can share with another segment.
    Expression<Integer>

Output type: Array<String>

Examples

Example 1: Base case

Description: This test shows the abilty of the tranform to properly segment asmall set of text where the end will be its own segment as well. Argument values:

  • Expression: string
  • Length: 3
  • Overflow: 1
stringOutput
hello world this is a test string[ hello world this, this is a, a test string, string ]

Example 2: Base case

Description: Test with negative overflow. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
She sells sea shells by2-1[ She sells, shells by ]

Example 3: Base case

Description: A larger test with overflow and a smaller segment at the end. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
hello world this is a larger test with overlap, the nature of the human spirit is strange as such i ...103[ hello world this is a larger test with overlap, the, with overlap, the nature of the human spirit ...

Example 4: Base case

Description: Test a string where overflow is set to 0and the last segment is smaller than a full length. Argument values:

  • Expression: string
  • Length: 3
  • Overflow: null
stringOutput
hello world this is a test string[ hello world this, is a test, string ]

Example 5: Base case

Description: Test with no overflow where the segments are perfectly divided by length. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
hello world this is a test string without overlap30[ hello world this, is a test, string without overlap ]

Example 6: Null case

Description: Test with no overflow where the segments are perfectly divided by length. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
nullnullnullnull

Example 7: Null case

Description: Test with no overflow where the segments are perfectly divided by length. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
null1nullnull

Example 8: Null case

Description: Test with no overflow where the segments are perfectly divided by length. Argument values:

  • Expression: string
  • Length: length
  • Overflow: overflow
stringlengthoverflowOutput
Hello worldnullnullnull