Other

Collections

  • array(*cols)
  • array_contains(col, value)
  • size(col)
  • sort_array(col, asc=True)
  • struct(*cols)

Sorting

  • asc(col)
  • desc(col)

Binary

  • bitwiseNOT(col)
  • shiftLeft(col, numBits)
  • shiftRight(col, numBits)
  • shiftRightUnsigned(col, numBits)

Dealing with null values

  • coalesce(*cols)
  • isnan(col)
  • isnull(col)

Columns

  • col(col) or column(col)
  • create_map(*cols)
  • explode(col)
  • expr(str)
  • hash(*cols)
  • input_file_name()
  • posexplode(col)
  • sha1(col)
  • sha2(col, numBits)
  • soundex(col)
  • spark_partition_id()

JSON

  • from_json(col, schema, options={})
  • get_json_object(col, path)
  • json_tuple(col, *fields)
  • to_json(col, options={})

Checkpoints

  • checkpoint(eager=True)
  • localCheckpoint(eager=True)

The checkpoint() function is used to temporarily store a DataFrame on disk, whereas localCheckpoint() stores them in executor memory. Use the eager parameter value to set whether or not the DataFrame is checkpointed immediately (default value is True).