A filesystem object for reading and writing raw dataset files in Spark transforms.
For lightweight, single-node transforms, see transforms.api.FoundryDataSidecarFileSystem.
Create a DataFrame ↗ containing the paths accessible within this dataset.
The DataFrame ↗ is partitioned by file size where each partition contains file paths whose combined size is at most spark.files.maxPartitionBytes bytes, or a single file if that file is larger than spark.files.maxPartitionBytes. The size of a file is calculated as its on-disk file size plus the spark.files.openCostInBytes.
ffd (first fit decreasing) or wfd (worst fit decreasing). While wfd tends to produce a less even distribution, it is much faster, so wfd is recommended for datasets containing a very large number of files. If a heuristic is not specified, one will be selected automatically.Fetches the Hadoop path of the dataset, which can be used for code that requires direct Hadoop IO.
NoneRecurses through all directories and lists all files matching the given patterns, starting from the root directory of the dataset.
FileStatus – The logical path, file size (bytes), and modified timestamp (ms since January 1, 1970 UTC)Open a FoundryFS file in the given mode.
io.open() ↗.