TransformInput
objectThe interface for low level operations on a Foundry dataset.
spark.df()
data.frame()
fileSystem()
TransformOutput
objectThe interface for low level write operations on a Foundry dataset.
write.spark.df
(df, partition_cols=NULL, bucket_cols=NULL, bucket_count=NULL, sort_by=NULL)
Write the given DataFrame ↗ to the output dataset.
Parameters |
|
write.data.frame
(rdf)
fileSystem()
FileSystem
objectls
(glob=NULL, regex='.*', show_hidden=FALSE)
Lists all files matching the given pattern (either glob
or regex
), with respect to the root directory of the dataset.
Parameters |
|
Returns | R array of the FileStatus named tuple (path, size, modified) - The logical path, file size (bytes), modified timestamp (ms since January 1, 1970 UTC) |
open
(path, open='r', disk_optimal=FALSE, encoding=default)
Open a FoundryFS file in the given mode.
Parameters |
|
Returns | An R connection object |
get_path
(path, open='r', disk_optimal=FALSE, encoding=default)
For a given FoundryFS (remote) path, returns the local temporary path.
Parameters |
|
Returns | str |
upload
(local_path, remote_path)
Upload the file from the local to the remote path. Write only.
Parameters |
|
Returns | None |
disk_optimal
settingIn the FileSystem
methods open()
and get_path()
, the disk_optimal
argument controls how file input and output (i/o) is handled.
By default, disk_optimal
is set to FALSE
in both open()
and get_path()
. In this mode, files are guaranteed to be downloaded before they are accessed.
If you choose to set disk_optimal
to TRUE
, files are downloaded simultaneously while the code executes. The temporary local path must be opened via fifo()
in order to read correctly. Note that not all libraries support reading this type of file.
You may choose to set disk_optimal
to TRUE
when the file you are reading is very large.
For example, let's imagine we have a very large txt file and we only want to read the first 10 lines. Use the below code to print only the first 10 lines, without reading the entire file.
Copied!1 2 3 4 5 6 7 8 9 10 11
disk_optimal_example<- function(large_txt_file) { fs <- large_txt_file$fileSystem() ## Open a connection with fifo() ## The text file is titled large_txt_file.txt conn <- fs$open("large_txt_file.txt", "r", disk_optimal = TRUE) A <- readLines(conn, n = 10) print(A) return(NULL) }
If you want to use R TransformOutput to write a file and then read it, disk_optimal
must be set to false.