TransformInput objectThe interface for low level operations on a Foundry dataset.
spark.df()
data.frame()
fileSystem()
TransformOutput objectThe interface for low level write operations on a Foundry dataset.
write.spark.df(df, partition_cols=NULL, bucket_cols=NULL, bucket_count=NULL, sort_by=NULL)
Write the given DataFrame ↗ to the output dataset.
| Parameters |
|
write.data.frame(rdf)
fileSystem()
FileSystem objectls(glob=NULL, regex='.*', show_hidden=FALSE)
Lists all files matching the given pattern (either glob or regex), with respect to the root directory of the dataset.
| Parameters |
|
| Returns | R array of the FileStatus named tuple (path, size, modified) - The logical path, file size (bytes), modified timestamp (ms since January 1, 1970 UTC) |
open(path, open='r', disk_optimal=FALSE, encoding=default)
Open a FoundryFS file in the given mode.
| Parameters |
|
| Returns | An R connection object |
get_path(path, open='r', disk_optimal=FALSE, encoding=default)
For a given FoundryFS (remote) path, returns the local temporary path.
| Parameters |
|
| Returns | str |
upload(local_path, remote_path)
Upload the file from the local to the remote path. Write only.
| Parameters |
|
| Returns | None |
disk_optimal settingIn the FileSystem methods open() and get_path(), the disk_optimal argument controls how file input and output (i/o) is handled.
By default, disk_optimal is set to FALSE in both open() and get_path(). In this mode, files are guaranteed to be downloaded before they are accessed.
If you choose to set disk_optimal to TRUE, files are downloaded simultaneously while the code executes. The temporary local path must be opened via fifo() in order to read correctly. Note that not all libraries support reading this type of file.
You may choose to set disk_optimal to TRUE when the file you are reading is very large.
For example, let's imagine we have a very large txt file and we only want to read the first 10 lines. Use the below code to print only the first 10 lines, without reading the entire file.
Copied!1 2 3 4 5 6 7 8 9 10 11disk_optimal_example<- function(large_txt_file) { fs <- large_txt_file$fileSystem() ## Open a connection with fifo() ## The text file is titled large_txt_file.txt conn <- fs$open("large_txt_file.txt", "r", disk_optimal = TRUE) A <- readLines(conn, n = 10) print(A) return(NULL) }
If you want to use R TransformOutput to write a file and then read it, disk_optimal must be set to false.