Foundry has three products available for writing code-based data transformations: Code Workbook, Code Workspaces, and Code Repositories. While there is some feature overlap between these products, each is geared toward distinct workflows and user types. The guide below is intended to help you determine which tool is best suited to your needs.
Code Repositories is recommended for creating robust production pipelines and supporting workflows that require an additional layer of governance and scrutiny. With Code Repositories, data engineers can create efficient pipelines in bulk. Example workflows that are a good fit for Code Repositories include:
Code Workspaces is recommended for quick and efficient exploratory analyses using JupyterLab® and RStudio® Workbench to combine familiar IDEs with the benefits of the Foundry platform, such as data security, branching, build scheduling, and resource management. Example workflows that are a good fit for Code Workspaces include:
Code Workbook is recommended for performing code-based analyses on high-scale data that would not otherwise be suitable for Code Workspaces. These analyses can be for one-time use or could produce an artifact that is updated on a recurring basis. Code Workbook can also be used to prototype pipelines, which can then be promoted to repositories. Example workflows that are a good fit for Code Workbook include:
Code Repositories | Code Workspaces | Code Workbook | |
---|---|---|---|
Features | Advanced pipelines | Exploratory analysis | Advanced analysis |
Enables complex workflows in long-lasting data pipelines with flexibility in performance optimization and code generation. | Enables interactive exploratory workflows using familiar IDEs tied with Foundry primitives. | Enables data analysis workflows with support for common analytical languages and visualization libraries. | |
Languages supported | Python, SQL, Java, Mesa | Python, R | Python, R, SQL |
Environments Supported | All environments | Kubernetes environments only | All environments |
Batch Pipeline support | Yes | Yes | Yes |
Incremental computation | Yes | No | No |
Transform generation | Yes | No | No |
Multi-output transforms | Yes | Yes | No |
Filesystem access | Yes | Yes | Yes |
Visualization support | No | Yes | Yes |
Iteration cycle | Iterate on code logic | Iterate on data discovery and analysis | Iterate on insight generation |
Designed to help iterate on code logic. Runtime debugger and previews can assist in validating transform logic. Data can be analyzed in Foundry after building. | Designed to help rapidly iterate on data discovery and analysis using widely known tools that seamlessly integrate with the rest of Foundry. | Designed to help generate insights from data; all transforms run on the full input data, interactive console enables ad-hoc queries, and Spark execution model is optimized for quick iteration. | |
Full data preview | Preview data sample, with the ability to pre-filter the input sample | Full data preview | Full data preview |
Debugger | Yes | No | No |
Console support | In debug mode | Yes | Yes |
Spark module management | Spark modules initiated at the job level | Spark-less environment for fast feedback loop | Spark modules kept warm for immediate interactivity, and initiated at the workbook level |
Operations | Data pipeline management | Data exploration management | Data analysis management |
Supports Foundry data management libraries and publishing custom Python libraries | Fully adjustable environment that can consume pip, CRAN, and conda libraries, including those published from Code Repositories | Can consume custom libraries published from Code Repositories; users can save pieces of logic as code templates, enabling point-and-click analysis by other users. | |
Data Expectations | Yes | No | No |
Publish custom libraries | Yes | No | No |
Consume custom libraries | Yes | Yes | Yes, for some environments |
Point-and-click code templates | No | No | Yes |
Change management | Governance | Flexibility | Rapid changes |
Prioritizes change traceability and governance to ensure that critical pipelines remain secure and robust; advanced review and approval workflows and complete changelogs. | Prioritizes rapid and flexible iteration with full branching support and automatic Git versioning. | Prioritizes rapid iteration and collaboration with a lightweight branching workflow; does not require CI checks or unit testing. | |
Full Git workflow | Yes | Yes | No |
Copy data after merge | No | No | Yes |
Administer and remove security markings | Yes | No | No |
Impact analysis views | Yes | No | No |
Advanced code review workflows | Yes | No | No |
Unit testing | Yes | No | No |
JupyterLab® is a registered trademark of NumFOCUS. RStudio® is a trademark of Posit™.