Comparison: Code Repositories vs. Code Workspaces vs. Code Workbook

Foundry has three products available for writing code-based data transformations: Code Workbook, Code Workspaces, and Code Repositories. While there is some feature overlap between these products, each is geared toward distinct workflows and user types. The guide below is intended to help you determine which tool is best suited to your needs.

Code Repositories is recommended for creating robust production pipelines and supporting workflows that require an additional layer of governance and scrutiny. With Code Repositories, data engineers can create efficient pipelines in bulk. Example workflows that are a good fit for Code Repositories include:

  • A daily pipeline at high data scale which requires incremental compute.
  • A high-visibility pipeline with strict governance requirements to be able to revert to previous versions of historical code, or gate code changes on unit tests passing.

Code Workspaces is recommended for quick and efficient exploratory analyses using JupyterLab® and RStudio® Workbench to combine familiar IDEs with the benefits of the Foundry platform, such as data security, branching, build scheduling, and resource management. Example workflows that are a good fit for Code Workspaces include:

  • Running a cell-by-cell data analysis and exporting its contents to a shareable report
  • Prototyping a data transformation pipeline or a machine learning model

Code Workbook is recommended for performing code-based analyses on high-scale data that would not otherwise be suitable for Code Workspaces. These analyses can be for one-time use or could produce an artifact that is updated on a recurring basis. Code Workbook can also be used to prototype pipelines, which can then be promoted to repositories. Example workflows that are a good fit for Code Workbook include:

  • Investigating the results of a clinical trial by testing out different p-values.
  • Creating interactive visualizations to share with others.

Comparison summary

Code RepositoriesCode WorkspacesCode Workbook
FeaturesAdvanced pipelinesExploratory analysisAdvanced analysis
Enables complex workflows in long-lasting data pipelines with flexibility in performance optimization and code generation.Enables interactive exploratory workflows using familiar IDEs tied with Foundry primitives.Enables data analysis workflows with support for common analytical languages and visualization libraries.
Languages supportedPython, SQL, Java, MesaPython, RPython, R, SQL
Environments SupportedAll environmentsKubernetes environments onlyAll environments
Batch Pipeline supportYesYesYes
Incremental computationYesNoNo
Transform generationYesNoNo
Multi-output transformsYesYesNo
Filesystem accessYesYesYes
Visualization supportNoYesYes
Iteration cycleIterate on code logicIterate on data discovery and analysisIterate on insight generation
Designed to help iterate on code logic. Runtime debugger and previews can assist in validating transform logic. Data can be analyzed in Foundry after building.Designed to help rapidly iterate on data discovery and analysis using widely known tools that seamlessly integrate with the rest of Foundry.Designed to help generate insights from data; all transforms run on the full input data, interactive console enables ad-hoc queries, and Spark execution model is optimized for quick iteration.
Full data previewPreview data sample, with the ability to pre-filter the input sampleFull data previewFull data preview
DebuggerYesNoNo
Console supportIn debug modeYesYes
Spark module managementSpark modules initiated at the job levelSpark-less environment for fast feedback loopSpark modules kept warm for immediate interactivity, and initiated at the workbook level
OperationsData pipeline managementData exploration managementData analysis management
Supports Foundry data management libraries and publishing custom Python librariesFully adjustable environment that can consume pip, CRAN, and conda libraries, including those published from Code RepositoriesCan consume custom libraries published from Code Repositories; users can save pieces of logic as code templates, enabling point-and-click analysis by other users.
Data ExpectationsYesNoNo
Publish custom librariesYesNoNo
Consume custom librariesYesYesYes, for some environments
Point-and-click code templatesNoNoYes
Change managementGovernanceFlexibilityRapid changes
Prioritizes change traceability and governance to ensure that critical pipelines remain secure and robust; advanced review and approval workflows and complete changelogs.Prioritizes rapid and flexible iteration with full branching support and automatic Git versioning.Prioritizes rapid iteration and collaboration with a lightweight branching workflow; does not require CI checks or unit testing.
Full Git workflowYesYesNo
Copy data after mergeNoNoYes
Administer and remove security markingsYesNoNo
Impact analysis viewsYesNoNo
Advanced code review workflowsYesNoNo
Unit testingYesNoNo
Table summary
Code Repositories features
  • Code Repositories features advanced pipelines and enables complex workflows in long-lasting data pipelines with flexibility in performance optimization and code generation.
  • Languages supported in Code Repositories include Python, SQL, Java, and Mesa.
  • Code Repositories supports incremental computation, transform generation, multi-output transforms, and filesystem access.
  • Code Repositories does not support visualizations.
Code Workspaces features
  • Code Workspaces features quick and efficient exploratory workflows with an embedded support for JupyterLab® and RStudio® Workbench in Foundry.
  • Languages supported in Code Workspaces include Python and R.
  • Code Workspaces supports filesystem access and provides full flexibility on notebook-based analyses.
  • Code Workspaces does not support distributed Spark, and is therefore better suited for data that can fit within the workspace's compute limits.
Code Workbook features
  • Code Workbook features advanced analysis analysis workflows with support for common analytical languages and visualization libraries.
  • Languages supported in Code Workbook include Python, R, and SQL.
  • Code Workbook supports filesystem access and visualization.
  • Code Workbook does not support incremental computation, transform generation, or multi-output transforms.
Code Repositories iteration cycle
  • Code Repositories is designed to help iterate on code logic. Data can be analyzed in Foundry after building.
  • Code Repositories supports data sample previews to validate transform logic, with the ability to pre-filter the input sample.
  • Code Repositories supports debugging at runtime.
  • In Code Repositories, Spark modules are initiated at the job level.
Code Workspaces iteration cycle
  • Code Workspaces is designed to help explore and analyze data. Results can then be shared, published to dashboards, turned into re-usable transforms, or exported to production-ready pipeline tools such as Code Repositories or Pipeline Builder.
  • Code Workspaces offers the full flexibility of the JupyterLab® and RStudio® Workbench IDEs, including full code and data previews.
  • Code Workspaces provides cell-by-cell iteration for instant feedback on code execution.
  • In Code Workspaces, no Spark modules are required and a fully customizable kernel is available for ad-hoc adjustments of the environment.
Code Workbook iteration cycle
  • Code Workbook is designed to help generate insights from data. All transforms run on the full input data, and Spark execution models are optimized for quick iteration.
  • Code Workbook supports full data previews.
  • Code Workbook provides console support for ad-hoc analysis of transforms.
  • In Code Workbook, Spark modules are kept warm for immediate interactivity and initiated at the workbook level.
Code Repositories operations
  • Code Repositories supports Foundry data management libraries and custom Python libraries.
  • Code Repositories supports data expectations, publishing custom libraries, and consuming custom libraries.
  • Code Repositories does not support point-and-click code templates.
Code Workspaces operations
  • Code Workspaces can consume pip, CRAN, and conda libraries, including those published from Code Repositories, and environments can be modified quickly.
  • Code Workspaces does not support data expectations or publishing custom libraries.
  • Code Workspaces does not support point-and-click code templates.
Code Workbook operations
  • Code Workbook can consume custom libraries published from Code Repositories, and users can save pieces of logic as code templates, enabling point-and-click analysis by other users.
  • Code Workbook does not support data expectations or publishing custom libraries.
  • Code Workbook does consume custom libraries for some Spark environments.
  • Code Workbook supports point-and-click templates.
Code Repositories change management
  • Code Repositories prioritizes change traceability and governance to ensure that critical pipelines remain secure and robust.
  • Code Repositories provides complete changelogs.
  • Code Repositories provides a full Git workflow, security marking administration and removal, impact analysis views, advanced code review workflows, and unit testing.
  • Code Repositories does not support copying data after merging.
Code Workspaces change management
  • Code Workspaces prioritizes rapid and flexible iteration with full branching support and automatic Git versioning.
  • Code Workspaces are fully backed by Code Repositories and benefit from their full Git workflow.
  • Code Workspaces does not support copying data after merging.
  • Code Workspaces stores safe checkpoints of its notebook's contents for 30 days, allowing users to safely retain and retrieve any given state, while also providing the opportunity to permanently store backups of the code in the Git repository.
Code Workbook change management
  • Code Workbook prioritizes rapid iteration and collaboration with a lightweight branching workflow. Code Workbook does not require CI checks or unit testing.
  • Code Workbook supports copying data after merging.
  • Code Workbook does not provide a full Git workflow, security marking administration or removal, impact analysis views, advanced code review workflows, or unit testing.

JupyterLab® is a registered trademark of NumFOCUS. RStudio® is a trademark of Posit™.