Python functions are currently in a beta state and may not be available on all enrollments.
You can write Python functions in Code Repositories and reuse them across Pipeline Builder and ontology-based applications like Workshop.
If you cannot manipulate your data with existing transformation options in Pipeline Builder, want to incorporate external Python libraries, or have complex logic you want to reuse across pipelines, you can create your own Python user-defined function (UDF).
Python functions also enable you to write logic that can be executed quickly for Workshop, Slate, and other ontology-based applications to empower decision-making processes.
Python functions can be reused between Workshop and Pipeline Builder as long as the input and output types are both supported. For example, objects and functions using objects as inputs or outputs are not supported in Pipeline Builder.
To create a Python functions repository, first navigate to a Project in which you would like to save the repository.
Then, select New and choose Code Repository. Select Functions -> Python Functions and then Initialize repository.
Open the my_function.py
file from the default repository template. There, you will see a function that looks like:
Copied!1 2 3 4 5
from functions.api import function, String @function def my_function() -> String: return "Hello World!"
Notice that the function adheres to the following constraints:
@function
from the functions.api
package to be recognized as a Python function. You may have multiple Python files with multiple functions in each file, but only the functions with this annotation will be registered as Python functions.string
from the functions API, but it may also be declared as the corresponding Python type str
.Even if you declare the type of an argument with the API type (for example, string
), your function will be passed the corresponding Python type at runtime (in this example, str
).
Below is the full list of currently supported functions API types, their corresponding Python types, and whether that type can be declared using its corresponding Python type instead of the functions API type:
Functions API type | Can declare as Python type? | Corresponding Python type |
---|---|---|
Array | Yes | list |
Binary | Yes | bytes |
Boolean | Yes | Boolean |
Byte | No | int* |
Date | Yes | datetime.date |
Decimal | Yes | decimal.Decimal |
Double | No | float* |
Float | Yes | float |
Integer | Yes | int |
Long | No | int* |
Map | Yes | dict |
Set | Yes | set |
Short | No | int* |
String | Yes | string |
Timestamp | Yes | datetime.datetime |
Although both Integer
and Long
correspond to the Python type int
, any fields marked as int
directly in your Function signature will be registered with type Integer
. Therefore, we recommend using either the Integer
or Long
types from the API to register numerical data types instead. Similar guidelines apply to Float
and Double
— using the Python type float
directly in your function signature will be registered as Float
by default.
Another example function with inputs is shown below:
Copied!1 2 3 4 5 6
from functions.api import function, Long, String, Timestamp @function def get_end_day_of_week(start_time: Timestamp, elapsed_millis: Long) -> String: # function logic here pass
As seen in the above table, this function could also be declared using only built-in Python types:
Copied!1 2 3 4 5 6 7
from functions.api import function from datetime import datetime @function def get_end_day_of_week(start_time: datetime, elapsed_millis: int) -> str: # function logic here pass
Or using a combination of built-in and API types:
Copied!1 2 3 4 5 6 7
from functions.api import function, Long, String from datetime import datetime @function def get_end_day_of_week(start_time: datetime, elapsed_millis: Long) -> String: # function logic here pass
Lastly, you could use custom types:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
from dataclasses import dataclass from functions.api import Float, Integer, String, function @dataclass class InventoryItem: """Class for keeping track of an item in inventory.""" name: String unit_price: Float quantity_on_hand: Integer = 0 def total_cost(self) -> Float: return self.unit_price * self.quantity_on_hand @function def custom_type_with_init_from_decorator(inventory_item: InventoryItem) -> Float: return inventory_item.total_cost()
If you want to use Python libraries in your code, you can either add the name of the library to the requirements
→ run
section of the meta.yaml file:
Or you can navigate to the Libraries section of the sidebar, search for the library, and select Add & install:
For your functions to be usable, you must first tag a version of them for release. Commit your work, then select Tag version:
Choose what you want this version of your functions to be called:
Once you select Tag and Release, select View in the pop-up to monitor the progress of the release. Alternatively, you can select the Branches tab, then select Tags and releases. If your release succeeds, you should be able to view all of the functions published here.
Otherwise, you can inspect the failed build to see the error that occurred.
To import your functions in Pipeline Builder, go to Reusables > User-defined functions > Import UDF.
Select your function(s) from the list and select Add:
Your functions should now be visible in the transform picker alongside built-in transforms to be used in your pipeline as normal.
To use your function in Workshop you will need to deploy it. From your published functions under Tags and Releases, select Open in Ontology Manager. In Ontology Manager, select the version of the function repository you want to use in applications, then select Create and start deployment.
In Workshop, search for the function as usual. For functions that are deployed there will be an icon with one of three states for both the function and the function version.
Only one version of the function’s repository is hosted at a given time. To make changes to functions with no downtime, we recommend adding a new function (like function_v1
) with the changes and tagging as outlined above. From your published functions, under Tags and Releases, select Open in Ontology Manager.
In Ontology Manager, select the version of the function repository you want to use in applications, then select Upgrade.
This will allow you to have function_v0
and function_v1
available at the same time while you make changes to downstream applications. When function_v0
is no longer used, you can delete the function.
If your function is not working as expected and you are in Workshop, first check if the issue is related to the logic or the responsiveness of the function. If there is an issue with the logic, inspect the source code in the backing code repository. If there is an issue with the function being unresponsive or throwing an error, follow the steps below.
Not deployed
or Upgrading
, hover over the function’s information icon and select Configure. This will take you to Ontology Manager where you can select Start Deployment to get your function running again.Deploying a function allocates a cluster of nodes that serve all functions from your repository for a given version.