Debugging

This guide provides an overview of debugging techniques available in Transforms Python. More information about errors and exceptions can be found in the Python documentation ↗.

Using the Debugger

A useful tool for debugging python transforms is the Code Repositories Debugger. Learn more about the Debugger.

Reading a Python traceback

A traceback in Python is equivalent to a stack trace in Java. In Python, any exceptions that are left unhandled will result in a traceback, which includes a stack trace with error messages. Most Transforms Python runtime failures surface as tracebacks, so it is important to understand how to read tracebacks.

Consider the following code example:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class Stats(object): nums = [] def add(self, n): self.nums.append(n) def sum(self): return sum(self.nums) def avg(self): return self.sum() / len(self.nums) def main(): stats = Statistics() stats.add(1) stats.add(2) stats.add(3) print stats.avg()

Running this code results in the following traceback:

Copied!
1 2 3 4 5 6 Traceback (most recent call last): File "test.py", line 26, in <module> main() File "test.py", line 16, in main stats = Statistics() NameError: global name 'Statistics' is not defined

Unlike Java stack traces, Python tracebacks show the most recent call last. So reading from the bottom-up, the traceback shows:

  • The exception name: NameError. There are many built-in Python exception classes ↗, but it’s also possible for code to define its own exception classes.
  • The exception message: global name 'Statistics' is not defined. This message contains the most useful information for debugging purposes.
  • The sequence of function calls leading up to the thrown exception: File "test.py", line 26, in <module> followed by the line of code in question (line 16).

Using this traceback, we can see that the exception occurs at line 16 of test.py in the main method. Specifically, the line of code causing the error is stats = Statistics(), and the exception thrown is NameError. From this, we can deduce that the name “Statistics” does not exist. Looking back at our example code, it appears we meant to use the name “Stats” instead of “Statistics”.

Logging

You should use the standard Python logging module ↗.

See the section for info on reading logs for more details.

Only INFO-level logs and higher are saved.

The following code example shows how you can output logs to help with debugging:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 from transforms.api import transform_df, Input, Output from myproject.datasets import utils import logging log = logging.getLogger(__name__) @transform_df( Output("/path/to/output/dataset"), my_input=Input("/path/to/input/dataset"), ) def my_compute_function(my_input): log.info("Input size: %d", my_input.count()) return my_input