Analyzer Interface

`analyzer_interface`

Modules:

Name	Description
`column_automap`
`context`
`data_type_compatibility`
`declaration`
`interface`
`params`

Attributes

Classes

Functions

Modules

`column_automap`

Functions:

Name	Description
`check_name_hint`	Returns true if every word in the hint (split by spaces) is present in the name,
`column_automap`	Matches user-provided columns to the expected columns based on the name hints.

Attributes

Classes

Functions

`check_name_hint(name, hint)`

Returns true if every word in the hint (split by spaces) is present in the name, in a case insensitive manner.

Source code in analyzer_interface/column_automap.py

def check_name_hint(name: str, hint: str):
    """
    Returns true if every word in the hint (split by spaces) is present in the name,
    in a case insensitive manner.
    """
    return all(word.lower().strip() in name.lower() for word in hint.split(" "))

`column_automap(user_columns, input_schema_columns)`

Matches user-provided columns to the expected columns based on the name hints.

The resulting dictionary is keyed by the expected input column name.

Source code in analyzer_interface/column_automap.py

def column_automap(
    user_columns: list[UserInputColumn], input_schema_columns: list[InputColumn]
):
    """
    Matches user-provided columns to the expected columns based on the name hints.

    The resulting dictionary is keyed by the expected input column name.
    """
    matches: dict[str, str] = {}
    for input_column in input_schema_columns:
        max_score = None
        best_match_user_column = None
        for user_column in user_columns:
            current_score = get_data_type_compatibility_score(
                input_column.data_type, user_column.data_type
            )

            # Don't consider type-incompatible columns
            if current_score is None:
                continue

            # Boost the score if we have a name hint match such that
            # - among similarly compatible matches, those with name hints are preferred
            # - among name hint matches, those with the best data type compatibility are preferred
            if any(
                check_name_hint(user_column.name, hint)
                for hint in input_column.name_hints
            ):
                current_score += 10

            if max_score is None or current_score > max_score:
                max_score = current_score
                best_match_user_column = user_column

        if best_match_user_column is not None:
            matches[input_column.name] = best_match_user_column.name

    return matches

`context`

Classes:

Name	Description
`AssetsReader`
`BaseDerivedModuleContext`	Common interface for secondary analyzers and web presenters runtime contexts.
`FactoryOutputContext`	Output interface for both factory and api_facotry functions for web
`InputTableReader`
`PrimaryAnalyzerContext`
`SecondaryAnalyzerContext`
`ShinyContext`	Output interface for Shiny dashboards
`TableReader`
`TableWriter`
`WebPresenterContext`

Classes

`AssetsReader`

Bases: ABC

Methods:

Name	Description
`table`	Gets the table reader for the specified output.

Source code in analyzer_interface/context.py

class AssetsReader(ABC):
    @abstractmethod
    def table(self, output_id: str) -> "TableReader":
        """
        Gets the table reader for the specified output.
        """
        pass

Functions

table(output_id) abstractmethod

Gets the table reader for the specified output.

Source code in analyzer_interface/context.py

@abstractmethod
def table(self, output_id: str) -> "TableReader":
    """
    Gets the table reader for the specified output.
    """
    pass

`BaseDerivedModuleContext` `pydantic-model`

Bases: ABC, BaseModel

Common interface for secondary analyzers and web presenters runtime contexts.

Fields:

temp_dir (str)

Source code in analyzer_interface/context.py

class BaseDerivedModuleContext(ABC, BaseModel):
    """
    Common interface for secondary analyzers and web presenters runtime contexts.
    """

    temp_dir: str
    """
  Gets the temporary directory that the module can freely write content to
  during its lifetime. This directory will not persist between runs.
  """

    @property
    @abstractmethod
    def base_params(self) -> dict[str, ParamValue]:
        """
        Gets the primary analysis parameters.
        """
        pass

    @property
    @abstractmethod
    def base(self) -> "AssetsReader":
        """
        Gets the base primary analyzer's context, which lets you inspect and load its
        outputs.
        """
        pass

    @abstractmethod
    def dependency(
        self, secondary_interface: SecondaryAnalyzerInterface
    ) -> "AssetsReader":
        """
        Gets the context of a secondary analyzer the current module depends on, which
        lets you inspect and load its outputs.
        """
        pass

Attributes

base abstractmethod property

Gets the base primary analyzer's context, which lets you inspect and load its outputs.

base_params abstractmethod property

Gets the primary analysis parameters.

temp_dir pydantic-field

Gets the temporary directory that the module can freely write content to during its lifetime. This directory will not persist between runs.

Functions

dependency(secondary_interface) abstractmethod

Gets the context of a secondary analyzer the current module depends on, which lets you inspect and load its outputs.

Source code in analyzer_interface/context.py

@abstractmethod
def dependency(
    self, secondary_interface: SecondaryAnalyzerInterface
) -> "AssetsReader":
    """
    Gets the context of a secondary analyzer the current module depends on, which
    lets you inspect and load its outputs.
    """
    pass

`FactoryOutputContext` `pydantic-model`

Bases: BaseModel

Output interface for both factory and api_facotry functions for web presenters.

Fields:

shiny (Optional[ShinyContext])
api (Optional[dict[str, Any]])
data_frames (Optional[dict[str, DataFrame]])

Source code in analyzer_interface/context.py

class FactoryOutputContext(BaseModel):
    """
    Output interface for both factory and api_facotry functions for web
    presenters.
    """

    shiny: Optional[ShinyContext] = None
    """
    Factory oputput for shiny dashboards
    """

    api: Optional[dict[str, Any]] = None
    """
    API factory output for React dashboard REST API
    """

    data_frames: Optional[dict[str, DataFrame]] = None
    """
    API factory dataframe output for React dashboard REST API
    """

    class Config:
        arbitrary_types_allowed = True

Attributes

api = None pydantic-field

API factory output for React dashboard REST API

data_frames = None pydantic-field

API factory dataframe output for React dashboard REST API

shiny = None pydantic-field

Factory oputput for shiny dashboards

`InputTableReader`

Bases: TableReader

Methods:

Name	Description
`preprocess`	Given the manually loaded user input dataframe, apply column mapping and

Source code in analyzer_interface/context.py

class InputTableReader(TableReader):
    @abstractmethod
    def preprocess[
        PolarsDataFrameLike
    ](self, df: PolarsDataFrameLike) -> PolarsDataFrameLike:
        """
        Given the manually loaded user input dataframe, apply column mapping and
        semantic transformations to give the input dataframe that the analyzer
        expects.
        """
        pass

Functions

preprocess(df) abstractmethod

Given the manually loaded user input dataframe, apply column mapping and semantic transformations to give the input dataframe that the analyzer expects.

Source code in analyzer_interface/context.py

@abstractmethod
def preprocess[
    PolarsDataFrameLike
](self, df: PolarsDataFrameLike) -> PolarsDataFrameLike:
    """
    Given the manually loaded user input dataframe, apply column mapping and
    semantic transformations to give the input dataframe that the analyzer
    expects.
    """
    pass

`PrimaryAnalyzerContext` `pydantic-model`

Bases: ABC, BaseModel

Fields:

temp_dir (str)

Source code in analyzer_interface/context.py

class PrimaryAnalyzerContext(ABC, BaseModel):
    temp_dir: str
    """
  Gets the temporary directory that the module can freely write content to
  during its lifetime. This directory will not persist between runs.
  """

    @abstractmethod
    def input(self) -> "InputTableReader":
        """
        Gets the input reader context.

        **Note that this is in function form** even though one input is expected,
        in anticipation that we may want to support multiple inputs in the future.
        """
        pass

    @property
    @abstractmethod
    def params(self) -> dict[str, ParamValue]:
        """
        Gets the analysis parameters.
        """
        pass

    @abstractmethod
    def output(self, output_id: str) -> "TableWriter":
        """
        Gets the output writer context for the specified output ID.
        """
        pass

Attributes

params abstractmethod property

Gets the analysis parameters.

temp_dir pydantic-field

Gets the temporary directory that the module can freely write content to during its lifetime. This directory will not persist between runs.

Functions

input() abstractmethod

Gets the input reader context.

Note that this is in function form even though one input is expected, in anticipation that we may want to support multiple inputs in the future.

Source code in analyzer_interface/context.py

@abstractmethod
def input(self) -> "InputTableReader":
    """
    Gets the input reader context.

    **Note that this is in function form** even though one input is expected,
    in anticipation that we may want to support multiple inputs in the future.
    """
    pass

output(output_id) abstractmethod

Gets the output writer context for the specified output ID.

Source code in analyzer_interface/context.py

@abstractmethod
def output(self, output_id: str) -> "TableWriter":
    """
    Gets the output writer context for the specified output ID.
    """
    pass

`SecondaryAnalyzerContext` `pydantic-model`

Bases: BaseDerivedModuleContext

Fields:

temp_dir (str)

Source code in analyzer_interface/context.py

class SecondaryAnalyzerContext(BaseDerivedModuleContext):
    @abstractmethod
    def output(self, output_id: str) -> "TableWriter":
        """
        Gets the output writer context
        """
        pass

Functions

output(output_id) abstractmethod

Gets the output writer context

Source code in analyzer_interface/context.py

@abstractmethod
def output(self, output_id: str) -> "TableWriter":
    """
    Gets the output writer context
    """
    pass

`ShinyContext` `pydantic-model`

Bases: BaseModel

Output interface for Shiny dashboards

Fields:

panel (NavPanel)
server_handler (Optional[ServerCallback])

Source code in analyzer_interface/context.py

class ShinyContext(BaseModel):
    """
    Output interface for Shiny dashboards
    """

    panel: NavPanel = None
    """
    UI navigation panel to be added to shiny dashboard
    """

    server_handler: Optional[ServerCallback] = None
    """
    Server handler callback to be called by the shiny application instance
    """

    class Config:
        arbitrary_types_allowed = True

Attributes

panel = None pydantic-field

UI navigation panel to be added to shiny dashboard

server_handler = None pydantic-field

Server handler callback to be called by the shiny application instance

`TableReader`

Bases: ABC

Attributes:

Name	Type	Description
`parquet_path`	`str`	Gets the path to the table's parquet file. The module should expect a parquet

Source code in analyzer_interface/context.py

class TableReader(ABC):
    @property
    @abstractmethod
    def parquet_path(self) -> str:
        """
        Gets the path to the table's parquet file. The module should expect a parquet
        file here.
        """
        pass

Attributes

parquet_path abstractmethod property

Gets the path to the table's parquet file. The module should expect a parquet file here.

`TableWriter`

Bases: ABC

Attributes:

Name	Type	Description
`parquet_path`	`str`	Gets the path to the table's parquet file. The module should write a parquet

Source code in analyzer_interface/context.py

class TableWriter(ABC):
    @property
    @abstractmethod
    def parquet_path(self) -> str:
        """
        Gets the path to the table's parquet file. The module should write a parquet
        file to it.
        """
        pass

Attributes

parquet_path abstractmethod property

Gets the path to the table's parquet file. The module should write a parquet file to it.

`WebPresenterContext` `pydantic-model`

Bases: BaseDerivedModuleContext

Fields:

temp_dir (str)
dash_app (Dash)

Source code in analyzer_interface/context.py

class WebPresenterContext(BaseDerivedModuleContext):
    dash_app: Dash
    """
  The Dash app that is being built.
  """

    @property
    @abstractmethod
    def state_dir(self) -> str:
        """
        Gets the directory where the web presenter can store state that persists
        between runs. This state space is unique for each
        project/primary analyzer/web presenter combination.
        """
        pass

    class Config:
        arbitrary_types_allowed = True

Attributes

dash_app pydantic-field

The Dash app that is being built.

state_dir abstractmethod property

Gets the directory where the web presenter can store state that persists between runs. This state space is unique for each project/primary analyzer/web presenter combination.

`data_type_compatibility`

Functions:

Name	Description
`get_data_type_compatibility_score`	Returns a score for the compatibility of the actual data type with the

Attributes:

Name	Type	Description
`data_type_mapping_preference`	`dict[DataType, list[list[DataType]]]`	For each data type, a list of lists of data types that are considered compatible

Attributes

`data_type_mapping_preference = {'text': [['text'], ['identifier', 'url']], 'integer': [['integer']], 'float': [['float', 'integer']], 'boolean': [['boolean']], 'datetime': [['datetime']], 'time': [['time'], ['datetime']], 'identifier': [['identifier'], ['url', 'datetime'], ['integer'], ['text']], 'url': [['url']]}` `module-attribute`

For each data type, a list of lists of data types that are considered compatible with it. The first list is the most preferred, the last list is the least. The items in each list are considered equally compatible.

Functions

`get_data_type_compatibility_score(expected_data_type, actual_data_type)`

Returns a score for the compatibility of the actual data type with the expected data type. Higher (less negative) scores are better. None means the data types are not compatible.

Source code in analyzer_interface/data_type_compatibility.py

def get_data_type_compatibility_score(
    expected_data_type: DataType, actual_data_type: DataType
):
    """
    Returns a score for the compatibility of the actual data type with the
    expected data type. Higher (less negative) scores are better.
    `None` means the data types are not compatible.
    """
    if expected_data_type == actual_data_type:
        return 0

    for i, preference_list in enumerate(
        data_type_mapping_preference[expected_data_type]
    ):
        if actual_data_type in preference_list:
            return -(i + 1)

    return None

`declaration`

Classes:

Name	Description
`AnalyzerDeclaration`
`SecondaryAnalyzerDeclaration`
`WebPresenterDeclaration`

Classes

`AnalyzerDeclaration` `pydantic-model`

Bases: AnalyzerInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
input (AnalyzerInput)
params (list[AnalyzerParam])
outputs (list[AnalyzerOutput])
kind (Literal['primary'])
entry_point (Callable[[PrimaryAnalyzerContext], None])
default_params (Callable[[PrimaryAnalyzerContext], dict[str, ParamValue]])
is_distributed (bool)

Source code in analyzer_interface/declaration.py

class AnalyzerDeclaration(AnalyzerInterface):
    entry_point: Callable[[PrimaryAnalyzerContext], None]
    default_params: Callable[[PrimaryAnalyzerContext], dict[str, ParamValue]]
    is_distributed: bool

    def __init__(
        self,
        interface: AnalyzerInterface,
        main: Callable,
        *,
        is_distributed: bool = False,
        default_params: Callable[[PrimaryAnalyzerContext], dict[str, ParamValue]] = (
            lambda _: dict()
        )
    ):
        """Creates a primary analyzer declaration

        Args:
          interface (AnalyzerInterface): The metadata interface for the primary analyzer.

          main (Callable):
            The entry point function for the primary analyzer. This function should
            take a single argument of type `PrimaryAnalyzerContext` and should ensure
            that the outputs specified in the interface are generated.

          is_distributed (bool):
            Set this explicitly to `True` once the analyzer is ready to be shipped
            to end users; it will make the analyzer available in the distributed
            executable.
        """
        super().__init__(
            **interface.model_dump(),
            entry_point=main,
            default_params=default_params,
            is_distributed=is_distributed
        )

Functions

__init__(interface, main, *, is_distributed=False, default_params=lambda _: dict())

Creates a primary analyzer declaration

Parameters:

Name	Type	Description	Default
`interface`	`AnalyzerInterface`	The metadata interface for the primary analyzer.	required
`main`	`Callable`	The entry point function for the primary analyzer. This function should take a single argument of type `PrimaryAnalyzerContext` and should ensure that the outputs specified in the interface are generated.	required
`is_distributed`	`bool`	Set this explicitly to `True` once the analyzer is ready to be shipped to end users; it will make the analyzer available in the distributed executable.	`False`

Source code in analyzer_interface/declaration.py

def __init__(
    self,
    interface: AnalyzerInterface,
    main: Callable,
    *,
    is_distributed: bool = False,
    default_params: Callable[[PrimaryAnalyzerContext], dict[str, ParamValue]] = (
        lambda _: dict()
    )
):
    """Creates a primary analyzer declaration

    Args:
      interface (AnalyzerInterface): The metadata interface for the primary analyzer.

      main (Callable):
        The entry point function for the primary analyzer. This function should
        take a single argument of type `PrimaryAnalyzerContext` and should ensure
        that the outputs specified in the interface are generated.

      is_distributed (bool):
        Set this explicitly to `True` once the analyzer is ready to be shipped
        to end users; it will make the analyzer available in the distributed
        executable.
    """
    super().__init__(
        **interface.model_dump(),
        entry_point=main,
        default_params=default_params,
        is_distributed=is_distributed
    )

`SecondaryAnalyzerDeclaration` `pydantic-model`

Bases: SecondaryAnalyzerInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
base_analyzer (AnalyzerInterface)
depends_on (list[SecondaryAnalyzerInterface])
outputs (list[AnalyzerOutput])
kind (Literal['secondary'])
entry_point (Callable[[SecondaryAnalyzerContext], None])

Source code in analyzer_interface/declaration.py

class SecondaryAnalyzerDeclaration(SecondaryAnalyzerInterface):
    entry_point: Callable[["SecondaryAnalyzerContext"], None]

    def __init__(self, interface: SecondaryAnalyzerInterface, main: Callable):
        """Creates a secondary analyzer declaration

        Args:
          interface (SecondaryAnalyzerInterface): The metadata interface for the secondary analyzer.

          main (Callable):
            The entry point function for the secondary analyzer. This function should
            take a single argument of type `SecondaryAnalyzerContext` and should ensure
            that the outputs specified in the interface are generated.
        """
        super().__init__(**interface.model_dump(), entry_point=main)

Functions

__init__(interface, main)

Creates a secondary analyzer declaration

Parameters:

Name	Type	Description	Default
`interface`	`SecondaryAnalyzerInterface`	The metadata interface for the secondary analyzer.	required
`main`	`Callable`	The entry point function for the secondary analyzer. This function should take a single argument of type `SecondaryAnalyzerContext` and should ensure that the outputs specified in the interface are generated.	required

Source code in analyzer_interface/declaration.py

def __init__(self, interface: SecondaryAnalyzerInterface, main: Callable):
    """Creates a secondary analyzer declaration

    Args:
      interface (SecondaryAnalyzerInterface): The metadata interface for the secondary analyzer.

      main (Callable):
        The entry point function for the secondary analyzer. This function should
        take a single argument of type `SecondaryAnalyzerContext` and should ensure
        that the outputs specified in the interface are generated.
    """
    super().__init__(**interface.model_dump(), entry_point=main)

`WebPresenterDeclaration` `pydantic-model`

Bases: WebPresenterInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
base_analyzer (AnalyzerInterface)
depends_on (list[SecondaryAnalyzerInterface])
kind (Literal['web'])
factory (Callable[[WebPresenterContext], Union[FactoryOutputContext, None]])
shiny (bool)
server_name (str)

Source code in analyzer_interface/declaration.py

class WebPresenterDeclaration(WebPresenterInterface):
    factory: Callable[["WebPresenterContext"], Union[FactoryOutputContext, None]]
    shiny: bool
    server_name: str

    def __init__(
        self,
        interface: WebPresenterInterface,
        factory: Callable,
        name: str,
        shiny: bool,
    ):
        """Creates a web presenter declaration

        Args:
          interface (WebPresenterInterface): The metadata interface for the web presenter.

          factory (Callable):
            The factory function that creates a Dash app for the web presenter. It should
            modify the Dash app in the context to add whatever plotting interface
            the web presenter needs.

          server_name (str):
            The server name for the Dash app. Typically, you will use the global
            variable `__name__` here.

            If your web presenter has assets like images, CSS or JavaScript files,
            you can put them in a folder named `assets` in the same directory
            as the file where `__name__` is used. The Dash app will serve these
            files at the `/assets/` URL, using the python module name in `__name__`
            to determine the absolute path to the assets folder.

            See Dash documentation for more details: https://dash.plotly.com
            See also Python documentation for the `__name__` variable:
            https://docs.python.org/3/tutorial/modules.html

        """
        super().__init__(
            **interface.model_dump(), factory=factory, server_name=name, shiny=shiny
        )

Functions

__init__(interface, factory, name, shiny)

Creates a web presenter declaration

Parameters:

Name	Type	Description	Default
`interface`	`WebPresenterInterface`	The metadata interface for the web presenter.	required
`factory`	`Callable`	The factory function that creates a Dash app for the web presenter. It should modify the Dash app in the context to add whatever plotting interface the web presenter needs.	required
`server_name`	`str`	The server name for the Dash app. Typically, you will use the global variable `__name__` here. If your web presenter has assets like images, CSS or JavaScript files, you can put them in a folder named `assets` in the same directory as the file where `__name__` is used. The Dash app will serve these files at the `/assets/` URL, using the python module name in `__name__` to determine the absolute path to the assets folder. See Dash documentation for more details: https://dash.plotly.com See also Python documentation for the `__name__` variable: https://docs.python.org/3/tutorial/modules.html	required

Source code in analyzer_interface/declaration.py

def __init__(
    self,
    interface: WebPresenterInterface,
    factory: Callable,
    name: str,
    shiny: bool,
):
    """Creates a web presenter declaration

    Args:
      interface (WebPresenterInterface): The metadata interface for the web presenter.

      factory (Callable):
        The factory function that creates a Dash app for the web presenter. It should
        modify the Dash app in the context to add whatever plotting interface
        the web presenter needs.

      server_name (str):
        The server name for the Dash app. Typically, you will use the global
        variable `__name__` here.

        If your web presenter has assets like images, CSS or JavaScript files,
        you can put them in a folder named `assets` in the same directory
        as the file where `__name__` is used. The Dash app will serve these
        files at the `/assets/` URL, using the python module name in `__name__`
        to determine the absolute path to the assets folder.

        See Dash documentation for more details: https://dash.plotly.com
        See also Python documentation for the `__name__` variable:
        https://docs.python.org/3/tutorial/modules.html

    """
    super().__init__(
        **interface.model_dump(), factory=factory, server_name=name, shiny=shiny
    )

`interface`

Classes:

Name	Description
`AnalyzerInterface`
`AnalyzerOutput`
`AnalyzerParam`
`BaseAnalyzerInterface`
`DerivedAnalyzerInterface`
`InputColumn`
`SecondaryAnalyzerInterface`

Attributes:

Name	Type	Description
`DataType`		The semantic data type for a data column. This is not quite the same as

Attributes

`DataType = Literal['text', 'integer', 'float', 'boolean', 'datetime', 'identifier', 'url', 'time']` `module-attribute`

The semantic data type for a data column. This is not quite the same as structural data types like polars or pandas or even arrow types, but they represent how the data is intended to be interpreted.

text is expected to be a free-form human-readable text content.
integer and float are meant to be manipulated arithmetically.
boolean is a binary value.
datetime represents time and are meant to be manipulated as time values.
time represents time within a day, not including the date information.
identifier is a unique identifier for a record. It is not expected to be manipulated in any way.
url is a string that represents a URL.

Classes

`AnalyzerInterface` `pydantic-model`

Bases: BaseAnalyzerInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
input (AnalyzerInput)
params (list[AnalyzerParam])
outputs (list[AnalyzerOutput])
kind (Literal['primary'])

Source code in analyzer_interface/interface.py

class AnalyzerInterface(BaseAnalyzerInterface):
    input: AnalyzerInput
    """
  Specifies the input data schema for the analyzer.
  """

    params: list[AnalyzerParam] = []
    """
  A list of parameters that the analyzer accepts.
  """

    outputs: list["AnalyzerOutput"]
    """
  Specifies the output data schema for the analyzer.
  """

    kind: Literal["primary"] = "primary"

Attributes

input pydantic-field

Specifies the input data schema for the analyzer.

outputs pydantic-field

Specifies the output data schema for the analyzer.

params = [] pydantic-field

A list of parameters that the analyzer accepts.

`AnalyzerOutput` `pydantic-model`

Bases: BaseModel

Fields:

id (str)
name (str)
description (Optional[str])
columns (list[OutputColumn])
internal (bool)

Source code in analyzer_interface/interface.py

class AnalyzerOutput(BaseModel):
    id: str
    """
  Uniquely identifies the output data schema for the analyzer. The analyzer
  must include this key in the output dictionary.
  """

    name: str
    """The human-friendly for the output."""

    description: Optional[str] = None

    columns: list["OutputColumn"]

    internal: bool = False

    def get_column_by_name(self, name: str):
        for column in self.columns:
            if column.name == name:
                return column
        return None

    def transform_output(self, output_df: pl.LazyFrame | pl.DataFrame):
        output_columns = output_df.lazy().collect_schema().names()
        return output_df.select(
            [
                pl.col(col_name).alias(
                    output_spec.human_readable_name_or_fallback()
                    if output_spec
                    else col_name
                )
                for col_name in output_columns
                if (output_spec := self.get_column_by_name(col_name)) or True
            ]
        )

Attributes

id pydantic-field

Uniquely identifies the output data schema for the analyzer. The analyzer must include this key in the output dictionary.

name pydantic-field

The human-friendly for the output.

`AnalyzerParam` `pydantic-model`

Bases: BaseModel

Fields:

id (str)
human_readable_name (Optional[str])
description (Optional[str])
type (ParamType)
default (Optional[ParamValue])
backfill_value (Optional[ParamValue])

Source code in analyzer_interface/interface.py

class AnalyzerParam(BaseModel):
    id: str
    """
    The name of the parameter. This becomes the key in the parameters dictionary
    that is passed to the analyzer.
    """

    human_readable_name: Optional[str] = None
    """
    The human-friendly name for the parameter. This is used in the UI to
    represent the parameter.
    """

    description: Optional[str] = None
    """
    A short description of the parameter. This is used in the UI to represent
    the parameter.
    """

    type: ParamType
    """
    The type of the parameter. This is used for validation and for customizing
    the UX for parameter input.
    """

    default: Optional[ParamValue] = None
    """
    Optional: define a static default value for this parameter. A parameter
    without a default will need to be chosen explicitly by the user.
    """

    backfill_value: Optional[ParamValue] = None
    """
    Recommended if this is a parameter that is newly introduced in a previously
    released analyzer. The backfill is show what this parameter was before it
    became customizable.
    """

    @property
    def print_name(self):
        return self.human_readable_name or self.id

Attributes

backfill_value = None pydantic-field

Recommended if this is a parameter that is newly introduced in a previously released analyzer. The backfill is show what this parameter was before it became customizable.

default = None pydantic-field

Optional: define a static default value for this parameter. A parameter without a default will need to be chosen explicitly by the user.

description = None pydantic-field

A short description of the parameter. This is used in the UI to represent the parameter.

human_readable_name = None pydantic-field

The human-friendly name for the parameter. This is used in the UI to represent the parameter.

id pydantic-field

The name of the parameter. This becomes the key in the parameters dictionary that is passed to the analyzer.

type pydantic-field

The type of the parameter. This is used for validation and for customizing the UX for parameter input.

`BaseAnalyzerInterface` `pydantic-model`

Bases: BaseModel

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])

Source code in analyzer_interface/interface.py

class BaseAnalyzerInterface(BaseModel):
    id: str
    """
  The static ID for the analyzer that, with the version, uniquely identifies the
  analyzer and will be stored as metadata as part of the output data.
  """

    version: str
    """
  The version ID for the analyzer. In future, we may choose to support output
  migration between versions of the same analyzer.
  """

    name: str
    """
  The short human-readable name of the analyzer.
  """

    short_description: str
    """
  A short, one-liner description of what the analyzer does.
  """

    long_description: Optional[str] = None
    """
  A longer description of what the analyzer does that will be shown separately.
  """

Attributes

id pydantic-field

The static ID for the analyzer that, with the version, uniquely identifies the analyzer and will be stored as metadata as part of the output data.

long_description = None pydantic-field

A longer description of what the analyzer does that will be shown separately.

name pydantic-field

The short human-readable name of the analyzer.

short_description pydantic-field

A short, one-liner description of what the analyzer does.

version pydantic-field

The version ID for the analyzer. In future, we may choose to support output migration between versions of the same analyzer.

`DerivedAnalyzerInterface` `pydantic-model`

Bases: BaseAnalyzerInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
base_analyzer (AnalyzerInterface)
depends_on (list[SecondaryAnalyzerInterface])

Source code in analyzer_interface/interface.py

class DerivedAnalyzerInterface(BaseAnalyzerInterface):
    base_analyzer: AnalyzerInterface
    """
  The base analyzer that this secondary analyzer extends. This is always a primary
  analyzer. If your module depends on other secondary analyzers (which must have
  the same base analyzer), you can specify them in the `depends_on` field.
  """

    depends_on: list["SecondaryAnalyzerInterface"] = []
    """
  A dictionary of secondary analyzers that must be run before the current analyzer
  secondary analyzer is run. These secondary analyzers must have the same
  primary base.
  """

Attributes

base_analyzer pydantic-field

The base analyzer that this secondary analyzer extends. This is always a primary analyzer. If your module depends on other secondary analyzers (which must have the same base analyzer), you can specify them in the depends_on field.

depends_on = [] pydantic-field

A dictionary of secondary analyzers that must be run before the current analyzer secondary analyzer is run. These secondary analyzers must have the same primary base.

`InputColumn` `pydantic-model`

Bases: Column

Fields:

name (str)
human_readable_name (Optional[str])
description (Optional[str])
data_type (DataType)
name_hints (list[str])

Source code in analyzer_interface/interface.py

class InputColumn(Column):
    name_hints: list[str] = []
    """
  Specifies a list of space-separated words that are likely to be found in the
  column name of the user-provided data. This is used to help the user map the
  input columns to the expected columns.

  Any individual hint matching is sufficient for a match to be called. The hint
  in turn is matched if every word matches some part of the column name.
  """

Attributes

name_hints = [] pydantic-field

Specifies a list of space-separated words that are likely to be found in the column name of the user-provided data. This is used to help the user map the input columns to the expected columns.

Any individual hint matching is sufficient for a match to be called. The hint in turn is matched if every word matches some part of the column name.

`SecondaryAnalyzerInterface` `pydantic-model`

Bases: DerivedAnalyzerInterface

Fields:

id (str)
version (str)
name (str)
short_description (str)
long_description (Optional[str])
base_analyzer (AnalyzerInterface)
depends_on (list[SecondaryAnalyzerInterface])
outputs (list[AnalyzerOutput])
kind (Literal['secondary'])

Source code in analyzer_interface/interface.py

class SecondaryAnalyzerInterface(DerivedAnalyzerInterface):
    outputs: list[AnalyzerOutput]
    """
  Specifies the output data schema for the analyzer.
  """

    kind: Literal["secondary"] = "secondary"

Attributes

outputs pydantic-field

Specifies the output data schema for the analyzer.

`params`

Classes:

Name	Description
`IntegerParam`	Represents an integer value
`TimeBinningParam`	Represents a time bin.
`TimeBinningValue`

Classes

`IntegerParam` `pydantic-model`

Bases: BaseModel

Represents an integer value

The corresponding value will be of type int.

Fields:

type (Literal['integer'])
min (int)
max (int)

Source code in analyzer_interface/params.py

class IntegerParam(BaseModel):
    """
    Represents an integer value

    The corresponding value will be of type `int`.
    """

    type: Literal["integer"] = "integer"
    min: int
    max: int

`TimeBinningParam` `pydantic-model`

Bases: BaseModel

Represents a time bin.

The corresponding value will be of type TimeBinningValue.

Fields:

type (Literal['time_binning'])

Source code in analyzer_interface/params.py

class TimeBinningParam(BaseModel):
    """
    Represents a time bin.

    The corresponding value will be of type `TimeBinningValue`.
    """

    type: Literal["time_binning"] = "time_binning"

`TimeBinningValue` `pydantic-model`

Bases: BaseModel

Config:

arbitrary_types_allowed: True

Fields:

unit (TimeBinningUnit)
amount (int)

Source code in analyzer_interface/params.py

class TimeBinningValue(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)

    unit: TimeBinningUnit
    amount: int

    def to_polars_truncate_spec(self) -> str:
        """
        Converts the value to a string that can be used in Polars truncate spec.
        See https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html
        """
        amount = self.amount
        unit = self.unit
        if unit == "year":
            return f"{amount}y"
        if unit == "month":
            return f"{amount}mo"
        if unit == "week":
            return f"{amount}w"
        if unit == "day":
            return f"{amount}d"
        if unit == "hour":
            return f"{amount}h"
        if unit == "minute":
            return f"{amount}m"
        if unit == "second":
            return f"{amount}s"

        raise ValueError("Invalid time binning value")

    def to_human_readable_text(self) -> str:
        amount = self.amount
        unit = self.unit

        if unit == "year":
            return f"{amount} year{'s' if amount > 1 else ''}"
        if unit == "month":
            return f"{amount} month{'s' if amount > 1 else ''}"
        if unit == "week":
            return f"{amount} week{'s' if amount > 1 else ''}"
        if unit == "day":
            return f"{amount} day{'s' if amount > 1 else ''}"
        if unit == "hour":
            return f"{amount} hour{'s' if amount > 1 else ''}"
        if unit == "minute":
            return f"{amount} minute{'s' if amount > 1 else ''}"
        if unit == "second":
            return f"{amount} second{'s' if amount > 1 else ''}"

        raise ValueError("Invalid time binning value")

Functions

to_polars_truncate_spec()

Converts the value to a string that can be used in Polars truncate spec. See https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html

Source code in analyzer_interface/params.py

def to_polars_truncate_spec(self) -> str:
    """
    Converts the value to a string that can be used in Polars truncate spec.
    See https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html
    """
    amount = self.amount
    unit = self.unit
    if unit == "year":
        return f"{amount}y"
    if unit == "month":
        return f"{amount}mo"
    if unit == "week":
        return f"{amount}w"
    if unit == "day":
        return f"{amount}d"
    if unit == "hour":
        return f"{amount}h"
    if unit == "minute":
        return f"{amount}m"
    if unit == "second":
        return f"{amount}s"

    raise ValueError("Invalid time binning value")

Analyzer Interface