Analyzer Interface
analyzer_interface
Modules:
Name | Description |
---|---|
column_automap |
|
context |
|
data_type_compatibility |
|
declaration |
|
interface |
|
params |
|
Attributes
Classes
Functions
Modules
column_automap
Functions:
Name | Description |
---|---|
check_name_hint |
Returns true if every word in the hint (split by spaces) is present in the name, |
column_automap |
Matches user-provided columns to the expected columns based on the name hints. |
Attributes
Classes
Functions
check_name_hint(name, hint)
Returns true if every word in the hint (split by spaces) is present in the name, in a case insensitive manner.
Source code in analyzer_interface/column_automap.py
52 53 54 55 56 57 |
|
column_automap(user_columns, input_schema_columns)
Matches user-provided columns to the expected columns based on the name hints.
The resulting dictionary is keyed by the expected input column name.
Source code in analyzer_interface/column_automap.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
context
Classes:
Name | Description |
---|---|
AssetsReader |
|
BaseDerivedModuleContext |
Common interface for secondary analyzers and web presenters runtime contexts. |
FactoryOutputContext |
Output interface for both factory and api_facotry functions for web |
InputTableReader |
|
PrimaryAnalyzerContext |
|
SecondaryAnalyzerContext |
|
ShinyContext |
Output interface for Shiny dashboards |
TableReader |
|
TableWriter |
|
WebPresenterContext |
|
Classes
AssetsReader
Bases: ABC
Methods:
Name | Description |
---|---|
table |
Gets the table reader for the specified output. |
Source code in analyzer_interface/context.py
116 117 118 119 120 121 122 |
|
table(output_id)
abstractmethod
Gets the table reader for the specified output.
Source code in analyzer_interface/context.py
117 118 119 120 121 122 |
|
BaseDerivedModuleContext
pydantic-model
Bases: ABC
, BaseModel
Common interface for secondary analyzers and web presenters runtime contexts.
Fields:
-
temp_dir
(str
)
Source code in analyzer_interface/context.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
base
abstractmethod
property
Gets the base primary analyzer's context, which lets you inspect and load its outputs.
base_params
abstractmethod
property
Gets the primary analysis parameters.
temp_dir
pydantic-field
Gets the temporary directory that the module can freely write content to during its lifetime. This directory will not persist between runs.
dependency(secondary_interface)
abstractmethod
Gets the context of a secondary analyzer the current module depends on, which lets you inspect and load its outputs.
Source code in analyzer_interface/context.py
76 77 78 79 80 81 82 83 84 |
|
FactoryOutputContext
pydantic-model
Bases: BaseModel
Output interface for both factory and api_facotry functions for web presenters.
Fields:
-
shiny
(Optional[ShinyContext]
) -
api
(Optional[dict[str, Any]]
) -
data_frames
(Optional[dict[str, DataFrame]]
)
Source code in analyzer_interface/context.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
api = None
pydantic-field
API factory output for React dashboard REST API
data_frames = None
pydantic-field
API factory dataframe output for React dashboard REST API
shiny = None
pydantic-field
Factory oputput for shiny dashboards
InputTableReader
Bases: TableReader
Methods:
Name | Description |
---|---|
preprocess |
Given the manually loaded user input dataframe, apply column mapping and |
Source code in analyzer_interface/context.py
139 140 141 142 143 144 145 146 147 148 149 |
|
preprocess(df)
abstractmethod
Given the manually loaded user input dataframe, apply column mapping and semantic transformations to give the input dataframe that the analyzer expects.
Source code in analyzer_interface/context.py
140 141 142 143 144 145 146 147 148 149 |
|
PrimaryAnalyzerContext
pydantic-model
Bases: ABC
, BaseModel
Fields:
-
temp_dir
(str
)
Source code in analyzer_interface/context.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
params
abstractmethod
property
Gets the analysis parameters.
temp_dir
pydantic-field
Gets the temporary directory that the module can freely write content to during its lifetime. This directory will not persist between runs.
input()
abstractmethod
Gets the input reader context.
Note that this is in function form even though one input is expected, in anticipation that we may want to support multiple inputs in the future.
Source code in analyzer_interface/context.py
22 23 24 25 26 27 28 29 30 |
|
output(output_id)
abstractmethod
Gets the output writer context for the specified output ID.
Source code in analyzer_interface/context.py
40 41 42 43 44 45 |
|
SecondaryAnalyzerContext
pydantic-model
Bases: BaseDerivedModuleContext
Fields:
-
temp_dir
(str
)
Source code in analyzer_interface/context.py
107 108 109 110 111 112 113 |
|
output(output_id)
abstractmethod
Gets the output writer context
Source code in analyzer_interface/context.py
108 109 110 111 112 113 |
|
ShinyContext
pydantic-model
Bases: BaseModel
Output interface for Shiny dashboards
Fields:
-
panel
(NavPanel
) -
server_handler
(Optional[ServerCallback]
)
Source code in analyzer_interface/context.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
panel = None
pydantic-field
UI navigation panel to be added to shiny dashboard
server_handler = None
pydantic-field
Server handler callback to be called by the shiny application instance
TableReader
Bases: ABC
Attributes:
Name | Type | Description |
---|---|---|
parquet_path |
str
|
Gets the path to the table's parquet file. The module should expect a parquet |
Source code in analyzer_interface/context.py
125 126 127 128 129 130 131 132 133 |
|
parquet_path
abstractmethod
property
Gets the path to the table's parquet file. The module should expect a parquet file here.
TableWriter
Bases: ABC
Attributes:
Name | Type | Description |
---|---|---|
parquet_path |
str
|
Gets the path to the table's parquet file. The module should write a parquet |
Source code in analyzer_interface/context.py
152 153 154 155 156 157 158 159 160 |
|
parquet_path
abstractmethod
property
Gets the path to the table's parquet file. The module should write a parquet file to it.
WebPresenterContext
pydantic-model
Bases: BaseDerivedModuleContext
Fields:
Source code in analyzer_interface/context.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
dash_app
pydantic-field
The Dash app that is being built.
state_dir
abstractmethod
property
Gets the directory where the web presenter can store state that persists between runs. This state space is unique for each project/primary analyzer/web presenter combination.
data_type_compatibility
Functions:
Name | Description |
---|---|
get_data_type_compatibility_score |
Returns a score for the compatibility of the actual data type with the |
Attributes:
Name | Type | Description |
---|---|---|
data_type_mapping_preference |
dict[DataType, list[list[DataType]]]
|
For each data type, a list of lists of data types that are considered compatible |
Attributes
data_type_mapping_preference = {'text': [['text'], ['identifier', 'url']], 'integer': [['integer']], 'float': [['float', 'integer']], 'boolean': [['boolean']], 'datetime': [['datetime']], 'time': [['time'], ['datetime']], 'identifier': [['identifier'], ['url', 'datetime'], ['integer'], ['text']], 'url': [['url']]}
module-attribute
For each data type, a list of lists of data types that are considered compatible with it. The first list is the most preferred, the last list is the least. The items in each list are considered equally compatible.
Functions
get_data_type_compatibility_score(expected_data_type, actual_data_type)
Returns a score for the compatibility of the actual data type with the
expected data type. Higher (less negative) scores are better.
None
means the data types are not compatible.
Source code in analyzer_interface/data_type_compatibility.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
declaration
Classes:
Name | Description |
---|---|
AnalyzerDeclaration |
|
SecondaryAnalyzerDeclaration |
|
WebPresenterDeclaration |
|
Classes
AnalyzerDeclaration
pydantic-model
Bases: AnalyzerInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
input
(AnalyzerInput
) -
params
(list[AnalyzerParam]
) -
outputs
(list[AnalyzerOutput]
) -
kind
(Literal['primary']
) -
entry_point
(Callable[[PrimaryAnalyzerContext], None]
) -
default_params
(Callable[[PrimaryAnalyzerContext], dict[str, ParamValue]]
) -
is_distributed
(bool
)
Source code in analyzer_interface/declaration.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
__init__(interface, main, *, is_distributed=False, default_params=lambda _: dict())
Creates a primary analyzer declaration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
interface
|
AnalyzerInterface
|
The metadata interface for the primary analyzer. |
required |
main
|
Callable
|
The entry point function for the primary analyzer. This function should
take a single argument of type |
required |
is_distributed
|
bool
|
Set this explicitly to |
False
|
Source code in analyzer_interface/declaration.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
SecondaryAnalyzerDeclaration
pydantic-model
Bases: SecondaryAnalyzerInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
base_analyzer
(AnalyzerInterface
) -
depends_on
(list[SecondaryAnalyzerInterface]
) -
outputs
(list[AnalyzerOutput]
) -
kind
(Literal['secondary']
) -
entry_point
(Callable[[SecondaryAnalyzerContext], None]
)
Source code in analyzer_interface/declaration.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
__init__(interface, main)
Creates a secondary analyzer declaration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
interface
|
SecondaryAnalyzerInterface
|
The metadata interface for the secondary analyzer. |
required |
main
|
Callable
|
The entry point function for the secondary analyzer. This function should
take a single argument of type |
required |
Source code in analyzer_interface/declaration.py
58 59 60 61 62 63 64 65 66 67 68 69 |
|
WebPresenterDeclaration
pydantic-model
Bases: WebPresenterInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
base_analyzer
(AnalyzerInterface
) -
depends_on
(list[SecondaryAnalyzerInterface]
) -
kind
(Literal['web']
) -
factory
(Callable[[WebPresenterContext], Union[FactoryOutputContext, None]]
) -
shiny
(bool
) -
server_name
(str
)
Source code in analyzer_interface/declaration.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
__init__(interface, factory, name, shiny)
Creates a web presenter declaration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
interface
|
WebPresenterInterface
|
The metadata interface for the web presenter. |
required |
factory
|
Callable
|
The factory function that creates a Dash app for the web presenter. It should modify the Dash app in the context to add whatever plotting interface the web presenter needs. |
required |
server_name
|
str
|
The server name for the Dash app. Typically, you will use the global
variable If your web presenter has assets like images, CSS or JavaScript files,
you can put them in a folder named See Dash documentation for more details: https://dash.plotly.com
See also Python documentation for the |
required |
Source code in analyzer_interface/declaration.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
interface
Classes:
Name | Description |
---|---|
AnalyzerInterface |
|
AnalyzerOutput |
|
AnalyzerParam |
|
BaseAnalyzerInterface |
|
DerivedAnalyzerInterface |
|
InputColumn |
|
SecondaryAnalyzerInterface |
|
Attributes:
Name | Type | Description |
---|---|---|
DataType |
The semantic data type for a data column. This is not quite the same as |
Attributes
DataType = Literal['text', 'integer', 'float', 'boolean', 'datetime', 'identifier', 'url', 'time']
module-attribute
The semantic data type for a data column. This is not quite the same as structural data types like polars or pandas or even arrow types, but they represent how the data is intended to be interpreted.
text
is expected to be a free-form human-readable text content.integer
andfloat
are meant to be manipulated arithmetically.boolean
is a binary value.datetime
represents time and are meant to be manipulated as time values.time
represents time within a day, not including the date information.identifier
is a unique identifier for a record. It is not expected to be manipulated in any way.url
is a string that represents a URL.
Classes
AnalyzerInterface
pydantic-model
Bases: BaseAnalyzerInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
input
(AnalyzerInput
) -
params
(list[AnalyzerParam]
) -
outputs
(list[AnalyzerOutput]
) -
kind
(Literal['primary']
)
Source code in analyzer_interface/interface.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
input
pydantic-field
Specifies the input data schema for the analyzer.
outputs
pydantic-field
Specifies the output data schema for the analyzer.
params = []
pydantic-field
A list of parameters that the analyzer accepts.
AnalyzerOutput
pydantic-model
Bases: BaseModel
Fields:
Source code in analyzer_interface/interface.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
id
pydantic-field
Uniquely identifies the output data schema for the analyzer. The analyzer must include this key in the output dictionary.
name
pydantic-field
The human-friendly for the output.
AnalyzerParam
pydantic-model
Bases: BaseModel
Fields:
-
id
(str
) -
human_readable_name
(Optional[str]
) -
description
(Optional[str]
) -
type
(ParamType
) -
default
(Optional[ParamValue]
) -
backfill_value
(Optional[ParamValue]
)
Source code in analyzer_interface/interface.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
backfill_value = None
pydantic-field
Recommended if this is a parameter that is newly introduced in a previously released analyzer. The backfill is show what this parameter was before it became customizable.
default = None
pydantic-field
Optional: define a static default value for this parameter. A parameter without a default will need to be chosen explicitly by the user.
description = None
pydantic-field
A short description of the parameter. This is used in the UI to represent the parameter.
human_readable_name = None
pydantic-field
The human-friendly name for the parameter. This is used in the UI to represent the parameter.
id
pydantic-field
The name of the parameter. This becomes the key in the parameters dictionary that is passed to the analyzer.
type
pydantic-field
The type of the parameter. This is used for validation and for customizing the UX for parameter input.
BaseAnalyzerInterface
pydantic-model
Bases: BaseModel
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
)
Source code in analyzer_interface/interface.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
id
pydantic-field
The static ID for the analyzer that, with the version, uniquely identifies the analyzer and will be stored as metadata as part of the output data.
long_description = None
pydantic-field
A longer description of what the analyzer does that will be shown separately.
name
pydantic-field
The short human-readable name of the analyzer.
short_description
pydantic-field
A short, one-liner description of what the analyzer does.
version
pydantic-field
The version ID for the analyzer. In future, we may choose to support output migration between versions of the same analyzer.
DerivedAnalyzerInterface
pydantic-model
Bases: BaseAnalyzerInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
base_analyzer
(AnalyzerInterface
) -
depends_on
(list[SecondaryAnalyzerInterface]
)
Source code in analyzer_interface/interface.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
base_analyzer
pydantic-field
The base analyzer that this secondary analyzer extends. This is always a primary
analyzer. If your module depends on other secondary analyzers (which must have
the same base analyzer), you can specify them in the depends_on
field.
depends_on = []
pydantic-field
A dictionary of secondary analyzers that must be run before the current analyzer secondary analyzer is run. These secondary analyzers must have the same primary base.
InputColumn
pydantic-model
Bases: Column
Fields:
-
name
(str
) -
human_readable_name
(Optional[str]
) -
description
(Optional[str]
) -
data_type
(DataType
) -
name_hints
(list[str]
)
Source code in analyzer_interface/interface.py
198 199 200 201 202 203 204 205 206 207 |
|
name_hints = []
pydantic-field
Specifies a list of space-separated words that are likely to be found in the column name of the user-provided data. This is used to help the user map the input columns to the expected columns.
Any individual hint matching is sufficient for a match to be called. The hint in turn is matched if every word matches some part of the column name.
SecondaryAnalyzerInterface
pydantic-model
Bases: DerivedAnalyzerInterface
Fields:
-
id
(str
) -
version
(str
) -
name
(str
) -
short_description
(str
) -
long_description
(Optional[str]
) -
base_analyzer
(AnalyzerInterface
) -
depends_on
(list[SecondaryAnalyzerInterface]
) -
outputs
(list[AnalyzerOutput]
) -
kind
(Literal['secondary']
)
Source code in analyzer_interface/interface.py
157 158 159 160 161 162 163 |
|
outputs
pydantic-field
Specifies the output data schema for the analyzer.
params
Classes:
Name | Description |
---|---|
IntegerParam |
Represents an integer value |
TimeBinningParam |
Represents a time bin. |
TimeBinningValue |
|
Classes
IntegerParam
pydantic-model
Bases: BaseModel
Represents an integer value
The corresponding value will be of type int
.
Fields:
-
type
(Literal['integer']
) -
min
(int
) -
max
(int
)
Source code in analyzer_interface/params.py
16 17 18 19 20 21 22 23 24 25 |
|
TimeBinningParam
pydantic-model
Bases: BaseModel
Represents a time bin.
The corresponding value will be of type TimeBinningValue
.
Fields:
-
type
(Literal['time_binning']
)
Source code in analyzer_interface/params.py
28 29 30 31 32 33 34 35 |
|
TimeBinningValue
pydantic-model
Bases: BaseModel
Config:
arbitrary_types_allowed
:True
Fields:
-
unit
(TimeBinningUnit
) -
amount
(int
)
Source code in analyzer_interface/params.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
to_polars_truncate_spec()
Converts the value to a string that can be used in Polars truncate spec. See https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html
Source code in analyzer_interface/params.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|