Configuration¶
The protea.config subpackage provides tuning parameters and
configuration helpers used across the PROTEA stack.
Tuning parameters
protea.config.tuning exposes TuningSettings, a Pydantic settings
class that aggregates knobs for batch sizes, timeouts, and algorithm
parameters. Values are read from environment variables (with
PROTEA_ prefix) and fall back to documented defaults. Workers
instantiate a single TuningSettings object at startup; operations
receive it via dependency injection rather than importing it directly,
keeping them independently testable.
Runtime tuning settings (T-CONF.2).
Externalises hardcoded module-level constants from protea/ so an
operator can tune throughput, retry policy and timeouts per
deployment target (dev, prod-cloud, hpc-bsc, hpc-airgap) without
touching code.
Hierarchy (lowest to highest priority):
Defaults baked into the pydantic models below.
tuning:section inprotea/config/system.yaml.Environment variables of the form
PROTEA_TUNING__<group>__<field>.
Currently scoped to the QueueTuning group as a proof of concept.
The remaining categories from docs/CONFIG_INVENTORY.md
(WorkerTuning, OperationTuning, APILimits, ResearchKnobs) follow the
same pattern and will be added incrementally.
Example:
from protea.config.tuning import get_tuning
settings = get_tuning()
for attempt in range(settings.queue.publisher_max_attempts):
...
- class protea.config.tuning.APILimits(*, max_fasta_bytes: Annotated[int, Ge(ge=1024)] = 52428800, max_comment_length: Annotated[int, Ge(ge=1)] = 500, recent_limit: Annotated[int, Ge(ge=1)] = 20, page_limit: Annotated[int, Ge(ge=1)] = 100)¶
Bases:
BaseModelHTTP boundary limits enforced at the FastAPI router layer.
Sources:
api/routers/{annotate,query_sets,support}.py(verdocs/CONFIG_INVENTORY.md§D).- max_comment_length: int¶
- max_fasta_bytes: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- page_limit: int¶
- recent_limit: int¶
- class protea.config.tuning.OperationTuning(*, annotation_chunk_size: Annotated[int, Ge(ge=100)] = 10000, stream_chunk_size: Annotated[int, Ge(ge=100)] = 2000, store_chunk_size: Annotated[int, Ge(ge=500)] = 10000, numpy_query_chunk: Annotated[int, Ge(ge=10)] = 500, ref_cache_freshness_seconds: Annotated[int, Ge(ge=0)] = 300, aspect_knn_workers: Annotated[int, Ge(ge=1)] = 3)¶
Bases:
BaseModelModule-level chunk and batch sizes used inside operations.
HTTP retry policy and per-source timeouts live inside their respective pydantic payloads (
InsertProteinsPayload,LoadGoaAnnotationsPayload, etc.) because the caller picks them per-job. The values here are infra-level: how to slice work between memory and broker pressure constraints.Sources:
core/feature_enricher.py,core/knn_search.py,core/operations/{predict_go_terms,training_dump_helpers}.py(verdocs/CONFIG_INVENTORY.md§C).- annotation_chunk_size: int¶
- aspect_knn_workers: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- numpy_query_chunk: int¶
- ref_cache_freshness_seconds: int¶
- store_chunk_size: int¶
- stream_chunk_size: int¶
- class protea.config.tuning.QueueTuning(*, publisher_max_attempts: Annotated[int, Ge(ge=1)] = 12, publisher_base_delay: Annotated[float, Ge(ge=0.0)] = 1.0, oom_max_retries: Annotated[int, Ge(ge=0)] = 5, oom_base_delay: Annotated[int, Ge(ge=0)] = 5, oom_max_delay: Annotated[int, Ge(ge=1)] = 300, amqp_heartbeat: Annotated[int, Ge(ge=0)] = 600)¶
Bases:
BaseModelRabbitMQ publisher / consumer retry and dispatch knobs.
Sources:
infrastructure/queue/publisher.pyandinfrastructure/queue/consumer.py(verdocs/CONFIG_INVENTORY.md§A).- amqp_heartbeat: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- oom_base_delay: int¶
- oom_max_delay: int¶
- oom_max_retries: int¶
- publisher_base_delay: float¶
- publisher_max_attempts: int¶
- class protea.config.tuning.TuningSettings(*, queue: QueueTuning = <factory>, worker: WorkerTuning = <factory>, operation: OperationTuning = <factory>, api: APILimits = <factory>)¶
Bases:
BaseModelRoot tuning model that composes per-category sub-models.
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- operation: OperationTuning¶
- queue: QueueTuning¶
- worker: WorkerTuning¶
- class protea.config.tuning.WorkerTuning(*, db_pool_size: Annotated[int, Ge(ge=1)] = 20, db_pool_max_overflow: Annotated[int, Ge(ge=0)] = 40, db_pool_recycle_seconds: Annotated[int, Ge(ge=60)] = 3600, model_cache_max: Annotated[int, Ge(ge=1)] = 1, ref_cache_max: Annotated[int, Ge(ge=1)] = 1, reaper_main_timeout_seconds: Annotated[int, Ge(ge=300)] = 21600, reaper_default_timeout_seconds: Annotated[int, Ge(ge=300)] = 3600, reaper_stall_seconds: Annotated[int, Ge(ge=60)] = 1800, worker_shutdown_grace_seconds: Annotated[int, Ge(ge=1)] = 30, job_heartbeat_interval_seconds: Annotated[int, Ge(ge=5)] = 30, max_lease_requeues: Annotated[int, Ge(ge=0)] = 3, api_cache_default_ttl_seconds: Annotated[float, Ge(ge=1.0)] = 300.0)¶
Bases:
BaseModelPool sizes, in-process caches and reaper timeouts.
Sources:
infrastructure/database/engine.py,infrastructure/operations/{compute_embeddings,predict_go_terms}.py,workers/stale_job_reaper.py,api/cache.py(verdocs/CONFIG_INVENTORY.md§B).- api_cache_default_ttl_seconds: float¶
- db_pool_max_overflow: int¶
- db_pool_recycle_seconds: int¶
- db_pool_size: int¶
- job_heartbeat_interval_seconds: int¶
- max_lease_requeues: int¶
- model_cache_max: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reaper_default_timeout_seconds: int¶
- reaper_main_timeout_seconds: int¶
- reaper_stall_seconds: int¶
- ref_cache_max: int¶
- worker_shutdown_grace_seconds: int¶
- protea.config.tuning.get_tuning() TuningSettings¶
Load and cache the tuning settings.
- Cache reset (mostly for tests):
get_tuning.cache_clear()
See also
Configuration Reference: full environment-variable reference.
Infrastructure:
protea.infrastructure.settingsfor the database and AMQP connection strings.