domino_admin_toolkit.checks.test_dmm_redis_queues module
DMM Redis Queue Health Checks
Data source: Redis (dmm-redis-ha instance) — LLEN and LRANGE on the
pending_jobs and running_jobs lists managed by dmm-plier and
dmm-compute.
Question answered: “Is the DMM job queue draining, and are there stale running-locks from crashed compute pods?”
- What this check does NOT cover:
Per-job stuck conditions or age — see test_dmm_ingestion_jobs_status.
DMM pod readiness — see test_dmm_pods_list.
Spark executor state — see info/test_dmm_spark.
Two checks live here, sharing a single Redis fetch per pytest session:
test_dmm_redis_pending_jobs— WARN at depth > 10, FAIL at depth > 30. A growing pending queue means compute can’t keep up or is wedged.test_dmm_redis_running_jobs— WARN at depth > 1. dmm-compute is single-threaded, so any depth above 1 means at least one crashed compute pod left a stale lock behind that will block new jobs from starting.
Thresholds inherited from RE-3125’s earlier analyzer (the per-job analyzer was the wrong place for cluster-wide queue depth — this is the right place).
- pydantic model domino_admin_toolkit.checks.test_dmm_redis_queues.DmmPendingQueueDepthAnalyzer
Bases:
AnalyzerBase[QueueDepthRow]WARN at depth > QUEUE_DEPTH_WARN, FAIL at depth > QUEUE_DEPTH_FAIL.
- analyze(data)
Analyzes one row and returns a list of CheckResult instances.
- Return type:
- Args:
data: One row dict (
TRow). The Runner calls this once per DataFrame row.- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'DmmPendingQueueDepthAnalyzer'
- pydantic model domino_admin_toolkit.checks.test_dmm_redis_queues.DmmRunningQueueAnalyzer
Bases:
AnalyzerBase[QueueDepthRow]WARN at depth > RUNNING_QUEUE_EXPECTED_MAX (stale locks).
- Fields:
-
field expected_max:
int= 1 Expected ceiling on running_jobs depth. dmm-compute is single-threaded.
- analyze(data)
Analyzes one row and returns a list of CheckResult instances.
- Return type:
- Args:
data: One row dict (
TRow). The Runner calls this once per DataFrame row.- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'DmmRunningQueueAnalyzer'
- class domino_admin_toolkit.checks.test_dmm_redis_queues.QueueDepthRow
Bases:
TypedDictPer-row shape passed to the queue analyzers.
- domino_admin_toolkit.checks.test_dmm_redis_queues.dmm_redis_queue_depths(k8s_client)
Single Redis hit per session: LLEN on both DMM queues. Returns a two-row DataFrame (
queue∈ {pending_jobs, running_jobs},depth) so each test can filter the slice its analyzer reasons over.Skip semantics mirror the rest of the DMM checks — Redis unreachable (RedisError or low-level connection errors) → skip. Anything more surprising (auth, protocol) propagates as a test ERROR. We do NOT return an empty DataFrame on failure: that would present as a misleading PASS-with-no-data via Runner’s
on_emptypath.
- domino_admin_toolkit.checks.test_dmm_redis_queues.test_dmm_redis_pending_jobs(dmm_redis_queue_depths, runner)
- Description:
Reports the depth of the DMM
pending_jobsRedis list.- Result:
PASS: depth ≤ 10. WARN: 10 < depth ≤ 30. Queue may be backing up; investigate compute health. FAIL: depth > 30. Queue is growing without draining. SKIP: DMM Redis unavailable.
- Thresholds:
WARN: depth > 10
FAIL: depth > 30
- Required Permissions:
Platform admin (kubectl exec on dmm-redis-ha for manual LRANGE).
- domino_admin_toolkit.checks.test_dmm_redis_queues.test_dmm_redis_running_jobs(dmm_redis_queue_depths, runner)
- Description:
Reports the depth of the DMM
running_jobsRedis list. Because dmm-compute is single-threaded, this list should hold at most one entry at a time. Anything above 1 means a previous compute pod crashed mid-job and left a stale lock behind.- Result:
PASS: depth ≤ 1. WARN: depth > 1. Stale locks present; investigate crashed compute pods. SKIP: DMM Redis unavailable.
- Thresholds:
WARN: depth > 1
- Required Permissions:
Platform admin (kubectl exec on dmm-redis-ha for manual LRANGE / DEL).