domino_admin_toolkit.checks.test_filetask_queue_status module

Filetask Queue Monitoring Test

This module provides comprehensive monitoring of the Domino filetask queue to detect: - Queue blockages and task accumulation - Tasks stuck in Created/Started states for extended periods - Imbalanced task type distribution that may indicate processing issues - Overall queue health metrics and performance indicators - Recently failed tasks with error pattern detection - Dispatch delay and concurrency saturation issues - Data plane distribution for multi-plane deployments

The test connects to the filetask PostgreSQL database and analyzes pending tasks to ensure the filetask service is processing tasks efficiently.

pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskFailurePatternAnalyzer

Bases: AnalyzerBase

Analyzes recent failure patterns to detect systemic issues.

Checks for high failure rates, disk threshold errors, and missing job errors from the last 24 hours of failed tasks.

Fields:
field disk_error_threshold: int = 3

Disk errors before FAIL

field max_failures_24h: int = 10

Max failures in 24h before FAIL

field missing_job_threshold: int = 2

Missing job errors before WARN

analyze(data)

Analyzes the provided data and returns a list of CheckResult instances.

Return type:

list[CheckResult]

Args:

data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.

Returns:

List[CheckResult]: A list containing the results of the analysis.

Raises:

NotImplementedError: If this method is not implemented by subclasses.

name: ClassVar[str] = 'FiletaskFailurePatternAnalyzer'
pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskQueueAnalyzer

Bases: AnalyzerBase

Analyzes overall filetask queue health and performance.

Monitors total queue size, task aging, and accumulation of old tasks to detect processing bottlenecks and service degradation.

Fields:
field max_old_tasks: int = 5

Maximum number of tasks older than threshold

field max_pending_tasks: int = 20

Maximum number of tasks in Created or Started state

field max_task_age_minutes: int = 15

Maximum age of tasks in minutes

field warning_pending_tasks: int = 10

Warning threshold for pending tasks

analyze(data)

Analyzes the provided data and returns a list of CheckResult instances.

Return type:

list[CheckResult]

Args:

data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.

Returns:

List[CheckResult]: A list containing the results of the analysis.

Raises:

NotImplementedError: If this method is not implemented by subclasses.

name: ClassVar[str] = 'FiletaskQueueAnalyzer'
pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskStuckTaskAnalyzer

Bases: AnalyzerBase

Analyzes tasks stuck in specific states to distinguish dispatcher vs worker issues.

Checks: - Created tasks stuck too long (dispatcher problem) - Dispatch delay (time between created_at and started_at) - Concurrency saturation (all slots used) - Deadline proximity (approaching K8s activeDeadlineSeconds) - Zombie tasks: Started tasks with stale updated_at (K8s Job likely gone, blocking concurrency)

Fields:
field concurrency_limit: int = 8

Filetask concurrency limit

field deadline_warning_minutes: int = 300

Running time warning threshold (5h of 6h deadline)

field max_created_age_minutes: int = 30

Max age for Created tasks before FAIL

field max_dispatch_delay_minutes: float = 15.0

Avg dispatch delay warning threshold

field zombie_task_threshold_minutes: int = 120

Started task stale updated_at threshold

analyze(data)

Analyzes the provided data and returns a list of CheckResult instances.

Return type:

list[CheckResult]

Args:

data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.

Returns:

List[CheckResult]: A list containing the results of the analysis.

Raises:

NotImplementedError: If this method is not implemented by subclasses.

name: ClassVar[str] = 'FiletaskStuckTaskAnalyzer'
pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskTypeDistributionAnalyzer

Bases: AnalyzerBase

Analyzes task type distribution to detect processing imbalances.

Monitors accumulation of specific task types (copy, download, sizing, delete, import-blobs, render) which may indicate processing bottlenecks or service failures for particular operation types.

Fields:
field max_copy_tasks: int = 15

Maximum number of copy/copy-v2 tasks

field max_delete_tasks: int = 20

Maximum number of delete/delete-v2 tasks

field max_download_tasks: int = 20

Maximum number of download/download-v2 tasks

field max_import_blobs_tasks: int = 10

Maximum number of import-blobs tasks

field max_render_tasks: int = 10

Maximum number of render tasks

field max_sizing_tasks: int = 10

Maximum number of sizing tasks

analyze(data)

Analyzes the provided data and returns a list of CheckResult instances.

Return type:

list[CheckResult]

Args:

data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.

Returns:

List[CheckResult]: A list containing the results of the analysis.

Raises:

NotImplementedError: If this method is not implemented by subclasses.

name: ClassVar[str] = 'FiletaskTypeDistributionAnalyzer'
domino_admin_toolkit.checks.test_filetask_queue_status.filetask_failure_summary(filetask_recent_failures)

Generates summary of recent failure patterns.

domino_admin_toolkit.checks.test_filetask_queue_status.filetask_queue_data(k8s_client)

Pytest fixture for filetask queue data.

domino_admin_toolkit.checks.test_filetask_queue_status.filetask_queue_summary(filetask_queue_data)

Generates aggregated summary statistics from filetask queue data.

domino_admin_toolkit.checks.test_filetask_queue_status.filetask_recent_failures(k8s_client)

Pytest fixture for recently failed filetask tasks (last 24h).

domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_failure_summary(failure_data)

Regular function version of filetask_failure_summary for unit testing.

domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_queue_data()

Regular function version of filetask_queue_data for unit testing.

domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_queue_summary(queue_data)

Regular function version of filetask_queue_summary for unit testing.

domino_admin_toolkit.checks.test_filetask_queue_status.test_filetask_queue_status(filetask_queue_data, filetask_queue_summary, filetask_recent_failures, filetask_failure_summary)
Description:

Monitors the Domino filetask queue to detect processing bottlenecks and service degradation. Analyzes pending tasks in Created/Started states, checks for task accumulation, validates task type distribution, identifies tasks exceeding age thresholds, detects dispatcher issues, monitors concurrency saturation, and analyzes recent failure patterns. https://support.domino.ai/support/s/article/Is-filetask-stuck-datasets-admin-page-showsDeletionsInProgress-Users-see-sizing-in-Pending

Result:

PASS: Filetask queue is processing efficiently with no accumulation or old tasks WARN: Queue approaching thresholds, concurrency saturated, or dispatch delays FAIL: Queue blocked, excessive task accumulation, stuck Created tasks, or high failure rate SKIP: If filetask database is unavailable or tasks table doesn’t exist

Thresholds:
  • Max pending tasks: 20 (warning at 10)

  • Max task age: 15 minutes

  • Old task threshold: 720 minutes (max 5 old tasks)

  • Task type limits: Copy(15), Download(20), Sizing(10), Delete(20), Import-blobs(10), Render(10)

  • Max Created age: 30 minutes (dispatcher issue indicator)

  • Max dispatch delay: 15 minutes average

  • Concurrency limit: 8 active tasks

  • Deadline warning: 300 minutes (5h of 6h K8s deadline)

  • Zombie task threshold: 120 minutes (Started with stale updated_at)

  • Max failures (24h): 10