domino_admin_toolkit.checks.test_filetask_queue_status module
Filetask Queue Monitoring Test
This module provides comprehensive monitoring of the Domino filetask queue to detect: - Queue blockages and task accumulation - Tasks stuck in Created/Started states for extended periods - Imbalanced task type distribution that may indicate processing issues - Overall queue health metrics and performance indicators - Recently failed tasks with error pattern detection - Dispatch delay and concurrency saturation issues - Data plane distribution for multi-plane deployments
The test connects to the filetask PostgreSQL database and analyzes pending tasks to ensure the filetask service is processing tasks efficiently.
- pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskFailurePatternAnalyzer
Bases:
AnalyzerBaseAnalyzes recent failure patterns to detect systemic issues.
Checks for high failure rates, disk threshold errors, and missing job errors from the last 24 hours of failed tasks.
- analyze(data)
Analyzes the provided data and returns a list of CheckResult instances.
- Return type:
- Args:
data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.
- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'FiletaskFailurePatternAnalyzer'
- pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskQueueAnalyzer
Bases:
AnalyzerBaseAnalyzes overall filetask queue health and performance.
Monitors total queue size, task aging, and accumulation of old tasks to detect processing bottlenecks and service degradation.
- Fields:
- analyze(data)
Analyzes the provided data and returns a list of CheckResult instances.
- Return type:
- Args:
data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.
- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'FiletaskQueueAnalyzer'
- pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskStuckTaskAnalyzer
Bases:
AnalyzerBaseAnalyzes tasks stuck in specific states to distinguish dispatcher vs worker issues.
Checks: - Created tasks stuck too long (dispatcher problem) - Dispatch delay (time between created_at and started_at) - Concurrency saturation (all slots used) - Deadline proximity (approaching K8s activeDeadlineSeconds) - Zombie tasks: Started tasks with stale updated_at (K8s Job likely gone, blocking concurrency)
- Fields:
- analyze(data)
Analyzes the provided data and returns a list of CheckResult instances.
- Return type:
- Args:
data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.
- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'FiletaskStuckTaskAnalyzer'
- pydantic model domino_admin_toolkit.checks.test_filetask_queue_status.FiletaskTypeDistributionAnalyzer
Bases:
AnalyzerBaseAnalyzes task type distribution to detect processing imbalances.
Monitors accumulation of specific task types (copy, download, sizing, delete, import-blobs, render) which may indicate processing bottlenecks or service failures for particular operation types.
- Fields:
- analyze(data)
Analyzes the provided data and returns a list of CheckResult instances.
- Return type:
- Args:
data (Dict[str, Any]): The data to be analyzed. The structure depends on the analyzer’s implementation.
- Returns:
List[CheckResult]: A list containing the results of the analysis.
- Raises:
NotImplementedError: If this method is not implemented by subclasses.
- name: ClassVar[str] = 'FiletaskTypeDistributionAnalyzer'
- domino_admin_toolkit.checks.test_filetask_queue_status.filetask_failure_summary(filetask_recent_failures)
Generates summary of recent failure patterns.
- domino_admin_toolkit.checks.test_filetask_queue_status.filetask_queue_data(k8s_client)
Pytest fixture for filetask queue data.
- domino_admin_toolkit.checks.test_filetask_queue_status.filetask_queue_summary(filetask_queue_data)
Generates aggregated summary statistics from filetask queue data.
- domino_admin_toolkit.checks.test_filetask_queue_status.filetask_recent_failures(k8s_client)
Pytest fixture for recently failed filetask tasks (last 24h).
- domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_failure_summary(failure_data)
Regular function version of filetask_failure_summary for unit testing.
- domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_queue_data()
Regular function version of filetask_queue_data for unit testing.
- domino_admin_toolkit.checks.test_filetask_queue_status.get_filetask_queue_summary(queue_data)
Regular function version of filetask_queue_summary for unit testing.
- domino_admin_toolkit.checks.test_filetask_queue_status.test_filetask_queue_status(filetask_queue_data, filetask_queue_summary, filetask_recent_failures, filetask_failure_summary)
- Description:
Monitors the Domino filetask queue to detect processing bottlenecks and service degradation. Analyzes pending tasks in Created/Started states, checks for task accumulation, validates task type distribution, identifies tasks exceeding age thresholds, detects dispatcher issues, monitors concurrency saturation, and analyzes recent failure patterns. https://support.domino.ai/support/s/article/Is-filetask-stuck-datasets-admin-page-showsDeletionsInProgress-Users-see-sizing-in-Pending
- Result:
PASS: Filetask queue is processing efficiently with no accumulation or old tasks WARN: Queue approaching thresholds, concurrency saturated, or dispatch delays FAIL: Queue blocked, excessive task accumulation, stuck Created tasks, or high failure rate SKIP: If filetask database is unavailable or tasks table doesn’t exist
- Thresholds:
Max pending tasks: 20 (warning at 10)
Max task age: 15 minutes
Old task threshold: 720 minutes (max 5 old tasks)
Task type limits: Copy(15), Download(20), Sizing(10), Delete(20), Import-blobs(10), Render(10)
Max Created age: 30 minutes (dispatcher issue indicator)
Max dispatch delay: 15 minutes average
Concurrency limit: 8 active tasks
Deadline warning: 300 minutes (5h of 6h K8s deadline)
Zombie task threshold: 120 minutes (Started with stale updated_at)
Max failures (24h): 10