domino_admin_toolkit.checks.test_node_ephemeral_storage module

pydantic model domino_admin_toolkit.checks.test_node_ephemeral_storage.NodeEphemeralStorageAnalyzer

Bases: AnalyzerBase

Validates node ephemeral storage usage is within acceptable thresholds.

Fields:
field usage_threshold_pct: float = 80.0
analyze(data)

Evaluate a single node’s ephemeral storage usage against the configured threshold.

Return type:

list[CheckResult]

name: ClassVar[str] = 'NodeEphemeralStorageAnalyzer'
domino_admin_toolkit.checks.test_node_ephemeral_storage.node_ephemeral_storage_data(prometheus_client_v2)

Collect node ephemeral storage usage from Prometheus.

Return type:

DataFrame

domino_admin_toolkit.checks.test_node_ephemeral_storage.test_node_ephemeral_storage(node_ephemeral_storage_data, runner)

Description: Checks actual ephemeral storage usage on cluster nodes. Failure Conditions: Any node root filesystem usage exceeds 80%. Troubleshooting Steps:

  1. Identify affected nodes from the table (sorted worst-first)

  2. Check what’s filling the disk: ssh to node, run du -sh /* 2>/dev/null | sort -rh | head

  3. Common causes: container log accumulation, image layer cache, coredumps

Resolution Steps:
  1. Prune unused container images: crictl rmi –prune

  2. Rotate or truncate large logs in /var/log

  3. If node pool is ‘compute’, nodes are ephemeral — cordon and replace

Required Permissions: Node SSH access, cluster admin