domino_admin_toolkit.checks.test_k8s_platform_sizing module
- class domino_admin_toolkit.checks.test_k8s_platform_sizing.TestK8sPlatformSizing
Bases:
object- test_k8s_platform_pod_sizing(container_memory_df, container_cpu_df, df_key, analyzers, sort_col, column_order, top_n)
- Description: Validates per-pod memory and CPU usage on platform nodes over the last hour.
Uses node-pool-scoped queries (pods on platform nodes regardless of namespace) when node pool labels are present; falls back to platform namespace scope otherwise.
- Failure Conditions:
memory-excess: Any pod’s avg memory usage exceeds its requests by more than 2 GiB. memory-oom: Any pod has OOM kill events (container_oom_events_total > 0) in the last hour. cpu-throttling: Any pod’s CPU throttle percentage exceeds 50%. cpu-excess: Any pod’s avg CPU usage exceeds its requests by more than 2 cores.
- Troubleshooting Steps:
Check pod resource usage: kubectl top pods -n <namespace>
Review container resource requests in the deployment spec
Check for memory leaks or unexpected load patterns via Grafana sizing dashboard
- Resolution Steps:
Adjust container memory/CPU requests and limits in the helm values
For OOM kills: increase memory limits or investigate memory leak
For CPU throttling: increase CPU limits or reduce workload
Required Permissions: Platform admin access
- domino_admin_toolkit.checks.test_k8s_platform_sizing.container_cpu_df(prometheus_client_v2, platform_namespace)
Per-pod CPU metrics for platform nodes, collected once per session.
- Return type:
- domino_admin_toolkit.checks.test_k8s_platform_sizing.container_memory_df(prometheus_client_v2, platform_namespace)
Per-pod memory metrics for platform nodes, collected once per session.
- Return type: