domino_admin_toolkit.checks.pre_upgrade.test_rabbitmq_stream_health module

Pre-upgrade RabbitMQ stream health check.

Two complementary surfaces are queried for the same streams: the Management API (authoritative for per-stream state / members / online / leader) and Prometheus (Raft-internal coordinator + per-stream membership that the Management API can lie about with a stale cache). A silent-hang upgrade incident reproduced when these two surfaces disagreed about the leader, so the check uses both and requires they agree.

domino_admin_toolkit.checks.pre_upgrade.test_rabbitmq_stream_health.rabbitmq_stream_data(k8s_client)

Per-stream rows from the RabbitMQ Management API.

Return type:: DataFrame

domino_admin_toolkit.checks.pre_upgrade.test_rabbitmq_stream_health.rabbitmq_stream_prometheus_data(prometheus_client_v2)

Three stream-health DataFrames from Prometheus: coordinator, leader presence, segments/lag.

Return type:: dict[str, DataFrame]

domino_admin_toolkit.checks.pre_upgrade.test_rabbitmq_stream_health.test_known_streams_present(rabbitmq_stream_data, runner)

Description:

Asserts that the two Nucleus-consumed streams (data-plane.resources and workload_status) are present in the RabbitMQ broker. Distinct failure surface from test_rabbitmq_stream_health so the operator can tell “stream missing — topology was reset, Nucleus will redeclare” apart from “stream present but unhealthy”.

Failure Conditions:

One or both of the expected streams is absent from /api/queues.

Troubleshooting Steps:

Confirm RabbitMQ pods are Running: kubectl -n <platform-ns> get pods -l app=rabbitmq-ha.
Restart the Nucleus dispatcher (kubectl -n <platform-ns> rollout restart deploy/nucleus-dispatcher) to trigger stream redeclaration if topology was reset intentionally.

Resolution Steps:

If the streams should already exist (no recent PVC reset), reset and re-form the RabbitMQ stream PVCs.
If the streams legitimately do not exist yet (fresh install or post-reset), starting Nucleus will declare them; this check is expected to pass on the next run.

Required Permissions:

Platform admin access to read the RabbitMQ admin secret and to restart Nucleus dispatcher.

See also:

test_rabbitmq_stream_health — companion check in this file that validates per-stream leader and Management-API health once the streams exist.

domino_admin_toolkit.checks.pre_upgrade.test_rabbitmq_stream_health.test_rabbitmq_stream_health(rabbitmq_stream_data, rabbitmq_stream_prometheus_data, runner)