Files
bluejay-infra/docs/runbooks/openvoxserver-quadlet-durability.md
2026-05-17 23:18:03 -05:00

4.7 KiB

openvoxserver Quadlet Durability

This runbook documents the noc1 openvoxserver durability fix for the Puppet control-repo deploy path. The service is a noc1 host artifact, not an ArgoCD application, so discovery always starts on noc1 rather than in apps/*.

Current State

As of the Sprint 32 Cx-12 apply on 2026-05-17:

  • /etc/containers/systemd/openvoxserver.container has a GIT_SSH_COMMAND environment entry that points at the persisted serverdata deploy key.
  • /etc/systemd/system/openvoxserver-safeconfig.service is enabled and active, and reapplies git config --global --add safe.directory * inside the running container.
  • /opt/puppet/r10k-deploy.sh self-heals before each fetch by setting safe.directory, the repo-local core.sshCommand, and the persisted known_hosts file when needed.
  • puppet-deploy.service exits 0/SUCCESS after the apply and the control repo reports HEAD == origin/master.
  • systemctl cat openvoxserver does not currently resolve to a generated unit on noc1. The container is running through Podman with restart=always, so destructive recreate smoke must not run until the generated unit is present.

Discovery

Run every command through noc1 as fcadmin; do not assume BLUEJAY-WS can reach container-local surfaces directly.

ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "hostname && sudo -n true"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo find /etc/containers/systemd /usr/share/containers/systemd /etc/systemd/system -name 'openvoxserver*' 2>/dev/null"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo sed -n '1,220p' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl cat puppet-deploy.service"

If a future noc1 profile manages these files, update the Puppet control repo and let puppet-deploy.service apply the change. On 2026-05-17, host puppet was not installed, so Cx-12 used a direct noc1 host edit.

Durable Fix Shape

The Quadlet keeps the deploy key as a path reference only:

Environment=GIT_SSH_COMMAND=ssh -i /opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=/opt/puppetlabs/server/data/puppetserver/.known_hosts

The safeconfig service is intentionally independent of openvoxserver.service until the generated unit exists. It waits for the openvoxserver container name and then runs:

/usr/bin/podman exec openvoxserver git config --global --add safe.directory *

The deploy script self-heals inside the container before it fetches the control repo:

git config --global --add safe.directory "*" 2>/dev/null || true
DEPLOY_KEY="/opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key"
KNOWN_HOSTS="/opt/puppetlabs/server/data/puppetserver/.known_hosts"
REPO="/etc/puppetlabs/code/environments/production"
export GIT_SSH_COMMAND="ssh -i $DEPLOY_KEY -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=$KNOWN_HOSTS"
git -C "$REPO" config core.sshCommand "$GIT_SSH_COMMAND" 2>/dev/null || true

Validation

Non-destructive validation:

ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo grep -n 'GIT_SSH_COMMAND' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl status openvoxserver-safeconfig.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl start puppet-deploy.service && sudo systemctl status puppet-deploy.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo podman exec openvoxserver git -C /etc/puppetlabs/code/environments/production config --get core.sshCommand"

Destructive recreate smoke is opt-in only:

scp scripts/monitoring/openvox-recreate-smoke.sh fcadmin@10.0.56.10:/tmp/openvox-recreate-smoke.sh
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "chmod +x /tmp/openvox-recreate-smoke.sh && sudo OPENVOX_RECREATE_SMOKE=1 /tmp/openvox-recreate-smoke.sh"

Do not run the smoke during normal sprint work. It stops and removes the production container before starting it again through systemd, and it now refuses to continue unless systemctl cat openvoxserver succeeds.

Credential Rotation Note

When rotating the Puppet deploy key, update the persisted serverdata copy on noc1:

sudo install -m 0600 -o root -g root <new-deploy-key> /opt/puppet/serverdata/.puppet-deploy-key
sudo podman exec openvoxserver sh -c "ssh-keyscan github.com > /opt/puppetlabs/server/data/puppetserver/.known_hosts"
sudo systemctl start openvoxserver-safeconfig.service
sudo systemctl start puppet-deploy.service

Never commit the deploy key or print it in logs.