Skip to content

[DPE-10450] Skip Patroni REST API call in member_inactive when snap is down#1793

Open
taurus-forever wants to merge 2 commits into
16/edgefrom
alutay/speedup_patroni_off
Open

[DPE-10450] Skip Patroni REST API call in member_inactive when snap is down#1793
taurus-forever wants to merge 2 commits into
16/edgefrom
alutay/speedup_patroni_off

Conversation

@taurus-forever

@taurus-forever taurus-forever commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Issue

The update-status cost 1 extra minute (for nothing) on the server restart IF update-status comes before start.
See juju/juju#22688
It also affects all cluster recovery cases when update-status is unpredictable/random but real event.

Example:

unit-postgresql-0: 23:20:47 DEBUG unit.postgresql/0.juju-log root:Emitting Juju event update_status.
unit-postgresql-0: 23:20:47 DEBUG unit.postgresql/0.juju-log root:Starting root trace with id='356b6169143c7f5e150efdd273a59559'.
unit-postgresql-0: 23:20:47 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:20:54 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:01 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:08 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:15 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:22 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:29 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:36 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:43 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:50 DEBUG unit.postgresql/0.juju-log urllib3.connectionpool:Starting new HTTPS connection (1): 10.69.235.205:8008
unit-postgresql-0: 23:21:50 INFO unit.postgresql/0.juju-log __main__:restarted PostgreSQL because it was not running
unit-postgresql-0: 23:21:50 DEBUG unit.postgresql/0.juju-log cluster:Restarting Patroni...
unit-postgresql-0: 23:21:51 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-0: 23:21:51 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm
unit-postgresql-0: 23:21:51 DEBUG unit.postgresql/0.juju-log ops 3.7.1 up and running.

Solution

Guard member_inactive with is_patroni_running() (mirroring member_started) so that when the Patroni snap service is not active we return True immediately instead of spending ~60s retrying the Patroni REST API.

Assisted-by: Claude:claude-4.8-opus

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

…s down

Guard member_inactive with is_patroni_running() (mirroring member_started)
so that when the Patroni snap service is not active we return True
immediately instead of spending ~60s retrying the Patroni REST API.

Assisted-by: Claude:claude-4.8-opus
@github-actions github-actions Bot added the Libraries: Out of sync The charm libs used are out-of-sync label Jun 19, 2026
@taurus-forever taurus-forever added the bug Something isn't working as expected label Jun 19, 2026
update-status could crash with FileNotFoundError when it fired before a
replica finished bootstrapping: the cluster is already initialised (so
_can_run_on_update_status passes) and the unit IP is in members_ips, but
the PostgreSQL data directory has not been created yet. member_inactive
returns True because Patroni is not running, so os.listdir was called on
a non-existent path.

Return early (no action) when the data directory does not exist yet,
since the member has not been initialised and there is no frozen process
to recover.

Assisted-by: Claude:claude-4.8-opus
@taurus-forever taurus-forever marked this pull request as ready for review June 22, 2026 21:16
@taurus-forever taurus-forever requested a review from a team as a code owner June 22, 2026 21:16
@taurus-forever taurus-forever requested review from carlcsaposs-canonical, dragomirp, juju-charm-bot and marceloneppel and removed request for a team June 22, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants