Investigate RPi hardware watchdog for autonomous reboot


## Context

Pyronear stations run in the field, in quite remote areas. When a station freezes/bug (kernel panic, deadlock, full crash), the current recovery relies on a physical power cut, which risks filesystem corruption and requires human intervention or usually hardware timer relay.

The RPi has a **built-in hardware watchdog** (`/dev/watchdog`) that can trigger a clean software reboot automatically if the system stops responding ! no relay, no power cut.

## How it works

The watchdog runs a hardware timer. Any process must periodically write to `/dev/watchdog` to reset it ("petting the dog"). If nothing writes for `N` seconds → the chip triggers a reboot.

It's deliberately dumb: it doesn't know *why* nothing wrote — freeze, crash, deadlock — it just reboots.

## Two approaches to explore

**1. System-level only (`watchdog` daemon)**
- Activate via `dtparam=watchdog=on` in `/boot/config.txt`
- Let the `watchdog` Linux daemon handle the petting
- ✅ Simple, no code change
- ⚠️ Only covers full OS/kernel freeze — won't catch a live-but-broken capture pipeline

**2. Application-level (from the main Python script)**
- Pet the watchdog only when the system is actually healthy (camera alive, recent frame, model responding)
- If a check fails → stop petting → reboot triggered after timeout
- ✅ Covers applicative failures too
- ⚠️ Slightly more complexity, watchdog thread must be robust itself

## Questions to answer

- [ ] Does approach 1 alone cover enough failure modes for our use case? Might be relevant to start from that
- [ ] What health checks are meaningful to gate the pet on (approach 2)?
- [ ] What's the right timeout value (balance between false reboot vs slow recovery)? -> 10 ? 30 min ?
- [ ] Impact on clean shutdown or an long update need to write magic char `V` to `/dev/watchdog` before closing -> Since deployments/updates are managed via Ansible, a natural solution is to add explicit steps around the update tasks (Disable watchdog before update and Re-enable watchdog after update)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigate RPi hardware watchdog for autonomous reboot #372

Context

How it works

Two approaches to explore

Questions to answer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Investigate RPi hardware watchdog for autonomous reboot #372

Description

Context

How it works

Two approaches to explore

Questions to answer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions