Fullfører oppgave 15.2: Graceful shutdown med vedlikeholdsmodus

Implementerer koordinert nedstenging der admin setter et
vedlikeholdstidspunkt, brukere ser nedtelling, og systemet
stenger ned trygt etter at aktive jobber er ferdige.

Nye filer:
- maskinrommet/src/maintenance.rs — MaintenanceState med atomiske
  flagg, shutdown-koordinator (vent på scheduled_at → blokker
  nye jobber/LiveKit → vent på kjørende jobber → exit)
- frontend/src/routes/admin/+page.svelte — admin-panel for
  vedlikehold med statusvisning og aktive sesjoner

Endringer:
- jobs.rs: sjekker maintenance.is_active() før dequeue
- intentions.rs: nye endepunkter (initiate/cancel/status), blokkerer
  join_communication under vedlikehold
- main.rs: MaintenanceState i AppState, nye ruter
- api.ts: klientfunksjoner for maintenance-API
- adminpanelet.md: dokumenterer implementerte endepunkter

Flyt: admin → GET /admin/maintenance_status (se aktive sesjoner)
→ POST /intentions/initiate_maintenance → varsel broadcast via STDB
→ frontend nedtelling → scheduled_at nådd → active=true → jobbkø
pauset + LiveKit blokkert → vent maks 5 min → process::exit(0)
→ systemd restarter maskinrommet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
vegard 2026-03-18 03:31:32 +00:00
parent 9d405f90e3
commit d8e44fe57e
8 changed files with 758 additions and 5 deletions

View file

@ -62,6 +62,21 @@ publisering.
5. Vent til aktive jobber fullføres (med timeout)
6. Restart
#### Implementert (oppgave 15.2)
- **Backend:** `maskinrommet/src/maintenance.rs``MaintenanceState` med atomiske flagg
og bakgrunns-shutdown-koordinator
- **API-endepunkter:**
- `POST /intentions/initiate_maintenance` — starter nedtelling med tidspunkt
- `POST /intentions/cancel_maintenance` — avbryter planlagt vedlikehold
- `GET /admin/maintenance_status` — viser status + kjørende jobber
- **Frontend:** `/admin` — vedlikeholdspanel med statusvisning, aktive sesjoner
og initier/avbryt-knapper
- **Jobbkø:** `jobs.rs` sjekker `maintenance.is_active()` før dequeue
- **LiveKit:** `join_communication` blokkerer nye tokens under vedlikehold
- **Shutdown-flyt:** Vent til `scheduled_at` → sett `active` → vent på jobber
(5 min timeout) → `process::exit(0)` → systemd restarter
#### Varslingsnode
```jsonc

View file

@ -668,3 +668,59 @@ export function resolveRetranscription(
choices
});
}
// =============================================================================
// Vedlikeholdsmodus (oppgave 15.2)
// =============================================================================
export interface RunningJob {
id: string;
job_type: string;
started_at: string | null;
collection_node_id: string | null;
}
export interface MaintenanceStatus {
initiated: boolean;
active: boolean;
scheduled_at: string | null;
announcement_node_id: string | null;
initiated_by: string | null;
running_jobs: RunningJob[];
}
/** Hent vedlikeholdsstatus (aktive sesjoner, kjørende jobber). */
export async function fetchMaintenanceStatus(accessToken: string): Promise<MaintenanceStatus> {
const res = await fetch(`${BASE_URL}/admin/maintenance_status`, {
headers: { Authorization: `Bearer ${accessToken}` }
});
if (!res.ok) {
const body = await res.text();
throw new Error(`maintenance_status failed (${res.status}): ${body}`);
}
return res.json();
}
export interface InitiateMaintenanceRequest {
scheduled_at: string;
}
export interface InitiateMaintenanceResponse {
announcement_node_id: string;
scheduled_at: string;
}
/** Initier planlagt vedlikehold med nedtelling. */
export function initiateMaintenance(
accessToken: string,
data: InitiateMaintenanceRequest
): Promise<InitiateMaintenanceResponse> {
return post(accessToken, '/intentions/initiate_maintenance', data);
}
/** Avbryt planlagt vedlikehold. */
export function cancelMaintenance(
accessToken: string
): Promise<{ cancelled: boolean }> {
return post(accessToken, '/intentions/cancel_maintenance', {});
}

View file

@ -0,0 +1,229 @@
<script lang="ts">
/**
* Admin — Vedlikeholdsmodus (oppgave 15.2)
*
* Viser aktive sesjoner (kjørende jobber) og lar admin initiere
* eller avbryte planlagt vedlikehold med graceful shutdown.
*/
import { page } from '$app/stores';
import {
fetchMaintenanceStatus,
initiateMaintenance,
cancelMaintenance,
type MaintenanceStatus
} from '$lib/api';
const session = $derived($page.data.session as Record<string, unknown> | undefined);
const accessToken = $derived(session?.accessToken as string | undefined);
let status = $state<MaintenanceStatus | null>(null);
let loading = $state(false);
let error = $state<string | null>(null);
// Vedlikeholds-tidspunkt: default 5 minutter fra nå
let scheduledMinutes = $state(5);
// Poll status every 5 seconds
$effect(() => {
if (!accessToken) return;
loadStatus();
const interval = setInterval(loadStatus, 5000);
return () => clearInterval(interval);
});
async function loadStatus() {
if (!accessToken) return;
try {
status = await fetchMaintenanceStatus(accessToken);
error = null;
} catch (e) {
error = String(e);
}
}
async function handleInitiate() {
if (!accessToken || loading) return;
loading = true;
error = null;
try {
const scheduledAt = new Date(Date.now() + scheduledMinutes * 60 * 1000).toISOString();
await initiateMaintenance(accessToken, { scheduled_at: scheduledAt });
await loadStatus();
} catch (e) {
error = String(e);
} finally {
loading = false;
}
}
async function handleCancel() {
if (!accessToken || loading) return;
loading = true;
error = null;
try {
await cancelMaintenance(accessToken);
await loadStatus();
} catch (e) {
error = String(e);
} finally {
loading = false;
}
}
// Countdown display
let now = $state(Date.now());
$effect(() => {
const interval = setInterval(() => { now = Date.now(); }, 1000);
return () => clearInterval(interval);
});
function countdown(isoDate: string): string {
const diff = new Date(isoDate).getTime() - now;
if (diff <= 0) return 'nå';
const s = Math.floor(diff / 1000);
const m = Math.floor(s / 60);
const h = Math.floor(m / 60);
if (h > 0) return `${h}t ${m % 60}m ${s % 60}s`;
if (m > 0) return `${m}m ${s % 60}s`;
return `${s}s`;
}
</script>
<div class="min-h-screen bg-gray-50">
<header class="border-b border-gray-200 bg-white">
<div class="mx-auto flex max-w-3xl items-center justify-between px-4 py-3">
<div class="flex items-center gap-3">
<a href="/" class="text-sm text-gray-500 hover:text-gray-700">Hjem</a>
<h1 class="text-lg font-semibold text-gray-900">Vedlikehold</h1>
</div>
</div>
</header>
<main class="mx-auto max-w-3xl px-4 py-6">
{#if !accessToken}
<p class="text-sm text-gray-400">Logg inn for tilgang.</p>
{:else if !status}
<p class="text-sm text-gray-400">Laster status...</p>
{:else}
<!-- Feilmelding -->
{#if error}
<div class="mb-4 rounded-lg border border-red-200 bg-red-50 p-3 text-sm text-red-700">
{error}
</div>
{/if}
<!-- Vedlikeholdsstatus -->
<section class="mb-6 rounded-lg border border-gray-200 bg-white p-5 shadow-sm">
<h2 class="mb-3 text-base font-semibold text-gray-800">Status</h2>
<div class="space-y-2 text-sm">
<div class="flex items-center gap-2">
<span class="font-medium text-gray-600">Modus:</span>
{#if status.active}
<span class="rounded-full bg-red-100 px-2.5 py-0.5 text-xs font-bold text-red-700">
AKTIV — stenger ned
</span>
{:else if status.initiated}
<span class="rounded-full bg-amber-100 px-2.5 py-0.5 text-xs font-bold text-amber-700">
Planlagt
</span>
{:else}
<span class="rounded-full bg-green-100 px-2.5 py-0.5 text-xs font-bold text-green-700">
Normal drift
</span>
{/if}
</div>
{#if status.scheduled_at}
<div class="flex items-center gap-2">
<span class="font-medium text-gray-600">Tidspunkt:</span>
<span class="text-gray-800">
{new Date(status.scheduled_at).toLocaleString('nb-NO')}
</span>
<span class="rounded-full bg-gray-100 px-2 py-0.5 text-xs font-mono text-gray-600">
om {countdown(status.scheduled_at)}
</span>
</div>
{/if}
</div>
</section>
<!-- Aktive sesjoner / kjørende jobber -->
<section class="mb-6 rounded-lg border border-gray-200 bg-white p-5 shadow-sm">
<h2 class="mb-3 text-base font-semibold text-gray-800">
Aktive sesjoner
<span class="ml-1 text-sm font-normal text-gray-400">
({status.running_jobs.length} kjørende jobber)
</span>
</h2>
{#if status.running_jobs.length === 0}
<p class="text-sm text-gray-400">Ingen kjørende jobber.</p>
{:else}
<div class="space-y-2">
{#each status.running_jobs as job (job.id)}
<div class="flex items-center gap-3 rounded border border-gray-100 bg-gray-50 px-3 py-2 text-sm">
<span class="h-2 w-2 shrink-0 rounded-full bg-blue-500" title="Kjører"></span>
<span class="font-mono text-xs text-gray-500">{job.id.slice(0, 8)}</span>
<span class="font-medium text-gray-700">{job.job_type}</span>
{#if job.started_at}
<span class="ml-auto text-xs text-gray-400">
startet {new Date(job.started_at).toLocaleTimeString('nb-NO')}
</span>
{/if}
</div>
{/each}
</div>
{/if}
</section>
<!-- Handlinger -->
<section class="rounded-lg border border-gray-200 bg-white p-5 shadow-sm">
<h2 class="mb-3 text-base font-semibold text-gray-800">Handlinger</h2>
{#if !status.initiated}
<div class="flex items-end gap-3">
<div>
<label for="minutes" class="mb-1 block text-xs font-medium text-gray-600">
Minutter til vedlikehold
</label>
<input
id="minutes"
type="number"
min="1"
max="1440"
bind:value={scheduledMinutes}
class="w-24 rounded border border-gray-300 px-2 py-1.5 text-sm"
/>
</div>
<button
onclick={handleInitiate}
disabled={loading}
class="rounded-lg bg-red-600 px-4 py-2 text-sm font-medium text-white hover:bg-red-700 disabled:opacity-50"
>
{loading ? 'Initierer...' : 'Start vedlikehold'}
</button>
</div>
<p class="mt-2 text-xs text-gray-400">
Dette sender et varsel til alle brukere, og stenger systemet ned etter nedtellingen.
Nye LiveKit-rom blokkeres og jobbkøen stoppes ved tidspunktet.
Kjørende jobber fullføres (maks 5 min timeout), deretter restarter maskinrommet.
</p>
{:else}
<div class="flex items-center gap-3">
<button
onclick={handleCancel}
disabled={loading || status.active}
class="rounded-lg bg-gray-600 px-4 py-2 text-sm font-medium text-white hover:bg-gray-700 disabled:opacity-50"
>
{loading ? 'Avbryter...' : 'Avbryt vedlikehold'}
</button>
{#if status.active}
<span class="text-sm text-red-600 font-medium">
Nedstengning pågår — kan ikke avbrytes
</span>
{/if}
</div>
{/if}
</section>
{/if}
</main>
</div>

View file

@ -3319,6 +3319,11 @@ pub async fn join_communication(
})?
.unwrap_or_else(|| "Ukjent".to_string());
// Blokkér nye LiveKit-rom under vedlikehold (oppgave 15.2)
if state.maintenance.is_active() {
return Err(bad_request("Nye rom er blokkert — vedlikehold pågår"));
}
// Bestem rolle
let role_str = req.role.as_deref().unwrap_or("publisher");
let lk_role = match role_str {
@ -3907,6 +3912,95 @@ pub async fn expire_announcement(
Ok(Json(ExpireAnnouncementResponse { expired: true }))
}
// =============================================================================
// Vedlikeholdsmodus (oppgave 15.2)
// =============================================================================
#[derive(Deserialize)]
pub struct InitiateMaintenanceRequest {
/// Vedlikeholdstidspunkt (ISO 8601 / RFC 3339).
pub scheduled_at: String,
}
#[derive(Serialize)]
pub struct InitiateMaintenanceResponse {
pub announcement_node_id: Uuid,
pub scheduled_at: String,
}
/// POST /intentions/initiate_maintenance
///
/// Starter nedtellingen til vedlikehold. Oppretter et critical-varsel
/// som vises for alle klienter, og starter bakgrunnskoordinatoren som
/// blokkerer nye jobber/LiveKit-rom og til slutt restarter prosessen.
///
/// Kall GET /admin/maintenance_status først for å se aktive sesjoner.
pub async fn initiate_maintenance(
State(state): State<AppState>,
_user: AuthUser,
Json(req): Json<InitiateMaintenanceRequest>,
) -> Result<Json<InitiateMaintenanceResponse>, (StatusCode, Json<ErrorResponse>)> {
let scheduled_at = chrono::DateTime::parse_from_rfc3339(&req.scheduled_at)
.map_err(|_| bad_request("scheduled_at må være gyldig ISO 8601 (RFC 3339)"))?
.with_timezone(&chrono::Utc);
if scheduled_at < chrono::Utc::now() {
return Err(bad_request("scheduled_at kan ikke være i fortiden"));
}
let user_id = _user.node_id;
let announcement_id = state
.maintenance
.initiate(&state.db, &state.stdb, scheduled_at, user_id)
.await
.map_err(|e| bad_request(&e))?;
Ok(Json(InitiateMaintenanceResponse {
announcement_node_id: announcement_id,
scheduled_at: scheduled_at.to_rfc3339(),
}))
}
#[derive(Serialize)]
pub struct CancelMaintenanceResponse {
pub cancelled: bool,
}
/// POST /intentions/cancel_maintenance
///
/// Avbryter planlagt vedlikehold. Fjerner varselet og stopper
/// nedtellingstasken.
pub async fn cancel_maintenance(
State(state): State<AppState>,
_user: AuthUser,
) -> Result<Json<CancelMaintenanceResponse>, (StatusCode, Json<ErrorResponse>)> {
state
.maintenance
.cancel(&state.db, &state.stdb)
.await
.map_err(|e| bad_request(&e))?;
Ok(Json(CancelMaintenanceResponse { cancelled: true }))
}
/// GET /admin/maintenance_status
///
/// Returnerer vedlikeholdsstatus inkludert kjørende jobber.
/// Brukes av admin-panelet for å vise aktive sesjoner før bekreftelse.
pub async fn maintenance_status(
State(state): State<AppState>,
_user: AuthUser,
) -> Result<Json<crate::maintenance::MaintenanceStatus>, (StatusCode, Json<ErrorResponse>)> {
let status = state
.maintenance
.status(&state.db)
.await
.map_err(|e| internal_error(&format!("Feil ved henting av vedlikeholdsstatus: {e}")))?;
Ok(Json(status))
}
// =============================================================================
// Tester
// =============================================================================

View file

@ -12,6 +12,7 @@ use crate::agent;
use crate::ai_edges;
use crate::audio;
use crate::cas::CasStore;
use crate::maintenance::MaintenanceState;
use crate::publishing;
use crate::stdb::StdbClient;
use crate::summarize;
@ -231,7 +232,10 @@ async fn handle_render_index(
/// Starter worker-loopen som poller job_queue.
/// Kjører som en bakgrunnsoppgave i tokio.
pub fn start_worker(db: PgPool, stdb: StdbClient, cas: CasStore) {
///
/// Respekterer vedlikeholdsmodus: når `maintenance.is_active()` er true,
/// slutter workeren å dequeue nye jobber (kjørende jobber fullføres).
pub fn start_worker(db: PgPool, stdb: StdbClient, cas: CasStore, maintenance: MaintenanceState) {
let whisper_url = std::env::var("WHISPER_URL")
.unwrap_or_else(|_| "http://faster-whisper:8000".to_string());
@ -239,6 +243,13 @@ pub fn start_worker(db: PgPool, stdb: StdbClient, cas: CasStore) {
tracing::info!("Jobbkø-worker startet (poll-intervall: 2s)");
loop {
// Sjekk vedlikeholdsmodus — ikke dequeue nye jobber
if maintenance.is_active() {
tracing::debug!("Vedlikeholdsmodus aktiv — jobbkø pauset");
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
continue;
}
match dequeue(&db).await {
Ok(Some(job)) => {
tracing::info!(

View file

@ -7,6 +7,7 @@ mod custom_domain;
mod intentions;
pub mod jobs;
pub mod livekit;
pub mod maintenance;
pub mod pruning;
mod queries;
pub mod publishing;
@ -38,6 +39,7 @@ pub struct AppState {
pub cas: CasStore,
pub index_cache: publishing::IndexCache,
pub dynamic_page_cache: publishing::DynamicPageCache,
pub maintenance: maintenance::MaintenanceState,
}
#[derive(Serialize)]
@ -135,8 +137,11 @@ async fn main() {
.expect("Kunne ikke opprette CAS-katalog");
tracing::info!(root = %cas_root, "CAS initialisert");
// Vedlikeholdstilstand (oppgave 15.2)
let maintenance = maintenance::MaintenanceState::new();
// Start jobbkø-worker i bakgrunnen
jobs::start_worker(db.clone(), stdb.clone(), cas.clone());
jobs::start_worker(db.clone(), stdb.clone(), cas.clone(), maintenance.clone());
// Start periodisk CAS-pruning i bakgrunnen
pruning::start_pruning_loop(db.clone(), cas.clone());
@ -149,7 +154,7 @@ async fn main() {
let index_cache = publishing::new_index_cache();
let dynamic_page_cache = publishing::new_dynamic_page_cache();
let state = AppState { db, jwks, stdb, cas, index_cache, dynamic_page_cache };
let state = AppState { db, jwks, stdb, cas, index_cache, dynamic_page_cache, maintenance };
// Ruter: /health er offentlig, /me krever gyldig JWT
let app = Router::new()
@ -190,6 +195,10 @@ async fn main() {
// Systemvarsler (oppgave 15.1)
.route("/intentions/create_announcement", post(intentions::create_announcement))
.route("/intentions/expire_announcement", post(intentions::expire_announcement))
// Vedlikeholdsmodus (oppgave 15.2)
.route("/intentions/initiate_maintenance", post(intentions::initiate_maintenance))
.route("/intentions/cancel_maintenance", post(intentions::cancel_maintenance))
.route("/admin/maintenance_status", get(intentions::maintenance_status))
.route("/query/audio_info", get(intentions::audio_info))
.route("/pub/{slug}/feed.xml", get(rss::generate_feed))
.route("/pub/{slug}", get(publishing::serve_index))

View file

@ -0,0 +1,340 @@
// Graceful shutdown — vedlikeholdsmodus med koordinert nedstengning.
//
// Flyt:
// 1. Admin kaller initiate_maintenance med tidspunkt
// 2. System oppretter systemvarsel → frontend viser nedtelling
// 3. Bakgrunnsoppgave venter til vedlikeholdstidspunkt
// 4. Setter maintenance_active → blokkerer nye LiveKit-rom + jobbkø stopper dequeue
// 5. Venter på at kjørende jobber fullføres (med timeout)
// 6. Avslutter prosessen → systemd restarter
//
// Ref: docs/concepts/adminpanelet.md, oppgave 15.2
use chrono::{DateTime, Utc};
use serde::Serialize;
use sqlx::PgPool;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use tokio::sync::Mutex;
use uuid::Uuid;
use crate::stdb::StdbClient;
/// Delt vedlikeholdstilstand — klones inn i AppState.
#[derive(Clone)]
pub struct MaintenanceState {
/// Satt til true når vedlikeholdstidspunktet er nådd.
/// Når true: jobbkø slutter å dequeue, nye LiveKit-tokens avvises.
pub active: Arc<AtomicBool>,
/// Satt til true når admin har initiert vedlikehold (men tidspunktet
/// trenger ikke være nådd ennå). Brukes for å vise status.
pub initiated: Arc<AtomicBool>,
/// Vedlikeholdstidspunkt og varsel-node-id.
inner: Arc<Mutex<MaintenanceInner>>,
}
struct MaintenanceInner {
scheduled_at: Option<DateTime<Utc>>,
announcement_node_id: Option<Uuid>,
initiated_by: Option<Uuid>,
/// Handle for å avbryte den planlagte shutdown-tasken.
abort_handle: Option<tokio::task::AbortHandle>,
}
/// Status-respons for admin-panelet.
#[derive(Serialize)]
pub struct MaintenanceStatus {
pub initiated: bool,
pub active: bool,
pub scheduled_at: Option<String>,
pub announcement_node_id: Option<Uuid>,
pub initiated_by: Option<Uuid>,
pub running_jobs: Vec<RunningJob>,
}
#[derive(Serialize)]
pub struct RunningJob {
pub id: Uuid,
pub job_type: String,
pub started_at: Option<String>,
pub collection_node_id: Option<Uuid>,
}
impl MaintenanceState {
pub fn new() -> Self {
Self {
active: Arc::new(AtomicBool::new(false)),
initiated: Arc::new(AtomicBool::new(false)),
inner: Arc::new(Mutex::new(MaintenanceInner {
scheduled_at: None,
announcement_node_id: None,
initiated_by: None,
abort_handle: None,
})),
}
}
/// Er vedlikeholdsmodus aktivert? (Tidspunktet er nådd.)
pub fn is_active(&self) -> bool {
self.active.load(Ordering::Relaxed)
}
/// Er vedlikehold initiert? (Planlagt, men kanskje ikke nådd ennå.)
pub fn is_initiated(&self) -> bool {
self.initiated.load(Ordering::Relaxed)
}
/// Hent full status inkludert kjørende jobber.
pub async fn status(&self, db: &PgPool) -> Result<MaintenanceStatus, sqlx::Error> {
let inner = self.inner.lock().await;
let running_jobs = fetch_running_jobs(db).await?;
Ok(MaintenanceStatus {
initiated: self.is_initiated(),
active: self.is_active(),
scheduled_at: inner.scheduled_at.map(|dt| dt.to_rfc3339()),
announcement_node_id: inner.announcement_node_id,
initiated_by: inner.initiated_by,
running_jobs,
})
}
/// Initier vedlikehold: sett tidspunkt, opprett varsel, start nedtelling.
///
/// Oppretter en system_announcement med `critical`-type og `scheduled_at`.
/// Starter en bakgrunnsoppgave som venter til tidspunktet, aktiverer
/// maintenance mode, venter på jobber, og avslutter prosessen.
pub async fn initiate(
&self,
db: &PgPool,
stdb: &StdbClient,
scheduled_at: DateTime<Utc>,
initiated_by: Uuid,
) -> Result<Uuid, String> {
// Sjekk at vi ikke allerede er i vedlikeholdsmodus
if self.is_initiated() {
return Err("Vedlikehold er allerede initiert".to_string());
}
// Opprett systemvarsel
let node_id = Uuid::now_v7();
let node_id_str = node_id.to_string();
let created_by_str = initiated_by.to_string();
let metadata = serde_json::json!({
"announcement_type": "critical",
"scheduled_at": scheduled_at.to_rfc3339(),
"blocks_new_sessions": true,
"maintenance_shutdown": true,
});
let metadata_str = metadata.to_string();
// STDB — umiddelbar broadcast til alle klienter
stdb.create_node(
&node_id_str,
"system_announcement",
"Planlagt vedlikehold",
&format!("Systemet stenges for vedlikehold. Lagre arbeidet ditt."),
"open",
&metadata_str,
&created_by_str,
)
.await
.map_err(|e| format!("STDB-feil: {e}"))?;
// PG — persistent lagring
sqlx::query(
r#"INSERT INTO nodes (id, node_kind, title, content, visibility, metadata, created_by)
VALUES ($1, 'system_announcement', 'Planlagt vedlikehold',
'Systemet stenges for vedlikehold. Lagre arbeidet ditt.',
'open', $2, $3)"#,
)
.bind(node_id)
.bind(&metadata)
.bind(initiated_by)
.execute(db)
.await
.map_err(|e| format!("PG-feil: {e}"))?;
tracing::info!(
announcement_id = %node_id,
scheduled_at = %scheduled_at,
initiated_by = %initiated_by,
"Vedlikehold initiert"
);
// Start bakgrunnsoppgave for shutdown-koordinering
let state = self.clone();
let db2 = db.clone();
let stdb2 = stdb.clone();
let handle = tokio::spawn(async move {
shutdown_coordinator(state, db2, stdb2, scheduled_at, node_id).await;
});
// Lagre tilstand
let mut inner = self.inner.lock().await;
inner.scheduled_at = Some(scheduled_at);
inner.announcement_node_id = Some(node_id);
inner.initiated_by = Some(initiated_by);
inner.abort_handle = Some(handle.abort_handle());
self.initiated.store(true, Ordering::Relaxed);
Ok(node_id)
}
/// Avbryt planlagt vedlikehold.
pub async fn cancel(
&self,
db: &PgPool,
stdb: &StdbClient,
) -> Result<(), String> {
if !self.is_initiated() {
return Err("Ingen vedlikehold er initiert".to_string());
}
let mut inner = self.inner.lock().await;
// Avbryt bakgrunnsoppgaven
if let Some(handle) = inner.abort_handle.take() {
handle.abort();
}
// Slett varselet fra STDB og PG
if let Some(nid) = inner.announcement_node_id.take() {
let nid_str = nid.to_string();
if let Err(e) = stdb.delete_node(&nid_str).await {
tracing::warn!("Kunne ikke slette varsel fra STDB: {e}");
}
if let Err(e) = sqlx::query("DELETE FROM nodes WHERE id = $1")
.bind(nid)
.execute(db)
.await
{
tracing::warn!("Kunne ikke slette varsel fra PG: {e}");
}
}
inner.scheduled_at = None;
inner.initiated_by = None;
self.initiated.store(false, Ordering::Relaxed);
self.active.store(false, Ordering::Relaxed);
tracing::info!("Vedlikehold avbrutt");
Ok(())
}
}
/// Hent kjørende jobber fra job_queue.
async fn fetch_running_jobs(db: &PgPool) -> Result<Vec<RunningJob>, sqlx::Error> {
let rows = sqlx::query_as::<_, (Uuid, String, Option<chrono::DateTime<Utc>>, Option<Uuid>)>(
"SELECT id, job_type, started_at, collection_node_id FROM job_queue WHERE status = 'running'"
)
.fetch_all(db)
.await?;
Ok(rows.into_iter().map(|(id, job_type, started_at, collection_node_id)| {
RunningJob {
id,
job_type,
started_at: started_at.map(|dt| dt.to_rfc3339()),
collection_node_id,
}
}).collect())
}
/// Bakgrunnsoppgave som koordinerer nedstengningen.
///
/// 1. Venter til scheduled_at
/// 2. Setter maintenance_active (blokkerer nye LiveKit-rom + jobbkø)
/// 3. Venter på at kjørende jobber fullføres (maks 5 min timeout)
/// 4. Avslutter prosessen (systemd restarter)
async fn shutdown_coordinator(
state: MaintenanceState,
db: PgPool,
stdb: StdbClient,
scheduled_at: DateTime<Utc>,
announcement_id: Uuid,
) {
// Vent til vedlikeholdstidspunkt
let wait_duration = (scheduled_at - Utc::now()).to_std().unwrap_or_default();
if !wait_duration.is_zero() {
tracing::info!(
seconds = wait_duration.as_secs(),
"Venter til vedlikeholdstidspunkt"
);
tokio::time::sleep(wait_duration).await;
}
// Aktiver vedlikeholdsmodus
state.active.store(true, Ordering::Relaxed);
tracing::warn!("Vedlikeholdsmodus AKTIV — nye jobber og LiveKit-rom blokkert");
// Oppdater varselet til å reflektere at vedlikehold er i gang
let active_meta = serde_json::json!({
"announcement_type": "critical",
"scheduled_at": scheduled_at.to_rfc3339(),
"blocks_new_sessions": true,
"maintenance_shutdown": true,
"maintenance_active": true,
});
let nid_str = announcement_id.to_string();
let _ = stdb.update_node(
&nid_str,
"system_announcement",
"Vedlikehold pågår",
"Systemet stenger ned. Vent til vedlikeholdet er ferdig.",
"open",
&active_meta.to_string(),
).await;
// Vent på at kjørende jobber fullføres (maks 5 minutter)
let timeout = std::time::Duration::from_secs(300);
let start = std::time::Instant::now();
loop {
match fetch_running_jobs(&db).await {
Ok(jobs) if jobs.is_empty() => {
tracing::info!("Ingen kjørende jobber — klar for restart");
break;
}
Ok(jobs) => {
tracing::info!(
count = jobs.len(),
"Venter på {} kjørende jobber",
jobs.len()
);
}
Err(e) => {
tracing::error!("Feil ved sjekk av kjørende jobber: {e}");
}
}
if start.elapsed() > timeout {
tracing::warn!("Timeout (5 min) — tvinger nedstengning med kjørende jobber");
break;
}
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
}
// Slett varselet (klienter vil se at tilkoblingen forsvinner)
let _ = stdb.delete_node(&nid_str).await;
if let Err(e) = sqlx::query("DELETE FROM nodes WHERE id = $1")
.bind(announcement_id)
.execute(&db)
.await
{
tracing::warn!("Kunne ikke slette varsel fra PG: {e}");
}
tracing::warn!("Avslutter prosessen for vedlikehold — systemd vil restarte");
// Gi litt tid til at siste logglinjer skrives
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
// Avslutt prosessen — systemd vil restarte maskinrommet
std::process::exit(0);
}

View file

@ -164,8 +164,7 @@ Uavhengige faser kan fortsatt plukkes.
## Fase 15: Adminpanel
- [x] 15.1 Systemvarsler: varslingsnode (`node_kind='system_announcement'`) med type (info/warning/critical), nedtelling og utløp. Frontend viser banner/toast for alle aktive klienter via STDB. Ref: `docs/concepts/adminpanelet.md`.
- [~] 15.2 Graceful shutdown: admin setter vedlikeholdstidspunkt → nedtelling i frontend → nye LiveKit-rom blokkeres → jobbkø stopper → vent på aktive jobber → restart. Vis aktive sesjoner før bekreftelse.
> Påbegynt: 2026-03-18T03:22
- [x] 15.2 Graceful shutdown: admin setter vedlikeholdstidspunkt → nedtelling i frontend → nye LiveKit-rom blokkeres → jobbkø stopper → vent på aktive jobber → restart. Vis aktive sesjoner før bekreftelse.
- [ ] 15.3 Jobbkø-oversikt: admin-UI for aktive, ventende og feilede jobber. Filtrer på type/samling/status. Manuell retry og avbryt.
- [ ] 15.4 AI Gateway-konfigurasjon: admin-UI for modelloversikt, API-nøkler (kryptert), ruting-regler per jobbtype, fallback-kjeder, forbruksoversikt per samling. Ref: `docs/infra/ai_gateway.md`.
- [ ] 15.5 Ressursstyring: prioritetsregler mellom jobbtyper, ressursgrenser per worker, ressurs-governor for automatisk nedprioritering under aktive LiveKit-sesjoner, disk-status med varsling.