Fullfører oppgave 15.6: Serverhelse-dashboard

Nytt admin-dashboard for sanntids serverhelse med fire hoveddeler:

1. Tjeneste-status: Parallelle helsesjekker for alle 7 tjenester
   (PG, STDB, Caddy, Authentik, LiteLLM, Whisper, LiveKit) med
   latens-måling og statusrapportering (up/down/degraded).

2. System-metrikker: CPU-load via /proc/loadavg, minne via
   /proc/meminfo, disk via statvfs, oppetid via /proc/uptime.
   Vises med progress-bars og fargekodede terskler.

3. PG-statistikk: Aktive tilkoblinger, maks-tilkoblinger,
   databasestørrelse og aktive spørringer.

4. Logg-tilgang: Filtrerbar visning av logger fra alle tjenester.
   Bruker journalctl for systemd-tjenester og docker logs for
   containere. Konfigurerbart antall linjer per tjeneste.

Backend: health.rs med tokio::join! for parallelle sjekker.
Frontend: /admin/health med auto-polling hvert 10. sekund.
Backup-sjekk rapporterer ok/stale/missing (ingen backup satt opp ennå).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
vegard 2026-03-18 04:12:54 +00:00
parent 9d4902ec48
commit 56b7df8bf8
6 changed files with 945 additions and 2 deletions

View file

@ -107,6 +107,27 @@ Sanntidsoversikt over systemtilstand.
- **Logg-tilgang:** Siste feil og advarsler fra alle tjenester, filtrerbart - **Logg-tilgang:** Siste feil og advarsler fra alle tjenester, filtrerbart
- **Backup-status:** Siste vellykkede backup per type, neste planlagte kjøring - **Backup-status:** Siste vellykkede backup per type, neste planlagte kjøring
#### Implementert (oppgave 15.6)
- **Backend:** `maskinrommet/src/health.rs` — parallelle helsesjekker for alle
tjenester, system-metrikker via `/proc`, PG-statistikk, backup-sjekk, logg-tilgang
- **API-endepunkter:**
- `GET /admin/health` — komplett helse-dashboard (tjeneste-status, CPU/minne/disk,
PG-stats, backup-status)
- `GET /admin/health/logs?service=&lines=` — logg-tilgang per tjeneste eller alle
- **Frontend:** `/admin/health` — dashboard med tjenestekort (opp/nede/degradert med
latens), system-metrikker med progress-bars, PG-tilkoblinger og DB-størrelse,
backup-status, og filtrerbar logg-visning
- **Tjeneste-sjekker:** PG (SQL ping), STDB (noop-kall), Caddy (admin-API),
Authentik (health-endpoint), LiteLLM/Whisper/LiveKit (HTTP health). Alle kjøres
parallelt med 5s timeout
- **Metrikker:** CPU load via `/proc/loadavg`, minne via `/proc/meminfo`,
disk via `statvfs`, oppetid via `/proc/uptime`
- **Logger:** Systemd-journal for native tjenester (maskinrommet, caddy),
`docker logs` for containere. Filtrerbart per tjeneste, konfigurerbart antall linjer
- **Backup:** Sjekker standard backup-kataloger for PG-dump og CAS-filer.
Rapporterer status som ok/stale/missing basert på filens alder
### 5. Bruker- og tilgangsoversikt ### 5. Bruker- og tilgangsoversikt
- **Aktive brukere:** Hvem er pålogget nå, siste aktivitet - **Aktive brukere:** Hvem er pålogget nå, siste aktivitet

View file

@ -945,3 +945,96 @@ export function deleteAiRouting(
): Promise<{ success: boolean }> { ): Promise<{ success: boolean }> {
return post(accessToken, '/admin/ai/delete_routing', { job_type: jobType }); return post(accessToken, '/admin/ai/delete_routing', { job_type: jobType });
} }
// =============================================================================
// Serverhelse-dashboard (oppgave 15.6)
// =============================================================================
export interface ServiceStatus {
name: string;
status: 'up' | 'down' | 'degraded';
latency_ms: number | null;
details: string | null;
}
export interface SystemMetrics {
cpu_usage_percent: number;
cpu_cores: number;
load_avg: [number, number, number];
memory_total_bytes: number;
memory_used_bytes: number;
memory_available_bytes: number;
memory_usage_percent: number;
disk: {
mount_point: string;
total_bytes: number;
used_bytes: number;
available_bytes: number;
usage_percent: number;
alert_level: string | null;
};
uptime_seconds: number;
}
export interface BackupInfo {
backup_type: string;
last_success: string | null;
path: string | null;
status: 'ok' | 'missing' | 'stale';
}
export interface PgStats {
active_connections: number;
max_connections: number;
database_size_bytes: number;
active_queries: number;
}
export interface HealthDashboard {
services: ServiceStatus[];
metrics: SystemMetrics;
backups: BackupInfo[];
pg_stats: PgStats;
}
export interface LogEntry {
timestamp: string;
service: string;
level: string;
message: string;
}
export interface LogsResponse {
entries: LogEntry[];
}
/** Hent komplett serverhelse-dashboard. */
export async function fetchHealthDashboard(accessToken: string): Promise<HealthDashboard> {
const res = await fetch(`${BASE_URL}/admin/health`, {
headers: { Authorization: `Bearer ${accessToken}` }
});
if (!res.ok) {
const body = await res.text();
throw new Error(`health dashboard failed (${res.status}): ${body}`);
}
return res.json();
}
/** Hent logger for en tjeneste (eller alle). */
export async function fetchHealthLogs(
accessToken: string,
params: { service?: string; lines?: number } = {}
): Promise<LogsResponse> {
const qs = new URLSearchParams();
if (params.service) qs.set('service', params.service);
if (params.lines) qs.set('lines', String(params.lines));
const query = qs.toString();
const res = await fetch(`${BASE_URL}/admin/health/logs${query ? `?${query}` : ''}`, {
headers: { Authorization: `Bearer ${accessToken}` }
});
if (!res.ok) {
const body = await res.text();
throw new Error(`health logs failed (${res.status}): ${body}`);
}
return res.json();
}

View file

@ -0,0 +1,308 @@
<script lang="ts">
/**
* Admin — Serverhelse-dashboard (oppgave 15.6)
*
* Viser tjeneste-status, system-metrikker (CPU, minne, disk),
* backup-status og logg-tilgang for alle tjenester i stacken.
*/
import { page } from '$app/stores';
import {
fetchHealthDashboard,
fetchHealthLogs,
type HealthDashboard,
type LogsResponse
} from '$lib/api';
const session = $derived($page.data.session as Record<string, unknown> | undefined);
const accessToken = $derived(session?.accessToken as string | undefined);
let dashboard = $state<HealthDashboard | null>(null);
let logs = $state<LogsResponse | null>(null);
let error = $state<string | null>(null);
let logsError = $state<string | null>(null);
// Logg-filter
let selectedService = $state<string>('');
let logLines = $state(50);
let showLogs = $state(false);
// Poll dashboard hvert 10. sekund
$effect(() => {
if (!accessToken) return;
loadDashboard();
const interval = setInterval(loadDashboard, 10_000);
return () => clearInterval(interval);
});
async function loadDashboard() {
if (!accessToken) return;
try {
dashboard = await fetchHealthDashboard(accessToken);
error = null;
} catch (e) {
error = String(e);
}
}
async function loadLogs() {
if (!accessToken) return;
logsError = null;
try {
logs = await fetchHealthLogs(accessToken, {
service: selectedService || undefined,
lines: logLines
});
} catch (e) {
logsError = String(e);
}
}
function statusColor(status: string): string {
if (status === 'up' || status === 'ok') return 'text-green-400';
if (status === 'down' || status === 'missing') return 'text-red-400';
if (status === 'degraded' || status === 'stale') return 'text-yellow-400';
return 'text-neutral-400';
}
function statusDot(status: string): string {
if (status === 'up') return 'bg-green-400';
if (status === 'down') return 'bg-red-400';
if (status === 'degraded') return 'bg-yellow-400';
return 'bg-neutral-400';
}
function formatBytes(bytes: number): string {
if (bytes === 0) return '0 B';
const units = ['B', 'KB', 'MB', 'GB', 'TB'];
const i = Math.floor(Math.log(bytes) / Math.log(1024));
return `${(bytes / Math.pow(1024, i)).toFixed(1)} ${units[i]}`;
}
function formatUptime(seconds: number): string {
const days = Math.floor(seconds / 86400);
const hours = Math.floor((seconds % 86400) / 3600);
const mins = Math.floor((seconds % 3600) / 60);
if (days > 0) return `${days}d ${hours}t ${mins}m`;
if (hours > 0) return `${hours}t ${mins}m`;
return `${mins}m`;
}
function logLevelColor(level: string): string {
if (level === 'error') return 'text-red-400';
if (level === 'warn') return 'text-yellow-400';
if (level === 'info') return 'text-blue-400';
return 'text-neutral-500';
}
const allServices = ['maskinrommet', 'caddy', 'postgres', 'spacetimedb', 'authentik', 'litellm', 'whisper', 'livekit'];
</script>
<div class="min-h-screen bg-neutral-950 text-neutral-100 p-4 sm:p-8">
<div class="max-w-6xl mx-auto space-y-6">
<div class="flex items-center justify-between">
<h1 class="text-2xl font-bold">Serverhelse</h1>
<a href="/admin" class="text-sm text-neutral-400 hover:text-white">Tilbake til admin</a>
</div>
{#if error}
<div class="bg-red-900/30 border border-red-700 rounded-lg p-4 text-red-300 text-sm">
{error}
</div>
{/if}
{#if !dashboard}
<p class="text-neutral-500">Laster...</p>
{:else}
<!-- Tjeneste-status -->
<section>
<h2 class="text-lg font-semibold mb-3">Tjenester</h2>
<div class="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-3 xl:grid-cols-4 gap-3">
{#each dashboard.services as svc}
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="flex items-center gap-2 mb-1">
<span class="w-2.5 h-2.5 rounded-full {statusDot(svc.status)}"></span>
<span class="font-medium">{svc.name}</span>
</div>
<div class="text-sm {statusColor(svc.status)} capitalize">{svc.status}</div>
{#if svc.latency_ms !== null}
<div class="text-xs text-neutral-500 mt-1">{svc.latency_ms} ms</div>
{/if}
{#if svc.details}
<div class="text-xs text-neutral-600 mt-1 truncate" title={svc.details}>{svc.details}</div>
{/if}
</div>
{/each}
</div>
</section>
<!-- System-metrikker -->
<section>
<h2 class="text-lg font-semibold mb-3">System</h2>
<div class="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-3">
<!-- CPU -->
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400 mb-1">CPU</div>
<div class="text-2xl font-mono">{dashboard.metrics.cpu_usage_percent.toFixed(0)}%</div>
<div class="text-xs text-neutral-500 mt-1">
{dashboard.metrics.cpu_cores} kjerner | Load: {dashboard.metrics.load_avg.map(v => v.toFixed(2)).join(', ')}
</div>
</div>
<!-- Minne -->
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400 mb-1">Minne</div>
<div class="text-2xl font-mono">{dashboard.metrics.memory_usage_percent.toFixed(0)}%</div>
<div class="text-xs text-neutral-500 mt-1">
{formatBytes(dashboard.metrics.memory_used_bytes)} / {formatBytes(dashboard.metrics.memory_total_bytes)}
</div>
<!-- Bar -->
<div class="mt-2 h-1.5 bg-neutral-800 rounded-full overflow-hidden">
<div
class="h-full rounded-full transition-all {dashboard.metrics.memory_usage_percent > 90 ? 'bg-red-500' : dashboard.metrics.memory_usage_percent > 70 ? 'bg-yellow-500' : 'bg-green-500'}"
style="width: {Math.min(dashboard.metrics.memory_usage_percent, 100)}%"
></div>
</div>
</div>
<!-- Disk -->
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400 mb-1">Disk</div>
<div class="text-2xl font-mono">{dashboard.metrics.disk.usage_percent.toFixed(1)}%</div>
<div class="text-xs text-neutral-500 mt-1">
{formatBytes(dashboard.metrics.disk.used_bytes)} / {formatBytes(dashboard.metrics.disk.total_bytes)}
</div>
{#if dashboard.metrics.disk.alert_level}
<div class="text-xs text-red-400 mt-1 uppercase">{dashboard.metrics.disk.alert_level}</div>
{/if}
<div class="mt-2 h-1.5 bg-neutral-800 rounded-full overflow-hidden">
<div
class="h-full rounded-full transition-all {dashboard.metrics.disk.usage_percent > 90 ? 'bg-red-500' : dashboard.metrics.disk.usage_percent > 80 ? 'bg-yellow-500' : 'bg-green-500'}"
style="width: {Math.min(dashboard.metrics.disk.usage_percent, 100)}%"
></div>
</div>
</div>
<!-- Oppetid -->
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400 mb-1">Oppetid</div>
<div class="text-2xl font-mono">{formatUptime(dashboard.metrics.uptime_seconds)}</div>
<div class="text-xs text-neutral-500 mt-1">Siden siste reboot</div>
</div>
</div>
</section>
<!-- PG-statistikk -->
<section>
<h2 class="text-lg font-semibold mb-3">PostgreSQL</h2>
<div class="grid grid-cols-2 sm:grid-cols-4 gap-3">
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400">Tilkoblinger</div>
<div class="text-xl font-mono">{dashboard.pg_stats.active_connections} / {dashboard.pg_stats.max_connections}</div>
</div>
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400">DB-storrelse</div>
<div class="text-xl font-mono">{formatBytes(dashboard.pg_stats.database_size_bytes)}</div>
</div>
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400">Aktive sporringer</div>
<div class="text-xl font-mono">{dashboard.pg_stats.active_queries}</div>
</div>
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="text-sm text-neutral-400">Frie tilkoblinger</div>
<div class="text-xl font-mono">{dashboard.pg_stats.max_connections - dashboard.pg_stats.active_connections}</div>
</div>
</div>
</section>
<!-- Backup-status -->
<section>
<h2 class="text-lg font-semibold mb-3">Backup</h2>
<div class="grid grid-cols-1 sm:grid-cols-2 gap-3">
{#each dashboard.backups as backup}
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4">
<div class="flex items-center gap-2">
<span class="w-2 h-2 rounded-full {statusDot(backup.status === 'ok' ? 'up' : backup.status === 'stale' ? 'degraded' : 'down')}"></span>
<span class="font-medium">{backup.backup_type}</span>
</div>
<div class="text-sm {statusColor(backup.status)} mt-1 capitalize">{backup.status}</div>
{#if backup.last_success}
<div class="text-xs text-neutral-500 mt-1">Siste: {new Date(backup.last_success).toLocaleString('nb-NO')}</div>
{:else}
<div class="text-xs text-neutral-500 mt-1">Ingen backup funnet</div>
{/if}
</div>
{/each}
</div>
</section>
<!-- Logg-tilgang -->
<section>
<div class="flex items-center justify-between mb-3">
<h2 class="text-lg font-semibold">Logger</h2>
<button
class="text-sm text-neutral-400 hover:text-white"
onclick={() => { showLogs = !showLogs; if (showLogs) loadLogs(); }}
>
{showLogs ? 'Skjul' : 'Vis logger'}
</button>
</div>
{#if showLogs}
<div class="bg-neutral-900 border border-neutral-800 rounded-lg p-4 space-y-3">
<!-- Filtre -->
<div class="flex flex-wrap gap-3 items-center">
<select
bind:value={selectedService}
class="bg-neutral-800 border border-neutral-700 rounded px-3 py-1.5 text-sm"
onchange={() => loadLogs()}
>
<option value="">Alle tjenester</option>
{#each allServices as svc}
<option value={svc}>{svc}</option>
{/each}
</select>
<select
bind:value={logLines}
class="bg-neutral-800 border border-neutral-700 rounded px-3 py-1.5 text-sm"
onchange={() => loadLogs()}
>
<option value={25}>25 linjer</option>
<option value={50}>50 linjer</option>
<option value={100}>100 linjer</option>
<option value={200}>200 linjer</option>
</select>
<button
class="bg-neutral-800 hover:bg-neutral-700 px-3 py-1.5 rounded text-sm"
onclick={() => loadLogs()}
>
Oppdater
</button>
</div>
{#if logsError}
<div class="text-red-400 text-sm">{logsError}</div>
{/if}
{#if logs}
<div class="max-h-96 overflow-y-auto font-mono text-xs space-y-0.5">
{#each logs.entries as entry}
<div class="flex gap-2 py-0.5 border-b border-neutral-800/50">
<span class="text-neutral-600 shrink-0 w-40 truncate">{entry.timestamp}</span>
<span class="text-neutral-500 shrink-0 w-24">{entry.service}</span>
<span class="shrink-0 w-12 {logLevelColor(entry.level)}">{entry.level}</span>
<span class="text-neutral-300 break-all">{entry.message}</span>
</div>
{/each}
{#if logs.entries.length === 0}
<div class="text-neutral-500 py-4 text-center">Ingen logger funnet</div>
{/if}
</div>
{:else}
<div class="text-neutral-500 text-sm">Laster logger...</div>
{/if}
</div>
{/if}
</section>
{/if}
</div>
</div>

518
maskinrommet/src/health.rs Normal file
View file

@ -0,0 +1,518 @@
// Serverhelse-dashboard — tjeneste-status, metrikker, backup-status, logg-tilgang.
//
// Sjekker alle tjenester i stacken (PG, STDB, Caddy, Authentik, LiteLLM,
// Whisper, LiveKit) og samler system-metrikker (CPU, minne, disk).
//
// Ref: docs/concepts/adminpanelet.md § 4 "Serverhelse", oppgave 15.6
use axum::{extract::{Query, State}, http::StatusCode, Json};
use serde::{Deserialize, Serialize};
use sqlx::PgPool;
use crate::auth::AuthUser;
use crate::AppState;
// =============================================================================
// Typer
// =============================================================================
#[derive(Serialize)]
pub struct ServiceStatus {
pub name: String,
pub status: String, // "up", "down", "degraded"
pub latency_ms: Option<u64>,
pub details: Option<String>,
}
#[derive(Serialize)]
pub struct SystemMetrics {
pub cpu_usage_percent: f32,
pub cpu_cores: usize,
pub load_avg: [f32; 3],
pub memory_total_bytes: u64,
pub memory_used_bytes: u64,
pub memory_available_bytes: u64,
pub memory_usage_percent: f32,
pub disk: crate::resources::DiskStatus,
pub uptime_seconds: u64,
}
#[derive(Serialize)]
pub struct BackupInfo {
pub backup_type: String,
pub last_success: Option<String>,
pub path: Option<String>,
pub status: String, // "ok", "missing", "stale"
}
#[derive(Serialize)]
pub struct LogEntry {
pub timestamp: String,
pub service: String,
pub level: String,
pub message: String,
}
#[derive(Serialize)]
pub struct HealthDashboard {
pub services: Vec<ServiceStatus>,
pub metrics: SystemMetrics,
pub backups: Vec<BackupInfo>,
pub pg_stats: PgStats,
}
#[derive(Serialize)]
pub struct PgStats {
pub active_connections: i64,
pub max_connections: i64,
pub database_size_bytes: i64,
pub active_queries: i64,
}
#[derive(Deserialize)]
pub struct LogsQuery {
pub service: Option<String>,
pub lines: Option<usize>,
}
#[derive(Serialize)]
pub struct LogsResponse {
pub entries: Vec<LogEntry>,
}
// =============================================================================
// Tjeneste-sjekker
// =============================================================================
async fn check_pg(db: &PgPool) -> ServiceStatus {
let start = std::time::Instant::now();
match sqlx::query_scalar::<_, i32>("SELECT 1").fetch_one(db).await {
Ok(_) => ServiceStatus {
name: "PostgreSQL".to_string(),
status: "up".to_string(),
latency_ms: Some(start.elapsed().as_millis() as u64),
details: None,
},
Err(e) => ServiceStatus {
name: "PostgreSQL".to_string(),
status: "down".to_string(),
latency_ms: None,
details: Some(format!("{e}")),
},
}
}
async fn check_stdb(stdb: &crate::stdb::StdbClient) -> ServiceStatus {
let start = std::time::Instant::now();
match stdb.delete_node("__healthcheck_nonexistent__").await {
Ok(()) => ServiceStatus {
name: "SpacetimeDB".to_string(),
status: "up".to_string(),
latency_ms: Some(start.elapsed().as_millis() as u64),
details: None,
},
Err(e) => ServiceStatus {
name: "SpacetimeDB".to_string(),
status: "down".to_string(),
latency_ms: None,
details: Some(format!("{e}")),
},
}
}
/// Sjekk en HTTP-tjeneste med timeout.
async fn check_http_service(name: &str, url: &str) -> ServiceStatus {
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(5))
.build()
.unwrap();
let start = std::time::Instant::now();
match client.get(url).send().await {
Ok(resp) => {
let latency = start.elapsed().as_millis() as u64;
let status_code = resp.status();
if status_code.is_success() || status_code.as_u16() == 401 || status_code.as_u16() == 403 {
// 401/403 betyr at tjenesten kjører, bare auth mangler
ServiceStatus {
name: name.to_string(),
status: "up".to_string(),
latency_ms: Some(latency),
details: None,
}
} else {
ServiceStatus {
name: name.to_string(),
status: "degraded".to_string(),
latency_ms: Some(latency),
details: Some(format!("HTTP {}", status_code)),
}
}
}
Err(e) => ServiceStatus {
name: name.to_string(),
status: "down".to_string(),
latency_ms: None,
details: Some(format!("{e}")),
},
}
}
async fn check_caddy() -> ServiceStatus {
// Caddy kjører lokalt, sjekk admin-API
check_http_service("Caddy", "http://localhost:2019/config/").await
}
async fn check_authentik() -> ServiceStatus {
// Authentik via Caddy
check_http_service("Authentik", "https://auth.sidelinja.org/-/health/ready/").await
}
async fn check_litellm() -> ServiceStatus {
let url = std::env::var("AI_GATEWAY_URL")
.unwrap_or_else(|_| "http://localhost:4000".to_string());
check_http_service("LiteLLM", &format!("{url}/health")).await
}
async fn check_whisper() -> ServiceStatus {
let url = std::env::var("WHISPER_URL")
.unwrap_or_else(|_| "http://localhost:8000".to_string());
check_http_service("Whisper", &format!("{url}/health")).await
}
async fn check_livekit() -> ServiceStatus {
let url = std::env::var("LIVEKIT_URL")
.unwrap_or_else(|_| "http://localhost:7880".to_string());
check_http_service("LiveKit", &url).await
}
// =============================================================================
// System-metrikker
// =============================================================================
fn read_cpu_usage() -> f32 {
// Les /proc/stat for CPU-bruk (snapshot, ikke gjennomsnitt).
// For enkel implementering bruker vi load average i stedet.
// CPU-prosent beregnes fra load_avg[0] / antall kjerner.
let cores = num_cpus();
let load = read_load_avg();
// Tilnærming: load / cores * 100, capped til 100
((load[0] / cores as f32) * 100.0).min(100.0)
}
fn num_cpus() -> usize {
std::thread::available_parallelism()
.map(|n| n.get())
.unwrap_or(1)
}
fn read_load_avg() -> [f32; 3] {
let content = std::fs::read_to_string("/proc/loadavg").unwrap_or_default();
let parts: Vec<f32> = content
.split_whitespace()
.take(3)
.filter_map(|s| s.parse().ok())
.collect();
[
parts.first().copied().unwrap_or(0.0),
parts.get(1).copied().unwrap_or(0.0),
parts.get(2).copied().unwrap_or(0.0),
]
}
fn read_memory_info() -> (u64, u64, u64) {
// Les /proc/meminfo
let content = std::fs::read_to_string("/proc/meminfo").unwrap_or_default();
let mut total: u64 = 0;
let mut available: u64 = 0;
for line in content.lines() {
if let Some(val) = line.strip_prefix("MemTotal:") {
total = parse_meminfo_kb(val) * 1024;
} else if let Some(val) = line.strip_prefix("MemAvailable:") {
available = parse_meminfo_kb(val) * 1024;
}
}
let used = total.saturating_sub(available);
(total, used, available)
}
fn parse_meminfo_kb(s: &str) -> u64 {
s.trim().split_whitespace().next()
.and_then(|v| v.parse().ok())
.unwrap_or(0)
}
fn read_uptime() -> u64 {
let content = std::fs::read_to_string("/proc/uptime").unwrap_or_default();
content.split_whitespace().next()
.and_then(|v| v.parse::<f64>().ok())
.map(|v| v as u64)
.unwrap_or(0)
}
fn collect_metrics() -> SystemMetrics {
let load = read_load_avg();
let cores = num_cpus();
let cpu = read_cpu_usage();
let (mem_total, mem_used, mem_available) = read_memory_info();
let mem_percent = if mem_total > 0 {
(mem_used as f64 / mem_total as f64 * 100.0) as f32
} else {
0.0
};
let cas_root = std::env::var("CAS_ROOT")
.unwrap_or_else(|_| "/srv/synops/media/cas".to_string());
let disk = crate::resources::check_disk_usage(&cas_root)
.unwrap_or_else(|_| crate::resources::check_disk_usage("/").unwrap_or(
crate::resources::DiskStatus {
mount_point: "/".to_string(),
total_bytes: 0,
used_bytes: 0,
available_bytes: 0,
usage_percent: 0.0,
alert_level: None,
}
));
SystemMetrics {
cpu_usage_percent: cpu,
cpu_cores: cores,
load_avg: load,
memory_total_bytes: mem_total,
memory_used_bytes: mem_used,
memory_available_bytes: mem_available,
memory_usage_percent: mem_percent,
disk,
uptime_seconds: read_uptime(),
}
}
// =============================================================================
// PG-statistikk
// =============================================================================
async fn collect_pg_stats(db: &PgPool) -> PgStats {
let active = sqlx::query_scalar::<_, i64>(
"SELECT count(*) FROM pg_stat_activity WHERE state = 'active'"
)
.fetch_one(db)
.await
.unwrap_or(0);
let max_conn = sqlx::query_scalar::<_, i64>(
"SELECT setting::bigint FROM pg_settings WHERE name = 'max_connections'"
)
.fetch_one(db)
.await
.unwrap_or(100);
let db_size = sqlx::query_scalar::<_, i64>(
"SELECT pg_database_size(current_database())"
)
.fetch_one(db)
.await
.unwrap_or(0);
let queries = sqlx::query_scalar::<_, i64>(
"SELECT count(*) FROM pg_stat_activity WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'"
)
.fetch_one(db)
.await
.unwrap_or(0);
PgStats {
active_connections: active,
max_connections: max_conn,
database_size_bytes: db_size,
active_queries: queries,
}
}
// =============================================================================
// Backup-status
// =============================================================================
fn check_backups() -> Vec<BackupInfo> {
// Sjekk om det finnes PG-dumper i standard backup-kataloger
let backup_paths = [
"/srv/synops/backups",
"/srv/synops/data/backups",
"/var/backups/synops",
];
let mut backups = Vec::new();
// PG-dump
let mut pg_backup = BackupInfo {
backup_type: "PostgreSQL dump".to_string(),
last_success: None,
path: None,
status: "missing".to_string(),
};
for dir in &backup_paths {
if let Ok(entries) = std::fs::read_dir(dir) {
for entry in entries.flatten() {
let name = entry.file_name().to_string_lossy().to_string();
if name.contains("pg") || name.ends_with(".sql") || name.ends_with(".dump") {
if let Ok(meta) = entry.metadata() {
if let Ok(modified) = meta.modified() {
let age = modified.elapsed().unwrap_or_default();
let ts = chrono::DateTime::<chrono::Utc>::from(modified);
pg_backup.last_success = Some(ts.to_rfc3339());
pg_backup.path = Some(entry.path().to_string_lossy().to_string());
pg_backup.status = if age.as_secs() < 86400 {
"ok".to_string()
} else if age.as_secs() < 7 * 86400 {
"stale".to_string()
} else {
"stale".to_string()
};
}
}
}
}
}
}
backups.push(pg_backup);
// CAS (media) backup
backups.push(BackupInfo {
backup_type: "CAS media".to_string(),
last_success: None,
path: None,
status: "missing".to_string(),
});
backups
}
// =============================================================================
// Logg-tilgang
// =============================================================================
fn read_service_logs(service: &str, max_lines: usize) -> Vec<LogEntry> {
// Bruk journalctl for systemd-tjenester, docker logs for containere
let cmd = match service {
"maskinrommet" | "caddy" | "sveltekit" => {
format!("journalctl -u {service} --no-pager -n {max_lines} --output=short-iso 2>/dev/null")
}
"postgres" | "spacetimedb" | "authentik" | "litellm" | "whisper" | "livekit" => {
let container = match service {
"postgres" => "sidelinja-postgres-1",
"spacetimedb" => "sidelinja-spacetimedb-1",
"authentik" => "sidelinja-authentik-server-1",
"litellm" => "sidelinja-ai-gateway-1",
"whisper" => "sidelinja-faster-whisper-1",
"livekit" => "sidelinja-livekit-1",
_ => return Vec::new(),
};
format!("docker logs --tail {max_lines} --timestamps {container} 2>&1")
}
_ => return Vec::new(),
};
let output = std::process::Command::new("bash")
.arg("-c")
.arg(&cmd)
.output();
match output {
Ok(out) => {
let text = String::from_utf8_lossy(&out.stdout);
text.lines()
.rev() // nyeste først
.take(max_lines)
.map(|line| {
// Prøv å parse tidsstempel fra starten av linjen
let (ts, msg) = if line.len() > 24 {
(line[..24].trim().to_string(), line[24..].trim().to_string())
} else {
(String::new(), line.to_string())
};
let level = if msg.contains("ERROR") || msg.contains("error") || msg.contains("ERR") {
"error"
} else if msg.contains("WARN") || msg.contains("warn") {
"warn"
} else if msg.contains("INFO") || msg.contains("info") {
"info"
} else {
"debug"
};
LogEntry {
timestamp: ts,
service: service.to_string(),
level: level.to_string(),
message: msg,
}
})
.collect()
}
Err(_) => Vec::new(),
}
}
// =============================================================================
// API-handlers
// =============================================================================
/// GET /admin/health — komplett serverhelse-dashboard.
pub async fn health_dashboard(
State(state): State<AppState>,
_user: AuthUser,
) -> Result<Json<HealthDashboard>, (StatusCode, Json<crate::intentions::ErrorResponse>)> {
// Kjør alle tjeneste-sjekker parallelt
let (pg, stdb, caddy, authentik, litellm, whisper, livekit) = tokio::join!(
check_pg(&state.db),
check_stdb(&state.stdb),
check_caddy(),
check_authentik(),
check_litellm(),
check_whisper(),
check_livekit(),
);
let services = vec![pg, stdb, caddy, authentik, litellm, whisper, livekit];
let metrics = collect_metrics();
let backups = check_backups();
let pg_stats = collect_pg_stats(&state.db).await;
Ok(Json(HealthDashboard {
services,
metrics,
backups,
pg_stats,
}))
}
/// GET /admin/health/logs?service=maskinrommet&lines=50
pub async fn health_logs(
_user: AuthUser,
Query(params): Query<LogsQuery>,
) -> Json<LogsResponse> {
let max_lines = params.lines.unwrap_or(50).min(200);
let entries = if let Some(service) = &params.service {
read_service_logs(service, max_lines)
} else {
// Alle tjenester, siste linjer fra hver
let services = ["maskinrommet", "caddy", "postgres", "spacetimedb", "authentik", "litellm", "whisper", "livekit"];
let per_service = (max_lines / services.len()).max(10);
let mut all = Vec::new();
for svc in &services {
all.extend(read_service_logs(svc, per_service));
}
// Sorter etter timestamp (nyeste først)
all.sort_by(|a, b| b.timestamp.cmp(&a.timestamp));
all.truncate(max_lines);
all
};
Json(LogsResponse { entries })
}

View file

@ -12,6 +12,7 @@ pub mod maintenance;
pub mod pruning; pub mod pruning;
mod queries; mod queries;
pub mod publishing; pub mod publishing;
pub mod health;
pub mod resources; pub mod resources;
mod rss; mod rss;
mod serving; mod serving;
@ -228,6 +229,9 @@ async fn main() {
.route("/admin/ai/delete_provider", post(ai_admin::delete_provider)) .route("/admin/ai/delete_provider", post(ai_admin::delete_provider))
.route("/admin/ai/update_routing", post(ai_admin::update_routing)) .route("/admin/ai/update_routing", post(ai_admin::update_routing))
.route("/admin/ai/delete_routing", post(ai_admin::delete_routing)) .route("/admin/ai/delete_routing", post(ai_admin::delete_routing))
// Serverhelse-dashboard (oppgave 15.6)
.route("/admin/health", get(health::health_dashboard))
.route("/admin/health/logs", get(health::health_logs))
.route("/query/audio_info", get(intentions::audio_info)) .route("/query/audio_info", get(intentions::audio_info))
.route("/pub/{slug}/feed.xml", get(rss::generate_feed)) .route("/pub/{slug}/feed.xml", get(rss::generate_feed))
.route("/pub/{slug}", get(publishing::serve_index)) .route("/pub/{slug}", get(publishing::serve_index))

View file

@ -168,8 +168,7 @@ Uavhengige faser kan fortsatt plukkes.
- [x] 15.3 Jobbkø-oversikt: admin-UI for aktive, ventende og feilede jobber. Filtrer på type/samling/status. Manuell retry og avbryt. - [x] 15.3 Jobbkø-oversikt: admin-UI for aktive, ventende og feilede jobber. Filtrer på type/samling/status. Manuell retry og avbryt.
- [x] 15.4 AI Gateway-konfigurasjon: admin-UI for modelloversikt, API-nøkler (kryptert), ruting-regler per jobbtype, fallback-kjeder, forbruksoversikt per samling. Ref: `docs/infra/ai_gateway.md`. - [x] 15.4 AI Gateway-konfigurasjon: admin-UI for modelloversikt, API-nøkler (kryptert), ruting-regler per jobbtype, fallback-kjeder, forbruksoversikt per samling. Ref: `docs/infra/ai_gateway.md`.
- [x] 15.5 Ressursstyring: prioritetsregler mellom jobbtyper, ressursgrenser per worker, ressurs-governor for automatisk nedprioritering under aktive LiveKit-sesjoner, disk-status med varsling. - [x] 15.5 Ressursstyring: prioritetsregler mellom jobbtyper, ressursgrenser per worker, ressurs-governor for automatisk nedprioritering under aktive LiveKit-sesjoner, disk-status med varsling.
- [~] 15.6 Serverhelse-dashboard: tjeneste-status (PG, STDB, Caddy, Authentik, LiteLLM, Whisper, LiveKit), metrikker (CPU, minne, disk), backup-status, logg-tilgang. - [x] 15.6 Serverhelse-dashboard: tjeneste-status (PG, STDB, Caddy, Authentik, LiteLLM, Whisper, LiveKit), metrikker (CPU, minne, disk), backup-status, logg-tilgang.
> Påbegynt: 2026-03-18T04:03
- [ ] 15.7 Ressursforbruk-logging: `resource_usage_log`-tabell i PG. Maskinrommet logger AI-tokens (inn/ut, modellnivå), Whisper-tid (sek), TTS-tegn, CAS-lagring (bytes), LiveKit-tid (deltaker-min). Båndbredde via Caddy-logg-parsing. Ref: `docs/features/ressursforbruk.md`. - [ ] 15.7 Ressursforbruk-logging: `resource_usage_log`-tabell i PG. Maskinrommet logger AI-tokens (inn/ut, modellnivå), Whisper-tid (sek), TTS-tegn, CAS-lagring (bytes), LiveKit-tid (deltaker-min). Båndbredde via Caddy-logg-parsing. Ref: `docs/features/ressursforbruk.md`.
- [ ] 15.8 Forbruksoversikt i admin: aggregert visning per samling, per ressurstype, per tidsperiode. Drill-down til jobbtype og modellnivå. - [ ] 15.8 Forbruksoversikt i admin: aggregert visning per samling, per ressurstype, per tidsperiode. Drill-down til jobbtype og modellnivå.
- [ ] 15.9 Brukersynlig forbruk: hver bruker ser eget forbruk i profil/innstillinger. Per-node forbruk synlig i node-detaljer for eiere. - [ ] 15.9 Brukersynlig forbruk: hver bruker ser eget forbruk i profil/innstillinger. Per-node forbruk synlig i node-detaljer for eiere.