synops-backup: PG-dump + CAS-filiste + metadata-snapshot (oppgave 28.6)

Nytt Rust CLI-verktøy som erstatter scripts/backup-pg.sh:
- pg_dump -Fc via docker exec (konsistent snapshot)
- CAS-manifest: liste over alle filer med hash og størrelse
- Metadata-snapshot (JSON) med tidsstempel, modus, statistikk
- --full / --incremental / --payload-json for jobbkø
- Rotasjon av gamle dumper (30 dager, kun ved --full)

Output: strukturert JSON med backup-sti og detaljer.
This commit is contained in:
vegard 2026-03-18 20:45:02 +00:00
parent 0f7fdb75e0
commit f1e6355037
6 changed files with 3006 additions and 7 deletions

View file

@ -17,22 +17,29 @@ PostgreSQL (autoritativ kilde)
└──→ Sanntid via PG LISTEN/NOTIFY → WebSocket └──→ Sanntid via PG LISTEN/NOTIFY → WebSocket
``` ```
## 1. PG-dump (daglig) ## 1. Backup via `synops-backup`
**Script:** `scripts/backup-pg.sh` **Verktøy:** `tools/synops-backup/` (Rust CLI, erstatter `scripts/backup-pg.sh`)
**Cron:** `/etc/cron.d/synops-backup``0 3 * * *` **Cron:** `/etc/cron.d/synops-backup``0 3 * * *`
**Logg:** `/srv/synops/logs/backup-pg.log`
**Dumper:** `/srv/synops/backup/pg/` **Dumper:** `/srv/synops/backup/pg/`
**Manifester:** `/srv/synops/backup/manifests/`
Prosess: Prosess:
1. Sjekker at PG-containeren kjører 1. Sjekker at PG-containeren kjører
2. `pg_dump -Fc` (custom format, komprimert) — konsistent snapshot uten nedetid 2. `pg_dump -Fc` (custom format, komprimert) — konsistent snapshot uten nedetid
3. Verifiserer at dump-filen ikke er tom 3. Verifiserer at dump-filen ikke er tom
4. Sletter dumper eldre enn 30 dager 4. Bygger CAS-manifest (liste over alle filer med hash og størrelse)
5. Skriver metadata-snapshot (JSON) med tidsstempel, modus og statistikk
6. Roterer dumper eldre enn 30 dager (kun ved `--full`)
Moduser:
- `--full` — Full PG-dump + komplett CAS-filiste + metadata
- `--incremental` — PG-dump + kun nye CAS-filer siden forrige manifest
- `--payload-json '{"mode":"full"}'` — Jobbkø-dispatch
Manuell kjøring: Manuell kjøring:
```bash ```bash
/home/vegard/synops/scripts/backup-pg.sh synops-backup --full
``` ```
Verifiser dump: Verifiser dump:

View file

@ -380,8 +380,7 @@ modell som brukes til hva.
- [x] 28.4 `synops-notify`: send varsel via epost (synops-mail), WebSocket-push, eller begge. Input: `--to <node_id> --message <tekst> [--channel email|ws|both]`. Brukes av orkestreringer og vaktmesteren. - [x] 28.4 `synops-notify`: send varsel via epost (synops-mail), WebSocket-push, eller begge. Input: `--to <node_id> --message <tekst> [--channel email|ws|both]`. Brukes av orkestreringer og vaktmesteren.
- [x] 28.5 `synops-validate`: sjekk at en node matcher forventet skjema for sin node_kind. Input: `--node-id <uuid>`. Output: liste av avvik. Brukes av valideringsfasen og som pre-commit sjekk. - [x] 28.5 `synops-validate`: sjekk at en node matcher forventet skjema for sin node_kind. Input: `--node-id <uuid>`. Output: liste av avvik. Brukes av valideringsfasen og som pre-commit sjekk.
- [~] 28.6 `synops-backup`: PG-dump + CAS-filiste + metadata-snapshot. Input: `[--full | --incremental]`. Output: backup-sti. Erstatter cron-scriptet fra 12.2. - [x] 28.6 `synops-backup`: PG-dump + CAS-filiste + metadata-snapshot. Input: `[--full | --incremental]`. Output: backup-sti. Erstatter cron-scriptet fra 12.2.
> Påbegynt: 2026-03-18T20:40
- [ ] 28.7 `synops-health`: sjekk status for alle tjenester (PG, Caddy, vaktmesteren, LiteLLM, Whisper, LiveKit). Output: JSON med status per tjeneste. Brukes av admin-dashboard og overvåking. - [ ] 28.7 `synops-health`: sjekk status for alle tjenester (PG, Caddy, vaktmesteren, LiteLLM, Whisper, LiveKit). Output: JSON med status per tjeneste. Brukes av admin-dashboard og overvåking.
## Fase 29: Universell input — alle modaliteter blir noder ## Fase 29: Universell input — alle modaliteter blir noder

View file

@ -26,6 +26,7 @@ eller maskinrommet-API. Ligger i PATH via symlink eller direkte kall.
| `synops-mail` | Send epost via msmtp (vaktmester@synops.no) | Ferdig (venter SMTP-credentials) | | `synops-mail` | Send epost via msmtp (vaktmester@synops.no) | Ferdig (venter SMTP-credentials) |
| `synops-notify` | Send varsel via epost, WebSocket-push, eller begge | Ferdig | | `synops-notify` | Send varsel via epost, WebSocket-push, eller begge | Ferdig |
| `synops-validate` | Valider at en node matcher forventet skjema for sin node_kind | Ferdig | | `synops-validate` | Valider at en node matcher forventet skjema for sin node_kind | Ferdig |
| `synops-backup` | PG-dump + CAS-filiste + metadata-snapshot (`--full` / `--incremental`) | Ferdig |
## Delt bibliotek ## Delt bibliotek

2447
tools/synops-backup/Cargo.lock generated Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,17 @@
[package]
name = "synops-backup"
version = "0.1.0"
edition = "2024"
[[bin]]
name = "synops-backup"
path = "src/main.rs"
[dependencies]
clap = { version = "4", features = ["derive", "env"] }
synops-common = { path = "../synops-common" }
chrono = { version = "0.4", features = ["serde"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"

View file

@ -0,0 +1,528 @@
// synops-backup — PG-dump + CAS-filiste + metadata-snapshot.
//
// Erstatter scripts/backup-pg.sh (cron-script fra oppgave 12.2).
// Gjør tre ting:
// 1. pg_dump -Fc via docker exec (konsistent snapshot)
// 2. Lister alle CAS-filer med størrelse
// 3. Skriver metadata-snapshot (JSON) med tidsstempel, modus, statistikk
//
// Bruk:
// synops-backup --full # Full backup
// synops-backup --incremental # Inkrementell (kun endringer siden forrige)
// synops-backup --payload-json '{"mode":"full"}' # Jobbkø-dispatch
//
// Output: backup-sti til stdout (JSON).
// Feil: stderr + exit code != 0.
//
// Ref: docs/infra/backup.md, docs/retninger/unix_filosofi.md
use chrono::Utc;
use clap::Parser;
use serde::Serialize;
use std::path::{Path, PathBuf};
/// Backup-modus.
#[derive(Debug, Clone, Copy, PartialEq)]
enum Mode {
Full,
Incremental,
}
/// PG-dump + CAS-filiste + metadata-snapshot.
#[derive(Parser)]
#[command(name = "synops-backup", about = "Backup: PG-dump, CAS-filiste og metadata-snapshot")]
struct Cli {
/// Full backup (PG-dump + CAS-liste + metadata)
#[arg(long, conflicts_with = "incremental")]
full: bool,
/// Inkrementell backup (kun CAS-filer nyere enn forrige backup)
#[arg(long, conflicts_with = "full")]
incremental: bool,
/// Backup-katalog (default: /srv/synops/backup)
#[arg(long, default_value = "/srv/synops/backup")]
backup_dir: String,
/// PG Docker-container (default: sidelinja-postgres-1)
#[arg(long, default_value = "sidelinja-postgres-1")]
pg_container: String,
/// Database-bruker (default: sidelinja)
#[arg(long, default_value = "sidelinja")]
db_user: String,
/// Databasenavn (default: sidelinja)
#[arg(long, default_value = "sidelinja")]
db_name: String,
/// Antall dager å beholde gamle backuper (default: 30)
#[arg(long, default_value = "30")]
retain_days: u32,
/// Payload fra jobbkø (JSON). Overstyrer andre argumenter.
#[arg(long)]
payload_json: Option<String>,
}
/// Resultat fra backup.
#[derive(Serialize)]
struct BackupResult {
ok: bool,
mode: String,
timestamp: String,
backup_dir: String,
#[serde(skip_serializing_if = "Option::is_none")]
pg_dump: Option<PgDumpResult>,
#[serde(skip_serializing_if = "Option::is_none")]
cas_manifest: Option<CasManifestResult>,
#[serde(skip_serializing_if = "Option::is_none")]
metadata_file: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
rotation: Option<RotationResult>,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
}
#[derive(Serialize)]
struct PgDumpResult {
path: String,
size_bytes: u64,
size_human: String,
}
#[derive(Serialize)]
struct CasManifestResult {
path: String,
file_count: u64,
total_size_bytes: u64,
total_size_human: String,
}
#[derive(Serialize)]
struct RotationResult {
deleted_count: u32,
retain_days: u32,
}
/// Metadata-snapshot som skrives til backup-katalogen.
#[derive(Serialize)]
struct MetadataSnapshot {
timestamp: String,
mode: String,
pg_dump_size_bytes: Option<u64>,
cas_file_count: Option<u64>,
cas_total_size_bytes: Option<u64>,
hostname: String,
db_name: String,
}
#[tokio::main]
async fn main() {
synops_common::logging::init("synops_backup");
let cli = Cli::parse();
// Resolve mode fra args eller payload-json
let (mode, backup_dir, pg_container, db_user, db_name, retain_days) =
if let Some(ref json_str) = cli.payload_json {
let payload: serde_json::Value = serde_json::from_str(json_str).unwrap_or_else(|e| {
eprintln!("Ugyldig --payload-json: {e}");
std::process::exit(1);
});
let mode = match payload["mode"].as_str().unwrap_or("full") {
"incremental" => Mode::Incremental,
_ => Mode::Full,
};
let backup_dir = payload["backup_dir"]
.as_str()
.unwrap_or("/srv/synops/backup")
.to_string();
let pg_container = payload["pg_container"]
.as_str()
.unwrap_or("sidelinja-postgres-1")
.to_string();
let db_user = payload["db_user"]
.as_str()
.unwrap_or("sidelinja")
.to_string();
let db_name = payload["db_name"]
.as_str()
.unwrap_or("sidelinja")
.to_string();
let retain_days = payload["retain_days"].as_u64().unwrap_or(30) as u32;
(mode, backup_dir, pg_container, db_user, db_name, retain_days)
} else {
let mode = if cli.incremental {
Mode::Incremental
} else {
Mode::Full // default
};
(
mode,
cli.backup_dir,
cli.pg_container,
cli.db_user,
cli.db_name,
cli.retain_days,
)
};
let timestamp = Utc::now().format("%Y%m%d_%H%M%S").to_string();
let timestamp_iso = Utc::now().to_rfc3339();
// Opprett backup-kataloger
let pg_dir = PathBuf::from(&backup_dir).join("pg");
let manifest_dir = PathBuf::from(&backup_dir).join("manifests");
for dir in [&pg_dir, &manifest_dir] {
if let Err(e) = tokio::fs::create_dir_all(dir).await {
let result = BackupResult {
ok: false,
mode: mode_str(mode).to_string(),
timestamp: timestamp_iso.clone(),
backup_dir: backup_dir.clone(),
pg_dump: None,
cas_manifest: None,
metadata_file: None,
rotation: None,
error: Some(format!("Kunne ikke opprette katalog {}: {e}", dir.display())),
};
println!("{}", serde_json::to_string_pretty(&result).unwrap());
std::process::exit(1);
}
}
let mut all_ok = true;
let mut pg_dump_result = None;
let mut cas_manifest_result = None;
let mut rotation_result = None;
let mut error_msg = None;
// 1. PG-dump
tracing::info!(mode = mode_str(mode), "Starter backup");
match run_pg_dump(&pg_container, &db_user, &db_name, &pg_dir, &timestamp).await {
Ok(r) => {
tracing::info!(path = %r.path, size = r.size_human, "PG-dump ferdig");
pg_dump_result = Some(r);
}
Err(e) => {
tracing::error!(error = %e, "PG-dump feilet");
all_ok = false;
error_msg = Some(e);
}
}
// 2. CAS-filiste
let cas_root = synops_common::cas::root();
match build_cas_manifest(&cas_root, &manifest_dir, &timestamp, mode).await {
Ok(r) => {
tracing::info!(
files = r.file_count,
size = r.total_size_human,
"CAS-manifest ferdig"
);
cas_manifest_result = Some(r);
}
Err(e) => {
tracing::error!(error = %e, "CAS-manifest feilet");
all_ok = false;
if let Some(ref mut existing) = error_msg {
existing.push_str(&format!("; CAS: {e}"));
} else {
error_msg = Some(e);
}
}
}
// 3. Metadata-snapshot
let hostname = std::env::var("HOSTNAME")
.or_else(|_| {
std::process::Command::new("hostname")
.output()
.map(|o| String::from_utf8_lossy(&o.stdout).trim().to_string())
})
.unwrap_or_else(|_| "unknown".to_string());
let metadata = MetadataSnapshot {
timestamp: timestamp_iso.clone(),
mode: mode_str(mode).to_string(),
pg_dump_size_bytes: pg_dump_result.as_ref().map(|r| r.size_bytes),
cas_file_count: cas_manifest_result.as_ref().map(|r| r.file_count),
cas_total_size_bytes: cas_manifest_result.as_ref().map(|r| r.total_size_bytes),
hostname,
db_name: db_name.clone(),
};
let metadata_path = manifest_dir.join(format!("backup_{timestamp}.json"));
let metadata_json = serde_json::to_string_pretty(&metadata).unwrap();
if let Err(e) = tokio::fs::write(&metadata_path, &metadata_json).await {
tracing::error!(error = %e, "Kunne ikke skrive metadata-snapshot");
all_ok = false;
}
// 4. Rotasjon (kun ved full backup)
if mode == Mode::Full && all_ok {
match rotate_old_backups(&pg_dir, retain_days).await {
Ok(r) => {
if r.deleted_count > 0 {
tracing::info!(
deleted = r.deleted_count,
retain_days = r.retain_days,
"Rotasjon ferdig"
);
}
rotation_result = Some(r);
}
Err(e) => {
tracing::warn!(error = %e, "Rotasjon feilet (backup er OK)");
// Rotasjonsfeil er ikke kritisk — backupen er allerede tatt
}
}
}
// Output
let result = BackupResult {
ok: all_ok,
mode: mode_str(mode).to_string(),
timestamp: timestamp_iso,
backup_dir,
pg_dump: pg_dump_result,
cas_manifest: cas_manifest_result,
metadata_file: Some(metadata_path.to_string_lossy().to_string()),
rotation: rotation_result,
error: error_msg,
};
println!("{}", serde_json::to_string_pretty(&result).unwrap());
if !all_ok {
std::process::exit(1);
}
}
fn mode_str(mode: Mode) -> &'static str {
match mode {
Mode::Full => "full",
Mode::Incremental => "incremental",
}
}
/// Kjør pg_dump via docker exec. Returnerer dump-filsti og størrelse.
async fn run_pg_dump(
container: &str,
db_user: &str,
db_name: &str,
pg_dir: &Path,
timestamp: &str,
) -> Result<PgDumpResult, String> {
// Sjekk at containeren kjører
let inspect = tokio::process::Command::new("docker")
.args(["inspect", container, "--format", "{{.State.Running}}"])
.output()
.await
.map_err(|e| format!("Kunne ikke kjøre docker inspect: {e}"))?;
let running = String::from_utf8_lossy(&inspect.stdout).trim().to_string();
if running != "true" {
return Err(format!("PostgreSQL-container {container} kjører ikke"));
}
let dump_file = pg_dir.join(format!("{db_name}_{timestamp}.dump"));
// pg_dump via docker exec, output til fil
let output = tokio::process::Command::new("docker")
.args([
"exec", container, "pg_dump", "-U", db_user, "-Fc", db_name,
])
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.output()
.await
.map_err(|e| format!("Kunne ikke kjøre pg_dump: {e}"))?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(format!("pg_dump feilet: {stderr}"));
}
// Skriv dump til fil
tokio::fs::write(&dump_file, &output.stdout)
.await
.map_err(|e| format!("Kunne ikke skrive dump-fil: {e}"))?;
let size_bytes = output.stdout.len() as u64;
if size_bytes < 100 {
// Slett tom/korrupt dump
let _ = tokio::fs::remove_file(&dump_file).await;
return Err(format!(
"Dump-filen er for liten ({size_bytes} bytes), noe gikk galt"
));
}
Ok(PgDumpResult {
path: dump_file.to_string_lossy().to_string(),
size_bytes,
size_human: human_size(size_bytes),
})
}
/// Bygg CAS-manifest: liste over alle filer med hash og størrelse.
///
/// Full modus: lister alle filer.
/// Inkrementell: lister kun filer nyere enn forrige manifest.
async fn build_cas_manifest(
cas_root: &str,
manifest_dir: &Path,
timestamp: &str,
mode: Mode,
) -> Result<CasManifestResult, String> {
let cas_path = Path::new(cas_root);
if !cas_path.exists() {
return Err(format!("CAS-katalog {cas_root} finnes ikke"));
}
// For inkrementell: finn tidsstempel for forrige manifest
let cutoff = if mode == Mode::Incremental {
find_last_manifest_time(manifest_dir).await
} else {
None
};
// Bruk find for å liste CAS-filer effektivt
let mut args = vec![cas_root.to_string(), "-type".to_string(), "f".to_string()];
if let Some(ref cutoff_time) = cutoff {
// Kun filer nyere enn cutoff
args.extend(["-newer".to_string(), cutoff_time.clone()]);
}
let output = tokio::process::Command::new("find")
.args(&args)
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.output()
.await
.map_err(|e| format!("Kunne ikke liste CAS-filer: {e}"))?;
let file_list = String::from_utf8_lossy(&output.stdout);
let mut file_count: u64 = 0;
let mut total_size: u64 = 0;
let mut manifest_lines = Vec::new();
for line in file_list.lines() {
let line = line.trim();
if line.is_empty() {
continue;
}
let path = Path::new(line);
let size = match tokio::fs::metadata(path).await {
Ok(m) => m.len(),
Err(_) => continue,
};
// Extraher hash fra filnavn (siste path-komponent)
let hash = path
.file_name()
.map(|n| n.to_string_lossy().to_string())
.unwrap_or_default();
manifest_lines.push(format!("{hash}\t{size}"));
file_count += 1;
total_size += size;
}
let manifest_name = if mode == Mode::Incremental {
format!("cas_incremental_{timestamp}.tsv")
} else {
format!("cas_full_{timestamp}.tsv")
};
let manifest_path = manifest_dir.join(&manifest_name);
// Skriv manifest (TSV: hash\tsize)
let mode_label = mode_str(mode);
let header = format!("# CAS manifest — {mode_label}{timestamp}\n# hash\tsize_bytes\n");
let content = format!("{}{}\n", header, manifest_lines.join("\n"));
tokio::fs::write(&manifest_path, &content)
.await
.map_err(|e| format!("Kunne ikke skrive CAS-manifest: {e}"))?;
Ok(CasManifestResult {
path: manifest_path.to_string_lossy().to_string(),
file_count,
total_size_bytes: total_size,
total_size_human: human_size(total_size),
})
}
/// Finn tidspunkt for forrige manifest (brukes som cutoff for inkrementell).
async fn find_last_manifest_time(manifest_dir: &Path) -> Option<String> {
let mut entries = match tokio::fs::read_dir(manifest_dir).await {
Ok(e) => e,
Err(_) => return None,
};
let mut latest_path: Option<PathBuf> = None;
let mut latest_modified = std::time::SystemTime::UNIX_EPOCH;
while let Ok(Some(entry)) = entries.next_entry().await {
let name = entry.file_name().to_string_lossy().to_string();
if name.starts_with("cas_") && name.ends_with(".tsv") {
if let Ok(meta) = entry.metadata().await {
if let Ok(modified) = meta.modified() {
if modified > latest_modified {
latest_modified = modified;
latest_path = Some(entry.path());
}
}
}
}
}
latest_path.map(|p| p.to_string_lossy().to_string())
}
/// Slett PG-dumper eldre enn retain_days.
async fn rotate_old_backups(pg_dir: &Path, retain_days: u32) -> Result<RotationResult, String> {
let cutoff = std::time::SystemTime::now()
- std::time::Duration::from_secs(retain_days as u64 * 86400);
let mut entries = tokio::fs::read_dir(pg_dir)
.await
.map_err(|e| format!("Kunne ikke lese backup-katalog: {e}"))?;
let mut deleted_count = 0u32;
while let Ok(Some(entry)) = entries.next_entry().await {
let name = entry.file_name().to_string_lossy().to_string();
if !name.ends_with(".dump") {
continue;
}
if let Ok(meta) = entry.metadata().await {
if let Ok(modified) = meta.modified() {
if modified < cutoff {
if tokio::fs::remove_file(entry.path()).await.is_ok() {
tracing::info!(file = %name, "Slettet gammel dump");
deleted_count += 1;
}
}
}
}
}
Ok(RotationResult {
deleted_count,
retain_days,
})
}
/// Formater bytes til lesbar størrelse.
fn human_size(bytes: u64) -> String {
const UNITS: &[&str] = &["B", "KiB", "MiB", "GiB", "TiB"];
let mut size = bytes as f64;
for unit in UNITS {
if size < 1024.0 {
return format!("{size:.1} {unit}");
}
size /= 1024.0;
}
format!("{size:.1} PiB")
}