Designing BackupScout: Scan Your Server for Critical Data (Part 1)

Hello, I’m Shrijith Venkatramana. I’m building LiveReview, a private AI code review tool that runs on your LLM key (OpenAI, Gemini, etc.) with highly competitive pricing — built for small teams. Do check it out and give it a try!

Servers run dozens—o…


This content originally appeared on DEV Community and was authored by Shrijith Venkatramana

Hello, I'm Shrijith Venkatramana. I’m building LiveReview, a private AI code review tool that runs on your LLM key (OpenAI, Gemini, etc.) with highly competitive pricing -- built for small teams. Do check it out and give it a try!

Servers run dozens—or hundreds—of processes at any given time. Some processes are critical for your applications or data, others are ephemeral system threads. BackupScout is a tool designed to automatically identify the processes that matter and classify them by backup relevance.

By the end of this post, you’ll understand how BackupScout:

  • Enumerates processes
  • Classifies them into categories
  • Flags them as High, Medium, or Low relevance
  • Produces a JSON file ready for review or further automation

The heavy lifting is powered by AI Studio’s Gemini API, which classifies processes based on minimal metadata like name, binary path, and parent process. No manual rules needed.

What BackupScout Produces

Here’s an example of the JSON output:

[
  {
    "pid": 330,
    "name": "mysqld",
    "category": "Database",
    "backup_relevance": "High"
  },
  {
    "pid": 451,
    "name": "nginx",
    "category": "Web Server",
    "backup_relevance": "Medium"
  },
  {
    "pid": 22,
    "name": "kworker/1",
    "category": "Kernel Thread",
    "backup_relevance": "Low"
  }
]

This gives immediate insight into what processes are worth backing up or monitoring.

Step 1: Enumerating Processes

We use Python’s psutil library for process enumeration. It provides cross-platform access to process metadata.

import psutil

processes = []
for proc in psutil.process_iter(['pid', 'name', 'exe', 'ppid']):
    try:
        processes.append(proc.info)
    except (psutil.NoSuchProcess, psutil.AccessDenied):
        continue

print(processes[:5])  # Example: preview first 5 processes

Sample Output:

{'pid': 1, 'name': 'systemd', 'exe': '/usr/lib/systemd/systemd', 'ppid': 0}
{'pid': 330, 'name': 'mysqld', 'exe': '/usr/sbin/mysqld', 'ppid': 1}
{'pid': 451, 'name': 'nginx', 'exe': '/usr/sbin/nginx', 'ppid': 1}

This metadata is enough for AI classification without exposing full process environments.

Step 2: Classifying Processes with AI

BackupScout uses Gemini via AI Studio for classification. The AI model assigns:

  • Category (Database, Web App, Web Server, Caching, System)
  • Backup relevance (High, Medium, Low)

For example, mysqld → Database → High, nginx → Web Server → Medium, kworker/1 → Kernel Thread → Low.

The AI handles these assignments automatically, so you don’t need to maintain a ruleset.

Step 3: Batching and Incremental Processing

For servers with hundreds of processes, sending them all at once can exceed the AI input limit. BackupScout handles this by:

  • Processing in batches (e.g., 10–20 processes per request)
  • Saving results incrementally to disk
  • Retrying failed batches automatically

Example of incremental saving:

import json
import os

results_file = "process_classification.json"
all_results = json.load(open(results_file)) if os.path.exists(results_file) else []

# After processing a batch
all_results.extend(batch_results)
with open(results_file, "w") as f:
    json.dump(all_results, f, indent=2)

This ensures partial results are never lost during long scans or network failures.

Step 4: Reviewing Results

Once the JSON is ready, you can filter High-relevance processes using jq:

jq '[.[] | select(.backup_relevance=="High")]' process_classification.json

Or get a quick PID + name + category list:

jq -r '.[] | select(.backup_relevance=="High") | "\(.pid)\t\(.name)\t\(.category)"' process_classification.json

Sample Output:

330 mysqld  Database
451 wordpress   Web App

This makes it easy to identify critical processes at a glance.

Step 5: Handling Large Servers

BackupScout is designed to work on servers with hundreds of processes. Key strategies:

  • Batch AI requests to avoid hitting token limits
  • Incremental saving to maintain progress
  • Retries to handle network or API errors

These make the tool robust and reliable, even in real-world server environments.

Step 6: Practical Use Cases

BackupScout helps in:

  • Prioritizing backups for databases and web apps
  • Preparing server snapshots for disaster recovery
  • Identifying high-impact processes for monitoring

Because AI classification is already done, your workflow is mostly orchestration and review—you don’t need to maintain manual rules.

Where We Go Next

In the next part of the series, we’ll combine everything into a full working BackupScout script:

  • Enumerating processes
  • Batching AI calls
  • Incremental saving
  • Automatic retries

The goal: a ready-to-run tool for scanning any server and discovering critical data automatically.

Reference: psutil Python Documentation


This content originally appeared on DEV Community and was authored by Shrijith Venkatramana


Print Share Comment Cite Upload Translate Updates
APA

Shrijith Venkatramana | Sciencx (2025-09-08T18:47:46+00:00) Designing BackupScout: Scan Your Server for Critical Data (Part 1). Retrieved from https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/

MLA
" » Designing BackupScout: Scan Your Server for Critical Data (Part 1)." Shrijith Venkatramana | Sciencx - Monday September 8, 2025, https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/
HARVARD
Shrijith Venkatramana | Sciencx Monday September 8, 2025 » Designing BackupScout: Scan Your Server for Critical Data (Part 1)., viewed ,<https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/>
VANCOUVER
Shrijith Venkatramana | Sciencx - » Designing BackupScout: Scan Your Server for Critical Data (Part 1). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/
CHICAGO
" » Designing BackupScout: Scan Your Server for Critical Data (Part 1)." Shrijith Venkatramana | Sciencx - Accessed . https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/
IEEE
" » Designing BackupScout: Scan Your Server for Critical Data (Part 1)." Shrijith Venkatramana | Sciencx [Online]. Available: https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/. [Accessed: ]
rf:citation
» Designing BackupScout: Scan Your Server for Critical Data (Part 1) | Shrijith Venkatramana | Sciencx | https://www.scien.cx/2025/09/08/designing-backupscout-scan-your-server-for-critical-data-part-1/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.