CI/CD & Automation

Soteria utilizes a split-deployment architecture and automated pipelines to ensure that the intelligence engine and the security dashboard remain synchronized, tested, and high-performing. The automation strategy focuses on "Continuous Security"—ensuring that the ML model is consistently retrained on new data without manual intervention.

Automated Deployment Pipeline

Soteria’s infrastructure is divided into two distinct delivery paths managed via GitHub Actions:

1. Frontend: React & Vite (Vercel)

The Cyber Sentinel dashboard is automatically deployed to Vercel upon every push to the main branch.

Trigger: Merges into the production branch.
Environment: Node.js with Vite.
Automatic Previews: Pull requests generate unique preview URLs to test UI changes before they are merged.

2. Backend: Flask & ML Engine (Render via Docker)

The Intelligence Engine, which handles AST analysis and model inference, is containerized using Docker and hosted on Render.

Containerization: The backend/ directory is packaged into a Docker image to ensure Python 3.13+ dependency parity across environments.
Webhooks: Render listens for successful GitHub Actions passes to trigger a fresh image build and deployment.

GitHub Actions Workflow

The project uses GitHub Actions to maintain code quality and pipeline integrity. The primary workflow performs the following steps:

Linting & Formatting: Checks Python and TypeScript code against project standards.
Security Audit: Scans for hardcoded secrets and known vulnerabilities in dependencies.
AST Validation: Runs a suite of tests on dataPipeline_AST.py to ensure the AST.NodeTransformer is correctly anonymizing code snippets without losing structural integrity.
Model Smoke Test: Verifies that the serialized acidModel.pkl can still be loaded and return predictions.

# Example Workflow Logic
name: Soteria CI/CD
on: [push, pull_request]

jobs:
  test-backend:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.13"
      - name: Run AST Logic Tests
        run: pytest backend/tests/

Intelligence Automation

A unique feature of Soteria is its automated data-to-model pipeline. This ensures the classifier stays ahead of new obfuscation techniques.

Automated Model Retraining

The watch_data.py background process monitors the /backend/data directory. When new samples (clean or corrupted) are added:

Normalization: The codeNormalizer automatically strips variable names and constants.
Deduplication: A SHA-256 hash is generated for the structural DNA. If the fingerprint already exists in the dataset, the sample is discarded to prevent training bias.
Retraining: The system triggers the trainerModel_Hybrid.py script to update the ensemble model with the new structural patterns.

Dataset Purity Automation

Soteria automates the ingestion of external data from sources like Hugging Face. The pipeline includes an automatic "Markdown Stripper" and "Syntax Validator" to ensure that only valid, parseable Python code enters the training matrix.

# Automatic ingestion and cleaning logic in dataPipeline_AST.py
if raw_code.strip().startswith("```"):
    # Automated removal of markdown formatting before AST parsing
    lines = raw_code.strip().splitlines()[1:-1]
    raw_code = "\n".join(lines)

Environment Configuration

To configure the automation for your own fork, ensure the following environment variables are set in your CI/CD provider: