CI/CD & Automation
Soteria utilizes a split-deployment architecture and automated pipelines to ensure that the intelligence engine and the security dashboard remain synchronized, tested, and high-performing. The automation strategy focuses on "Continuous Security"—ensuring that the ML model is consistently retrained on new data without manual intervention.
Automated Deployment Pipeline
Soteria’s infrastructure is divided into two distinct delivery paths managed via GitHub Actions:
1. Frontend: React & Vite (Vercel)
The Cyber Sentinel dashboard is automatically deployed to Vercel upon every push to the main branch.
- Trigger: Merges into the production branch.
- Environment: Node.js with Vite.
- Automatic Previews: Pull requests generate unique preview URLs to test UI changes before they are merged.
2. Backend: Flask & ML Engine (Render via Docker)
The Intelligence Engine, which handles AST analysis and model inference, is containerized using Docker and hosted on Render.
- Containerization: The
backend/directory is packaged into a Docker image to ensure Python 3.13+ dependency parity across environments. - Webhooks: Render listens for successful GitHub Actions passes to trigger a fresh image build and deployment.
GitHub Actions Workflow
The project uses GitHub Actions to maintain code quality and pipeline integrity. The primary workflow performs the following steps:
- Linting & Formatting: Checks Python and TypeScript code against project standards.
- Security Audit: Scans for hardcoded secrets and known vulnerabilities in dependencies.
- AST Validation: Runs a suite of tests on
dataPipeline_AST.pyto ensure theAST.NodeTransformeris correctly anonymizing code snippets without losing structural integrity. - Model Smoke Test: Verifies that the serialized
acidModel.pklcan still be loaded and return predictions.
# Example Workflow Logic
name: Soteria CI/CD
on: [push, pull_request]
jobs:
test-backend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.13"
- name: Run AST Logic Tests
run: pytest backend/tests/
Intelligence Automation
A unique feature of Soteria is its automated data-to-model pipeline. This ensures the classifier stays ahead of new obfuscation techniques.
Automated Model Retraining
The watch_data.py background process monitors the /backend/data directory. When new samples (clean or corrupted) are added:
- Normalization: The
codeNormalizerautomatically strips variable names and constants. - Deduplication: A SHA-256 hash is generated for the structural DNA. If the fingerprint already exists in the dataset, the sample is discarded to prevent training bias.
- Retraining: The system triggers the
trainerModel_Hybrid.pyscript to update the ensemble model with the new structural patterns.
Dataset Purity Automation
Soteria automates the ingestion of external data from sources like Hugging Face. The pipeline includes an automatic "Markdown Stripper" and "Syntax Validator" to ensure that only valid, parseable Python code enters the training matrix.
# Automatic ingestion and cleaning logic in dataPipeline_AST.py
if raw_code.strip().startswith("```"):
# Automated removal of markdown formatting before AST parsing
lines = raw_code.strip().splitlines()[1:-1]
raw_code = "\n".join(lines)
Environment Configuration
To configure the automation for your own fork, ensure the following environment variables are set in your CI/CD provider:
| Variable | Description |
| :--- | :--- |
| VITE_API_URL | The URL of your deployed Render backend. |
| SECRET_KEY | Used by Flask for JWT signing and session security. |
| RENDER_DEPLOY_HOOK | The webhook URL provided by Render to trigger builds. |
| DATABASE_URL | SQLite path or PostgreSQL connection string for user persistence. |