Docker & Containerization
To ensure consistent performance of the machine learning models and the Intelligence Engine across different environments, Soteria utilizes Docker for backend containerization. This approach encapsulates the Python 3.13+ runtime, complex ML dependencies (PyTorch, Scikit-Learn), and the Flask middleware into a single, deployable unit.
Intelligence Engine Containerization
The backend service, which handles AST analysis and vulnerability scoring, is designed to run as a containerized application. This is particularly important given the specific versions of data science libraries required for the Hybrid Stacking Model.
Prerequisites
- Docker Desktop (or Docker Engine on Linux)
- Backend Environment Variables: A
.envfile containing your production secrets.
Local Build and Execution
To replicate the production environment locally or prepare for a manual deployment, follow these steps:
-
Create a Dockerfile (if not already present) in the root or
/backenddirectory:FROM python:3.13-slim # Set working directory WORKDIR /app # Install system dependencies for ML libraries RUN apt-get update && apt-get install -y \ build-essential \ && rm -rf /var/lib/apt/lists/* # Copy requirements first to leverage Docker cache COPY backend/requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy source code COPY . . # Expose the Flask port EXPOSE 5001 # Start the Intelligence Engine CMD ["python", "middleware/app.py"] -
Build the Image:
docker build -t soteria-backend . -
Run the Container:
docker run -p 5001:5001 \ -e SECRET_KEY="your_secure_key" \ -e SQLALCHEMY_DATABASE_URI="sqlite:///kyber.db" \ soteria-backend
Environment Configuration
The containerized Intelligence Engine relies on specific environment variables to manage security and database connections. Ensure these are injected during the Docker run command or via your CI/CD pipeline:
| Variable | Description | Default/Example |
| :--- | :--- | :--- |
| SECRET_KEY | Used for signing JWT tokens. | kyber-dev-secret-key |
| SQLALCHEMY_DATABASE_URI | Path to the SQLite database. | sqlite:///kyber.db |
| PORT | The port the Flask app listens on. | 5001 |
Deployment to Render
Soteria is pre-configured for deployment on Render using the "Web Service" type with Docker.
- Link Repository: Connect your GitHub repository to Render.
- Select Environment: Choose Docker as the runtime.
- Specify Context: Ensure the Docker build context is set to the root of the project to allow the container to access both
/middlewareand/backenddirectories. - Model Persistence: Since the project uses serialized models (
acidModel.pkl,acidModel_neural.pt), ensure these files are tracked in Git or downloaded into the container during the build process to enable immediate scanning capabilities.
Data Persistence & Volumes
The Intelligence Engine uses a local SQLite database (kyber.db) for user authentication and scan history.
- For Development: Standard container storage is sufficient.
- For Production: If you require persistent user data across container restarts on platforms like Render or AWS, you must mount a Persistent Disk/Volume to the directory containing
kyber.db. Alternatively, updateSQLALCHEMY_DATABASE_URIto point to a managed PostgreSQL instance.
Automated Retraining (Watchdog)
The watch_data.py process can be run as a sidecar container or a background process within the main container. It monitors the /backend/data directory for new training samples. When new malicious patterns are identified, the containerized pipeline automatically triggers trainerModel_Hybrid.py to update the "Structural DNA" matrix and retrain the classifier.