OctopusNet is a modular neural network that learns without global backpropagation. Four independent modules process the same image at different resolutions, each trained locally with Hinton's Forward-Forward algorithm, and a central coordinator aggregates their outputs via attention. The result: 68.65% on CIFAR-10 with zero global gradients between modules — and a resilience floor of 67.03% when any single module fails.
The design is inspired by the octopus nervous system, where ~2/3 of neurons live in the arms and compute locally before sending signals to the brain. Each module here is an arm.
Undergraduate thesis: Erick, 2026.
Centralized networks are fragile. When any component fails, the system collapses.
| Model | Normal accuracy | Single module fails | Two modules fail | Degradation |
|---|---|---|---|---|
| CNN (backprop) | 90.96% | 10.00% (random chance) | — | −80.96 pts |
| OctopusNet (FF) | 52.50% | 41.72% | ~30% | −10.78 pts |
| OctopusNet + Channel Grouping (A18b) | 64.17% | 41.47% | 22.32% | −22.70 pts |
| OctopusNet + CG + Module Dropout (A6b) | 64.34% | 61.12% | 52.87% | −3.22 pts |
| OctopusNet + Stride Conv + ModDrop p=0.5 (A21) | 69.22% | 66.28% | 47.69% | −2.94 pts |
| OctopusNet + Stride Conv + ModDrop p=0.7 (A21b) | 68.65% | 67.03% | 56.03% | −1.62 pts |
FF standard had one catastrophic failure point — losing M1 dropped accuracy to 13.89%, near random chance. Channel grouping eliminates that. Stride conv + Module Dropout p=0.7 (A21b) goes further: every single-module failure stays above 67%, every double-module failure stays above 56%. The floor is structural, not lucky.
A21b improves A6b on all three metrics simultaneously: +4.31% accuracy, +5.91pts single-failure floor, +3.16pts double-failure floor. No tradeoff.
This matters for robotics, IoT, autonomous vehicles, and embedded systems where a sensor can fail at any time.
OctopusNet is a neural network that learns without global backprop. Instead of one big network trained end-to-end, it uses N independent processing modules (any differentiable architecture) that each learn locally using Hinton's Forward-Forward algorithm. A central coordinator aggregates their outputs via attention. Current implementation uses CNNs with heterogeneous kernel sizes.
Inspired loosely by the octopus nervous system, where ~2/3 of neurons live in the arms and process information locally before sending signals to the brain.
Key features: multiscale input (each module sees a different resolution), Fourier label overlay (labels encoded as frequency patterns instead of pixel patches), and two training modes: standard backprop coordinator or fully local SFF.
Each module learns to distinguish positive samples (image + correct label overlay) from negative samples (image + wrong label) using a local goodness score. No gradients flow between modules.
| Mode | Accuracy | Epochs | Notes |
|---|---|---|---|
| FF modules + backprop coordinator | 52.75% | 100 | Standard mode |
| FF modules + SFF local coordinator | 53.16% | 100 | 100% local learning |
| Simple ensemble average (SFF) | 53.59% | 100 | Best fully local result |
| Channel Grouping + coordinator (A18b) | 64.17% | 30 | Floor 41.47% |
| Channel Grouping + Module Dropout p=0.5 (A6b) | 64.34% | 30 | Floor 61.12% |
| CG + Stride Conv + ModDrop p=0.5 (A21) | 69.22% | 30 | Floor single 66.28%, floor doble 47.69% |
| CG + Stride Conv + ModDrop p=0.7 (A21b) | 68.65% | 30 | Best overall — floor single 67.03%, floor doble 56.03% |
Each module specializes in different classes:
| airplane | auto | bird | cat | deer | dog | frog | horse | ship | truck | |
|---|---|---|---|---|---|---|---|---|---|---|
| M1 | 54% | 48% | 47% | 37% | 52% | 58% | 60% | 55% | 51% | 44% |
| M2 | 52% | 65% | 46% | 38% | 54% | 51% | 54% | 55% | 64% | 56% |
| M3 | 53% | 55% | 50% | 41% | 57% | 55% | 61% | 58% | 55% | 50% |
| M4 | 53% | 58% | 47% | 39% | 53% | 53% | 57% | 55% | 60% | 62% |
| Mechanism | Accuracy | Tradeoff |
|---|---|---|
| Soft attention | 43.72% | Best for N=4 modules |
| Top-K (K=2) | 42.32% | Good for N>>4 |
| Gumbel-softmax | 39.33% | Hard selection, needs more modules |
| Top-K (K=1) | 38.09% | Too sparse for small N |
python train.py --channel_grouping --module_dropout 0.7 --stride_compress --epochs 3068.65% accuracy, single-failure floor 67.03%, double-failure floor 56.03%. Stride conv compression + Module Dropout p=0.7.
python train.py --channel_grouping --module_dropout 0.5 --epochs 3064.34% accuracy, single-failure floor 61.12%.
python train.py --dataset cifar10 --epochs 50python train.py --use_sff --dataset cifar10 --epochs 50In SFF mode, an AuxClassifier attaches to each module's feature map and a LogitCoordinator learns attention over their logits. No global backprop anywhere.
--dataset cifar10 | cifar100 | mnist (default: cifar10)
--epochs int (default: 30)
--batch_size int (default: 128)
--bottleneck int (default: 64)
--use_sff flag 100% local SFF mode
--no_channel_grouping flag disable CGCNNModule (on by default)
--no_stride_compress flag disable stride conv, use pool (A6b mode)
--module_dropout float module dropout prob (default: 0.7 = A21b)
--no_multiscale flag disable multiscale input
--seed int (default: 42)
--device cuda | cpu (auto-detected)
from config import OctopusNetConfig
from octopusnet import OctopusNet
from train import train
config = OctopusNetConfig(
dataset="cifar10",
epochs=50,
device="cuda"
)
model, history = train(config) # standard mode
model, history = train(config, use_sff=True) # 100% localUpload OctopusNet_Colab.ipynb to Colab and run cells. Includes all experiments, visualizations, and ablations.
| File | Description |
|---|---|
config.py |
Model hyperparameters |
modules.py |
CNN modules + ModuleDecoder |
nerve_ring.py |
Cross-attention lateral communication |
coordinator.py |
Coordinator + AuxClassifier + LogitCoordinator |
octopusnet.py |
Full model |
data.py |
Dataset loaders |
train.py |
Training loop (standard + SFF) |
experiments.py |
Ablation experiments |
OctopusNet_Colab.ipynb |
Interactive notebook |
| ID | What | Key Finding |
|---|---|---|
| A1 | Number of modules (2, 4, 8, 16) | 4 modules optimal |
| A2 | Bottleneck size (8–128) | 64 best accuracy/size tradeoff |
| A6 | Module resilience (FF) | Floor 41.72%, one catastrophic point at 13.89% |
| A7 | With/without feedback | Feedback adds ~0.5% |
| A8 | With/without nerve ring | Nerve ring adds ~1% |
| A9 | Homogeneous vs heterogeneous | Heterogeneous kernels help |
| A10 | GWT competition mechanism | Soft attention wins for N=4 |
| A15b | SFF local coordinator | 53.16%: best fully local mode |
| A18b | Channel grouping (Ortiz Torres) | 64.17%: eliminates catastrophic failures, floor 41.47% |
| A6b | Channel grouping + Module Dropout | 64.34%: floor jumps to 61.12% — +19.65 pts vs A18b, no accuracy cost |
| A17 | Iterative nerve ring (N rounds) | Rounds=1 optimal — more rounds homogenize representations, hurt accuracy |
| A19 | CGCNNModule + ResBlocks | 63.24%: FF doesn't scale in depth — pool destroys what ResBlocks build |
| A20 | Pool 4×4 → 6×6 | 61.49%: larger pool is worse — spatial pooling was not the bottleneck |
| A21 | Pool → stride conv (learned compression) | 69.22%: +4.88pts over A6b, but floor doble 47.69% (M1+M2 co-specialized) |
| A21b | Stride conv + Module Dropout p=0.7 | 68.65%: best overall — floor single 67.03%, floor doble 56.03%, no catastrophic pairs |
| A22 | goodness pre-pool vs post-pool | 62.96%: goodness location doesn't matter — not the bottleneck |
Forward-Forward
- Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations
- Krotov & Hopfield (2023). Training CNNs with the Forward-Forward Algorithm. arXiv:2312.14924
- Krutsylo (2025). Scalable Forward-Forward (SFF). arXiv:2501.03176: basis for SFF local mode
- Ortiz Torres et al. (2025). On Advancements of the Forward-Forward Algorithm. arXiv:2504.21662: 84.7% CIFAR-10, channel grouping technique
- ASGE (2025). Adaptive Spatial Goodness Encoding. arXiv:2509.12394
- SCFF (2025). Self-Contrastive Forward-Forward. Nature Communications: 98.70% MNIST, 80.75% CIFAR-10
- Codellaro et al. (2025). Training CNNs with Forward-Forward: Fourier spatial label encoding. Scientific Reports: basis for Fourier label overlay
Global Workspace & Coordination
- Goyal et al. (ICLR 2022). Coordination Among Neural Modules Through a Shared Global Workspace
- Baars, B. (1988). A Cognitive Theory of Consciousness: original GWT theory
Octopus Neuroscience
- Sumbre, G. et al.: Autonomous arm movements in octopus
- Gutnick, T. et al.: Information flow between brain and arms in octopus
- Hochner, B. (2012). An Embodied View of Octopus Neurobiology. Current Biology
If you're running experiments with this or building something on top of it, I'd genuinely like to know. Email me at erickarriolaaguillon123@gmail.com or open a Discussion.
If you use OctopusNet in your research:
@misc{octopusnet2026,
author = {Arriola Aguill\'{o}n, Erick},
title = {OctopusNet: Bio-inspired Distributed Neural Architecture},
year = {2026},
publisher = {GitHub},
url = {https://github.com/ErickUser1/OctopusNet}
}MIT
