🐙 OctopusNet

OctopusNet is a modular neural network that learns without global backpropagation. Four independent modules process the same image at different resolutions, each trained locally with Hinton's Forward-Forward algorithm, and a central coordinator aggregates their outputs via attention. The result: 68.65% on CIFAR-10 with zero global gradients between modules — and a resilience floor of 67.03% when any single module fails.

The design is inspired by the octopus nervous system, where ~2/3 of neurons live in the arms and compute locally before sending signals to the brain. Each module here is an arm.

Undergraduate thesis: Erick, 2026.

Why OctopusNet?

Centralized networks are fragile. When any component fails, the system collapses.

Model	Normal accuracy	Single module fails	Two modules fail	Degradation
CNN (backprop)	90.96%	10.00% (random chance)	—	−80.96 pts
OctopusNet (FF)	52.50%	41.72%	~30%	−10.78 pts
OctopusNet + Channel Grouping (A18b)	64.17%	41.47%	22.32%	−22.70 pts
OctopusNet + CG + Module Dropout (A6b)	64.34%	61.12%	52.87%	−3.22 pts
OctopusNet + Stride Conv + ModDrop p=0.5 (A21)	69.22%	66.28%	47.69%	−2.94 pts
OctopusNet + Stride Conv + ModDrop p=0.7 (A21b)	68.65%	67.03%	56.03%	−1.62 pts

FF standard had one catastrophic failure point — losing M1 dropped accuracy to 13.89%, near random chance. Channel grouping eliminates that. Stride conv + Module Dropout p=0.7 (A21b) goes further: every single-module failure stays above 67%, every double-module failure stays above 56%. The floor is structural, not lucky.

A21b improves A6b on all three metrics simultaneously: +4.31% accuracy, +5.91pts single-failure floor, +3.16pts double-failure floor. No tradeoff.

This matters for robotics, IoT, autonomous vehicles, and embedded systems where a sensor can fail at any time.

What is this?

OctopusNet is a neural network that learns without global backprop. Instead of one big network trained end-to-end, it uses N independent processing modules (any differentiable architecture) that each learn locally using Hinton's Forward-Forward algorithm. A central coordinator aggregates their outputs via attention. Current implementation uses CNNs with heterogeneous kernel sizes.

Inspired loosely by the octopus nervous system, where ~2/3 of neurons live in the arms and process information locally before sending signals to the brain.

Key features: multiscale input (each module sees a different resolution), Fourier label overlay (labels encoded as frequency patterns instead of pixel patches), and two training modes: standard backprop coordinator or fully local SFF.

Architecture

Each module learns to distinguish positive samples (image + correct label overlay) from negative samples (image + wrong label) using a local goodness score. No gradients flow between modules.

Results (CIFAR-10)

Mode	Accuracy	Epochs	Notes
FF modules + backprop coordinator	52.75%	100	Standard mode
FF modules + SFF local coordinator	53.16%	100	100% local learning
Simple ensemble average (SFF)	53.59%	100	Best fully local result
Channel Grouping + coordinator (A18b)	64.17%	30	Floor 41.47%
Channel Grouping + Module Dropout p=0.5 (A6b)	64.34%	30	Floor 61.12%
CG + Stride Conv + ModDrop p=0.5 (A21)	69.22%	30	Floor single 66.28%, floor doble 47.69%
CG + Stride Conv + ModDrop p=0.7 (A21b)	68.65%	30	Best overall — floor single 67.03%, floor doble 56.03%

Module specialization (A15b)

Each module specializes in different classes:

	airplane	auto	bird	cat	deer	dog	frog	horse	ship	truck
M1	54%	48%	47%	37%	52%	58%	60%	55%	51%	44%
M2	52%	65%	46%	38%	54%	51%	54%	55%	64%	56%
M3	53%	55%	50%	41%	57%	55%	61%	58%	55%	50%
M4	53%	58%	47%	39%	53%	53%	57%	55%	60%	62%

GWT Competition (A10)

Mechanism	Accuracy	Tradeoff
Soft attention	43.72%	Best for N=4 modules
Top-K (K=2)	42.32%	Good for N>>4
Gumbel-softmax	39.33%	Hard selection, needs more modules
Top-K (K=1)	38.09%	Too sparse for small N

Training Modes

A21b mode: best overall (recommended)

python train.py --channel_grouping --module_dropout 0.7 --stride_compress --epochs 30

68.65% accuracy, single-failure floor 67.03%, double-failure floor 56.03%. Stride conv compression + Module Dropout p=0.7.

A6b mode: best accuracy/resilience balance (no stride conv)

python train.py --channel_grouping --module_dropout 0.5 --epochs 30

64.34% accuracy, single-failure floor 61.12%.

Standard mode (FF + backprop coordinator)

python train.py --dataset cifar10 --epochs 50

SFF mode: 100% local learning

python train.py --use_sff --dataset cifar10 --epochs 50

In SFF mode, an AuxClassifier attaches to each module's feature map and a LogitCoordinator learns attention over their logits. No global backprop anywhere.

Options

--dataset              cifar10 | cifar100 | mnist  (default: cifar10)
--epochs               int                          (default: 30)
--batch_size           int                          (default: 128)
--bottleneck           int                          (default: 64)
--use_sff              flag                         100% local SFF mode
--no_channel_grouping  flag                         disable CGCNNModule (on by default)
--no_stride_compress   flag                         disable stride conv, use pool (A6b mode)
--module_dropout       float                        module dropout prob (default: 0.7 = A21b)
--no_multiscale        flag                         disable multiscale input
--seed                 int                          (default: 42)
--device               cuda | cpu                   (auto-detected)

Quick Start

from config import OctopusNetConfig
from octopusnet import OctopusNet
from train import train

config = OctopusNetConfig(
    dataset="cifar10",
    epochs=50,
    device="cuda"
)

model, history = train(config)           # standard mode
model, history = train(config, use_sff=True)  # 100% local

Google Colab

Upload OctopusNet_Colab.ipynb to Colab and run cells. Includes all experiments, visualizations, and ablations.

File Structure

File	Description
`config.py`	Model hyperparameters
`modules.py`	CNN modules + ModuleDecoder
`nerve_ring.py`	Cross-attention lateral communication
`coordinator.py`	Coordinator + AuxClassifier + LogitCoordinator
`octopusnet.py`	Full model
`data.py`	Dataset loaders
`train.py`	Training loop (standard + SFF)
`experiments.py`	Ablation experiments
`OctopusNet_Colab.ipynb`	Interactive notebook

Ablations

ID	What	Key Finding
A1	Number of modules (2, 4, 8, 16)	4 modules optimal
A2	Bottleneck size (8–128)	64 best accuracy/size tradeoff
A6	Module resilience (FF)	Floor 41.72%, one catastrophic point at 13.89%
A7	With/without feedback	Feedback adds ~0.5%
A8	With/without nerve ring	Nerve ring adds ~1%
A9	Homogeneous vs heterogeneous	Heterogeneous kernels help
A10	GWT competition mechanism	Soft attention wins for N=4
A15b	SFF local coordinator	53.16%: best fully local mode
A18b	Channel grouping (Ortiz Torres)	64.17%: eliminates catastrophic failures, floor 41.47%
A6b	Channel grouping + Module Dropout	64.34%: floor jumps to 61.12% — +19.65 pts vs A18b, no accuracy cost
A17	Iterative nerve ring (N rounds)	Rounds=1 optimal — more rounds homogenize representations, hurt accuracy
A19	CGCNNModule + ResBlocks	63.24%: FF doesn't scale in depth — pool destroys what ResBlocks build
A20	Pool 4×4 → 6×6	61.49%: larger pool is worse — spatial pooling was not the bottleneck
A21	Pool → stride conv (learned compression)	69.22%: +4.88pts over A6b, but floor doble 47.69% (M1+M2 co-specialized)
A21b	Stride conv + Module Dropout p=0.7	68.65%: best overall — floor single 67.03%, floor doble 56.03%, no catastrophic pairs
A22	goodness pre-pool vs post-pool	62.96%: goodness location doesn't matter — not the bottleneck

References

Forward-Forward

Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations
Krotov & Hopfield (2023). Training CNNs with the Forward-Forward Algorithm. arXiv:2312.14924
Krutsylo (2025). Scalable Forward-Forward (SFF). arXiv:2501.03176: basis for SFF local mode
Ortiz Torres et al. (2025). On Advancements of the Forward-Forward Algorithm. arXiv:2504.21662: 84.7% CIFAR-10, channel grouping technique
ASGE (2025). Adaptive Spatial Goodness Encoding. arXiv:2509.12394
SCFF (2025). Self-Contrastive Forward-Forward. Nature Communications: 98.70% MNIST, 80.75% CIFAR-10
Codellaro et al. (2025). Training CNNs with Forward-Forward: Fourier spatial label encoding. Scientific Reports: basis for Fourier label overlay

Global Workspace & Coordination

Goyal et al. (ICLR 2022). Coordination Among Neural Modules Through a Shared Global Workspace
Baars, B. (1988). A Cognitive Theory of Consciousness: original GWT theory

Octopus Neuroscience

Sumbre, G. et al.: Autonomous arm movements in octopus
Gutnick, T. et al.: Information flow between brain and arms in octopus
Hochner, B. (2012). An Embodied View of Octopus Neurobiology. Current Biology

Get in touch

If you're running experiments with this or building something on top of it, I'd genuinely like to know. Email me at erickarriolaaguillon123@gmail.com or open a Discussion.

Cite

If you use OctopusNet in your research:

@misc{octopusnet2026,
  author    = {Arriola Aguill\'{o}n, Erick},
  title     = {OctopusNet: Bio-inspired Distributed Neural Architecture},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ErickUser1/OctopusNet}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
OctopusNet_Colab.ipynb		OctopusNet_Colab.ipynb
README.md		README.md
architecture.png		architecture.png
config.py		config.py
coordinator.py		coordinator.py
data.py		data.py
diagnostic.py		diagnostic.py
experiments.py		experiments.py
modules.py		modules.py
nerve_ring.py		nerve_ring.py
octopusnet.py		octopusnet.py
paper.txt		paper.txt
plot_architecture.py		plot_architecture.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐙 OctopusNet

Why OctopusNet?

What is this?

Architecture

Results (CIFAR-10)

Module specialization (A15b)

GWT Competition (A10)

Training Modes

A21b mode: best overall (recommended)

A6b mode: best accuracy/resilience balance (no stride conv)

Standard mode (FF + backprop coordinator)

SFF mode: 100% local learning

Options

Quick Start

Google Colab

File Structure

Ablations

References

Get in touch

Cite

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐙 OctopusNet

Why OctopusNet?

What is this?

Architecture

Results (CIFAR-10)

Module specialization (A15b)

GWT Competition (A10)

Training Modes

A21b mode: best overall (recommended)

A6b mode: best accuracy/resilience balance (no stride conv)

Standard mode (FF + backprop coordinator)

SFF mode: 100% local learning

Options

Quick Start

Google Colab

File Structure

Ablations

References

Get in touch

Cite

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages