Updates to increase the flexibility of the influence calculator by alexanderbates · Pull Request #4 · DrugowitschLab/ConnectomeInfluenceCalculator

alexanderbates · 2026-04-27T04:00:15Z

These updates attempt to solve a suite of medium-level issues. This will help improve the repo's usability, I feel!

Enabling the calculator to take more diverse input, not just .sqlite files (inc. csv, feather, parquet, pandas data frames)
Providing a bundled function to "adjust" influence score with a constant and by taking the log, as we describe in our paper
Surfacing lambda as a value the user can alter, which turns out to be necessary to analyse e.g. the c elegans connectome
not hard-coding NT assignment, we should leave this up to the user
Correlated documentation for these changes, inc. in the README and inc. images that are helpful
Package with C. elegans data to users can quickly get started running examples with real connectome data

Here is the detail on these changes:

ConnectomeInfluenceCalculator — Update Notes

A summary of the conceptual changes made on the working tree (no commits yet). Each entry gives the change, the reason for it, and the files touched.

1. DataFrame / CSV / Parquet / Feather / NumPy constructors

Change. The library now accepts pandas DataFrames and several common on-disk formats in addition to the original SQLite path:

InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)
InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)
InfluenceCalculator.from_parquet(...), from_feather(...)
InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)

The old InfluenceCalculator(filename, ...) SQLite path still works.

Why. The original API forced every caller to package their connectome into a SQLite file with a specific schema, which is awkward for ad-hoc exploration, for users coming from R / pandas pipelines, and for the worked example in this repo. The DataFrame constructor accepts the same columns as the SQLite schema (pre, post, count, optional norm) plus a metadata frame with root_id and (when relevant) top_nt.

Files. InfluenceCalculator/InfluenceCalculator.py — added from_* classmethods plus two module-level helpers, _validate_meta and _validate_and_prepare_edgelist, that enforce the column requirements with descriptive error messages.

2. Bundled C. elegans dataset

Change. A small C. elegans connectome (300 neurons, 3,539 chemical edges, 20,672 synapses) ships with the package and is exposed as:

from InfluenceCalculator.data import celegans_edgelist, celegans_meta
edges = celegans_edgelist()   # pre, post, count, norm
meta  = celegans_meta()       # root_id, top_nt, super_class, neuron_class, body_part

Why. Tests and examples need a real connectome they can build a calculator from without external downloads. The previous toy tests/toy_network_example.sqlite was opaque and not documented; the C. elegans graph is small, public, and well-annotated, which lets the worked example double as a tutorial against a famous connectome.

Files. InfluenceCalculator/data/__init__.py (new — uses importlib.resources.files() so it works whether installed or run in-tree); InfluenceCalculator/data/celegans_edgelist.csv and InfluenceCalculator/data/celegans_meta.csv (new); pyproject.toml gains a [tool.setuptools.package-data] entry so the CSVs are included in the wheel.

Provenance. The data was taken from the
OpenWorm project distribution of the
C. elegans hermaphrodite chemical connectome (accessed
February 2026), which aggregates the original electron-microscopy
reconstructions of White et al. 1986 and Cook et al. 2019. The
README "Data source" section carries the full citation block;
downstream users redistributing the bundled CSVs should cite both
primary sources and the OpenWorm aggregation.

3. Module-level `adjust_influence`

Change. A new function adjust_influence(df, const=24, signif=6) is exported alongside InfluenceCalculator. It takes the DataFrame returned by calculate_influence, groups by (target, seed), sums within each group, and returns three columns:

adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)
adjusted_influence_norm_by_targets
adjusted_influence_norm_by_sources_and_targets

Why. Raw influence scores span many orders of magnitude — the strongest direct paths can be ten billion times larger than the weakest distal trickle — so a linear-scaled heatmap shows the top of the distribution and nothing else. adjust_influence does three things to make the output legible:

log(...) compresses the dynamic range so weak and strong paths are visible side by side.
max(|x|, exp(-const)) is a junk-node floor: anything weaker than exp(-const) is treated as "essentially zero", which keeps log(0) from producing -inf and stops the colormap getting hijacked by numerical noise.
+ const then shifts everything so the smallest meaningful score sits at exactly 0 and a colormap can be anchored there without losing sign information.

Mirrors adjust_influence in the R sibling package natverse/influencer.

Files. InfluenceCalculator/InfluenceCalculator.py (function defined at module scope); InfluenceCalculator/__init__.py (exported in __all__); tests/test_influence_calculator.py (test_adjust_influence_basics, test_adjust_influence_threshold_floor, test_adjust_influence_preserves_sign).

4. Sign-preserving signed mode

Change. When signed=True is set, the influence DataFrame now returns the real part of the steady-state vector rather than its magnitude — so a target dominated by inhibitory paths gets a negative Influence_score_(signed). adjust_influence propagates the sign through the log transform.

Why. Without sign preservation, "signed" mode silently degenerated to unsigned: every output was a non-negative magnitude, and the only remaining effect of GABA-pre-neuron negation was to reduce magnitudes where positive and negative paths cancelled. That cancellation is real but undetectable from the output, which makes the signed flag indistinguishable from a perturbed unsigned run.

While auditing this we also found a pre-existing bug in _create_sparse_W. To make signed=True produce negative weights for inhibitory pre-neurons, the original code multiplied the relevant rows of the edgelist's count column by -1. But the matrix W is populated from the norm column (the fraction count / sum_per_post), not from count. Flipping the sign of count left norm positive, so every entry of W ended up positive regardless of how signed= was set — the signed setting silently built the same matrix as the unsigned setting. The fix multiplies whichever column is actually used to build W (held in the syn_weight_measure variable, which defaults to norm), so the negation now reaches the matrix.

Files. InfluenceCalculator/InfluenceCalculator.py — _build_influence_dataframe now branches on self.W_signed; _create_sparse_W negates syn_weight_measure instead of 'count'.

5. Externalised neurotransmitter assignment (`inhibitory_nts`, `excluded_nts`)

Change. The old hardcoded NEG_NEUROTRANSMITTERS = {'glutamate', 'gaba', 'serotonin', 'octopamine'} constant has been removed. Two keyword arguments now pass through the constructor and every classmethod:

inhibitory_nts={'gaba', ...} — top_nt values whose pre-neurons receive a negative sign in signed=True mode.
excluded_nts={'dopamine', 'serotonin', ...} — top_nt values whose pre-neurons contribute zero outgoing weight (their columns of W are empty), in either signed or unsigned mode.

Why.

Species-dependence. Glutamate is net excitatory in mammals and most Drosophila circuits but is dual-action in C. elegans (GluCl chloride channels make it inhibitory at e.g. AWC→AIY). Hardcoding the sign in the library forces every user into the same biological assumption.
Receptor-mix uncertainty. Modulators like dopamine, serotonin, and octopamine can be excitatory or inhibitory at a given target depending on the receptor mix. excluded_nts lets users silence pre-neurons whose net effect cannot be assigned a single sign, rather than forcing a wrong one.
Library hygiene. Per a direct user instruction: "the library should not pre-empt how the user wishes to assign transmitters." Sets are still demonstrated in the README and worked example as defaults users can copy-paste.

Files. InfluenceCalculator/InfluenceCalculator.py (constants removed; new kwargs threaded through __init__ and all from_* classmethods; validation raises ValueError if signed=True is set without inhibitory_nts or if excluded_nts is used without a meta_df containing top_nt); tests/test_influence_calculator.py gains test_excluded_nts_removes_edges and
test_excluded_nts_requires_top_nt.

6. Exposed `lambda_max` as a documented parameter

Change. The target largest real eigenvalue of the rescaled connectivity matrix W̃, previously hardcoded to 0.99 inside _normalize_W, is now a constructor argument:

ic = InfluenceCalculator.from_dataframes(edges, meta, lambda_max=0.5)

Default remains 0.99 for backwards compatibility.

_normalize_W now always rescales to lambda_max exactly (rather than only capping when the natural eigenvalue exceeds it), so the parameter is a true control knob over leading-mode amplification rather than a stability ceiling.

Why. Think of lambda_max as a reverb knob on the network. Near 1, a signal injected at the seed echoes around the graph many times before fading — the gain along the dominant recurrent loop is 1/(1-lambda_max), so 100× at 0.99 versus 2× at 0.5. Crank it to the max and the dominant loop drowns out finer differences between targets: every column of the heatmap ends up with nearly the same shape. Turn it down and the signal mostly travels along short paths, exposing per-target specificity at the cost of attenuating long polysynaptic effects.

Which value is "right" depends on the connectome. The default 0.99 is calibrated for a whole-CNS Drosophila graph (BANC-scale, ~130k neurons), where you want maximum sensitivity to weak distal influence. On a small graph like the C. elegans connectome the same setting puts the leading mode in charge of the entire heatmap; 0.5 is a more useful starting point. The point of exposing the parameter is that this is a knob users should be turning, not a hidden constant.

Trade-off shown by the worked example sweep (28 canonical sensory→interneuron pairs from the C. elegans literature, ranked within each seed's column; mean column std as a leading-mode dominance proxy):

| lambda_max | canonical mean rank-frac | mean col std (info) | |--------------|--------------------------|---------------------| | 0.10 | 0.931 | 0.160 |
| 0.30 | 0.929 | 0.162 |
| 0.50 | 0.932 | 0.149 |
| 0.70 | 0.933 | 0.131 |
| 0.90 | 0.937 | 0.105 |
| 0.99 | 0.921 | 0.042 |

Canonical-pair scores barely move (the strongest direct paths win at any λ), but column differentiation collapses by ~4× between λ=0.90 and λ=0.99 — the leading-mode dominance signature. The worked example defaults to lambda_max=0.5 as the balance: canonical hits intact, columns clearly differentiated, some polysynaptic integration retained.

Files. InfluenceCalculator/InfluenceCalculator.py (parameter added and validated on every constructor; _normalize_W reads self.lambda_max); examples/celegans_worked_example.py (defines LAMBDA_MAX = 0.5 and surfaces it in the heatmap title).

7. Worked example — `examples/celegans_worked_example.py`

Change. A self-contained script that loads the bundled C. elegans graph, computes per-seed influence from every sensory neuron (83 seeds, summed into 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 → 136 cell classes), log-adjusts the per-(target_class, seed_class) raw scores via adjust_influence, and renders two heatmaps in docs/images/:

influence_heatmap_unsigned.png (sequential greyscale, [0, max])
influence_heatmap_signed.png (diverging blue→white→red, [−bound, +bound])

The seed and target axes are grouped by body_part (body / head / tail; the pharyngeal nervous system is excluded as it is essentially isolated from the rest of the graph) and clustered within each group by average-linkage hierarchical clustering. The matrix is transposed so seed classes index the rows.

Bilateral pairs are summed into cell classes (AVAL/AVAR → AVA, AVDL/AVDR → AVD, IL2DL/IL2DR → IL2D) on both axes via a regex that strips the trailing L/R only when it follows a capital letter (deliberately not including DL|DR|VL|VR as alternatives — Python's leftmost-first alternation would otherwise turn AVDL into AV instead of AVD). The matrix shows the raw adjusted_influence values directly with no per-row min-max rescaling; with lambda_max = 0.5 the leading mode is damped enough that per-target seed specificity is already legible, so a min-max normalisation step is unnecessary.

Why. A connectome library without a worked example is hard to evaluate. The C. elegans example is small enough to run in seconds, recognised enough that the resulting heatmap can be eyeballed against the literature (sensory → command-interneuron paths, body-touch → ventral cord motor blocks, phasmid → AVA/AVD), and structured enough to demonstrate the full library API end-to-end.

const auto-calibration. Rather than hardcoding const=24, the example computes const = -log(min_nonzero |raw|) over the per-row influence scores so the smallest non-zero magnitude maps exactly to 0 after the log transform, eliminating an arbitrary floor and adapting cleanly to different lambda_max choices.

Files. examples/celegans_worked_example.py (new); docs/images/influence_heatmap_unsigned.png and
docs/images/influence_heatmap_signed.png (regenerated each run).

8. Tests overhaul

Change. tests/test_influence_calculator.py is rewritten as discrete pytest functions, fed by tests/conftest.py fixtures that use importlib.resources.as_file() to expose the bundled CSVs as filesystem paths. New tests:

test_format_equivalence — from_dataframes, from_csv, and from_numpy agree on neuron count, matrix size, and ID universe.
test_adjust_influence_basics / _threshold_floor / _preserves_sign — covers the new module-level transform.
test_input_validation_missing_columns / _signed_no_top_nt.
test_excluded_nts_removes_edges / _requires_top_nt.
test_norm_auto_computation — 'norm' is computed when absent.
test_round_trip_smoke — full pipeline from CSV → calculate_influence → adjust_influence, gated on PETSc/SLEPc availability.

PETSc/SLEPc-dependent tests use pytest.importorskip so the suite is runnable in environments without those libraries.

Files. tests/conftest.py (new); tests/test_influence_calculator.py (rewritten); pyproject.toml ([tool.pytest.ini_options] plus a test extra).

9. `pyproject.toml` modernisation

Change. Bumped setuptools >= 77 (so the SPDX-string license = "BSD-3-Clause" syntax from PEP 639 is accepted), set requires-python >= 3.10, version 0.2.0, declared optional extras (parquet, test, examples, dev), added a package-data block for the bundled CSVs, and a [tool.pytest.ini_options] section pointing to tests/.

Why. Pre-existing setup couldn't install on a machine with newer setuptools that emit warnings about the dual-purpose license field; the new form is the documented PEP 639 spelling. The optional extras mean a CI image can install just what it needs (pip install .[test]) rather than every parquet dependency.

Files. pyproject.toml.

10. README — restructured around the new knobs

Change. The README is reorganised so the three things a user actually tunes — inhibitory_nts / excluded_nts, lambda_max, and const for adjust_influence — each have one canonical home:

The Description section now derives W̃ = (λ / λ_max(W)) · W with λ as a tuneable target (the lambda_max argument), and retains the explicit gloss "where λ_max(W) is the largest real eigenvalue of W, and λ is the desired largest real eigenvalue of W̃" matching the original phrasing. It carries both the technical explanation (gain = 1/(1-lambda_max) along the leading recurrent mode) and a short "reverb knob" metaphor: high lambda_max makes the network echo signals through long indirect paths, low lambda_max keeps the signal local. Includes inline species guidance — 0.99 (seems appropriate for the whole-CNS Drosophila BANC connectome and larger graphs), near 0.5 (more appropriate for the C. elegans connectome, where the graph is small enough that the leading mode otherwise washes the heatmap out).
The "How W is filled" sentence in the Description was rewritten to make the input-normalisation explicit. The original said the matrix is "filled with the number of synaptic connections that a presynaptic neuron projects onto a postsynaptic neuron", which described syn_weight_measure='count' rather than the actual default ('norm'). The new wording makes clear that each entry is the fraction of a postsynaptic neuron's total drive that comes from a given upstream partner and explains the biological rationale (per-edge weights need to be comparable across neurons that vary widely in size and total input count).
A new "adjust_influence: log-compression and grouping" section explains why the function exists (raw scores span ten orders of magnitude), the const floor as a junk-node cutoff, and the difference between the three output columns (adjusted_influence vs the two normalised variants) — borrowing framing from the R sibling package's documentation. Includes the adjusted_influence_vs_traversal.jpg figure with an expanded caption that defines the x-axis (graph-traversal depth = mean number of synaptic hops in shortest-path BFS) and reads off the intuition: each polysynaptic step costs ≈ 1.3 units of adjusted_influence, so the score maps directly onto effective polysynaptic distance.
A new "Worked example: C. elegans connectome" section embeds the two regenerated heatmaps and shows a minimum-viable end-to-end snippet. A short knobs table cross-references the Description and adjust_influence sections rather than re-explaining each parameter. Detailed biology (the Drosophila vs C. elegans NT-set comparison, the cholinergic-fraction callout that explains the wide blank band on the signed heatmap's seed axis) lives as comments in examples/celegans_worked_example.py rather than in the README.
A short "Data source" subsection attributes the bundled CSVs to the OpenWorm project distribution (accessed February 2026) with prose citations to White et al. 1986 and Cook et al. 2019. Full BibTeX lives in the docstring of InfluenceCalculator/data/__init__.py (so help(.data) surfaces it) rather than cluttering the README.
The BANC Dataset section now lists, alongside the existing Dataverse DOI, the lab's public Google Cloud Storage path for the Feather-formatted edge list (gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather), and notes that it loads directly through from_feather.
The Usage section now also lists the alternative constructors (from_dataframes, from_csv, from_parquet, from_feather, from_numpy) alongside the original SQLite path, names the required edgelist columns (pre, post, count or weight, optional norm) and metadata columns (root_id, plus top_nt when signed=True or excluded_nts is set), and explicitly states that missing columns raise a ValueError that names the required columns and lists the columns the user actually passed — fail-fast with an actionable message rather than a silent bad result.
A one-line cross-link to natverse/influencer appears at the top of the Description section.
Four images are embedded inline:

| image | location | role | |---|---|---| | seed_to_targets_diagram.jpg | top of Description | conceptual schematic of source → targets propagation | | linear_dynamical_model.png | next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) | | neural_network_dynamics.gif | after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source .mp4 via two-pass palette ffmpeg, source deleted) | | adjusted_influence_vs_traversal.jpg | in the adjust_influence section | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) |

The seed_to_targets and adjusted_influence_vs_traversal images are pulled from the R sibling package natverse/influencer; linear_dynamical_model.png and neural_network_dynamics.gif are bespoke for this repo.

Files. README.md;
InfluenceCalculator/data/__init__.py (BibTeX moved into module docstring); examples/celegans_worked_example.py (Drosophila / C. elegans NT comparison + cholinergic-fraction comment absorbed); docs/images/seed_to_targets_diagram.jpg,
docs/images/linear_dynamical_model.png,
docs/images/neural_network_dynamics.gif,
docs/images/adjusted_influence_vs_traversal.jpg (new).

11. `.gitignore`

Change. Added __pycache__/, .pytest_cache/, .venv/, Influence/ (test-output directory), and CLAUDE.md (working notes, not for distribution). .DS_Store was already there.

Files. .gitignore.

Files affected — index

File	Status	Summary
`InfluenceCalculator/InfluenceCalculator.py`	modified	constructors, `adjust_influence`, sign preservation, `lambda_max`, NT externalisation, signed-mode bug fix

These updates attempt to solve a suite of medium-level issues: - Enabling the calculator to take more diverse input, not just .sqlite files - Providing a bundled function to "adjust" in influence score as we describe in our paper - Surfacing lambda as a value the user can alter, which turns out to be necessary to analyse e.g. the c elegans connectome - not hard-coding NT assignment, we should leave this up to the user - Correlated documentation for these changes, inc. in the README - Package with C. elegans data to users can quickly get started running examples with real connectome data Here is the detail on these changes: # ConnectomeInfluenceCalculator — Update Notes A summary of the conceptual changes made on the working tree (no commits yet). Each entry gives the change, the reason for it, and the files touched. --- ## 1. DataFrame / CSV / Parquet / Feather / NumPy constructors **Change.** The library now accepts pandas DataFrames and several common on-disk formats in addition to the original SQLite path: - `InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)` - `InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)` - `InfluenceCalculator.from_parquet(...)`, `from_feather(...)` - `InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)` The old `InfluenceCalculator(filename, ...)` SQLite path still works. **Why.** The original API forced every caller to package their connectome into a SQLite file with a specific schema, which is awkward for ad-hoc exploration, for users coming from R / pandas pipelines, and for the worked example in this repo. The DataFrame constructor accepts the same columns as the SQLite schema (`pre`, `post`, `count`, optional `norm`) plus a metadata frame with `root_id` and (when relevant) `top_nt`. **Files.** `InfluenceCalculator/InfluenceCalculator.py` — added `from_*` classmethods plus two module-level helpers, `_validate_meta` and `_validate_and_prepare_edgelist`, that enforce the column requirements with descriptive error messages. --- ## 2. Bundled C. elegans dataset **Change.** A small *C. elegans* connectome (300 neurons, 3,539 chemical edges, 20,672 synapses) ships with the package and is exposed as: ```python from InfluenceCalculator.data import celegans_edgelist, celegans_meta edges = celegans_edgelist() # pre, post, count, norm meta = celegans_meta() # root_id, top_nt, super_class, neuron_class, body_part ``` **Why.** Tests and examples need a real connectome they can build a calculator from without external downloads. The previous toy `tests/toy_network_example.sqlite` was opaque and not documented; the *C. elegans* graph is small, public, and well-annotated, which lets the worked example double as a tutorial against a famous connectome. **Files.** `InfluenceCalculator/data/__init__.py` (new — uses `importlib.resources.files()` so it works whether installed or run in-tree); `InfluenceCalculator/data/celegans_edgelist.csv` and `InfluenceCalculator/data/celegans_meta.csv` (new); `pyproject.toml` gains a `[tool.setuptools.package-data]` entry so the CSVs are included in the wheel. > **Provenance.** The data was taken from the > [OpenWorm project](https://openworm.org/) distribution of the > *C. elegans* hermaphrodite chemical connectome (accessed > February 2026), which aggregates the original electron-microscopy > reconstructions of White et al. 1986 and Cook et al. 2019. The > README "Data source" section carries the full citation block; > downstream users redistributing the bundled CSVs should cite both > primary sources and the OpenWorm aggregation. --- ## 3. Module-level `adjust_influence` **Change.** A new function `adjust_influence(df, const=24, signif=6)` is exported alongside `InfluenceCalculator`. It takes the DataFrame returned by `calculate_influence`, groups by `(target, seed)`, sums within each group, and returns three columns: - `adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)` - `adjusted_influence_norm_by_targets` - `adjusted_influence_norm_by_sources_and_targets` **Why.** Raw influence scores span many orders of magnitude — the strongest direct paths can be ten billion times larger than the weakest distal trickle — so a linear-scaled heatmap shows the top of the distribution and nothing else. `adjust_influence` does three things to make the output legible: - **`log(...)`** compresses the dynamic range so weak and strong paths are visible side by side. - **`max(|x|, exp(-const))`** is a junk-node floor: anything weaker than `exp(-const)` is treated as "essentially zero", which keeps `log(0)` from producing `-inf` and stops the colormap getting hijacked by numerical noise. - **`+ const`** then shifts everything so the smallest meaningful score sits at exactly 0 and a colormap can be anchored there without losing sign information. Mirrors `adjust_influence` in the R sibling package [`natverse/influencer`](https://github.com/natverse/influencer). **Files.** `InfluenceCalculator/InfluenceCalculator.py` (function defined at module scope); `InfluenceCalculator/__init__.py` (exported in `__all__`); `tests/test_influence_calculator.py` (`test_adjust_influence_basics`, `test_adjust_influence_threshold_floor`, `test_adjust_influence_preserves_sign`). --- ## 4. Sign-preserving signed mode **Change.** When `signed=True` is set, the influence DataFrame now returns the real part of the steady-state vector rather than its magnitude — so a target dominated by inhibitory paths gets a negative `Influence_score_(signed)`. `adjust_influence` propagates the sign through the log transform. **Why.** Without sign preservation, "signed" mode silently degenerated to unsigned: every output was a non-negative magnitude, and the only remaining effect of GABA-pre-neuron negation was to reduce magnitudes where positive and negative paths cancelled. That cancellation is real but undetectable from the output, which makes the signed flag indistinguishable from a perturbed unsigned run. While auditing this we also found a **pre-existing bug** in `_create_sparse_W`. To make `signed=True` produce negative weights for inhibitory pre-neurons, the original code multiplied the relevant rows of the edgelist's `count` column by `-1`. But the matrix `W` is populated from the `norm` column (the fraction `count / sum_per_post`), not from `count`. Flipping the sign of `count` left `norm` positive, so every entry of `W` ended up positive regardless of how `signed=` was set — the signed setting silently built the same matrix as the unsigned setting. The fix multiplies whichever column is actually used to build `W` (held in the `syn_weight_measure` variable, which defaults to `norm`), so the negation now reaches the matrix. **Files.** `InfluenceCalculator/InfluenceCalculator.py` — `_build_influence_dataframe` now branches on `self.W_signed`; `_create_sparse_W` negates `syn_weight_measure` instead of `'count'`. --- ## 5. Externalised neurotransmitter assignment (`inhibitory_nts`, `excluded_nts`) **Change.** The old hardcoded `NEG_NEUROTRANSMITTERS = {'glutamate', 'gaba', 'serotonin', 'octopamine'}` constant has been removed. Two keyword arguments now pass through the constructor and every classmethod: - `inhibitory_nts={'gaba', ...}` — `top_nt` values whose pre-neurons receive a negative sign in `signed=True` mode. - `excluded_nts={'dopamine', 'serotonin', ...}` — `top_nt` values whose pre-neurons contribute zero outgoing weight (their columns of `W` are empty), in either signed or unsigned mode. **Why.** 1. *Species-dependence.* Glutamate is net excitatory in mammals and most *Drosophila* circuits but is dual-action in *C. elegans* (GluCl chloride channels make it inhibitory at e.g. AWC→AIY). Hardcoding the sign in the library forces every user into the same biological assumption. 2. *Receptor-mix uncertainty.* Modulators like dopamine, serotonin, and octopamine can be excitatory or inhibitory at a given target depending on the receptor mix. `excluded_nts` lets users silence pre-neurons whose net effect cannot be assigned a single sign, rather than forcing a wrong one. 3. *Library hygiene.* Per a direct user instruction: "the library should not pre-empt how the user wishes to assign transmitters." Sets are still demonstrated in the README and worked example as defaults users can copy-paste. **Files.** `InfluenceCalculator/InfluenceCalculator.py` (constants removed; new kwargs threaded through `__init__` and all `from_*` classmethods; validation raises `ValueError` if `signed=True` is set without `inhibitory_nts` or if `excluded_nts` is used without a `meta_df` containing `top_nt`); `tests/test_influence_calculator.py` gains `test_excluded_nts_removes_edges` and `test_excluded_nts_requires_top_nt`. --- ## 6. Exposed `lambda_max` as a documented parameter **Change.** The target largest real eigenvalue of the rescaled connectivity matrix W̃, previously hardcoded to `0.99` inside `_normalize_W`, is now a constructor argument: ```python ic = InfluenceCalculator.from_dataframes(edges, meta, lambda_max=0.5) ``` Default remains `0.99` for backwards compatibility. `_normalize_W` now *always* rescales to `lambda_max` exactly (rather than only capping when the natural eigenvalue exceeds it), so the parameter is a true control knob over leading-mode amplification rather than a stability ceiling. **Why.** Think of `lambda_max` as a **reverb knob** on the network. Near 1, a signal injected at the seed echoes around the graph many times before fading — the gain along the dominant recurrent loop is `1/(1-lambda_max)`, so `100×` at `0.99` versus `2×` at `0.5`. Crank it to the max and the dominant loop drowns out finer differences between targets: every column of the heatmap ends up with nearly the same shape. Turn it down and the signal mostly travels along short paths, exposing per-target specificity at the cost of attenuating long polysynaptic effects. Which value is "right" depends on the connectome. The default `0.99` is calibrated for a whole-CNS *Drosophila* graph (BANC-scale, ~130k neurons), where you want maximum sensitivity to weak distal influence. On a small graph like the *C. elegans* connectome the same setting puts the leading mode in charge of the entire heatmap; `0.5` is a more useful starting point. The point of exposing the parameter is that this is a knob users should be turning, not a hidden constant. **Trade-off shown by the worked example sweep** (28 canonical sensory→interneuron pairs from the *C. elegans* literature, ranked within each seed's column; mean column std as a leading-mode dominance proxy): | `lambda_max` | canonical mean rank-frac | mean col std (info) | |--------------|--------------------------|---------------------| | 0.10 | 0.931 | 0.160 | | 0.30 | 0.929 | 0.162 | | 0.50 | 0.932 | 0.149 | | 0.70 | 0.933 | 0.131 | | 0.90 | 0.937 | 0.105 | | 0.99 | 0.921 | **0.042** | Canonical-pair scores barely move (the strongest direct paths win at any λ), but column differentiation collapses by ~4× between λ=0.90 and λ=0.99 — the leading-mode dominance signature. The worked example defaults to `lambda_max=0.5` as the balance: canonical hits intact, columns clearly differentiated, some polysynaptic integration retained. **Files.** `InfluenceCalculator/InfluenceCalculator.py` (parameter added and validated on every constructor; `_normalize_W` reads `self.lambda_max`); `examples/celegans_worked_example.py` (defines `LAMBDA_MAX = 0.5` and surfaces it in the heatmap title). --- ## 7. Worked example — `examples/celegans_worked_example.py` **Change.** A self-contained script that loads the bundled *C. elegans* graph, computes per-seed influence from every sensory neuron (83 seeds, summed into 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 → 136 cell classes), log-adjusts the per-(target_class, seed_class) raw scores via `adjust_influence`, and renders two heatmaps in `docs/images/`: - `influence_heatmap_unsigned.png` (sequential greyscale, [0, max]) - `influence_heatmap_signed.png` (diverging blue→white→red, [−bound, +bound]) The seed and target axes are grouped by `body_part` (body / head / tail; the pharyngeal nervous system is excluded as it is essentially isolated from the rest of the graph) and clustered within each group by average-linkage hierarchical clustering. The matrix is transposed so seed classes index the rows. Bilateral pairs are summed into cell classes (`AVAL/AVAR → AVA`, `AVDL/AVDR → AVD`, `IL2DL/IL2DR → IL2D`) on both axes via a regex that strips the trailing L/R only when it follows a capital letter (deliberately *not* including `DL|DR|VL|VR` as alternatives — Python's leftmost-first alternation would otherwise turn `AVDL` into `AV` instead of `AVD`). The matrix shows the **raw** adjusted_influence values directly with no per-row min-max rescaling; with `lambda_max = 0.5` the leading mode is damped enough that per-target seed specificity is already legible, so a min-max normalisation step is unnecessary. **Why.** A connectome library without a worked example is hard to evaluate. The `C. elegans` example is small enough to run in seconds, recognised enough that the resulting heatmap can be eyeballed against the literature (sensory → command-interneuron paths, body-touch → ventral cord motor blocks, phasmid → AVA/AVD), and structured enough to demonstrate the full library API end-to-end. **`const` auto-calibration.** Rather than hardcoding `const=24`, the example computes `const = -log(min_nonzero |raw|)` over the per-row influence scores so the smallest non-zero magnitude maps exactly to 0 after the log transform, eliminating an arbitrary floor and adapting cleanly to different `lambda_max` choices. **Files.** `examples/celegans_worked_example.py` (new); `docs/images/influence_heatmap_unsigned.png` and `docs/images/influence_heatmap_signed.png` (regenerated each run). --- ## 8. Tests overhaul **Change.** `tests/test_influence_calculator.py` is rewritten as discrete pytest functions, fed by `tests/conftest.py` fixtures that use `importlib.resources.as_file()` to expose the bundled CSVs as filesystem paths. New tests: - `test_format_equivalence` — `from_dataframes`, `from_csv`, and `from_numpy` agree on neuron count, matrix size, and ID universe. - `test_adjust_influence_basics` / `_threshold_floor` / `_preserves_sign` — covers the new module-level transform. - `test_input_validation_missing_columns` / `_signed_no_top_nt`. - `test_excluded_nts_removes_edges` / `_requires_top_nt`. - `test_norm_auto_computation` — `'norm'` is computed when absent. - `test_round_trip_smoke` — full pipeline from CSV → `calculate_influence` → `adjust_influence`, gated on PETSc/SLEPc availability. PETSc/SLEPc-dependent tests use `pytest.importorskip` so the suite is runnable in environments without those libraries. **Files.** `tests/conftest.py` (new); `tests/test_influence_calculator.py` (rewritten); `pyproject.toml` (`[tool.pytest.ini_options]` plus a `test` extra). --- ## 9. `pyproject.toml` modernisation **Change.** Bumped `setuptools >= 77` (so the SPDX-string `license = "BSD-3-Clause"` syntax from PEP 639 is accepted), set `requires-python >= 3.10`, version `0.2.0`, declared optional extras (`parquet`, `test`, `examples`, `dev`), added a `package-data` block for the bundled CSVs, and a `[tool.pytest.ini_options]` section pointing to `tests/`. **Why.** Pre-existing setup couldn't install on a machine with newer setuptools that emit warnings about the dual-purpose `license` field; the new form is the documented PEP 639 spelling. The optional extras mean a CI image can install just what it needs (`pip install .[test]`) rather than every parquet dependency. **Files.** `pyproject.toml`. --- ## 10. README — restructured around the new knobs **Change.** The README is reorganised so the three things a user actually tunes — `inhibitory_nts` / `excluded_nts`, `lambda_max`, and `const` for `adjust_influence` — each have one canonical home: - The **Description** section now derives `W̃ = (λ / λ_max(W)) · W` with `λ` as a tuneable target (the `lambda_max` argument), and retains the explicit gloss *"where λ_max(W) is the largest real eigenvalue of W, and λ is the desired largest real eigenvalue of W̃"* matching the original phrasing. It carries both the technical explanation (gain = `1/(1-lambda_max)` along the leading recurrent mode) and a short "reverb knob" metaphor: high `lambda_max` makes the network echo signals through long indirect paths, low `lambda_max` keeps the signal local. Includes inline species guidance — `0.99` *(seems appropriate for the whole-CNS Drosophila BANC connectome and larger graphs)*, near `0.5` *(more appropriate for the C. elegans connectome, where the graph is small enough that the leading mode otherwise washes the heatmap out)*. - The **"How W is filled"** sentence in the Description was rewritten to make the input-normalisation explicit. The original said the matrix is "filled with the number of synaptic connections that a presynaptic neuron projects onto a postsynaptic neuron", which described `syn_weight_measure='count'` rather than the actual default (`'norm'`). The new wording makes clear that each entry is *the fraction of a postsynaptic neuron's total drive that comes from a given upstream partner* and explains the biological rationale (per-edge weights need to be comparable across neurons that vary widely in size and total input count). - A new **"`adjust_influence`: log-compression and grouping"** section explains why the function exists (raw scores span ten orders of magnitude), the `const` floor as a junk-node cutoff, and the difference between the three output columns (`adjusted_influence` vs the two normalised variants) — borrowing framing from the R sibling package's documentation. Includes the `adjusted_influence_vs_traversal.jpg` figure with an expanded caption that defines the x-axis (graph-traversal depth = mean number of synaptic hops in shortest-path BFS) and reads off the intuition: each polysynaptic step costs ≈ 1.3 units of `adjusted_influence`, so the score maps directly onto effective polysynaptic distance. - A new **"Worked example: *C. elegans* connectome"** section embeds the two regenerated heatmaps and shows a minimum-viable end-to-end snippet. A short knobs table cross-references the Description and `adjust_influence` sections rather than re-explaining each parameter. Detailed biology (the *Drosophila* vs *C. elegans* NT-set comparison, the cholinergic-fraction callout that explains the wide blank band on the signed heatmap's seed axis) lives as comments in `examples/celegans_worked_example.py` rather than in the README. - A short **"Data source"** subsection attributes the bundled CSVs to the OpenWorm project distribution (accessed February 2026) with prose citations to White et al. 1986 and Cook et al. 2019. Full BibTeX lives in the docstring of `InfluenceCalculator/data/__init__.py` (so `help(.data)` surfaces it) rather than cluttering the README. - The **BANC Dataset** section now lists, alongside the existing Dataverse DOI, the lab's public Google Cloud Storage path for the Feather-formatted edge list (`gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather`), and notes that it loads directly through `from_feather`. - The **Usage** section now also lists the alternative constructors (`from_dataframes`, `from_csv`, `from_parquet`, `from_feather`, `from_numpy`) alongside the original SQLite path, names the required edgelist columns (`pre`, `post`, `count` or `weight`, optional `norm`) and metadata columns (`root_id`, plus `top_nt` when `signed=True` or `excluded_nts` is set), and explicitly states that **missing columns raise a `ValueError` that names the required columns and lists the columns the user actually passed** — fail-fast with an actionable message rather than a silent bad result. - A one-line cross-link to [`natverse/influencer`](https://github.com/natverse/influencer) appears at the top of the Description section. - Four images are embedded inline: | image | location | role | |---|---|---| | `seed_to_targets_diagram.jpg` | top of Description | conceptual schematic of source → targets propagation | | `linear_dynamical_model.png` | next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) | | `neural_network_dynamics.gif` | after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source `.mp4` via two-pass palette `ffmpeg`, source deleted) | | `adjusted_influence_vs_traversal.jpg` | in the `adjust_influence` section | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) | The `seed_to_targets` and `adjusted_influence_vs_traversal` images are pulled from the R sibling package [`natverse/influencer`](https://github.com/natverse/influencer); `linear_dynamical_model.png` and `neural_network_dynamics.gif` are bespoke for this repo. **Files.** `README.md`; `InfluenceCalculator/data/__init__.py` (BibTeX moved into module docstring); `examples/celegans_worked_example.py` (Drosophila / *C. elegans* NT comparison + cholinergic-fraction comment absorbed); `docs/images/seed_to_targets_diagram.jpg`, `docs/images/linear_dynamical_model.png`, `docs/images/neural_network_dynamics.gif`, `docs/images/adjusted_influence_vs_traversal.jpg` (new). --- ## 11. `.gitignore` **Change.** Added `__pycache__/`, `.pytest_cache/`, `.venv/`, `Influence/` (test-output directory), and `CLAUDE.md` (working notes, not for distribution). `.DS_Store` was already there. **Files.** `.gitignore`. --- ## Files affected — index | File | Status | Summary | |------|--------|---------| | `InfluenceCalculator/InfluenceCalculator.py` | modified | constructors, `adjust_influence`, sign preservation, `lambda_max`, NT externalisation, signed-mode bug fix | | `InfluenceCalculator/__init__.py` | modified | export `adjust_influence` | | `InfluenceCalculator/data/__init__.py` | new | `celegans_edgelist()`, `celegans_meta()`; module docstring carries OpenWorm + White 1986 + Cook 2019 BibTeX | | `InfluenceCalculator/data/celegans_edgelist.csv` | new | bundled edge list | | `InfluenceCalculator/data/celegans_meta.csv` | new | bundled metadata | | `examples/celegans_worked_example.py` | new | worked example, generates heatmaps | | `docs/images/influence_heatmap_unsigned.png` | new | example output (regenerated) | | `docs/images/influence_heatmap_signed.png` | new | example output (regenerated) | | `docs/images/seed_to_targets_diagram.jpg` | new | source → targets schematic (Description) | | `docs/images/linear_dynamical_model.png` | new | annotated linear-dynamics ODE (Description) | | `docs/images/neural_network_dynamics.gif` | new | propagation-to-steady-state animation (Description) | | `docs/images/adjusted_influence_vs_traversal.jpg` | new | adjusted_influence vs graph-traversal depth (`adjust_influence` section) | | `tests/conftest.py` | new | importlib.resources fixtures | | `tests/test_influence_calculator.py` | rewritten | 11 discrete pytest functions | | `pyproject.toml` | modified | setuptools≥77, extras, package-data, pytest config | | `.gitignore` | modified | cache and working-note ignores | | `README.md` | modified | `lambda_max` reverb-knob explanation in Description; `adjust_influence` section; worked-example section; data citations; `natverse/influencer` cross-link | | `update.md` | new | this document |

alexanderbates · 2026-05-02T13:21:44Z

Splitting per your request into 6 sequential PRs (#5 → #10); closing this in favour of those. #5 is the first one - please review there!

alexanderbates closed this May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to increase the flexibility of the influence calculator#4

Updates to increase the flexibility of the influence calculator#4
alexanderbates wants to merge 1 commit into
DrugowitschLab:mainfrom
alexanderbates:asb

alexanderbates commented Apr 27, 2026

Uh oh!

alexanderbates commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexanderbates commented Apr 27, 2026

ConnectomeInfluenceCalculator — Update Notes

1. DataFrame / CSV / Parquet / Feather / NumPy constructors

2. Bundled C. elegans dataset

3. Module-level adjust_influence

4. Sign-preserving signed mode

5. Externalised neurotransmitter assignment (inhibitory_nts, excluded_nts)

6. Exposed lambda_max as a documented parameter

7. Worked example — examples/celegans_worked_example.py

8. Tests overhaul

9. pyproject.toml modernisation

10. README — restructured around the new knobs

11. .gitignore

Files affected — index

Uh oh!

alexanderbates commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3. Module-level `adjust_influence`

5. Externalised neurotransmitter assignment (`inhibitory_nts`, `excluded_nts`)

6. Exposed `lambda_max` as a documented parameter

7. Worked example — `examples/celegans_worked_example.py`

9. `pyproject.toml` modernisation

11. `.gitignore`