Updates to increase the flexibility of the influence calculator#4
Closed
alexanderbates wants to merge 1 commit into
Closed
Updates to increase the flexibility of the influence calculator#4alexanderbates wants to merge 1 commit into
alexanderbates wants to merge 1 commit into
Conversation
These updates attempt to solve a suite of medium-level issues: - Enabling the calculator to take more diverse input, not just .sqlite files - Providing a bundled function to "adjust" in influence score as we describe in our paper - Surfacing lambda as a value the user can alter, which turns out to be necessary to analyse e.g. the c elegans connectome - not hard-coding NT assignment, we should leave this up to the user - Correlated documentation for these changes, inc. in the README - Package with C. elegans data to users can quickly get started running examples with real connectome data Here is the detail on these changes: # ConnectomeInfluenceCalculator — Update Notes A summary of the conceptual changes made on the working tree (no commits yet). Each entry gives the change, the reason for it, and the files touched. --- ## 1. DataFrame / CSV / Parquet / Feather / NumPy constructors **Change.** The library now accepts pandas DataFrames and several common on-disk formats in addition to the original SQLite path: - `InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)` - `InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)` - `InfluenceCalculator.from_parquet(...)`, `from_feather(...)` - `InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)` The old `InfluenceCalculator(filename, ...)` SQLite path still works. **Why.** The original API forced every caller to package their connectome into a SQLite file with a specific schema, which is awkward for ad-hoc exploration, for users coming from R / pandas pipelines, and for the worked example in this repo. The DataFrame constructor accepts the same columns as the SQLite schema (`pre`, `post`, `count`, optional `norm`) plus a metadata frame with `root_id` and (when relevant) `top_nt`. **Files.** `InfluenceCalculator/InfluenceCalculator.py` — added `from_*` classmethods plus two module-level helpers, `_validate_meta` and `_validate_and_prepare_edgelist`, that enforce the column requirements with descriptive error messages. --- ## 2. Bundled C. elegans dataset **Change.** A small *C. elegans* connectome (300 neurons, 3,539 chemical edges, 20,672 synapses) ships with the package and is exposed as: ```python from InfluenceCalculator.data import celegans_edgelist, celegans_meta edges = celegans_edgelist() # pre, post, count, norm meta = celegans_meta() # root_id, top_nt, super_class, neuron_class, body_part ``` **Why.** Tests and examples need a real connectome they can build a calculator from without external downloads. The previous toy `tests/toy_network_example.sqlite` was opaque and not documented; the *C. elegans* graph is small, public, and well-annotated, which lets the worked example double as a tutorial against a famous connectome. **Files.** `InfluenceCalculator/data/__init__.py` (new — uses `importlib.resources.files()` so it works whether installed or run in-tree); `InfluenceCalculator/data/celegans_edgelist.csv` and `InfluenceCalculator/data/celegans_meta.csv` (new); `pyproject.toml` gains a `[tool.setuptools.package-data]` entry so the CSVs are included in the wheel. > **Provenance.** The data was taken from the > [OpenWorm project](https://openworm.org/) distribution of the > *C. elegans* hermaphrodite chemical connectome (accessed > February 2026), which aggregates the original electron-microscopy > reconstructions of White et al. 1986 and Cook et al. 2019. The > README "Data source" section carries the full citation block; > downstream users redistributing the bundled CSVs should cite both > primary sources and the OpenWorm aggregation. --- ## 3. Module-level `adjust_influence` **Change.** A new function `adjust_influence(df, const=24, signif=6)` is exported alongside `InfluenceCalculator`. It takes the DataFrame returned by `calculate_influence`, groups by `(target, seed)`, sums within each group, and returns three columns: - `adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)` - `adjusted_influence_norm_by_targets` - `adjusted_influence_norm_by_sources_and_targets` **Why.** Raw influence scores span many orders of magnitude — the strongest direct paths can be ten billion times larger than the weakest distal trickle — so a linear-scaled heatmap shows the top of the distribution and nothing else. `adjust_influence` does three things to make the output legible: - **`log(...)`** compresses the dynamic range so weak and strong paths are visible side by side. - **`max(|x|, exp(-const))`** is a junk-node floor: anything weaker than `exp(-const)` is treated as "essentially zero", which keeps `log(0)` from producing `-inf` and stops the colormap getting hijacked by numerical noise. - **`+ const`** then shifts everything so the smallest meaningful score sits at exactly 0 and a colormap can be anchored there without losing sign information. Mirrors `adjust_influence` in the R sibling package [`natverse/influencer`](https://github.com/natverse/influencer). **Files.** `InfluenceCalculator/InfluenceCalculator.py` (function defined at module scope); `InfluenceCalculator/__init__.py` (exported in `__all__`); `tests/test_influence_calculator.py` (`test_adjust_influence_basics`, `test_adjust_influence_threshold_floor`, `test_adjust_influence_preserves_sign`). --- ## 4. Sign-preserving signed mode **Change.** When `signed=True` is set, the influence DataFrame now returns the real part of the steady-state vector rather than its magnitude — so a target dominated by inhibitory paths gets a negative `Influence_score_(signed)`. `adjust_influence` propagates the sign through the log transform. **Why.** Without sign preservation, "signed" mode silently degenerated to unsigned: every output was a non-negative magnitude, and the only remaining effect of GABA-pre-neuron negation was to reduce magnitudes where positive and negative paths cancelled. That cancellation is real but undetectable from the output, which makes the signed flag indistinguishable from a perturbed unsigned run. While auditing this we also found a **pre-existing bug** in `_create_sparse_W`. To make `signed=True` produce negative weights for inhibitory pre-neurons, the original code multiplied the relevant rows of the edgelist's `count` column by `-1`. But the matrix `W` is populated from the `norm` column (the fraction `count / sum_per_post`), not from `count`. Flipping the sign of `count` left `norm` positive, so every entry of `W` ended up positive regardless of how `signed=` was set — the signed setting silently built the same matrix as the unsigned setting. The fix multiplies whichever column is actually used to build `W` (held in the `syn_weight_measure` variable, which defaults to `norm`), so the negation now reaches the matrix. **Files.** `InfluenceCalculator/InfluenceCalculator.py` — `_build_influence_dataframe` now branches on `self.W_signed`; `_create_sparse_W` negates `syn_weight_measure` instead of `'count'`. --- ## 5. Externalised neurotransmitter assignment (`inhibitory_nts`, `excluded_nts`) **Change.** The old hardcoded `NEG_NEUROTRANSMITTERS = {'glutamate', 'gaba', 'serotonin', 'octopamine'}` constant has been removed. Two keyword arguments now pass through the constructor and every classmethod: - `inhibitory_nts={'gaba', ...}` — `top_nt` values whose pre-neurons receive a negative sign in `signed=True` mode. - `excluded_nts={'dopamine', 'serotonin', ...}` — `top_nt` values whose pre-neurons contribute zero outgoing weight (their columns of `W` are empty), in either signed or unsigned mode. **Why.** 1. *Species-dependence.* Glutamate is net excitatory in mammals and most *Drosophila* circuits but is dual-action in *C. elegans* (GluCl chloride channels make it inhibitory at e.g. AWC→AIY). Hardcoding the sign in the library forces every user into the same biological assumption. 2. *Receptor-mix uncertainty.* Modulators like dopamine, serotonin, and octopamine can be excitatory or inhibitory at a given target depending on the receptor mix. `excluded_nts` lets users silence pre-neurons whose net effect cannot be assigned a single sign, rather than forcing a wrong one. 3. *Library hygiene.* Per a direct user instruction: "the library should not pre-empt how the user wishes to assign transmitters." Sets are still demonstrated in the README and worked example as defaults users can copy-paste. **Files.** `InfluenceCalculator/InfluenceCalculator.py` (constants removed; new kwargs threaded through `__init__` and all `from_*` classmethods; validation raises `ValueError` if `signed=True` is set without `inhibitory_nts` or if `excluded_nts` is used without a `meta_df` containing `top_nt`); `tests/test_influence_calculator.py` gains `test_excluded_nts_removes_edges` and `test_excluded_nts_requires_top_nt`. --- ## 6. Exposed `lambda_max` as a documented parameter **Change.** The target largest real eigenvalue of the rescaled connectivity matrix W̃, previously hardcoded to `0.99` inside `_normalize_W`, is now a constructor argument: ```python ic = InfluenceCalculator.from_dataframes(edges, meta, lambda_max=0.5) ``` Default remains `0.99` for backwards compatibility. `_normalize_W` now *always* rescales to `lambda_max` exactly (rather than only capping when the natural eigenvalue exceeds it), so the parameter is a true control knob over leading-mode amplification rather than a stability ceiling. **Why.** Think of `lambda_max` as a **reverb knob** on the network. Near 1, a signal injected at the seed echoes around the graph many times before fading — the gain along the dominant recurrent loop is `1/(1-lambda_max)`, so `100×` at `0.99` versus `2×` at `0.5`. Crank it to the max and the dominant loop drowns out finer differences between targets: every column of the heatmap ends up with nearly the same shape. Turn it down and the signal mostly travels along short paths, exposing per-target specificity at the cost of attenuating long polysynaptic effects. Which value is "right" depends on the connectome. The default `0.99` is calibrated for a whole-CNS *Drosophila* graph (BANC-scale, ~130k neurons), where you want maximum sensitivity to weak distal influence. On a small graph like the *C. elegans* connectome the same setting puts the leading mode in charge of the entire heatmap; `0.5` is a more useful starting point. The point of exposing the parameter is that this is a knob users should be turning, not a hidden constant. **Trade-off shown by the worked example sweep** (28 canonical sensory→interneuron pairs from the *C. elegans* literature, ranked within each seed's column; mean column std as a leading-mode dominance proxy): | `lambda_max` | canonical mean rank-frac | mean col std (info) | |--------------|--------------------------|---------------------| | 0.10 | 0.931 | 0.160 | | 0.30 | 0.929 | 0.162 | | 0.50 | 0.932 | 0.149 | | 0.70 | 0.933 | 0.131 | | 0.90 | 0.937 | 0.105 | | 0.99 | 0.921 | **0.042** | Canonical-pair scores barely move (the strongest direct paths win at any λ), but column differentiation collapses by ~4× between λ=0.90 and λ=0.99 — the leading-mode dominance signature. The worked example defaults to `lambda_max=0.5` as the balance: canonical hits intact, columns clearly differentiated, some polysynaptic integration retained. **Files.** `InfluenceCalculator/InfluenceCalculator.py` (parameter added and validated on every constructor; `_normalize_W` reads `self.lambda_max`); `examples/celegans_worked_example.py` (defines `LAMBDA_MAX = 0.5` and surfaces it in the heatmap title). --- ## 7. Worked example — `examples/celegans_worked_example.py` **Change.** A self-contained script that loads the bundled *C. elegans* graph, computes per-seed influence from every sensory neuron (83 seeds, summed into 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 → 136 cell classes), log-adjusts the per-(target_class, seed_class) raw scores via `adjust_influence`, and renders two heatmaps in `docs/images/`: - `influence_heatmap_unsigned.png` (sequential greyscale, [0, max]) - `influence_heatmap_signed.png` (diverging blue→white→red, [−bound, +bound]) The seed and target axes are grouped by `body_part` (body / head / tail; the pharyngeal nervous system is excluded as it is essentially isolated from the rest of the graph) and clustered within each group by average-linkage hierarchical clustering. The matrix is transposed so seed classes index the rows. Bilateral pairs are summed into cell classes (`AVAL/AVAR → AVA`, `AVDL/AVDR → AVD`, `IL2DL/IL2DR → IL2D`) on both axes via a regex that strips the trailing L/R only when it follows a capital letter (deliberately *not* including `DL|DR|VL|VR` as alternatives — Python's leftmost-first alternation would otherwise turn `AVDL` into `AV` instead of `AVD`). The matrix shows the **raw** adjusted_influence values directly with no per-row min-max rescaling; with `lambda_max = 0.5` the leading mode is damped enough that per-target seed specificity is already legible, so a min-max normalisation step is unnecessary. **Why.** A connectome library without a worked example is hard to evaluate. The `C. elegans` example is small enough to run in seconds, recognised enough that the resulting heatmap can be eyeballed against the literature (sensory → command-interneuron paths, body-touch → ventral cord motor blocks, phasmid → AVA/AVD), and structured enough to demonstrate the full library API end-to-end. **`const` auto-calibration.** Rather than hardcoding `const=24`, the example computes `const = -log(min_nonzero |raw|)` over the per-row influence scores so the smallest non-zero magnitude maps exactly to 0 after the log transform, eliminating an arbitrary floor and adapting cleanly to different `lambda_max` choices. **Files.** `examples/celegans_worked_example.py` (new); `docs/images/influence_heatmap_unsigned.png` and `docs/images/influence_heatmap_signed.png` (regenerated each run). --- ## 8. Tests overhaul **Change.** `tests/test_influence_calculator.py` is rewritten as discrete pytest functions, fed by `tests/conftest.py` fixtures that use `importlib.resources.as_file()` to expose the bundled CSVs as filesystem paths. New tests: - `test_format_equivalence` — `from_dataframes`, `from_csv`, and `from_numpy` agree on neuron count, matrix size, and ID universe. - `test_adjust_influence_basics` / `_threshold_floor` / `_preserves_sign` — covers the new module-level transform. - `test_input_validation_missing_columns` / `_signed_no_top_nt`. - `test_excluded_nts_removes_edges` / `_requires_top_nt`. - `test_norm_auto_computation` — `'norm'` is computed when absent. - `test_round_trip_smoke` — full pipeline from CSV → `calculate_influence` → `adjust_influence`, gated on PETSc/SLEPc availability. PETSc/SLEPc-dependent tests use `pytest.importorskip` so the suite is runnable in environments without those libraries. **Files.** `tests/conftest.py` (new); `tests/test_influence_calculator.py` (rewritten); `pyproject.toml` (`[tool.pytest.ini_options]` plus a `test` extra). --- ## 9. `pyproject.toml` modernisation **Change.** Bumped `setuptools >= 77` (so the SPDX-string `license = "BSD-3-Clause"` syntax from PEP 639 is accepted), set `requires-python >= 3.10`, version `0.2.0`, declared optional extras (`parquet`, `test`, `examples`, `dev`), added a `package-data` block for the bundled CSVs, and a `[tool.pytest.ini_options]` section pointing to `tests/`. **Why.** Pre-existing setup couldn't install on a machine with newer setuptools that emit warnings about the dual-purpose `license` field; the new form is the documented PEP 639 spelling. The optional extras mean a CI image can install just what it needs (`pip install .[test]`) rather than every parquet dependency. **Files.** `pyproject.toml`. --- ## 10. README — restructured around the new knobs **Change.** The README is reorganised so the three things a user actually tunes — `inhibitory_nts` / `excluded_nts`, `lambda_max`, and `const` for `adjust_influence` — each have one canonical home: - The **Description** section now derives `W̃ = (λ / λ_max(W)) · W` with `λ` as a tuneable target (the `lambda_max` argument), and retains the explicit gloss *"where λ_max(W) is the largest real eigenvalue of W, and λ is the desired largest real eigenvalue of W̃"* matching the original phrasing. It carries both the technical explanation (gain = `1/(1-lambda_max)` along the leading recurrent mode) and a short "reverb knob" metaphor: high `lambda_max` makes the network echo signals through long indirect paths, low `lambda_max` keeps the signal local. Includes inline species guidance — `0.99` *(seems appropriate for the whole-CNS Drosophila BANC connectome and larger graphs)*, near `0.5` *(more appropriate for the C. elegans connectome, where the graph is small enough that the leading mode otherwise washes the heatmap out)*. - The **"How W is filled"** sentence in the Description was rewritten to make the input-normalisation explicit. The original said the matrix is "filled with the number of synaptic connections that a presynaptic neuron projects onto a postsynaptic neuron", which described `syn_weight_measure='count'` rather than the actual default (`'norm'`). The new wording makes clear that each entry is *the fraction of a postsynaptic neuron's total drive that comes from a given upstream partner* and explains the biological rationale (per-edge weights need to be comparable across neurons that vary widely in size and total input count). - A new **"`adjust_influence`: log-compression and grouping"** section explains why the function exists (raw scores span ten orders of magnitude), the `const` floor as a junk-node cutoff, and the difference between the three output columns (`adjusted_influence` vs the two normalised variants) — borrowing framing from the R sibling package's documentation. Includes the `adjusted_influence_vs_traversal.jpg` figure with an expanded caption that defines the x-axis (graph-traversal depth = mean number of synaptic hops in shortest-path BFS) and reads off the intuition: each polysynaptic step costs ≈ 1.3 units of `adjusted_influence`, so the score maps directly onto effective polysynaptic distance. - A new **"Worked example: *C. elegans* connectome"** section embeds the two regenerated heatmaps and shows a minimum-viable end-to-end snippet. A short knobs table cross-references the Description and `adjust_influence` sections rather than re-explaining each parameter. Detailed biology (the *Drosophila* vs *C. elegans* NT-set comparison, the cholinergic-fraction callout that explains the wide blank band on the signed heatmap's seed axis) lives as comments in `examples/celegans_worked_example.py` rather than in the README. - A short **"Data source"** subsection attributes the bundled CSVs to the OpenWorm project distribution (accessed February 2026) with prose citations to White et al. 1986 and Cook et al. 2019. Full BibTeX lives in the docstring of `InfluenceCalculator/data/__init__.py` (so `help(.data)` surfaces it) rather than cluttering the README. - The **BANC Dataset** section now lists, alongside the existing Dataverse DOI, the lab's public Google Cloud Storage path for the Feather-formatted edge list (`gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather`), and notes that it loads directly through `from_feather`. - The **Usage** section now also lists the alternative constructors (`from_dataframes`, `from_csv`, `from_parquet`, `from_feather`, `from_numpy`) alongside the original SQLite path, names the required edgelist columns (`pre`, `post`, `count` or `weight`, optional `norm`) and metadata columns (`root_id`, plus `top_nt` when `signed=True` or `excluded_nts` is set), and explicitly states that **missing columns raise a `ValueError` that names the required columns and lists the columns the user actually passed** — fail-fast with an actionable message rather than a silent bad result. - A one-line cross-link to [`natverse/influencer`](https://github.com/natverse/influencer) appears at the top of the Description section. - Four images are embedded inline: | image | location | role | |---|---|---| | `seed_to_targets_diagram.jpg` | top of Description | conceptual schematic of source → targets propagation | | `linear_dynamical_model.png` | next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) | | `neural_network_dynamics.gif` | after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source `.mp4` via two-pass palette `ffmpeg`, source deleted) | | `adjusted_influence_vs_traversal.jpg` | in the `adjust_influence` section | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) | The `seed_to_targets` and `adjusted_influence_vs_traversal` images are pulled from the R sibling package [`natverse/influencer`](https://github.com/natverse/influencer); `linear_dynamical_model.png` and `neural_network_dynamics.gif` are bespoke for this repo. **Files.** `README.md`; `InfluenceCalculator/data/__init__.py` (BibTeX moved into module docstring); `examples/celegans_worked_example.py` (Drosophila / *C. elegans* NT comparison + cholinergic-fraction comment absorbed); `docs/images/seed_to_targets_diagram.jpg`, `docs/images/linear_dynamical_model.png`, `docs/images/neural_network_dynamics.gif`, `docs/images/adjusted_influence_vs_traversal.jpg` (new). --- ## 11. `.gitignore` **Change.** Added `__pycache__/`, `.pytest_cache/`, `.venv/`, `Influence/` (test-output directory), and `CLAUDE.md` (working notes, not for distribution). `.DS_Store` was already there. **Files.** `.gitignore`. --- ## Files affected — index | File | Status | Summary | |------|--------|---------| | `InfluenceCalculator/InfluenceCalculator.py` | modified | constructors, `adjust_influence`, sign preservation, `lambda_max`, NT externalisation, signed-mode bug fix | | `InfluenceCalculator/__init__.py` | modified | export `adjust_influence` | | `InfluenceCalculator/data/__init__.py` | new | `celegans_edgelist()`, `celegans_meta()`; module docstring carries OpenWorm + White 1986 + Cook 2019 BibTeX | | `InfluenceCalculator/data/celegans_edgelist.csv` | new | bundled edge list | | `InfluenceCalculator/data/celegans_meta.csv` | new | bundled metadata | | `examples/celegans_worked_example.py` | new | worked example, generates heatmaps | | `docs/images/influence_heatmap_unsigned.png` | new | example output (regenerated) | | `docs/images/influence_heatmap_signed.png` | new | example output (regenerated) | | `docs/images/seed_to_targets_diagram.jpg` | new | source → targets schematic (Description) | | `docs/images/linear_dynamical_model.png` | new | annotated linear-dynamics ODE (Description) | | `docs/images/neural_network_dynamics.gif` | new | propagation-to-steady-state animation (Description) | | `docs/images/adjusted_influence_vs_traversal.jpg` | new | adjusted_influence vs graph-traversal depth (`adjust_influence` section) | | `tests/conftest.py` | new | importlib.resources fixtures | | `tests/test_influence_calculator.py` | rewritten | 11 discrete pytest functions | | `pyproject.toml` | modified | setuptools≥77, extras, package-data, pytest config | | `.gitignore` | modified | cache and working-note ignores | | `README.md` | modified | `lambda_max` reverb-knob explanation in Description; `adjust_influence` section; worked-example section; data citations; `natverse/influencer` cross-link | | `update.md` | new | this document |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These updates attempt to solve a suite of medium-level issues. This will help improve the repo's usability, I feel!
Here is the detail on these changes:
ConnectomeInfluenceCalculator — Update Notes
A summary of the conceptual changes made on the working tree (no commits yet). Each entry gives the change, the reason for it, and the files touched.
1. DataFrame / CSV / Parquet / Feather / NumPy constructors
Change. The library now accepts pandas DataFrames and several common on-disk formats in addition to the original SQLite path:
InfluenceCalculator.from_dataframes(edgelist_df, meta_df=None, ...)InfluenceCalculator.from_csv(edgelist_path, meta_path=None, ...)InfluenceCalculator.from_parquet(...),from_feather(...)InfluenceCalculator.from_numpy(adjacency_matrix, neuron_ids=None, ...)The old
InfluenceCalculator(filename, ...)SQLite path still works.Why. The original API forced every caller to package their connectome into a SQLite file with a specific schema, which is awkward for ad-hoc exploration, for users coming from R / pandas pipelines, and for the worked example in this repo. The DataFrame constructor accepts the same columns as the SQLite schema (
pre,post,count, optionalnorm) plus a metadata frame withroot_idand (when relevant)top_nt.Files.
InfluenceCalculator/InfluenceCalculator.py— addedfrom_*classmethods plus two module-level helpers,_validate_metaand_validate_and_prepare_edgelist, that enforce the column requirements with descriptive error messages.2. Bundled C. elegans dataset
Change. A small C. elegans connectome (300 neurons, 3,539 chemical edges, 20,672 synapses) ships with the package and is exposed as:
Why. Tests and examples need a real connectome they can build a calculator from without external downloads. The previous toy
tests/toy_network_example.sqlitewas opaque and not documented; the C. elegans graph is small, public, and well-annotated, which lets the worked example double as a tutorial against a famous connectome.Files.
InfluenceCalculator/data/__init__.py(new — usesimportlib.resources.files()so it works whether installed or run in-tree);InfluenceCalculator/data/celegans_edgelist.csvandInfluenceCalculator/data/celegans_meta.csv(new);pyproject.tomlgains a[tool.setuptools.package-data]entry so the CSVs are included in the wheel.3. Module-level
adjust_influenceChange. A new function
adjust_influence(df, const=24, signif=6)is exported alongsideInfluenceCalculator. It takes the DataFrame returned bycalculate_influence, groups by(target, seed), sums within each group, and returns three columns:adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const)adjusted_influence_norm_by_targetsadjusted_influence_norm_by_sources_and_targetsWhy. Raw influence scores span many orders of magnitude — the strongest direct paths can be ten billion times larger than the weakest distal trickle — so a linear-scaled heatmap shows the top of the distribution and nothing else.
adjust_influencedoes three things to make the output legible:log(...)compresses the dynamic range so weak and strong paths are visible side by side.max(|x|, exp(-const))is a junk-node floor: anything weaker thanexp(-const)is treated as "essentially zero", which keepslog(0)from producing-infand stops the colormap getting hijacked by numerical noise.+ constthen shifts everything so the smallest meaningful score sits at exactly 0 and a colormap can be anchored there without losing sign information.Mirrors
adjust_influencein the R sibling packagenatverse/influencer.Files.
InfluenceCalculator/InfluenceCalculator.py(function defined at module scope);InfluenceCalculator/__init__.py(exported in__all__);tests/test_influence_calculator.py(test_adjust_influence_basics,test_adjust_influence_threshold_floor,test_adjust_influence_preserves_sign).4. Sign-preserving signed mode
Change. When
signed=Trueis set, the influence DataFrame now returns the real part of the steady-state vector rather than its magnitude — so a target dominated by inhibitory paths gets a negativeInfluence_score_(signed).adjust_influencepropagates the sign through the log transform.Why. Without sign preservation, "signed" mode silently degenerated to unsigned: every output was a non-negative magnitude, and the only remaining effect of GABA-pre-neuron negation was to reduce magnitudes where positive and negative paths cancelled. That cancellation is real but undetectable from the output, which makes the signed flag indistinguishable from a perturbed unsigned run.
While auditing this we also found a pre-existing bug in
_create_sparse_W. To makesigned=Trueproduce negative weights for inhibitory pre-neurons, the original code multiplied the relevant rows of the edgelist'scountcolumn by-1. But the matrixWis populated from thenormcolumn (the fractioncount / sum_per_post), not fromcount. Flipping the sign ofcountleftnormpositive, so every entry ofWended up positive regardless of howsigned=was set — the signed setting silently built the same matrix as the unsigned setting. The fix multiplies whichever column is actually used to buildW(held in thesyn_weight_measurevariable, which defaults tonorm), so the negation now reaches the matrix.Files.
InfluenceCalculator/InfluenceCalculator.py—_build_influence_dataframenow branches onself.W_signed;_create_sparse_Wnegatessyn_weight_measureinstead of'count'.5. Externalised neurotransmitter assignment (
inhibitory_nts,excluded_nts)Change. The old hardcoded
NEG_NEUROTRANSMITTERS = {'glutamate', 'gaba', 'serotonin', 'octopamine'}constant has been removed. Two keyword arguments now pass through the constructor and every classmethod:inhibitory_nts={'gaba', ...}—top_ntvalues whose pre-neurons receive a negative sign insigned=Truemode.excluded_nts={'dopamine', 'serotonin', ...}—top_ntvalues whose pre-neurons contribute zero outgoing weight (their columns ofWare empty), in either signed or unsigned mode.Why.
excluded_ntslets users silence pre-neurons whose net effect cannot be assigned a single sign, rather than forcing a wrong one.Files.
InfluenceCalculator/InfluenceCalculator.py(constants removed; new kwargs threaded through__init__and allfrom_*classmethods; validation raisesValueErrorifsigned=Trueis set withoutinhibitory_ntsor ifexcluded_ntsis used without ameta_dfcontainingtop_nt);tests/test_influence_calculator.pygainstest_excluded_nts_removes_edgesandtest_excluded_nts_requires_top_nt.6. Exposed
lambda_maxas a documented parameterChange. The target largest real eigenvalue of the rescaled connectivity matrix W̃, previously hardcoded to
0.99inside_normalize_W, is now a constructor argument:Default remains
0.99for backwards compatibility._normalize_Wnow always rescales tolambda_maxexactly (rather than only capping when the natural eigenvalue exceeds it), so the parameter is a true control knob over leading-mode amplification rather than a stability ceiling.Why. Think of
lambda_maxas a reverb knob on the network. Near 1, a signal injected at the seed echoes around the graph many times before fading — the gain along the dominant recurrent loop is1/(1-lambda_max), so100×at0.99versus2×at0.5. Crank it to the max and the dominant loop drowns out finer differences between targets: every column of the heatmap ends up with nearly the same shape. Turn it down and the signal mostly travels along short paths, exposing per-target specificity at the cost of attenuating long polysynaptic effects.Which value is "right" depends on the connectome. The default
0.99is calibrated for a whole-CNS Drosophila graph (BANC-scale, ~130k neurons), where you want maximum sensitivity to weak distal influence. On a small graph like the C. elegans connectome the same setting puts the leading mode in charge of the entire heatmap;0.5is a more useful starting point. The point of exposing the parameter is that this is a knob users should be turning, not a hidden constant.Trade-off shown by the worked example sweep (28 canonical sensory→interneuron pairs from the C. elegans literature, ranked within each seed's column; mean column std as a leading-mode dominance proxy):
|
lambda_max| canonical mean rank-frac | mean col std (info) | |--------------|--------------------------|---------------------| | 0.10 | 0.931 | 0.160 || 0.30 | 0.929 | 0.162 |
| 0.50 | 0.932 | 0.149 |
| 0.70 | 0.933 | 0.131 |
| 0.90 | 0.937 | 0.105 |
| 0.99 | 0.921 | 0.042 |
Canonical-pair scores barely move (the strongest direct paths win at any λ), but column differentiation collapses by ~4× between λ=0.90 and λ=0.99 — the leading-mode dominance signature. The worked example defaults to
lambda_max=0.5as the balance: canonical hits intact, columns clearly differentiated, some polysynaptic integration retained.Files.
InfluenceCalculator/InfluenceCalculator.py(parameter added and validated on every constructor;_normalize_Wreadsself.lambda_max);examples/celegans_worked_example.py(definesLAMBDA_MAX = 0.5and surfaces it in the heatmap title).7. Worked example —
examples/celegans_worked_example.pyChange. A self-contained script that loads the bundled C. elegans graph, computes per-seed influence from every sensory neuron (83 seeds, summed into 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 → 136 cell classes), log-adjusts the per-(target_class, seed_class) raw scores via
adjust_influence, and renders two heatmaps indocs/images/:influence_heatmap_unsigned.png(sequential greyscale, [0, max])influence_heatmap_signed.png(diverging blue→white→red, [−bound, +bound])The seed and target axes are grouped by
body_part(body / head / tail; the pharyngeal nervous system is excluded as it is essentially isolated from the rest of the graph) and clustered within each group by average-linkage hierarchical clustering. The matrix is transposed so seed classes index the rows.Bilateral pairs are summed into cell classes (
AVAL/AVAR → AVA,AVDL/AVDR → AVD,IL2DL/IL2DR → IL2D) on both axes via a regex that strips the trailing L/R only when it follows a capital letter (deliberately not includingDL|DR|VL|VRas alternatives — Python's leftmost-first alternation would otherwise turnAVDLintoAVinstead ofAVD). The matrix shows the raw adjusted_influence values directly with no per-row min-max rescaling; withlambda_max = 0.5the leading mode is damped enough that per-target seed specificity is already legible, so a min-max normalisation step is unnecessary.Why. A connectome library without a worked example is hard to evaluate. The
C. elegansexample is small enough to run in seconds, recognised enough that the resulting heatmap can be eyeballed against the literature (sensory → command-interneuron paths, body-touch → ventral cord motor blocks, phasmid → AVA/AVD), and structured enough to demonstrate the full library API end-to-end.constauto-calibration. Rather than hardcodingconst=24, the example computesconst = -log(min_nonzero |raw|)over the per-row influence scores so the smallest non-zero magnitude maps exactly to 0 after the log transform, eliminating an arbitrary floor and adapting cleanly to differentlambda_maxchoices.Files.
examples/celegans_worked_example.py(new);docs/images/influence_heatmap_unsigned.pnganddocs/images/influence_heatmap_signed.png(regenerated each run).8. Tests overhaul
Change.
tests/test_influence_calculator.pyis rewritten as discrete pytest functions, fed bytests/conftest.pyfixtures that useimportlib.resources.as_file()to expose the bundled CSVs as filesystem paths. New tests:test_format_equivalence—from_dataframes,from_csv, andfrom_numpyagree on neuron count, matrix size, and ID universe.test_adjust_influence_basics/_threshold_floor/_preserves_sign— covers the new module-level transform.test_input_validation_missing_columns/_signed_no_top_nt.test_excluded_nts_removes_edges/_requires_top_nt.test_norm_auto_computation—'norm'is computed when absent.test_round_trip_smoke— full pipeline from CSV →calculate_influence→adjust_influence, gated on PETSc/SLEPc availability.PETSc/SLEPc-dependent tests use
pytest.importorskipso the suite is runnable in environments without those libraries.Files.
tests/conftest.py(new);tests/test_influence_calculator.py(rewritten);pyproject.toml([tool.pytest.ini_options]plus atestextra).9.
pyproject.tomlmodernisationChange. Bumped
setuptools >= 77(so the SPDX-stringlicense = "BSD-3-Clause"syntax from PEP 639 is accepted), setrequires-python >= 3.10, version0.2.0, declared optional extras (parquet,test,examples,dev), added apackage-datablock for the bundled CSVs, and a[tool.pytest.ini_options]section pointing totests/.Why. Pre-existing setup couldn't install on a machine with newer setuptools that emit warnings about the dual-purpose
licensefield; the new form is the documented PEP 639 spelling. The optional extras mean a CI image can install just what it needs (pip install .[test]) rather than every parquet dependency.Files.
pyproject.toml.10. README — restructured around the new knobs
Change. The README is reorganised so the three things a user actually tunes —
inhibitory_nts/excluded_nts,lambda_max, andconstforadjust_influence— each have one canonical home:The Description section now derives
W̃ = (λ / λ_max(W)) · Wwithλas a tuneable target (thelambda_maxargument), and retains the explicit gloss "where λ_max(W) is the largest real eigenvalue of W, and λ is the desired largest real eigenvalue of W̃" matching the original phrasing. It carries both the technical explanation (gain =1/(1-lambda_max)along the leading recurrent mode) and a short "reverb knob" metaphor: highlambda_maxmakes the network echo signals through long indirect paths, lowlambda_maxkeeps the signal local. Includes inline species guidance —0.99(seems appropriate for the whole-CNS Drosophila BANC connectome and larger graphs), near0.5(more appropriate for the C. elegans connectome, where the graph is small enough that the leading mode otherwise washes the heatmap out).The "How W is filled" sentence in the Description was rewritten to make the input-normalisation explicit. The original said the matrix is "filled with the number of synaptic connections that a presynaptic neuron projects onto a postsynaptic neuron", which described
syn_weight_measure='count'rather than the actual default ('norm'). The new wording makes clear that each entry is the fraction of a postsynaptic neuron's total drive that comes from a given upstream partner and explains the biological rationale (per-edge weights need to be comparable across neurons that vary widely in size and total input count).A new "
adjust_influence: log-compression and grouping" section explains why the function exists (raw scores span ten orders of magnitude), theconstfloor as a junk-node cutoff, and the difference between the three output columns (adjusted_influencevs the two normalised variants) — borrowing framing from the R sibling package's documentation. Includes theadjusted_influence_vs_traversal.jpgfigure with an expanded caption that defines the x-axis (graph-traversal depth = mean number of synaptic hops in shortest-path BFS) and reads off the intuition: each polysynaptic step costs ≈ 1.3 units ofadjusted_influence, so the score maps directly onto effective polysynaptic distance.A new "Worked example: C. elegans connectome" section embeds the two regenerated heatmaps and shows a minimum-viable end-to-end snippet. A short knobs table cross-references the Description and
adjust_influencesections rather than re-explaining each parameter. Detailed biology (the Drosophila vs C. elegans NT-set comparison, the cholinergic-fraction callout that explains the wide blank band on the signed heatmap's seed axis) lives as comments inexamples/celegans_worked_example.pyrather than in the README.A short "Data source" subsection attributes the bundled CSVs to the OpenWorm project distribution (accessed February 2026) with prose citations to White et al. 1986 and Cook et al. 2019. Full BibTeX lives in the docstring of
InfluenceCalculator/data/__init__.py(sohelp(.data)surfaces it) rather than cluttering the README.The BANC Dataset section now lists, alongside the existing Dataverse DOI, the lab's public Google Cloud Storage path for the Feather-formatted edge list (
gs://lee-lab_brain-and-nerve-cord-fly-connectome/compiled_data/banc_888/banc_888_edgelist_simple_v2.feather), and notes that it loads directly throughfrom_feather.The Usage section now also lists the alternative constructors (
from_dataframes,from_csv,from_parquet,from_feather,from_numpy) alongside the original SQLite path, names the required edgelist columns (pre,post,countorweight, optionalnorm) and metadata columns (root_id, plustop_ntwhensigned=Trueorexcluded_ntsis set), and explicitly states that missing columns raise aValueErrorthat names the required columns and lists the columns the user actually passed — fail-fast with an actionable message rather than a silent bad result.A one-line cross-link to
natverse/influencerappears at the top of the Description section.Four images are embedded inline:
| image | location | role | |---|---|---| |
seed_to_targets_diagram.jpg| top of Description | conceptual schematic of source → targets propagation | |linear_dynamical_model.png| next to the ODE | annotated breakdown of the linear-dynamics equation (terms + BANC-scale dimensions) | |neural_network_dynamics.gif| after the steady-state equation | 12-second propagation animation on a 28-node toy graph showing convergence to steady state (auto-renders inline; converted from a source.mp4via two-pass paletteffmpeg, source deleted) | |adjusted_influence_vs_traversal.jpg| in theadjust_influencesection | scatter of adjusted_influence vs graph-traversal depth on BANC, showing the near-linear scaling (R² = 0.94) |The
seed_to_targetsandadjusted_influence_vs_traversalimages are pulled from the R sibling packagenatverse/influencer;linear_dynamical_model.pngandneural_network_dynamics.gifare bespoke for this repo.Files.
README.md;InfluenceCalculator/data/__init__.py(BibTeX moved into module docstring);examples/celegans_worked_example.py(Drosophila / C. elegans NT comparison + cholinergic-fraction comment absorbed);docs/images/seed_to_targets_diagram.jpg,docs/images/linear_dynamical_model.png,docs/images/neural_network_dynamics.gif,docs/images/adjusted_influence_vs_traversal.jpg(new).11.
.gitignoreChange. Added
__pycache__/,.pytest_cache/,.venv/,Influence/(test-output directory), andCLAUDE.md(working notes, not for distribution)..DS_Storewas already there.Files.
.gitignore.Files affected — index
InfluenceCalculator/InfluenceCalculator.pyadjust_influence, sign preservation,lambda_max, NT externalisation, signed-mode bug fix