diff --git a/.gitignore b/.gitignore index 6684cea..3063160 100644 --- a/.gitignore +++ b/.gitignore @@ -9,3 +9,6 @@ _site/ # Jekyll metadata .jekyll-metadata + +# Vale linter — fetched via `vale sync`, not for commit +.vale-styles/ diff --git a/.markdownlint.jsonc b/.markdownlint.jsonc new file mode 100644 index 0000000..b591d79 --- /dev/null +++ b/.markdownlint.jsonc @@ -0,0 +1,11 @@ +{ + // CV and prose use LaTeX-style ventilated prose: one sentence per line. + // Disable hard line-length cap; sentences run as long as they need to. + "MD013": false, + // Ordered list flexibility for publication numbering. + "MD029": false, + // Allow inline HTML (we use angle brackets for emails). + "MD033": false, + // Don't require H1 at top of file (Jekyll layout supplies title). + "MD041": false +} diff --git a/.vale.ini b/.vale.ini new file mode 100644 index 0000000..cbfb900 --- /dev/null +++ b/.vale.ini @@ -0,0 +1,7 @@ +StylesPath = .vale-styles +MinAlertLevel = warning +Packages = write-good, proselint +Vocab = ctr26 + +[*.md] +BasedOnStyles = write-good, proselint diff --git a/docs/sanger-gdm-cover-letter.md b/docs/sanger-gdm-cover-letter.md new file mode 100644 index 0000000..a6f0ec0 --- /dev/null +++ b/docs/sanger-gdm-cover-letter.md @@ -0,0 +1,81 @@ +# Cover Letter: Google DeepMind Fellow in Genomics and AI + +Dr Craig T. Russell +London, UK · · [ctr26.github.io](https://ctr26.github.io) + +To the Fellowship Review Panel, +Wellcome Sanger Institute & Google DeepMind Academic Fellowship Programme + +--- + +## Why Sanger, and why now + +This Fellowship sits where my work is heading. +I want to take the multi-modal foundation-model approaches I have built in industry and ground them in the world's richest genomics environment, to deliver fundamental biological insight. +Sanger is the only place in the UK where atlas-scale single-cell data (Cellular Genomics), a programme committed to predictive biology (Generative Genomics), and a strategic embrace of AI for science share one building. +I want to spend three years here. + +I am Senior Machine Learning Scientist at Valence Labs / Recursion Pharmaceuticals. +I lead components of the Virtual Cell initiative, fine-tuning multi-modal LLMs over knowledge graphs, free text, transcriptomic and phenotypic imaging data to predict cell state and drug response. +I co-developed TxPert (arXiv:2505.14919), a state-of-the-art transcriptomic perturbation predictor, and contributed to the Boltz2 proteome-scale virtual screening pipeline. +Before Recursion I led AI engineering at EMBL-EBI's BioImage Archive and Uhlmann Group, supervised six PhD students, and shipped `bioimage_embed` (self-supervised learning for biological images) and `shape_embed` (cell-shape representation, arXiv:2507.01009) as community resources. +My PhD (Cambridge, 2018) was an instrument-building project: I designed and built a light-sheet microscope from scratch. +I am as comfortable at the data-generation end of the stack as at the modelling end. + +## My Grand Challenges + +I would primarily target Grand Challenge 4: *Harnessing multimodal generative AI to study the interplay of genetics with spatial transcriptomics, proteomics and morphology in human tissues*, led by Mo Lotfollahi and Muzlifah Haniffa. +The downstream goal of predicting genetic variation from routine clinical pathology slides sits exactly at the intersection of the multi-modal foundation models I have built at Recursion (TxPert; the Virtual Cell initiative integrating knowledge graphs, transcriptomics, free text and phenomics; Boltz2 components) and the imaging methods I shipped at EMBL-EBI (`bioimage_embed`, `shape_embed`, MIFA). +Sanger's >200 million curated cells across H&E, Xenium, scRNA-seq and ATAC are an unparalleled training corpus for cross-modal generative architectures. + +I would secondarily consider Grand Challenge 5 (*A Foundation sequence model of the Cis-Regulatory Code in developing human tissues*, Lotfollahi/Taipale) and Grand Challenge 2 (*Predictive and Mechanistic AI for Longitudinal Multi-Omics in Complex Disease*, Anderson) as natural adjacencies, but Challenge 4 is where my unique skills land best. + +The technical novelty I would bring has four parts. +First, integrate Sanger's H&E + Xenium + scRNA-seq + ATAC corpora under a single multi-expert architecture with shared latent space, with knowledge graphs as structural priors and curated literature as semantic anchors. +The architectural lineage is the multi-modal LLM-over-omics work I have shipped at Valence; the data scale is what makes Sanger the right home for it. +Second, evaluate not only held-out reconstruction but counterfactual prediction. +Does the model give scientifically reasonable answers when asked "what would the spatial transcriptome and morphology look like for this individual's variant in this tissue?". +Third, explicitly evaluate and correct for ancestry-shift and tissue-shift in the latent space, so that variant-to-pathology predictions are anchored in data the model has actually seen rather than extrapolated unsafely from a single reference cohort. +This connects directly to my outreach plan below. + +Fourth is the part I would most like to push the field on: build an agentic discovery layer on top of the foundation model, walled in by inference-time engineering so that hallucination is contained to scientifically tolerable failure modes. +Naive agentic LLMs in biology fail loudly. +The agent invents a gene name, fabricates a citation, proposes a perturbation that exists in no cell line. +The engineering response is to refuse free generation over the things that matter. +Constrained decoding restricted to a closed vocabulary of valid Ensembl IDs, HGNC symbols, and ChEMBL identifiers; tool-call verification against Sanger's own structured resources (Cellular Genomics atlases, Tree of Life genomes, the human variant catalogues), so that every claim round-trips through real data before the agent emits it; retrieval grounding to a curated literature corpus with citation-level attribution; verifier-model ensembles that score each hypothesis for biological plausibility before it leaves the agent. +The result is an agent that proposes follow-up experiments (variants to validate, tissues to oversample, perturbations to run) but only inside a sandbox where every leaf claim is checkable. +This is the missing piece between an LLM that talks about biology and an autonomous research collaborator. +GDM mentorship is most directly useful here: the inference-engineering and constrained-generation work at DeepMind on AlphaFold's confidence calibration and the AlphaProof verifier loop is the lineage I want to bring into genomics. + +Three deliverables in year one. +(i) A Sanger-anchored multimodal foundation model trained jointly on Cellular Genomics atlases (H&E + Xenium + scRNA-seq + ATAC), with public benchmarks for cross-modal counterfactual prediction. +(ii) A variant-to-pathology benchmark co-designed with Lotfollahi and Haniffa's groups that scores models on ancestry-stratified validation cohorts rather than convenience samples. +(iii) An open-source agentic discovery harness with built-in hallucination suppression (constrained decoding, tool-call verification, retrieval grounding, verifier ensembles) that other Sanger researchers can wrap around their own foundation models. +Year two onward extends this to the cis-regulatory code (Challenge 5) and to longitudinal cohorts (Challenge 2) where the same foundation-model backbone applies. + +For secondary supervisor I suggest Prof. Pietro Liò (Cambridge, Computer Laboratory) for his work on graph foundation models for biology, with Dr Virginie Uhlmann (EMBL Barcelona, formerly EMBL-EBI) as a complementary option from my prior collaboration on bioimage representation learning. +Neither is a precondition; I am open to whichever AI/ML academic the panel matches me to. + +## Outreach: capacity-building for Genomics and AI in LMICs + +The outreach fund is what I most want to use this Fellowship for. +I will run a recurring "Foundation Models for Genomics" workshop series, twice a year at LMIC partner institutions, one week each, producing graduates who can fine-tune and evaluate ML models on local genomic data. +The teaching pattern is the one I already run at EMBL-EBI for 40+ researchers a year, adapted for compute-constrained settings: containerised pipelines, transfer learning from publicly hosted weights, and datasets sourced where possible from the host country's own collections. + +The first workshop I would scope sits with the H3ABioNet / H3Africa community, who have spent over a decade building African genomic infrastructure and have been systematically underserved by the foundation-model wave. +Year-two and year-three editions expand to South-East Asia (probably anchored at the MORU collaboration in Bangkok or NUS in Singapore) and Latin America. +Each edition leaves behind a containerised teaching environment that runs on modest hardware, a small number of fine-tuned community models on locally-relevant tasks, and a peer cohort of trainers who can run later editions independently. +I would budget the fund for travel cost-sharing for participants who cannot self-fund, because that is where most "open" programmes silently fail. + +Beyond the workshops I would use the ambassadorial role to push for honest evaluation practices in published genomics-AI work, specifically benchmarks that include ancestry- and tissue-diversity stratifications, in line with the Grand Challenge framing above. +I have done versions of this community-shaping work already: founding the Virtual Cell Journal Club at Valence, contributing to napari and the Bioimage Model Zoo, and co-authoring the AI4LIFE €5M federated bioimage AI infrastructure grant. + +## In closing + +I am a research engineer. +I came up through instrument-building, went into industry to ship multi-modal foundation models at scale, and want to come back and do this work in the open. +Sanger is the one place in the UK where the data, the strategic commitment to AI, and the global-health framing live in one building. +I want to spend the next three years there. + +Sincerely, +Craig T. Russell, PhD diff --git a/index.md b/index.md index da446fd..ba7c9a2 100644 --- a/index.md +++ b/index.md @@ -1,116 +1,171 @@ --- -title: Machine Learning Scientist — Drug Discovery (Foundation Models) +title: Machine Learning Scientist - Drug Discovery (Foundation Models) layout: default --- -London, UK • [linkedin.com/in/ctr26](https://linkedin.com/in/ctr26) • [github.com/ctr26](https://github.com/ctr26) • [Google Scholar](https://scholar.google.com/citations?user=XVt7BYQAAAAJ&hl=en) -*Focus:* Virtual cells, multi‑modal foundation models, omics + imaging, precision medicine +London, UK · +[linkedin.com/in/ctr26](https://linkedin.com/in/ctr26) · +[github.com/ctr26](https://github.com/ctr26) · +[Google Scholar](https://scholar.google.com/citations?user=XVt7BYQAAAAJ&hl=en) -## Professional Summary -Machine Learning Scientist specialising in **virtual cell** development for drug discovery. Built and deployed **multi‑modal foundation models** that integrate knowledge graphs, text, transcriptomic and phenotypic imaging data. Experience spans molecular interactions to whole‑organism imaging. Comfortable leading across science and engineering—MLOps at TB‑scale, reproducible pipelines, and cross‑functional collaboration with biology, chemistry and platform teams. +*Focus:* Virtual cells, multi-modal foundation models, omics and imaging, precision medicine. -## Core Strengths -Foundation models • Representation/self‑supervised learning • Multi‑modal fusion • Knowledge graphs • Biological sequence & transcriptomics • High‑content imaging • GNNs • OOD/robustness • Scaling & performance • Reproducible ML (MLOps) • Scientific communication +## Professional summary + +Machine Learning Scientist specialising in virtual-cell development for drug discovery. +I build and deploy multi-modal foundation models that integrate knowledge graphs, text, transcriptomic and phenotypic imaging data. +My experience spans molecular interactions to whole-organism imaging. +I lead across science and engineering: MLOps at TB-scale, reproducible pipelines, and cross-functional work with biology, chemistry and platform teams. + +## Core strengths + +Foundation models · +representation and self-supervised learning · +multi-modal fusion · +knowledge graphs · +agentic discovery · +inference-time engineering and hallucination suppression · +biological sequence and transcriptomics · +high-content imaging · +GNNs · +OOD and robustness · +scaling and performance · +reproducible ML (MLOps) · +scientific communication. ## Experience -**Senior Machine Learning Scientist — Valence Labs @ Recursion Pharmaceuticals** -London, UK • Oct 2024 – Present -- **Virtual Cell initiative:** Fine‑tune multi‑modal LLMs over **knowledge graphs + text + RNA‑seq + phenotypic imaging** to model cell state and predict gene/drug responses; partnered closely with biology to design benchmark tasks and success metrics. -- **TxPert:** Co‑developed a **state‑of‑the‑art transcriptomic perturbation predictor** using systems‑biology KGs; contributed training code, data curation, and ablations. -- **Boltz2 project:** Contributed to proteome‑scale **virtual drug screening** components; supported evaluation strategy and error analysis across targets. -- **Community:** Organiser, **Virtual Cell Journal Club**; fostered reading group bridging ML and wet‑lab teams. - -**Senior Research Associate & AI Engineering Lead — EMBL‑EBI (Uhlmann Group & Bio‑Image Archive)** -Cambridge, UK • Dec 2022 – Oct 2024 -- **Team leadership:** Supervised **6 PhD students**; established coding standards, CI, and peer‑review practices used across the lab. -- **Spatial biology:** Built deep‑learning pipelines for **high‑content cell morphology** and single‑cell feature learning; integrated with public bioimage resources. -- **Open‑source:** Created **[bioimage_embed](https://github.com/uhlmanngroup/bioimage_embed)** (self‑supervised biological images) and **[shape_embed](https://arxiv.org/abs/2507.01009)** (cell‑shape DL toolkit); productionised training/inference. -- **MLOps:** Designed scalable pipelines processing **TB‑scale microscopy datasets** across HPC and cloud; containerised workflows, automated experiment tracking. -- **Academic service:** Reviewer — ISBI 2022/2023, **ICASSP** 2024. - -**AI/ML Founding Engineer — Amun AI AB** -Stockholm, Sweden • 2022 – 2024 -- Built **GKE/Kubernetes** platform for model serving with **NVIDIA Triton/KServe**; supported **100+ models** for **30+ daily users** with auth, monitoring and autoscaling. - -**AI/ML Engineering Consultant — DeepMirror** -Cambridge & London, UK • 2022 – 2024 -- **MouseMindMapper:** Automated brain‑histology segmentation product generating **£50k annual revenue**; delivered end‑to‑end data, training, packaging and docs. -- Wrote a high‑performance **C++ cheminformatics fingerprinting** library for production use. - -**Data Scientist — Brazma Group, EMBL‑EBI** -Cambridge, UK • Dec 2019 – Dec 2023 -- Co‑authored the successful **AI4LIFE €5M** grant (federated bioimage AI infrastructure); contributed to platform architecture and model‑sharing strategy. -- Drove **large‑scale AI microscopy** analyses in the Image Data Resource; collaborated with **Google Cloud** on representation learning. -- Taught annual deep‑learning courses to **40+ researchers** (PhD to PI). - -**Software Engineer (COVID‑19 Response) — European Nucleotide Archive, EMBL‑EBI** -Cambridge, UK • Mar 2020 – Sept 2020 -- Built CI/CD for the **COVID‑19 Data Portal** to enable **daily global data updates**. -- Scaled NGS alignment and **Nextflow/Kubernetes** ETL pipelines for surging data volumes. - -**Computational Microscopist — National Physical Laboratory** -London, UK • 2018 – Dec 2019 -- Developed novel **3D organoid segmentation** methods for cancer research; delivered consultancy to MSquared on advanced imaging. +### Senior Machine Learning Scientist, Valence Labs at Recursion Pharmaceuticals + +London, UK. Oct 2024 to present. + +- Virtual Cell initiative: I fine-tune multi-modal LLMs over knowledge graphs, free text, RNA-seq and phenotypic imaging to model cell state and predict gene and drug responses; I partner with biology to design benchmark tasks and success metrics. +- TxPert: I co-developed a state-of-the-art transcriptomic perturbation predictor using systems-biology knowledge graphs; I contributed training code, data curation and ablations. +- Boltz2: I contributed to proteome-scale virtual drug screening components; I supported evaluation strategy and error analysis across targets. +- Agentic discovery: I built inference-time engineering layers around multi-modal foundation models. Constrained decoding to closed biological vocabularies (Ensembl, HGNC, ChEMBL); tool-call verification against structured knowledge graphs; retrieval grounding with citation-level attribution; verifier-model ensembles. Hallucinations are walled into checkable failure modes for autonomous hypothesis generation. +- Community: I organise the Virtual Cell Journal Club, bridging ML and wet-lab teams. + +### Senior Research Associate and AI Engineering Lead, EMBL-EBI (Uhlmann Group and BioImage Archive) + +Cambridge, UK. Dec 2022 to Oct 2024. + +- Team leadership: I supervised six PhD students; I established coding standards, CI, and peer-review practices used across the lab. +- Spatial biology: I built deep-learning pipelines for high-content cell morphology and single-cell feature learning, integrated with public bioimage resources. +- Open source: I created [bioimage_embed](https://github.com/uhlmanngroup/bioimage_embed) (self-supervised learning for biological images) and [shape_embed](https://arxiv.org/abs/2507.01009) (cell-shape DL toolkit); I productionised training and inference. +- MLOps: I designed scalable pipelines processing TB-scale microscopy datasets across HPC and cloud; I containerised workflows and automated experiment tracking. +- Academic service: Reviewer, ISBI 2022 and 2023, ICASSP 2024. + +### AI/ML Founding Engineer, Amun AI AB + +Stockholm, Sweden. 2022 to 2024. + +- I built a GKE/Kubernetes platform for model serving with NVIDIA Triton and KServe; it supported 100+ models for 30+ daily users with auth, monitoring and autoscaling. + +### AI/ML Engineering Consultant, DeepMirror + +Cambridge and London, UK. 2022 to 2024. + +- MouseMindMapper: I delivered an automated brain-histology segmentation product generating £50k annual revenue, end to end across data, training, packaging and docs. +- I wrote a high-performance C++ cheminformatics fingerprinting library for production use. + +### Data Scientist, Brazma Group at EMBL-EBI + +Cambridge, UK. Dec 2019 to Dec 2023. + +- I co-authored the AI4LIFE €5M EU grant on federated bioimage AI infrastructure; I contributed platform architecture and model-sharing strategy. +- I drove large-scale AI microscopy analyses in the Image Data Resource; I collaborated with Google Cloud on representation learning. +- I taught annual deep-learning courses to 40+ researchers (PhD to PI). + +### Software Engineer (COVID-19 Response), European Nucleotide Archive at EMBL-EBI + +Cambridge, UK. Mar 2020 to Sept 2020. + +- I built CI/CD for the COVID-19 Data Portal to enable daily global data updates. +- I scaled NGS alignment and Nextflow/Kubernetes ETL pipelines for the pandemic data surge. + +### Computational Microscopist, National Physical Laboratory + +London, UK. 2018 to Dec 2019. + +- I developed novel 3D organoid segmentation methods for cancer research; I delivered consultancy to MSquared on advanced imaging. ## Education -**PhD, Engineering — University of Cambridge** • 2014 – 2018 (EPSRC PES‑CDT) -*Thesis:* “Light‑sheet microscopy for tracking particles in large specimens” -- Designed & built a novel light‑sheet microscope with automated acquisition. -- Algorithms for particle tracking, signal optimisation, and micrometre‑scale tomography. -- Supervision: 2× MRes, 1× BSc. +### PhD, Engineering, University of Cambridge + +2014 to 2018. EPSRC PES-CDT studentship, £120k. + +Thesis: "Light-sheet microscopy for tracking particles in large specimens." -**MRes, Photonics — University of Cambridge & UCL** • 2013 – 2014 (EPSRC Photonics CDT) -- Structured‑illumination microscopy reconstruction; modules in Computer Vision, Quantum Mechanics, Photonics. +- I designed and built a novel light-sheet microscope with automated acquisition. +- I developed algorithms for particle tracking, signal optimisation and micrometre-scale tomography. +- I supervised two MRes and one BSc student. -**MSci, Physics (First‑Class Honours) — Nottingham Trent University** • 2009 – 2013 -- Top physics graduate; President, Mountaineering Club (2011–2012). +### MRes, Photonics, University of Cambridge and UCL -## Selected Publications & Preprints +2013 to 2014. EPSRC Photonics CDT. -1. **Wenkel F**, Tu W, Masschelein C, Shirzad H, Eastwood C, Whitfield ST, Bendidi I, **Russell CT**, et al. *TxPert: Leveraging Biochemical Relationships for Out‑of‑Distribution Transcriptomic Perturbation Prediction.* arXiv:2505.14919 (2025) +- Structured-illumination microscopy reconstruction; modules in computer vision, quantum mechanics, photonics. -2. Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, **Russell CT**, et al. *The COVID‑19 Data Portal: accelerating SARS‑CoV‑2 and COVID‑19 research through rapid open access data sharing.* Nucleic Acids Research 49(W1):W619–W623 (2021) +### MSci, Physics (First-Class Honours), Nottingham Trent University -3. Ouyang W, Beuttenmueller F, Gómez‑de‑Mariscal E, Pape C, Burke T, Garcia‑López‑de Haro C, **Russell C**, et al. *Bioimage model zoo: a community‑driven resource for accessible deep learning in bioimage analysis.* BioRxiv 2022.06.07.495102 (2022) +2009 to 2013. -4. Ahlers J, Moré DA, Amsalem O, Anderson A, Bokota G, Boone P, **Russell C**, et al. *napari: a multi‑dimensional image viewer for Python.* Zenodo 1–2 (2023) +- Top physics graduate. +- President, Mountaineering Club (2011 to 2012). -5. Hidalgo‑Cenalmor I, Pylvänäinen JW, Ferreira MG, **Russell CT**, et al. *DL4MicEverywhere: deep learning for microscopy made flexible, shareable and reproducible.* Nature Methods 21(6):925–927 (2024) +## Selected publications and preprints -See **Google Scholar** for complete publication list: [scholar.google.com/citations?user=XVt7BYQAAAAJ](https://scholar.google.com/citations?user=XVt7BYQAAAAJ&hl=en) +1. Foix Romero A, **Russell C**, Krull A, Uhlmann V. *ShapeEmbed: a self-supervised learning framework for 2D contour quantification.* arXiv:2507.01009 (2025). +2. Wenkel F, Tu W, Masschelein C, Shirzad H, Eastwood C, Whitfield ST, Bendidi I, **Russell CT**, et al. *TxPert: Leveraging Biochemical Relationships for Out-of-Distribution Transcriptomic Perturbation Prediction.* arXiv:2505.14919 (2025). +3. (MIFA - Nature Methods 22(11):2245-2252, 2025 - co-author; full citation TBC by Craig) +4. Hidalgo-Cenalmor I, Pylvänäinen JW, Ferreira MG, **Russell CT**, et al. *DL4MicEverywhere: deep learning for microscopy made flexible, shareable and reproducible.* Nature Methods 21(6):925 to 927 (2024). +5. Ahlers J, Moré DA, Amsalem O, Anderson A, Bokota G, Boone P, **Russell C**, et al. *napari: a multi-dimensional image viewer for Python.* Zenodo (2023). +6. Ouyang W, Beuttenmueller F, Gómez-de-Mariscal E, Pape C, Burke T, Garcia-López-de Haro C, **Russell C**, et al. *Bioimage model zoo: a community-driven resource for accessible deep learning in bioimage analysis.* bioRxiv 2022.06.07.495102 (2022). +7. Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, **Russell CT**, et al. *The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing.* Nucleic Acids Research 49(W1):W619 to W623 (2021). + +Full list at [Google Scholar](https://scholar.google.com/citations?user=XVt7BYQAAAAJ&hl=en). ## Patents -- **Virtual Cell Foundation Model** • Patent pending • 2024 • Multi‑modal integration of knowledge graphs, transcriptomics and imaging for cellular state prediction (Hook1/Recursion) -- **TxPert: Transcriptomic Perturbation Prediction** • Patent pending • 2024 • Systems‑biology knowledge graph integration for gene expression response forecasting (Recursion) -## Open Source (Selected) -- **[bioimage_embed](https://github.com/uhlmanngroup/bioimage_embed)** — Self‑supervised learning for biological images. -- **[shape_embed](https://arxiv.org/abs/2507.01009)** — Deep‑learning toolkit for cell‑shape analysis. -- Contributions to **Hypha Platform**, **BioImage Model Zoo**, **BIA Binder**, **Hypha Helm Charts**, **COVID Workflow Manager**. +- Virtual Cell Foundation Model. Patent pending, 2024 (Hook1/Recursion). Multi-modal integration of knowledge graphs, transcriptomics and imaging for cellular-state prediction. +- TxPert: Transcriptomic Perturbation Prediction. Patent pending, 2024 (Recursion). Systems-biology knowledge-graph integration for gene-expression response forecasting. + +## Open source + +- [bioimage_embed](https://github.com/uhlmanngroup/bioimage_embed). Self-supervised learning for biological images. +- [shape_embed](https://arxiv.org/abs/2507.01009). Deep-learning toolkit for cell-shape analysis. +- Contributions to Hypha Platform, BioImage Model Zoo, BIA Binder, Hypha Helm Charts and the COVID Workflow Manager. ## Skills -**ML & AI:** Foundation‑model fine‑tuning, contrastive/self‑supervised learning, OOD & uncertainty, evaluation/ablation design -**Frameworks:** PyTorch, TensorFlow, Lightning, Pyro, Hugging Face, scikit‑learn -**Vision & Bio:** Bioimage analysis, 3D reconstruction, segmentation/super‑resolution, **snRNA‑seq/bulk RNA‑seq**, histopathology, fluorescence imaging, **GNNs**, knowledge graphs -**Languages:** Python (primary), R, MATLAB, C++, Java -**Compute:** Multi‑GPU training (A100/V100), CUDA, distributed training, SLURM, HPC, GCP/AWS GPU instances -**MLOps/Infra:** Kubernetes, Docker, **NVIDIA Triton**, **KServe**, MLflow, CI/CD, Terraform -**Workflows:** Nextflow, Snakemake, Apache Airflow - -## Grants & Awards -- **AI4LIFE** (2022) — Co‑investigator on **€5M** EU grant (federated bioimage AI) -- **EPSRC CDT Studentship** (2013–2018) — Photonic & Electronic Systems CDT (£120k) -- **Nuffield Research Bursary** (2012) — Computer vision for liquid‑crystal flows -- **Institute of Physics** grant support (2009–2012) - -## Teaching, Mentoring & Service -- **Course lead:** Deep Learning for Bioimage Analysis (2019–2023), 40+ participants/year -- **Supervision:** 6 PhD students (AI & spatial biology) + 3 project students (PhD years) -- **Peer review:** Nature Methods, Scientific Reports, Journal of Microscopy, ISBI, **ICASSP** -- **Talks & conferences:** FOM (2018, 2022, 2023), MMC (2018, 2022), CBIAS (2023) -- **Community leadership:** Rowing captain/coach (Magdalene College), Mountaineering Club President (NTU) - -*References available upon request.* +**ML and AI.** Foundation-model fine-tuning, contrastive and self-supervised learning, OOD and uncertainty, evaluation and ablation design, agentic LLM workflows, inference-time engineering (constrained decoding, tool-call verification, retrieval grounding, verifier ensembles). + +**Frameworks.** PyTorch, Lightning, Hugging Face, Pyro, TensorFlow, scikit-learn. + +**Vision and bio.** Bioimage analysis, 3D reconstruction, segmentation and super-resolution, snRNA-seq and bulk RNA-seq, histopathology, fluorescence imaging, GNNs, knowledge graphs. + +**Languages.** Python (primary), R, MATLAB, C++, Java. + +**Compute.** Multi-GPU training (A100, V100), CUDA, distributed training, SLURM, HPC, GCP and AWS GPU instances. + +**MLOps and infra.** Kubernetes, Docker, NVIDIA Triton, KServe, MLflow, CI/CD, Terraform. + +**Workflows.** Nextflow, Snakemake, Apache Airflow. + +## Grants and awards + +- AI4LIFE (2022). Co-investigator on a €5M EU grant for federated bioimage AI. +- EPSRC CDT studentship (2013 to 2018). Photonic and Electronic Systems CDT, £120k. +- Nuffield Research Bursary (2012). Computer vision for liquid-crystal flows. +- Institute of Physics grant support (2009 to 2012). + +## Teaching, mentoring and service + +- Course lead: Deep Learning for Bioimage Analysis (2019 to 2023), 40+ participants per year. +- Supervision: 6 PhD students (AI and spatial biology), plus 3 project students during PhD. +- Peer review: Nature Methods, Scientific Reports, Journal of Microscopy, ISBI, ICASSP. +- Talks and conferences: FOM (2018, 2022, 2023), MMC (2018, 2022), CBIAS (2023). +- Community: Rowing captain and coach (Magdalene College); Mountaineering Club President (NTU). + +*References available on request.*