diff --git a/get-started/sample-datasets/index.mdx b/get-started/sample-datasets/index.mdx
index 7e65f005e..d1447644d 100644
--- a/get-started/sample-datasets/index.mdx
+++ b/get-started/sample-datasets/index.mdx
@@ -7,6 +7,8 @@ title: 'Tutorials and example datasets'
doc_type: 'landing-page'
---
+import { SampleDatasetExplorer } from '/snippets/components/SampleDatasetExplorer/SampleDatasetExplorer.jsx'
+
These tutorials work with any ClickHouse deployment, including [ClickHouse Cloud](/get-started/setup/cloud).
@@ -20,39 +22,4 @@ In addition, the sample datasets provide a great experience on working with Clic
learning important techniques and tricks, and seeing how to take advantage of the many powerful
functions in ClickHouse. The sample datasets include:
-{/* The following table is automatically generated at build time
-by https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-table-of-contents.sh */}
-
-{/*AUTOGENERATED_START*/}
-| Page | Description |
-|-----|-----|
-| [Amazon customer review](/get-started/sample-datasets/amazon-reviews) | Over 150M customer reviews of Amazon products |
-| [AMPLab Big Data Benchmark](/get-started/sample-datasets/amplab-benchmark) | A benchmark dataset used for comparing the performance of data warehousing solutions. |
-| [Analyzing Stack Overflow data with ClickHouse](/get-started/sample-datasets/stackoverflow) | Analyzing Stack Overflow data with ClickHouse |
-| [Anonymized web analytics](/get-started/sample-datasets/anon-web-analytics-metrica) | Dataset consisting of two tables containing anonymized web analytics data with hits and visits |
-| [Brown University Benchmark](/get-started/sample-datasets/brown-benchmark) | A new analytical benchmark for machine-generated log data |
-| [COVID-19 open data](/get-started/sample-datasets/covid19) | COVID-19 Open-Data is a large, open-source database of COVID-19 epidemiological data and related factors like demographics, economics, and government responses |
-| [dbpedia dataset](/get-started/sample-datasets/dbpedia) | Dataset containing 1 million articles from Wikipedia and their vector embeddings |
-| [Environmental sensors data](/get-started/sample-datasets/environmental-sensors) | Over 20 billion records of data from Sensor.Community, a contributors-driven global sensor network that creates Open Environmental Data. |
-| [Foursquare places](/get-started/sample-datasets/foursquare-os-places) | Dataset with over 100 million records containing information about places on a map, such as shops, restaurants, parks, playgrounds, and monuments. |
-| [Geo data using the cell tower dataset](/get-started/sample-datasets/cell-towers) | Learn how to load OpenCelliD data into ClickHouse, connect Apache Superset to ClickHouse and build a dashboard based on data |
-| [GitHub events dataset](/get-started/sample-datasets/github-events) | Dataset containing all events on GitHub from 2011 to Dec 6 2020, with a size of 3.1 billion records. |
-| [Hacker News dataset](/get-started/sample-datasets/hacker-news) | Dataset containing 28 million rows of hacker news data. |
-| [Hacker News vector search dataset](/get-started/sample-datasets/hacker-news-vector-search) | Dataset containing 28+ million Hacker News postings & their vector embeddings |
-| [LAION 5B dataset](/get-started/sample-datasets/laion5b) | Dataset containing 100 million vectors from the LAION 5B dataset |
-| [Laion-400M dataset](/get-started/sample-datasets/laion) | Dataset containing 400 million images with English image captions |
-| [New York Public Library "What's on the Menu?" dataset](/get-started/sample-datasets/menus) | Dataset containing 1.3 million records of historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices. |
-| [New York taxi data](/get-started/sample-datasets/nyc-taxi) | Data for billions of taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009 |
-| [NOAA Global Historical Climatology Network](/get-started/sample-datasets/noaa) | 2.5 billion rows of climate data for the last 120 yrs |
-| [NYPD complaint data](/get-started/sample-datasets/nypd-complaint-data) | Ingest and query Tab Separated Value data in 5 steps |
-| [OnTime](/get-started/sample-datasets/ontime) | Dataset containing the on-time performance of airline flights |
-| [Star Schema Benchmark (SSB, 2009)](/get-started/sample-datasets/star-schema) | The Star Schema Benchmark (SSB) data set and queries |
-| [Taiwan historical weather datasets](/get-started/sample-datasets/tw-weather) | 131 million rows of weather observation data for the last 128 yrs |
-| [Terabyte click logs from Criteo](/get-started/sample-datasets/criteo) | A terabyte of click logs from Criteo |
-| [The UK property prices dataset](/get-started/sample-datasets/uk-price-paid) | Learn how to use projections to improve the performance of queries that you run frequently using the UK property dataset, which contains data about prices paid for real-estate property in England and Wales |
-| [TPC-DS (2012)](/get-started/sample-datasets/tpcds) | The TPC-DS benchmark data set and queries. |
-| [TPC-H (1999)](/get-started/sample-datasets/tpch) | The TPC-H benchmark data set and queries. |
-| [WikiStat](/get-started/sample-datasets/wikistat) | Explore the WikiStat dataset containing 0.5 trillion records. |
-| [Writing queries in ClickHouse using GitHub data](/get-started/sample-datasets/github) | Dataset containing all of the commits and changes for the ClickHouse repository |
-| [YouTube dataset of dislikes](/get-started/sample-datasets/youtube-dislikes) | A collection of dislikes of YouTube videos. |
-{/*AUTOGENERATED_END*/}
+
diff --git a/images/sample-datasets-grid/benchmarks-dark.jpg b/images/sample-datasets-grid/benchmarks-dark.jpg
new file mode 100644
index 000000000..a28104aa0
Binary files /dev/null and b/images/sample-datasets-grid/benchmarks-dark.jpg differ
diff --git a/images/sample-datasets-grid/benchmarks-light.jpg b/images/sample-datasets-grid/benchmarks-light.jpg
new file mode 100644
index 000000000..f2d1840bb
Binary files /dev/null and b/images/sample-datasets-grid/benchmarks-light.jpg differ
diff --git a/images/sample-datasets-grid/geo-location-dark.jpg b/images/sample-datasets-grid/geo-location-dark.jpg
new file mode 100644
index 000000000..000687358
Binary files /dev/null and b/images/sample-datasets-grid/geo-location-dark.jpg differ
diff --git a/images/sample-datasets-grid/geo-location-light.jpg b/images/sample-datasets-grid/geo-location-light.jpg
new file mode 100644
index 000000000..d05d15e97
Binary files /dev/null and b/images/sample-datasets-grid/geo-location-light.jpg differ
diff --git a/images/sample-datasets-grid/public-records-dark.jpg b/images/sample-datasets-grid/public-records-dark.jpg
new file mode 100644
index 000000000..dcb7ae9c8
Binary files /dev/null and b/images/sample-datasets-grid/public-records-dark.jpg differ
diff --git a/images/sample-datasets-grid/public-records-light.jpg b/images/sample-datasets-grid/public-records-light.jpg
new file mode 100644
index 000000000..56bb5e1e6
Binary files /dev/null and b/images/sample-datasets-grid/public-records-light.jpg differ
diff --git a/images/sample-datasets-grid/time-series-sensors-dark.jpg b/images/sample-datasets-grid/time-series-sensors-dark.jpg
new file mode 100644
index 000000000..82b83c404
Binary files /dev/null and b/images/sample-datasets-grid/time-series-sensors-dark.jpg differ
diff --git a/images/sample-datasets-grid/time-series-sensors-light.jpg b/images/sample-datasets-grid/time-series-sensors-light.jpg
new file mode 100644
index 000000000..1c6550330
Binary files /dev/null and b/images/sample-datasets-grid/time-series-sensors-light.jpg differ
diff --git a/images/sample-datasets-grid/vector-search-dark.jpg b/images/sample-datasets-grid/vector-search-dark.jpg
new file mode 100644
index 000000000..d4a0ae256
Binary files /dev/null and b/images/sample-datasets-grid/vector-search-dark.jpg differ
diff --git a/images/sample-datasets-grid/vector-search-light.jpg b/images/sample-datasets-grid/vector-search-light.jpg
new file mode 100644
index 000000000..45cc12dce
Binary files /dev/null and b/images/sample-datasets-grid/vector-search-light.jpg differ
diff --git a/images/sample-datasets-grid/web-social-analytics-dark.jpg b/images/sample-datasets-grid/web-social-analytics-dark.jpg
new file mode 100644
index 000000000..da4d635ac
Binary files /dev/null and b/images/sample-datasets-grid/web-social-analytics-dark.jpg differ
diff --git a/images/sample-datasets-grid/web-social-analytics-light.jpg b/images/sample-datasets-grid/web-social-analytics-light.jpg
new file mode 100644
index 000000000..1fc7c46de
Binary files /dev/null and b/images/sample-datasets-grid/web-social-analytics-light.jpg differ
diff --git a/snippets/components/SampleDatasetExplorer/SampleDatasetExplorer.jsx b/snippets/components/SampleDatasetExplorer/SampleDatasetExplorer.jsx
new file mode 100644
index 000000000..0bde32521
--- /dev/null
+++ b/snippets/components/SampleDatasetExplorer/SampleDatasetExplorer.jsx
@@ -0,0 +1,273 @@
+// SampleDatasetExplorer
+// A 3x2 grid of sample-dataset *categories*. Clicking a category expands it into
+// a grid of cards for that category's child dataset pages, with an animated
+// (staggered fade/scale) transition between the two views.
+//
+// Child pages don't have their own images yet, so they render as icon Cards.
+//
+// NOTE: Mintlify eval's ONLY the exported component function, so every constant
+// (ACCENT, CATEGORIES) and helper MUST live inside the component body — module-level
+// declarations are not in scope at render time and throw "X is not defined".
+
+export const SampleDatasetExplorer = ({ categories }) => {
+ const ACCENT = '#FAFF69';
+
+ // Each category: id, title (also baked into the banner image), an icon used for
+ // its child cards, the two banner images, and the child dataset pages.
+ const CATEGORIES = [
+ {
+ id: 'benchmarks',
+ title: 'Benchmarks',
+ icon: 'gauge',
+ imgLight: '/images/sample-datasets-grid/benchmarks-light.jpg',
+ imgDark: '/images/sample-datasets-grid/benchmarks-dark.jpg',
+ datasets: [
+ { title: 'AMPLab Big Data Benchmark', href: '/get-started/sample-datasets/amplab-benchmark' },
+ { title: 'Brown University Benchmark', href: '/get-started/sample-datasets/brown-benchmark' },
+ { title: 'Star Schema Benchmark (SSB)', href: '/get-started/sample-datasets/star-schema' },
+ { title: 'TPC-DS', href: '/get-started/sample-datasets/tpcds' },
+ { title: 'TPC-H', href: '/get-started/sample-datasets/tpch' },
+ ],
+ },
+ {
+ id: 'geo-location',
+ title: 'Geo & location',
+ icon: 'map-pin',
+ imgLight: '/images/sample-datasets-grid/geo-location-light.jpg',
+ imgDark: '/images/sample-datasets-grid/geo-location-dark.jpg',
+ datasets: [
+ { title: 'Cell towers (OpenCelliD)', href: '/get-started/sample-datasets/cell-towers' },
+ { title: 'Foursquare places', href: '/get-started/sample-datasets/foursquare-os-places' },
+ { title: 'New York taxi data', href: '/get-started/sample-datasets/nyc-taxi' },
+ ],
+ },
+ {
+ id: 'public-records',
+ title: 'Public records & open data',
+ icon: 'landmark',
+ imgLight: '/images/sample-datasets-grid/public-records-light.jpg',
+ imgDark: '/images/sample-datasets-grid/public-records-dark.jpg',
+ datasets: [
+ { title: 'COVID-19 open data', href: '/get-started/sample-datasets/covid19' },
+ { title: 'NYPD complaint data', href: '/get-started/sample-datasets/nypd-complaint-data' },
+ { title: 'OnTime (airline flights)', href: '/get-started/sample-datasets/ontime' },
+ { title: 'UK property prices', href: '/get-started/sample-datasets/uk-price-paid' },
+ { title: "What's on the Menu? (NYPL)", href: '/get-started/sample-datasets/menus' },
+ ],
+ },
+ {
+ id: 'time-series-sensors',
+ title: 'Time series & sensors',
+ icon: 'activity',
+ imgLight: '/images/sample-datasets-grid/time-series-sensors-light.jpg',
+ imgDark: '/images/sample-datasets-grid/time-series-sensors-dark.jpg',
+ datasets: [
+ { title: 'Environmental sensors data', href: '/get-started/sample-datasets/environmental-sensors' },
+ { title: 'NOAA Global Historical Climatology Network', href: '/get-started/sample-datasets/noaa' },
+ { title: 'Taiwan historical weather', href: '/get-started/sample-datasets/tw-weather' },
+ ],
+ },
+ {
+ id: 'vector-search',
+ title: 'Vector search and embeddings',
+ icon: 'search',
+ imgLight: '/images/sample-datasets-grid/vector-search-light.jpg',
+ imgDark: '/images/sample-datasets-grid/vector-search-dark.jpg',
+ datasets: [
+ { title: 'dbpedia dataset', href: '/get-started/sample-datasets/dbpedia' },
+ { title: 'Hacker News vector search', href: '/get-started/sample-datasets/hacker-news-vector-search' },
+ { title: 'LAION 5B dataset', href: '/get-started/sample-datasets/laion5b' },
+ { title: 'Laion-400M dataset', href: '/get-started/sample-datasets/laion' },
+ ],
+ },
+ {
+ id: 'web-social',
+ title: 'Web and social analytics',
+ icon: 'globe',
+ imgLight: '/images/sample-datasets-grid/web-social-analytics-light.jpg',
+ imgDark: '/images/sample-datasets-grid/web-social-analytics-dark.jpg',
+ datasets: [
+ { title: 'Amazon customer reviews', href: '/get-started/sample-datasets/amazon-reviews' },
+ { title: 'Analyzing Stack Overflow data', href: '/get-started/sample-datasets/stackoverflow' },
+ { title: 'Anonymized web analytics', href: '/get-started/sample-datasets/anon-web-analytics-metrica' },
+ { title: 'Criteo terabyte click logs', href: '/get-started/sample-datasets/criteo' },
+ { title: 'GitHub events dataset', href: '/get-started/sample-datasets/github-events' },
+ { title: 'Hacker News dataset', href: '/get-started/sample-datasets/hacker-news' },
+ { title: 'Querying GitHub data', href: '/get-started/sample-datasets/github' },
+ { title: 'WikiStat', href: '/get-started/sample-datasets/wikistat' },
+ { title: 'YouTube dataset of dislikes', href: '/get-started/sample-datasets/youtube-dislikes' },
+ ],
+ },
+ ];
+
+ const cats = categories || CATEGORIES;
+
+ const [selectedId, setSelectedId] = useState(null);
+ const selected = cats.find((c) => c.id === selectedId) || null;
+
+ // Theme visibility is handled by explicit `.dark` descendant selectors in the
+ //