Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
34a3012
[maven-release-plugin] prepare for next development iteration
namedgraph Apr 6, 2026
cbc5e7a
Post-release version bump
namedgraph Apr 6, 2026
e20d1c1
Fallback to synthesized predicate description when proxy unavailable
namedgraph Apr 8, 2026
00f0d60
Proxy URIs from namespace (#285)
namedgraph Apr 9, 2026
11da65e
Fix CORS response headers (#286)
namedgraph Apr 10, 2026
f33b394
Enable gzip compression in nginx for RDF and JSON content types (#290)
namedgraph Apr 11, 2026
c901a0d
Add 3D Linked Data browser (#288)
namedgraph Apr 11, 2026
11f2c33
gzip fix for RDF documents
namedgraph Apr 11, 2026
fdd5a72
Scope nginx gzip to static locations only
namedgraph Apr 12, 2026
b96f2fd
Fix connection pool exhaustion from proxy requests (#292)
namedgraph Apr 13, 2026
224e7c3
Remove debug xsl:message statements from 3D graph XSL
namedgraph Apr 13, 2026
73d540f
README update
namedgraph Apr 20, 2026
bb28205
Document tabs (#294)
namedgraph May 9, 2026
420142c
Build static resource URLs from dataspace origin
namedgraph May 9, 2026
747b6ec
Move ldh:view block injection client-side under Saxon-JS (#295)
namedgraph May 10, 2026
4ec275a
Surface document and primary topic first in client-side rdf:RDF rende…
namedgraph May 11, 2026
300488f
Generalise object metadata loading (#297)
namedgraph May 11, 2026
ae475fa
Tolerate SPARQL failures when loading server-side object metadata
namedgraph May 11, 2026
e765a60
HTTP test for accept param on non-exsistent dataspaces
namedgraph May 11, 2026
6435b32
Apply ?accept override before app matching so 404 on unknown dataspac…
namedgraph May 11, 2026
23efdb3
Render proxied RDF responses in tab panes and tidy navbar templates f…
namedgraph May 11, 2026
d216841
Client side property/object metadata (#298)
namedgraph May 12, 2026
26d1d94
Move http:Response bs2:Header templates from layout.xsl to document.xsl
namedgraph May 12, 2026
c4124b4
Align ac:property-label cache lookup with documentPool key shape
namedgraph May 12, 2026
94bc699
Load metadata in ontology-view block render chain; rename for clarity
namedgraph May 12, 2026
cae32c5
Preserve mode query param when navigating from links on proxied pages
namedgraph May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 32 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,18 @@ find ./document-hierarchy/ -name '*.sh' -exec bash {} \;
- `ServiceContext` decouples HTTP infrastructure from `Service`, holding dataspace and service metadata separately
- Dataspace metadata and service metadata are split in configuration; types for `lapp:endUserApplication`/`lapp:adminApplication` are inferred on the fly from `system.trig`

### Dataspaces
Since v5.1.0, a single LDH instance supports multiple **dataspaces**, each identified by a distinct subdomain (origin). Each dataspace is a pair of applications: an end-user app (`<subdomain>`) and an admin app (`admin.<subdomain>`), routed by nginx via wildcard subdomain matching.

Configuration is split across two files:
- `config/dataspaces.trig` — public metadata: origins (`lapp:origin`), ontologies (`ldt:ontology`), stylesheets (`ac:stylesheet`)
- `config/system.trig` — internal wiring: maps apps to SPARQL services (`ldt:service`) and assigns types (`lapp:AdminApplication`/`lapp:EndUserApplication`)

Multiple dataspaces can share the same backend SPARQL service.

### Service Architecture
The application runs as a multi-container setup:
- **nginx**: Reverse proxy and SSL termination
- **nginx**: Reverse proxy and SSL termination (wildcard subdomain routing for dataspaces)
- **linkeddatahub**: Main Java application (Tomcat)
- **fuseki-admin/fuseki-end-user**: Separate SPARQL stores
- **varnish-frontend/varnish-admin/varnish-end-user**: Caching layers
Expand All @@ -91,8 +100,28 @@ The application runs as a multi-container setup:
1. Requests come through nginx proxy
2. Varnish provides caching layer
3. LinkedDataHub application handles business logic
4. Data persisted to appropriate Fuseki triplestore
5. XSLT transforms data for client presentation
4. RDF data is read/written via the **Graph Store Protocol** — each document in the hierarchy corresponds to a named graph in the triplestore; the document URI is the graph name
5. Data persisted to appropriate Fuseki triplestore
6. XSLT transforms data for client presentation

### Linked Data Proxy and Client-Side Rendering

LDH includes a Linked Data proxy that dereferences external URIs on behalf of the browser. The original design rendered proxied resources identically to local ones — server-side RDF fetch + XSLT. This created a DDoS/resource-exhaustion vector: scraper bots routing arbitrary external URIs through the proxy would trigger a full server-side pipeline (HTTP fetch → XSLT rendering) per request, exhausting HTTP connection pools and CPU.

The current design splits rendering by request origin:

- **Browser requests** (`Accept: text/html`): `ProxyRequestFilter` bypasses the proxy entirely. The server returns the local application shell. Saxon-JS then issues a second, RDF-typed request (`Accept: application/rdf+xml`) from the browser.
- **RDF requests** (API clients, Saxon-JS second pass): `ProxyRequestFilter` fetches the external RDF, parses it, and returns it to the caller. No XSLT happens server-side.
- **Client-side rendering**: Saxon-JS receives the raw RDF and applies the same XSLT 3 templates used server-side (shared stylesheet), so proxied resources look almost identical to local ones.

Key implementation files:
- `ProxyRequestFilter.java` — intercepts `?uri=` and `lapp:Dataset` proxy requests; HTML bypass; forwards external `Link` headers
- `ApplicationFilter.java` — registers external proxy target URI in request context (`AC.uri` property) as authoritative proxy marker
- `ResponseHeadersFilter.java` — skips local-only hypermedia links (`sd:endpoint`, `ldt:ontology`, `ac:stylesheet`) for proxy requests; external ones are forwarded by `ProxyRequestFilter`
- `client.xsl` (`ldh:rdf-document-response`) — receives the RDF proxy response client-side; extracts `sd:endpoint` from `Link` header; stores it in `LinkedDataHub.endpoint`
- `functions.xsl` (`sd:endpoint()`) — returns `LinkedDataHub.endpoint` when set (external proxy), otherwise falls back to the local SPARQL endpoint

The SPARQL endpoint forwarding chain ensures ContentMode blocks (charts, maps) query the **remote** app's SPARQL endpoint, not the local one. `LinkedDataHub.endpoint` is reset to the local endpoint by `ldh:HTMLDocumentLoaded` on every HTML page navigation, so there is no stale state when navigating back to local documents.

### Key Extension Points
- **Vocabulary definitions** in `com.atomgraph.linkeddatahub.vocabulary`
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ ENV MAX_TOTAL_CONN=40

ENV MAX_REQUEST_RETRIES=3

ENV CONNECTION_REQUEST_TIMEOUT=30000

ENV IMPORT_KEEPALIVE=

ENV MAX_IMPORT_THREADS=10
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,13 @@ The following tools are required for CLI scripts in the `bin/` directory:

### Dataspaces

Dataspaces are configured in [`config/system.trig`](https://github.com/AtomGraph/LinkedDataHub/blob/master/config/system.trig). Relative URIs will be resolved against the base URI configured in the `.env` file.
Since version 5.1.0, a single LinkedDataHub instance supports multiple **dataspaces**, each identified by a distinct subdomain (origin). Each dataspace consists of a pair of applications: an end-user app (e.g. `https://northwind-traders.demo.localhost:4443`) and an admin app on the `admin.` subdomain (e.g. `https://admin.northwind-traders.demo.localhost:4443`).

Dataspace configuration is split across two files:
- [`config/dataspaces.trig`](https://github.com/AtomGraph/LinkedDataHub/blob/master/config/dataspaces.trig) — public metadata: origins (`lapp:origin`), ontologies, stylesheets
- [`config/system.trig`](https://github.com/AtomGraph/LinkedDataHub/blob/master/config/system.trig) — internal wiring: SPARQL service bindings and application types (`lapp:AdminApplication`/`lapp:EndUserApplication`)

To add a new dataspace, add corresponding entries to both files. Relative URIs will be resolved against the base URI configured in the `.env` file.

_:warning: Do not use blank nodes to identify applications or services. We recommend using the `urn:` URI scheme, since LinkedDataHub application resources are not accessible under their own dataspace._

Expand Down
49 changes: 47 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ services:
- SIGN_UP_CERT_VALIDITY=180
- MAX_CONTENT_LENGTH=${MAX_CONTENT_LENGTH:-2097152}
- ALLOW_INTERNAL_URLS=${ALLOW_INTERNAL_URLS:-}
- CONNECTION_REQUEST_TIMEOUT=${CONNECTION_REQUEST_TIMEOUT:-}
- NOTIFICATION_ADDRESS=LinkedDataHub <notifications@localhost>
- MAIL_SMTP_HOST=email-server
- MAIL_SMTP_PORT=25
Expand Down Expand Up @@ -204,6 +205,20 @@ configs:
ssl_verify_client ${NGINX_SSL_VERIFY_CLIENT:-optional_no_ca};

location / {
add_header Access-Control-Allow-Origin "*" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS" always;
add_header Access-Control-Allow-Headers "Accept, Content-Type, Authorization" always;
add_header Access-Control-Expose-Headers "Link, Content-Location, Location" always;

if ($$request_method = OPTIONS) {
add_header Access-Control-Allow-Origin "*";
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS";
add_header Access-Control-Allow-Headers "Accept, Content-Type, Authorization";
add_header Access-Control-Expose-Headers "Link, Content-Location, Location";
add_header Access-Control-Max-Age "1728000";
return 204;
}

proxy_pass http://linkeddatahub;
#proxy_cache backcache;
limit_req zone=linked_data burst=30 nodelay;
Expand All @@ -215,11 +230,14 @@ configs:

proxy_set_header Client-Cert '';
proxy_set_header Client-Cert $$ssl_client_escaped_cert;

# add_header Cache-Control "public, max-age=86400";
}

location ^~ /uploads/ {
gzip on;
gzip_proxied any;
gzip_types *;
gzip_min_length 1024;

proxy_pass http://linkeddatahub;
limit_req zone=static_files burst=20 nodelay;

Expand All @@ -235,9 +253,15 @@ configs:
}

location ^~ /static/ {
gzip on;
gzip_proxied any;
gzip_types *;
gzip_min_length 1024;

proxy_pass http://linkeddatahub;
limit_req zone=static_files burst=50 nodelay;

add_header Access-Control-Allow-Origin "*" always;
add_header Cache-Control "public, max-age=604800, immutable";
}
}
Expand All @@ -253,6 +277,20 @@ configs:
ssl_verify_client optional_no_ca;

location / {
add_header Access-Control-Allow-Origin "*" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS" always;
add_header Access-Control-Allow-Headers "Accept, Content-Type, Authorization" always;
add_header Access-Control-Expose-Headers "Link, Content-Location, Location" always;

if ($$request_method = OPTIONS) {
add_header Access-Control-Allow-Origin "*";
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS";
add_header Access-Control-Allow-Headers "Accept, Content-Type, Authorization";
add_header Access-Control-Expose-Headers "Link, Content-Location, Location";
add_header Access-Control-Max-Age "1728000";
return 204;
}

proxy_pass http://linkeddatahub;
#proxy_cache backcache;
limit_req zone=linked_data burst=30 nodelay;
Expand All @@ -267,8 +305,15 @@ configs:
}

location ^~ /static/ {
gzip on;
gzip_proxied any;
gzip_types *;
gzip_min_length 1024;

proxy_pass http://linkeddatahub;
limit_req zone=static_files burst=50 nodelay;

add_header Access-Control-Allow-Origin "*" always;
}
}

Expand Down
18 changes: 18 additions & 0 deletions http-tests/dataspaces/non-existent-dataspace-accept-param.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -euo pipefail

# Regression: ?accept= param must be honoured even when the dataspace does not exist

# admin app
content_type=$(curl -k -s -G -w "%{content_type}" -o /dev/null \
--data-urlencode "accept=text/turtle" \
"https://admin.non-existing.localhost:4443/")

echo "$content_type" | grep -q "text/turtle"

# end-user app
content_type=$(curl -k -s -G -w "%{content_type}" -o /dev/null \
--data-urlencode "accept=text/turtle" \
"https://non-existing.localhost:4443/")

echo "$content_type" | grep -q "text/turtle"
2 changes: 1 addition & 1 deletion http-tests/misc/cors-jaxrs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ purge_cache "$END_USER_VARNISH_SERVICE"
purge_cache "$ADMIN_VARNISH_SERVICE"
purge_cache "$FRONTEND_VARNISH_SERVICE"

# Test JAX-RS CORSFilter on dynamic content (GET request)
# Test nginx CORS headers on dynamic content (GET request)

response=$(curl -i -k -s \
-H "Origin: https://example.com" \
Expand Down
18 changes: 18 additions & 0 deletions http-tests/misc/gzip-sefjson.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -euo pipefail

# Test that nginx gzip compression is active for static JSON (SEF file)

response=$(curl -k -s -D - -o /dev/null \
-H "Accept-Encoding: gzip" \
"${END_USER_BASE_URL}static/com/atomgraph/linkeddatahub/xsl/client.xsl.sef.json")

if ! echo "$response" | grep -qi "Content-Encoding: gzip"; then
echo "Content-Encoding: gzip not found on client.xsl.sef.json"
exit 1
fi

if ! echo "$response" | grep -q "HTTP/.* 200"; then
echo "client.xsl.sef.json did not return 200 OK"
exit 1
fi
39 changes: 39 additions & 0 deletions http-tests/proxy/GET-proxied-accept-forwarded.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
set -euo pipefail

initialize_dataset "$END_USER_BASE_URL" "$TMP_END_USER_DATASET" "$END_USER_ENDPOINT_URL"
initialize_dataset "$ADMIN_BASE_URL" "$TMP_ADMIN_DATASET" "$ADMIN_ENDPOINT_URL"
purge_cache "$END_USER_VARNISH_SERVICE"
purge_cache "$ADMIN_VARNISH_SERVICE"
purge_cache "$FRONTEND_VARNISH_SERVICE"

# add agent to the readers group to be able to read documents

add-agent-to-group.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
--agent "$AGENT_URI" \
"${ADMIN_BASE_URL}acl/groups/readers/"

# Regression: ProxyRequestFilter must forward the client's Accept header verbatim to the
# upstream, NOT substitute its own readable-types list. Previously the filter built its
# outbound Accept from MediaTypes.getReadable(Model.class) + getReadable(ResultSet.class)
# (everything Jena could ingest, all q=1.0), discarding what the client actually asked for.
# The upstream then content-negotiated against that broad list and could legally pick any
# RDF format — e.g. application/rdf+thrift — even when the client (e.g. SaxonJS document())
# explicitly requested application/rdf+xml or application/xml.
#
# Verify by requesting one specific RDF type and asserting the response matches it.

for accept in 'application/rdf+xml' 'text/turtle' 'application/n-triples'; do
content_type=$(curl -k -f -s -G -w "%{content_type}" -o /dev/null \
-E "$AGENT_CERT_FILE":"$AGENT_CERT_PWD" \
-H "Accept: $accept" \
--data-urlencode "uri=${END_USER_BASE_URL}" \
"$ADMIN_BASE_URL")

case "$content_type" in
"$accept"*) ;;
*) exit 1 ;;
esac
done
40 changes: 40 additions & 0 deletions http-tests/proxy/GET-proxied-accept-html-not-preferred.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
set -euo pipefail

initialize_dataset "$END_USER_BASE_URL" "$TMP_END_USER_DATASET" "$END_USER_ENDPOINT_URL"
initialize_dataset "$ADMIN_BASE_URL" "$TMP_ADMIN_DATASET" "$ADMIN_ENDPOINT_URL"
purge_cache "$END_USER_VARNISH_SERVICE"
purge_cache "$ADMIN_VARNISH_SERVICE"
purge_cache "$FRONTEND_VARNISH_SERVICE"

# add agent to the readers group to be able to read documents

add-agent-to-group.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
--agent "$AGENT_URI" \
"${ADMIN_BASE_URL}acl/groups/readers/"

# Regression: when a client lists application/xhtml+xml (or text/html) in Accept at a
# LOWER q-value than another supported type, the proxy must treat the request as
# API-client intent and forward — not as browser navigation that wants the app shell.
# Previously, ProxyRequestFilter bypassed on anyMatch(HTML or XHTML in Accept) without
# checking q-rank, so it false-fired on any Accept that mentioned HTML at all and
# returned the local app shell instead of the proxied response.
#
# Discriminator is HTTP status — content-type cannot tell bypass from forward because
# admin and end-user share writer configs (same Accept → same negotiated type on both).
# A UUID-named path that doesn't exist on either origin disambiguates:
# - bypass: ApplicationFilter strips ?uri= → request URI becomes admin root → 200
# - forward: proxy forwards the actual UUID path to end-user → 404

accept_header='application/xml, text/xml;q=0.9, application/xhtml+xml;q=0.8, */*;q=0.7'
non_existing_uri="${END_USER_BASE_URL}$(cat /proc/sys/kernel/random/uuid 2>/dev/null || uuidgen)/"

status=$(curl -k -s -G -o /dev/null -w "%{http_code}" \
-E "$AGENT_CERT_FILE":"$AGENT_CERT_PWD" \
-H "Accept: $accept_header" \
--data-urlencode "uri=${non_existing_uri}" \
"$ADMIN_BASE_URL")

[ "$status" = "$STATUS_NOT_FOUND" ] || exit 1
1 change: 1 addition & 0 deletions http-tests/proxy/GET-proxied-external-502.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ add-agent-to-group.sh \

curl -k -w "%{http_code}\n" -o /dev/null -s \
-G \
-H "Accept: application/n-triples" \
-E "$AGENT_CERT_FILE":"$AGENT_CERT_PWD" \
--data-urlencode "uri=http://f1d2d4cf-90bb-4f5b-ae4b-921e584b6edd.org" \
"$END_USER_BASE_URL" \
Expand Down
66 changes: 66 additions & 0 deletions http-tests/proxy/GET-proxied-ontology-ns.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env bash
set -euo pipefail

initialize_dataset "$END_USER_BASE_URL" "$TMP_END_USER_DATASET" "$END_USER_ENDPOINT_URL"
initialize_dataset "$ADMIN_BASE_URL" "$TMP_ADMIN_DATASET" "$ADMIN_ENDPOINT_URL"
purge_cache "$END_USER_VARNISH_SERVICE"
purge_cache "$ADMIN_VARNISH_SERVICE"
purge_cache "$FRONTEND_VARNISH_SERVICE"

# add agent to the readers group to be able to read documents

add-agent-to-group.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
--agent "$AGENT_URI" \
"${ADMIN_BASE_URL}acl/groups/readers/"

# use a made-up hash-based namespace: not mapped as a static file, not a registered app
namespace_uri="http://made-up-test-ns.example/ns"
class1="${namespace_uri}#ClassOne"
class2="${namespace_uri}#ClassTwo"
ontology_doc="${ADMIN_BASE_URL}ontologies/namespace/"
namespace="${END_USER_BASE_URL}ns#"

# add two classes with URIs in the made-up namespace to the app's ontology

add-class.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
-b "$ADMIN_BASE_URL" \
--uri "$class1" \
--label "Class One" \
"$ontology_doc"

add-class.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
-b "$ADMIN_BASE_URL" \
--uri "$class2" \
--label "Class Two" \
"$ontology_doc"

# clear the in-memory ontology so the new classes are present on next request

clear-ontology.sh \
-f "$OWNER_CERT_FILE" \
-p "$OWNER_CERT_PWD" \
-b "$ADMIN_BASE_URL" \
--ontology "$namespace"

# request the namespace document URI (without fragment) via ?uri= proxy.
# the namespace document is not DataManager-mapped and not a registered app,
# so ProxyRequestFilter falls through to the OntModel DESCRIBE path, which
# returns descriptions of all #-fragment terms in that namespace.

response=$(curl -k -f -s \
-G \
-E "$AGENT_CERT_FILE":"$AGENT_CERT_PWD" \
-H "Accept: application/n-triples" \
--data-urlencode "uri=${namespace_uri}" \
"$END_USER_BASE_URL")

# verify both class descriptions are present in the response

echo "$response" | grep -q "$class1"
echo "$response" | grep -q "$class2"
Loading
Loading