Skip to content

feat(s3): add a blob-backed S3-compatible API#53

Draft
pthmas wants to merge 12 commits intomainfrom
pthmas/s3-blob-export
Draft

feat(s3): add a blob-backed S3-compatible API#53
pthmas wants to merge 12 commits intomainfrom
pthmas/s3-blob-export

Conversation

@pthmas
Copy link
Copy Markdown
Collaborator

@pthmas pthmas commented Apr 28, 2026

Summary

  • Add SQLite-backed S3-compatible API for bucket and object CRUD
  • Celestia as verification layer, not storage: PutObject submits a small JSON CommitmentEnvelope (~200 bytes) containing SHA256 + object metadata to Celestia, not raw data
  • SQLite caches raw object data; GetObject serves from the local cache; users can prove data authenticity by hashing a downloaded object and comparing against the on-chain SHA256 record
  • All write operations (CreateBucket, DeleteBucket, PutObject, DeleteObject) are blocked with 405 Method Not Allowed when no Celestia submitter is configured (read-only mode)
  • SigV4 AWS authentication with ±15 minute timestamp skew enforcement; optional — server warns at startup if unconfigured
  • parsePath uses r.URL.RawPath to avoid double URL-decoding of %2F in object keys
  • DeleteBucket wrapped in a transaction to prevent TOCTOU race between empty-check and delete
  • maxObjectSize limit (2 MB) is independent of Celestia's blob size since only the envelope is submitted
  • Migrations 004 and 005 squashed into a single clean migration with the final schema (includes sha256 column from the start, no AUTOINCREMENT overhead, single composite index)

Architecture

PUT /bucket/key
  → read body (max 2 MB)
  → compute SHA256 + MD5 (ETag)
  → marshal CommitmentEnvelope JSON
  → submit envelope to Celestia → get height + commitment
  → store raw data + metadata in SQLite

GET /bucket/key
  → serve raw data from SQLite cache
  → client can verify: sha256sum(body) == obj.SHA256 (from Celestia)

CommitmentEnvelope (submitted to Celestia)

{
  "version": 1,
  "bucket": "my-bucket",
  "key": "path/to/file.txt",
  "content_type": "text/plain",
  "size": 1234,
  "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "etag": "d41d8cd98f00b204e9800998ecf8427e"
}

Test plan

  • just test passes with -race
  • PUT object → verify obj.SHA256 non-empty → sha256sum of downloaded body matches
  • PUT with no submitter → 405 Method Not Allowed
  • PUT object > 2 MB → 413 Request Entity Too Large
  • PUT object with key > 1024 bytes → 400 Bad Request
  • Bucket name validation (3-63 chars, lowercase alphanum+hyphen, no IP)
  • TestService_PutObject_WithSubmitter verifies submitted blob is a CommitmentEnvelope, not raw data
  • TestObjectStore_ObjectCRUD verifies SHA256 round-trips through SQLite

🤖 Generated with Claude Code

@pthmas pthmas self-assigned this Apr 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6de85162-c09e-4def-bddc-8d4c5b69fd12

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pthmas/s3-blob-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pthmas pthmas changed the title feat(s3): add e2e coverage for blob-backed API feat(s3): add a blob-backed S3-compatible API Apr 29, 2026
pthmas and others added 9 commits April 29, 2026 10:59
Write operations (CreateBucket, DeleteBucket, PutObject, DeleteObject)
now return ErrReadOnly immediately when no submitter is wired in.
Previously PutObject would silently write to SQLite with no Celestia
anchor (height=0, empty commitments), producing orphaned data.

ErrReadOnly maps to 405 MethodNotAllowed in the HTTP error handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- object.go: escape LIKE wildcards (%, _, \) in ListObjects prefix to
  prevent unintended pattern matching on user-supplied keys
- object.go: populate Object.Namespace from o.ns on read instead of
  scanning per-row DB value, eliminating stale-config drift
- server.go: remove dead handleBucket wrapper, route GET bucket
  directly to handleListObjects
- server.go: clamp max-keys to 1000 per S3 spec
- server.go: apply http.MaxBytesReader before reading PUT body so
  oversized requests are rejected at the network layer, not after
  buffering the full payload in memory
- service.go: detect http.MaxBytesError from MaxBytesReader and map
  to ErrObjectTooLarge
- auth.go: validate X-Amz-Date is within ±15 min to prevent replay
  attacks with captured signed requests
- tests: update to use mockSubmitter for write ops, fix stale
  hardcoded SigV4 timestamp, add TestService_ReadOnly coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty PUTs skip Celestia submission and store locally with Height=0
and no commitments. This is intentional to preserve S3 tool
compatibility (e.g. folder placeholder keys like "prefix/").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical fixes:
- server.go: parsePath now uses r.URL.RawPath to avoid double-decoding;
  percent-encoded characters in keys (e.g. %2F) are preserved through
  path splitting and decoded per-segment
- server.go: remove query-param priority over HTTP method in bucket
  router; DELETE/PUT/HEAD on a bucket with query params now routes
  correctly instead of falling through to handleListObjects
- object.go: wrap DeleteBucket count check and delete in a single
  transaction to close TOCTOU race where a concurrent write could sneak
  in between the two separate queries

Medium fixes:
- object.go: remove redundant GetBucket call from PutObject; the SQLite
  FK constraint enforces bucket existence and is detected via
  isSQLiteFKConstraint → ErrBucketNotFound
- migrations/005: drop idx_s3_objects_bucket; the composite index on
  (bucket, key) already covers all bucket-only lookups
- service.go: validate bucket names (3-63 chars, lowercase alphanum +
  hyphen, no leading/trailing hyphen, not an IP address) and key length
  (max 1024 bytes); new ErrInvalidBucketName and ErrKeyTooLong errors
  map to 400 in the HTTP layer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- auth.go: refactor authenticateRequest into focused helpers
  (parseAuthorizationHeader, validateCredentialScope, validateAmzDate,
  payloadHashFromRequest) for readability and testability
- server.go: additional hardening from review
- service.go: minor fix
- integration_test.go: expand S3 integration coverage
- object.go: additional store hardening
- object_test.go: expand ObjectStore test coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Celestia now serves as a verification layer, not storage. PutObject
computes SHA256 of the object, builds a ~200-byte CommitmentEnvelope
(bucket, key, size, sha256, etag), and submits that JSON to Celestia.
Raw object data remains cached in SQLite and served from there.

- Add CommitmentEnvelope struct and SHA256 field to Object
- Update ObjectStore.PutObject interface to carry sha256 param
- Squash migrations 004+005 into single final schema with sha256 column
- Update store and all tests for new interface and envelope assertions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Verify X-Amz-Content-Sha256 header matches actual body on PutObject
  (SigV4 compliance; prevents body substitution after signing)
- Handle content-length as canonical header for SigV4 compatibility
- Check bucket existence before submitting to Celestia to avoid
  orphaned on-chain commitments for non-existent buckets
- Add readRequestBody helper to buffer+restore body for hash check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant