Skip to content

S3

The s3 driver is DuckDB with the httpfs extension preconfigured. PlotPress queries object-storage paths as if they were tables, and DuckDB pushes column projection / predicate pushdown into Parquet readers.

The DSN encodes the S3 endpoint, region, and credentials. The path component is the default prefix all view= lookups resolve under.

events:
driver: s3
dsn: s3://${S3_KEY}:${S3_SECRET}@s3.eu-central-1.amazonaws.com/data-warehouse?region=eu-central-1
allowed_users: [analysts]
timeout: 60s

Works with any S3-compatible store: AWS, Cloudflare R2, MinIO, Backblaze B2. For non-AWS endpoints set endpoint=... explicitly:

dsn: s3://${R2_KEY}:${R2_SECRET}@/data-warehouse?endpoint=7db3edf0...r2.cloudflarestorage.com&region=auto

Per provider, scoped to the prefix the dashboard reads:

  • AWS: an IAM user/role with s3:GetObject and s3:ListBucket on the prefix.
  • R2: a token scoped to the bucket, with Object Read.
  • MinIO: a service account with a policy allowing s3:GetObject/s3:ListBucket on the bucket.

Define a view in dashboards/<name>/views/ as a small SQL file that DuckDB resolves on demand. The file’s basename is the view name:

-- dashboards/sales/views/monthly_revenue.sql
SELECT
date_trunc('month', invoice_date) AS month,
sum(amount) AS revenue,
currency
FROM read_parquet('s3://data-warehouse/invoices/year=*/*.parquet')
GROUP BY 1, 3
ORDER BY 1;
Plot.barY(data, { x: "month", y: "revenue" })

DuckDB reads only the Parquet column chunks and row groups it needs. Wide tables stay cheap.

Why a SQL file under views/ rather than the queries/ fallback? Because there’s no database to push the view definition into. With S3, the dashboard is the source of truth for “what counts as monthly_revenue.” That’s expected — it’s not a workaround.

For one-off slices, drop a parameterised SQL file in queries/ and reference with query=. Identical to the Postgres case in syntax.

  • Listing cost. read_parquet('s3://.../year=*/*.parquet') triggers LIST calls. Prefer Hive-partitioned layouts and reference partitions explicitly when possible.
  • Region matters. Cross-region reads from EC2/EKS to a different-region bucket are slow and expensive. Co-locate.
  • Schema drift. Parquet files in the same prefix must agree on column types. Drift produces type errors at query time, not at load.
  • No live mutation. The driver is read-only by construction; INSERT and friends are blocked at parser level even with read_only: false.
  • Credentials in DSN. PlotPress expands ${VAR} at boot from environment; the resolved DSN never lands on disk.