@faceless-photolib/backend-webgpu

WebGPU backend for faceless-photolib — WGSL codegen and pipeline/bind-group descriptor construction from the render-graph IR, with one shading-language codebase for browser, Node (Dawn), and Expo/RN.

The WebGPU backend for faceless-photolib — a headless, color-managed, GPU-accelerated image-editing engine.

It lowers the engine's frozen render-graph IR to WebGPU: per-pass WGSL codegen plus device-independent pipeline / bind-group descriptors. The GPU work is authored once in WGSL behind a thin, structural GpuBackend port, so the same code runs on browser native WebGPU, Node (Dawn via webgpu), and Expo/React Native (react-native-wgpu/Dawn). When no usable adapter is present the backend returns backend-unavailable — never a silent CPU fallback.

Install

pnpm add @faceless-photolib/backend-webgpu

Usage

import {
  createBackend,
  generateWgsl,
  planGraph,
  renderPipelineFor,
  INTERMEDIATE_FORMAT,
} from "@faceless-photolib/backend-webgpu";
import type { CompiledRenderGraph } from "@faceless-photolib/render-graph";

// Acquire a device and build the live backend. Absence of an adapter is a
// `backend-unavailable` Result, not a throw or a CPU fallback.
const result = await createBackend();
if (result.kind === "ok") {
  const backend = result.value; // GpuBackend: { kind, info(), render(), dispose() }
  // backend.render(graph) returns a Resource<RenderResult> (loading / error / ready).
}

// Codegen + descriptors are computed without a device, so they are fully
// inspectable and testable. Lower one pass to a complete WGSL module:
declare const graph: CompiledRenderGraph;
const pass = graph.passes[0];
const wgsl = generateWgsl(pass); // Result<string> — ok, or `not-implemented` per pass
if (wgsl.kind === "ok") {
  const descriptor = renderPipelineFor(pass, wgsl.value);
  // descriptor.targetFormat === INTERMEDIATE_FORMAT ("rgba16float")
}

// Or lower the whole graph at once (validates every pass is lowerable first):
const plan = planGraph(graph); // Result<RenderPlan>

API

Export	Description
`createBackend()`	Acquires a WebGPU device and builds the live `GpuBackend`; missing adapter → `backend-unavailable`.
`createUninitializedBackend()`	A synchronous `GpuBackend` handle that reports unavailable until a device is acquired.
`acquireBackend()`	The underlying device-acquisition routine returning `Result<GpuBackend>`.
`makeDeviceBackend(device, state)`	Builds the live `GpuBackend` over an already-acquired device.
`planGraph(graph)`	Lowers every pass of a `CompiledRenderGraph` to its pipeline descriptor, returning `Result<RenderPlan>`.
`executePlanUnverified(device, plan, state)`	The device-UNVERIFIED encode path (pipeline build per pass); awaited by the engine-runtime orchestrator.
`generateWgsl(pass)`	Lowers one `RenderPass` to a complete WGSL module (`Result<string>`).
`generateSourceWgsl` / `generateBlendWgsl` / `generateAdjustmentWgsl` / `generateConversionWgsl` / `generateLut3dWgsl` / `generateOutputTransformWgsl`	Per-pass WGSL generators for each render-graph node kind.
`renderPipelineFor(pass, code)`	Builds a device-independent `RenderPipelineDescriptor` for a pass + its WGSL.
`bindGroupLayoutFor(pass)`	Builds the `BindGroupLayoutDescriptor` matching the WGSL `@group(0)` bindings.
`inputCount(pass)`	The number of distinct image inputs a pass consumes.
`lut3dTextureData(descriptor)`	Packs a 3D-LUT descriptor into the `rgba16float` texture-upload data.
`probeGpuApi()`	Detects a usable `navigator.gpu`, returning a present/absent probe result.
`INTERMEDIATE_FORMAT`	The intermediate connection-space render-target format (`"rgba16float"`).
`transferFnDecls` / `collectTransferDecls` / `encodeFnName` / `decodeFnName` / `transferFnTag`	WGSL transfer-function (EOTF/OETF) declaration helpers.

Types are re-exported for the conformance harness and engine-runtime orchestrator: RenderPlan, RenderPipelineDescriptor, BindGroupLayoutDescriptor, BindGroupLayoutEntry, and the structural WebGPU surface (GpuApi, GpuAdapter, GpuDevice, GpuNavigator, GpuApiProbe).

The widened v1 passes (fill, text, colorTransform, resample, mask, clip) and the ACES output-transform tone-scale are surfaced loudly as not-implemented rather than rendering a silent identity; they are realized in the CPU reference backend today.

License

MIT

API reference

135 public exports · 121 documented · generated from source.

acquireBackendfunction

acquireBackend(): Promise<Result<GpuBackend>>

Acquire a WebGPU adapter + device and build the backend. The absence of an adapter is `backend-unavailable` (never a throw, never a CPU fallback). A device that is acquired but immediately lost is wired so the next render reports `error`. Built on `acquireDevice` (acquire.ts, task 2.1) — the injectable, mockable acquisition module the engine binding and the browser-vm e2e drive directly; this wrapper maps its outcome union onto the frozen `Result<GpuBackend>` contract.

acquireDevicefunction

acquireDevice(apiOverride?: GpuApi | undefined): Promise<DeviceAcquisition>

Acquire a WebGPU adapter + device and capture its capabilities. Pass a {@link GpuApi} to inject a specific entry point (a vitest mock, or an explicitly-chosen `navigator.gpu`); omit it to probe `globalThis`. Never throws: every failure mode is an `unavailable` variant.

adjustmentParamBlockfunction

adjustmentParamBlock(desc: AdjustmentDescriptor): Result<AdjustmentParamBlock>

Validate + serialize an adjustment descriptor's params into its uniform block. Uniform-parameterized effects yield the packed vec4 data (the executor sizes the GPU buffer from `data.byteLength` and rewrites it on the param-only fast path); `curves` yields `baked`; an unknown effect is `not-implemented` + beacon, out-of-range params are `invalid-request` — exactly mirroring `generateAdjustmentWgsl`, so pack and codegen can never disagree about which effects exist.

alignedBytesPerRowfunction

alignedBytesPerRow(width: number): number

The 256-aligned bytesPerRow for an `rgba16float` row of `width` pixels.

assembleTilefunction

assembleTile(out: Float32Array<ArrayBufferLike>, outWidth: number, origin: { readonly x: number; readonly y: number; }, tile: { readonly width: number; readonly height: number; }, pixels: Float32Array<...>): void

Copy one read-back tile into the assembled output buffer. `origin` is the tile's top-left in OUTPUT coordinates (tile canvas position minus the export viewport origin). Pure and exported for direct unit testing.

bindGroupLayoutForfunction

bindGroupLayoutFor(pass: RenderPass): BindGroupLayoutDescriptor

Build the bind-group-layout descriptor for a pass, matching the `@group(0)` binding numbers the WGSL codegen emits: - single-input fragment passes: 0 = src texture, 1 = sampler - blend: 0 = backdrop texture, 1 = sampler, 2 = source texture - lut3d: 0 = src texture, 1 = sampler, 2 = 3D LUT texture, 3 = LUT sampler - resample: 0 = src texture, 1 = sampler, 2 = resample params uniform (the inverse matrix + perspective flag; see `wgsl/resample.ts`) `outputTransform` has no lowerable WGSL (the pass rejects), so it has no pipeline; we still describe its single-input layout for symmetry, but callers never build a pipeline for it.

binIndexForfunction

binIndexFor(v: number): number

The bin index a straight channel value falls into — the shared TS mirror of both the WGSL `to_bin` below and the CPU reference's `toBin` (clamp both ends, round half UP). Conformance and unit tests compare against this single definition so the binning domain cannot drift between backends.

collectTransferDeclsfunction

collectTransferDecls(fns: readonly ({ kind: "linear"; } | { kind: "gamma"; exponent: number; } | { kind: "sRGB"; } | { kind: "rec709"; } | { kind: "rec2020"; } | { kind: "pq"; } | { kind: "hlg"; } | { kind: "logC3"; } | { kind: "logC4"; } | { ...; } | { ...; } | { ...; } | { ...; })[]): string

Emit the deduplicated transfer-fn declarations for a set of transfer fns. The `fp_log10` prelude is emitted exactly once and each unique curve tag's decode+encode helper pair exactly once — so combining several conversions into one WGSL module (e.g. the adjustment sandwich's two conversions) cannot produce a duplicate-definition compile error.

configureCanvasContextfunction

configureCanvasContext(context: GpuCanvasContext, device: GpuDevice, format: GpuTextureFormat): Result<{ readonly configured: true; }>

Configure a WebGPU canvas context for presentation from `device` in the given format (call once when the canvas mounts; reconfigure on format change). Alpha is premultiplied — the compositor convention for RGBA output.

conjugateMat3ByScalefunction

conjugateMat3ByScale(m: [number, number, number, number, number, number, number, number, number], sx: number, sy: number): [number, number, number, number, number, number, number, number, number]

Conjugate a row-major Mat3 by `S = diag(sx, sy, 1)`: `M' = S · M · S⁻¹`.

consultGateLedgerfunction

consultGateLedger(ledger: { feature: string; category: "base" | "blend" | "adjustment" | "conversion" | "outputTransform" | "lut"; status: { kind: "gated"; } | { kind: "promoted"; measuredAt: string; environment: string; fixture: string; maxDeltaE: number; psnr: { ...; } | { ...; }; }; }[], passes: readonly RenderPass[]): Result<...>

Consult the ledger for every feature the given passes require. DENY BY DEFAULT: a feature that is `gated` — or absent from the ledger entirely — is refused with `rejected:not-implemented` NAMING every offending feature (the gpu-preview spec's "refused, not approximated" scenario). Returns the promoted feature ids on success (useful for diagnostics/tests).

createBackendfunction

createBackend(): Promise<Result<GpuBackend>>

Acquire a WebGPU device + build the backend. Absence of an adapter is `backend-unavailable`.

createPresenterfunction

createPresenter(device: GpuDevice, format: GpuTextureFormat): Result<Presenter>

Build the blit pipeline for one canvas format (once per device+format).

createRegionPresenterfunction

createRegionPresenter(device: GpuDevice, format: GpuTextureFormat, region: PresentRegion): Result<Presenter>

Build a blit pipeline presenting only `region` (normalized UV) of the source — the presentation-side realization of the engine's crop viewport (gpu-preview-pipeline task 3.2: the GPU path renders in canvas space and the viewport window is applied at the blit, mirroring the CPU backend's `extractViewport`). A malformed region is a typed `invalid-request`, never a silently clamped window. The returned {@link Presenter} drops into {@link presentToCanvas} unchanged.

createUninitializedBackendfunction

createUninitializedBackend(): GpuBackend

A synchronous handle that reports unavailable until a device is acquired (port shape only).

decodeF16function

decodeF16(data: Uint16Array<ArrayBufferLike>): Float32Array<ArrayBufferLike>

Decode IEEE 754 half floats to f32 — the exact inverse of `encodeF16`.

decodeFnNamefunction

decodeFnName(fn: { kind: "linear"; } | { kind: "gamma"; exponent: number; } | { kind: "sRGB"; } | { kind: "rec709"; } | { kind: "rec2020"; } | { kind: "pq"; } | { kind: "hlg"; } | { kind: "logC3"; } | { kind: "logC4"; } | { ...; } | { ...; } | { ...; } | { ...; }): string

Name of the generated decode helper for a transfer fn.

disposePreparedHistogramfunction

disposePreparedHistogram(prepared: PreparedHistogram): void

Release the histogram pass's device resources.

disposePreparedPlanfunction

disposePreparedPlan(prepared: PreparedPlan): void

Release every device resource a prepared plan holds.

encodeF16function

encodeF16(data: Float32Array<ArrayBufferLike>): Uint16Array<ArrayBufferLike>

Encode f32 samples to IEEE 754 half floats (rgba16float texture upload).

encodeFnNamefunction

encodeFnName(fn: { kind: "linear"; } | { kind: "gamma"; exponent: number; } | { kind: "sRGB"; } | { kind: "rec709"; } | { kind: "rec2020"; } | { kind: "pq"; } | { kind: "hlg"; } | { kind: "logC3"; } | { kind: "logC4"; } | { ...; } | { ...; } | { ...; } | { ...; }): string

Name of the generated encode helper for a transfer fn.

executePlanUnverifiedfunction

executePlanUnverified(device: GpuDevice, plan: RenderPlan, state: DeviceState): Promise<Resource<{ width: number; height: number; colorSpace: string & $brand<"ColorSpaceId">; pixels: Float32Array<...>; }>>

The device-UNVERIFIED GPU execution: encode each pass's pipeline into ping-pong `rgba16float` targets and read back the output as f32 RGBA. NOT executed on this host. Returns a `Resource` so a future async port (or the engine-runtime orchestrator) can adopt it; here it is awaited by no one and exists to make the encode path real rather than a stub.

featureIdsForPassfunction

featureIdsForPass(pass: RenderPass): readonly string[]

The gate feature id(s) a pass requires. Exhaustive over the frozen pass union — a new `RenderPass` variant is a compile error here until it declares its feature identity.

generateAdjustmentWgslfunction

generateAdjustmentWgsl(desc: AdjustmentDescriptor): Result<string>

Generate the full WGSL fragment-shader module for an adjustment pass. The fragment runs ONLY the effect math on the source buffer (already in the working space — the surrounding `colorConversion` passes are separate IR nodes the render-graph compiler emits; re-wrapping here would double-convert). Alpha passes through untouched (these are tonal/color RGB adjustments). Uniform-parameterized effects produce PARAM-INDEPENDENT WGSL (the fast-path invariant tested in `adjustment.test.ts`); curves bake their control points. Unknown effect → `rejected("not-implemented")` + beacon; out-of-range params → `invalid-request` (the forwarded failure). Never a silent identity.

generateBlendWgslfunction

generateBlendWgsl(desc: BlendDescriptor): string

Generate the full WGSL fragment-shader module for a blend pass. Reads two inputs (backdrop = group 0 binding 0, source = binding 2) as premultiplied RGBA and writes the premultiplied composite. `desc.opacity`/`desc.fillOpacity` are baked as constants (they are part of the pass identity / Merkle key). Fires the same `warnDegraded` beacon the CPU reference does for an uncalibrated Special-8 Fill response (D5) — never silently.

generateConversionWgslfunction

generateConversionWgsl(desc: ColorConversionDescriptor): string

Generate a full WGSL fragment-shader module for a standalone colorConversion pass. The fragment reads the single input texture, converts the RGB, and writes straight RGBA (alpha passed through). Used by the per-pass pipeline.

generateHistogramWgslfunction

generateHistogramWgsl(): string

generateLut3dWgslfunction

generateLut3dWgsl(desc: Lut3dDescriptor): string

Generate the full WGSL fragment-shader module for a lut3d pass. The 3D LUT is bound at group 0 binding 2 (`texture_3d<f32>`), the source at binding 0. Domain min/max are baked numerically.

generateOutputTransformWgslfunction

generateOutputTransformWgsl(desc: OutputTransformDescriptor): Result<string>

generateResampleWgslfunction

generateResampleWgsl(desc: ResampleDescriptor): Result<string>

Generate the full WGSL fragment-shader module for a `resample` pass. Reads the single input texture, maps each destination pixel through the pass's inverse transform (uniform-driven, see module doc), and reconstructs per `quality`. A non-invertible `matrix` is `rejected("degenerate", …)` — the WGSL text itself does not embed the numeric inverse (it is uniform-driven), but this validates it up front so a plan is never built for a descriptor the CPU reference would refuse, mirroring `resampleParamBlock`'s own check (they can never disagree, same principle as `adjustment.ts`'s double-validate).

generateSourceWgslfunction

generateSourceWgsl(window?: SourceWindow | undefined): string

generateWgslfunction

generateWgsl(pass: RenderPass): Result<string>

Per-pass WGSL dispatcher (backend-webgpu; D2). Lowers one backend-agnostic `RenderPass` from the frozen render-graph IR into a complete WGSL module (vertex + fragment). One module + one pipeline per pass — matching the IR's one-effect-per-pass structure. Returns `Result<string>` uniformly: `source`, `blend`, `colorConversion`, and `lut3d` always lower (wrapped in `ok`); `adjustment`, `outputTransform`, and `resample` already return `Result`. An unknown adjustment effect surfaces as `not-implemented` + beacon; the `outputTransform` lowers `aces-1.x` via a documented degraded convert (`ok` + a `degraded` beacon, mirroring the CPU reference) and rejects `aces-2.0` as `not-implemented`; `resample` lowers a uniform-driven inverse-transform sampler (see `resample.ts`) and rejects only a non-invertible (degenerate) matrix — never a silent identity in any case. `match().exhaustive()` on `pass.node` means adding a new `RenderPass` variant to the frozen union is a compile error here until it is handled — never a silent miss. `resample` NOW LOWERS (see `resample.ts`), but the `gated("resample","base")` entry in `gate-ledger.ts` is UNCHANGED: a real device conformance run is still required before the executor's DEFAULT gate mode will execute it — the same "complete WGSL, still gated" state `blend:dissolve` is already in. The remaining widened v1 passes (`fill`, `text`, `colorTransform`, `mask`, `clip`) are realized first in the CPU reference backend (the golden source). Their WGSL lowering is a later phase (GPU execution is out of scope on the GPU-less host this was built on), so they return `not-implemented` + a beacon here rather than a silent identity — the CPU backend renders them today.

gridTilesfunction

gridTiles(region: ExportTile, maxEdge: number): Result<readonly ExportTile[]>

Split an integer region into a row-major grid of tiles whose EDGES are at most `maxEdge` — covering the region exactly with no gaps or overlap. Unlike `render-graph`'s pixel-budget `planTiles`, this planner bounds each EDGE (the constraint device texture limits actually impose): an elongated region within a pixel budget can still exceed `maxTextureDimension2D` on one axis. A region already within the edge cap yields the single-tile plan (the degenerate "untiled" case).

guardExportSizefunction

guardExportSize(size: { readonly width: number; readonly height: number; }, maxPixels?: number): Result<{ readonly ok: true; }>

Guard an export's output dimensions against the documented size limit (task 5.3). A typed `rejected("unsupported", …)` NAMES the limit and the requested size — never a silently downscaled or truncated export.

histogramWorkgroupCountsfunction

histogramWorkgroupCounts(size: { readonly width: number; readonly height: number; }): Result<WorkgroupCounts>

Dispatch sizing from the texture dims: ceil-divide each axis by the workgroup edge so every texel is covered exactly once (the shader's bounds check drops the over-hang invocations on the ragged edge).

inputCountfunction

inputCount(pass: RenderPass): number

The number of distinct *image inputs* a pass consumes (used by the render orchestrator to wire the right intermediate buffers in). `source` reads its uploaded asset (1 external input), `blend` reads backdrop + source (2), single-input effects read 1, and the `lut3d` pass additionally binds a 3D LUT texture (not counted here — it is a resource, not a graph-edge input).

isUniformParameterizedAdjustmentfunction

isUniformParameterizedAdjustment(effect: string): boolean

Whether this adjustment effect reads its params from the uniform buffer.

lut3dTextureDatafunction

lut3dTextureData(desc: Lut3dDescriptor): Float32Array<ArrayBufferLike>

Build the row-major `Float32Array` upload buffer for the 3D LUT texture. The descriptor `data` is already R-fastest RGBA in exactly the order WebGPU's `writeTexture` expects for a 3D texture of extent (size, size, size) with R→x, G→y, B→z — so this is a faithful copy with a length assertion.

makeDeviceBackendfunction

makeDeviceBackend(device: GpuDevice, state: DeviceState): GpuBackend

Build the live backend over an acquired device. `render` is synchronous (the frozen port shape), so it cannot await GPU readback; it validates the graph by lowering it, surfaces a lost device as `Resource.error`, and otherwise reports the work as in-flight (`loading`) after building the plan. The actual encode/submit/map-readback is device-UNVERIFIED and lives in `executePlanUnverified`, dispatched here but not awaited by the sync port.

noSourceTexturesfunction

noSourceTextures(): GpuSourceTextures

The no-sources default: every lookup fails loud (mirrors `unresolvedSource`).

passParamBlockfunction

passParamBlock(pass: RenderPass): Result<PassParamBlock>

Pack the uniform param block for any pass. `adjustment` delegates to its effect table (uniform-parameterized effects → data; baked `curves` → none); `resample` ALWAYS packs a uniform block (the inverse matrix + perspective flag, `resample.ts`'s param-only fast path — a live drag/resize/rotate never rebuilds the pipeline); every other pass bakes its parameters into its WGSL today (structural), so it carries no uniform block. Exhaustive over the pass union — a new node kind must decide its param story here before it can execute.

planGraphfunction

planGraph(graph: CompiledRenderGraph): Result<RenderPlan>

Lower every pass of a compiled graph to its pipeline descriptor. A pass that cannot be lowered (an unsupported adjustment effect, or the unrealized ACES output transform) fails the whole plan loudly with the forwarded failure — never a silently dropped pass.

planShapeKeyfunction

planShapeKey(plan: RenderPlan): string

The plan's SHAPE fingerprint: node kinds + edge topology (as pass INDICES — pass ids are Merkle hashes over params, so they change on every edit) + the generated WGSL of every pass. Two plans with equal shape keys differ at most in uniform-read params — exactly the fast-path precondition.

prepareHistogramfunction

prepareHistogram(device: GpuDevice, source: HistogramSource, options: { readonly limits: Readonly<Record<string, number>>; }): Result<PreparedHistogram>

Prepare the histogram compute pass for one composited texture: shader module + compute pipeline, the 256×4 `atomic<u32>` bin storage buffer, and the texture+bins bind group. Dispatch counts are sized from the texture dims; the bin buffer is validated against the CAPTURED device limits at setup (absent names stay unvalidated — the capture omits what the device did not report, and the WebGPU spec minimums are far above 4KB).

preparePlanfunction

preparePlan(device: GpuDevice, plan: RenderPlan, options: { readonly width: number; readonly height: number; readonly sources?: GpuSourceTextures | undefined; readonly gates?: GateConsultation | undefined; }): Result<...>

Prepare a lowered plan on the device: pipelines (deduped by WGSL), one render-target texture per pass, bind groups wiring the graph edges, LUT texture uploads, and the uniform param buffers (written once here; rewritten by `updatePlanParams` on the fast path). Pure resource construction — no draw is encoded (that is `submitPlan`).

presentRegionWgslfunction

presentRegionWgsl(region: PresentRegion): string

The region-blit WGSL: fullscreen triangle sampling only the given normalized sub-rectangle of the source (uv' = offset + uv·scale). The region is BAKED as shader constants — presentation stays a zero-buffer-write hot path (no per-frame uniform upload); a region change (crop commit — rare, never per-frame) builds a new presenter via {@link createRegionPresenter}.

presentToCanvasfunction

presentToCanvas(device: GpuDevice, presenter: Presenter, sourceView: GpuTextureView, context: GpuCanvasContext): Result<{ readonly presented: true; }>

Present one frame: blit `sourceView` (the prepared plan's output view) into the canvas context's CURRENT texture and submit. Render-to-texture only — no readback, no JS pixel loop (gpu-preview spec scenario "No readback during interaction").

probeComputeCapabilityfunction

probeComputeCapability(device: GpuDevice, capabilities: GpuCapabilities, required: { readonly storageBufferBytes: number; }): ComputeCapability

Probe whether an acquired device can run the compute passes this engine needs (gpu-histogram D5): `createComputePipeline` and `beginComputePass` present on the RUNTIME surface (a boundary object may lack them regardless of the structural type), and the granted `maxStorageBufferBindingSize` (when captured; the spec minimum is far above any bin buffer) large enough for `required.storageBufferBytes`. Called at acquire time and again after a device-loss re-acquire; a failing probe maps to the typed histogram `unavailable{reason:"compute-unsupported"}` surface — the engine never discovers missing compute support as a dispatch-time crash.

probeGpuApifunction

probeGpuApi(): GpuApiProbe

Probe the host for a WebGPU entry point.

readbackBufferfunction

readbackBuffer(device: GpuDevice, source: GpuBuffer, byteLength: number): Promise<Result<Uint32Array<ArrayBufferLike>>>

Read a SMALL non-pixel GPU buffer back to JS as u32 words — BIN DATA ONLY (gpu-histogram D6): built for the 256×4 histogram bin buffer and capped at {@link READBACK_BUFFER_MAX_BYTES}. Awaits the map, so the returned words reflect all previously submitted GPU work on the buffer (the accumulation dispatch). Same conventions as `readbackTexture`: duck-probed copy/map surface, copy out of the mapped range BEFORE unmap, staging destroyed, and every failure a canonical `Result` — never an escaping exception.

readbackTexturefunction

readbackTexture(device: GpuDevice, texture: GpuTexture, size: { readonly width: number; readonly height: number; }): Promise<Result<Float32Array<ArrayBufferLike>>>

Read an `rgba16float` texture back to straight f32 RGBA (row-major, `width·height·4` floats, copy-row padding stripped). Awaits the map, so the returned pixels reflect all previously submitted GPU work on the texture. Failure modes are canonical `Result`s: a surface without readback members is `rejected:not-implemented`; a throwing/rejecting device is `backend-unavailable` (beaconed) — never an escaping exception.

renderGraphTiledfunction

renderGraphTiled(device: GpuDevice, graph: CompiledRenderGraph, options: RenderGraphTiledOptions): Promise<Result<{ width: number; height: number; colorSpace: string & $brand<...>; pixels: Float32Array<...>; }>>

Render a compiled graph's viewport at FULL graph resolution, tiled within device limits, and read the assembled pixels back (the export boundary). The graph must be pre-scaled (`viewport.scale === 1`, integer viewport) — use `scaleGraphResolution` to realize a source-resolution target first.

renderPipelineForfunction

renderPipelineFor(pass: RenderPass, code: string): RenderPipelineDescriptor

Build the render-pipeline descriptor for a pass given its generated WGSL. The full-screen triangle (`vs_main`) + the pass fragment (`fs_main`) write to a single `rgba16float` target.

resampleParamBlockfunction

resampleParamBlock(desc: ResampleDescriptor): Result<Float32Array<ArrayBufferLike>>

Pack a resample descriptor's per-instance uniform block: the forward matrix's INVERSE (row-major, 3 vec4-padded rows) + the perspective flag (a 4th vec4), 16 floats total. A non-invertible matrix is `rejected("degenerate", …)`, forwarded from `invertMat3` — never a silently clamped/identity fallback.

scaleGraphResolutionfunction

scaleGraphResolution(graph: CompiledRenderGraph, target: ResolutionTarget): Result<CompiledRenderGraph>

Re-express a compiled graph at an exact target canvas resolution. The input viewport's own `scale` is treated as already REALIZED by the target choice (callers derive `target` from it), so the output viewport always carries `scale: 1`. Identity targets return the graph unchanged (no re-key), so the scale-1 path is byte-identical to today's.

submitHistogramfunction

submitHistogram(device: GpuDevice, prepared: PreparedHistogram): Result<{ readonly workgroups: WorkgroupCounts; }>

Zero-clear the bins (D1) and dispatch one accumulation over the source texture — ONE command buffer, run once per idle preview frame after the plan submit on the same queue. The bin readback (`readbackBuffer`) is the caller's next step, awaited off the render loop.

submitPlanfunction

submitPlan(device: GpuDevice, prepared: PreparedPlan): Result<{ readonly passesEncoded: number; }>

Encode every prepared pass in plan order — fullscreen triangle into the pass's own render target — and submit ONE command buffer. Allocation-free; run once per frame (after `updatePlanParams` on the fast path).

transferFnDeclsfunction

transferFnDecls(fn: { kind: "linear"; } | { kind: "gamma"; exponent: number; } | { kind: "sRGB"; } | { kind: "rec709"; } | { kind: "rec2020"; } | { kind: "pq"; } | { kind: "hlg"; } | { kind: "logC3"; } | { kind: "logC4"; } | { ...; } | { ...; } | { ...; } | { ...; }): string

Emit the WGSL declarations (the log10 prelude + the named decode/encode helpers) for a transfer fn. `linear` still emits real `return x;` helpers (no silent omission) so a conversion always has a callable function.

transferFnTagfunction

transferFnTag(fn: { kind: "linear"; } | { kind: "gamma"; exponent: number; } | { kind: "sRGB"; } | { kind: "rec709"; } | { kind: "rec2020"; } | { kind: "pq"; } | { kind: "hlg"; } | { kind: "logC3"; } | { kind: "logC4"; } | { ...; } | { ...; } | { ...; } | { ...; }): string

A stable WGSL identifier suffix for a transfer fn (so two different curves yield two different helper names, but the same curve always reuses one).

updatePlanParamsfunction

updatePlanParams(device: GpuDevice, prepared: PreparedPlan, newPlan: RenderPlan): Result<{ readonly buffersUpdated: number; }>

The param-only FAST PATH (D5): rewrite the uniform buffers from a shape-identical plan without touching pipelines, textures, or bind groups. A plan whose shape key differs (structural edit, or a baked-param change — blend/conversion/curves) is a typed `conflict` refusal: re-prepare instead.

windowedPlanfunction

windowedPlan(plan: RenderPlan, window: ExportTile, canvas: { readonly width: number; readonly height: number; }): Result<RenderPlan>

Re-window a lowered plan to a canvas sub-rectangle: `source` passes get the windowed sampling WGSL (normalized window baked as constants — export is one-shot, not the hot path); per-pixel passes are untouched (their tile-sized inputs/outputs already correspond 1:1). A full-canvas window returns the plan unchanged (the exact promoted WGSL). A `resample` pass under a REAL window is refused: its input tile does not contain the pixels an arbitrary inverse mapping may read.

BindGroupLayoutDescriptorinterface

interface BindGroupLayoutDescriptor

A bind-group-layout descriptor for one pass.

DeviceLossWatchinterface

interface DeviceLossWatch

A watch over `device.lost`, wired at acquisition. `status()` is the pollable union; `onLost` registers a listener (fired immediately if the device is already lost, so late subscribers cannot miss the event).

ExportTileinterface

interface ExportTile

An integer output tile: `x/y` are canvas coordinates, `width/height` its size.

GpuAdapterinterface

interface GpuAdapter

Adapter handle; `requestDevice` resolves to a usable `GpuDevice` or rejects.

GpuApiinterface

interface GpuApi

The `navigator.gpu` entry point. `requestAdapter` may resolve to `null`.

GpuBindGroupinterface

interface GpuBindGroup

GpuBufferinterface

interface GpuBuffer

GpuCanvasContextinterface

interface GpuCanvasContext

The WebGPU canvas context (`canvas.getContext("webgpu")`): configure once, then render into `getCurrentTexture()` each frame — the loop-free presentation path (gpu-preview spec). No readback surface, by design.

GpuCapabilitiesinterface

interface GpuCapabilities

Adapter + device capabilities captured at acquisition time (plain data).

GpuCommandBufferinterface

interface GpuCommandBuffer

A finished, submittable command buffer.

GpuCommandEncoderinterface

interface GpuCommandEncoder

Minimal command-encoder surface (render + compute passes; see note above). `beginComputePass` is optional because this is a boundary type we don't own: every real WebGPU encoder has it, but exotic ports/doubles may not — the capability probe (`acquire.ts`) and the histogram executor duck-check it and surface a typed `compute-unsupported` instead of a TypeError.

GpuComputePassEncoderinterface

interface GpuComputePassEncoder

The compute-pass encoder slice the histogram executor drives (gpu-histogram D1): pipeline + bind group + `dispatchWorkgroups`. Like the render-pass slice, NO readback/copy surface — a compute pass writes device buffers only; the ≤4KB bin readback lives in `readback.ts` beside the pixel readback.

GpuComputePipelineinterface

interface GpuComputePipeline

GpuDeviceinterface

interface GpuDevice

Minimal device surface. Methods accept the descriptor shapes in `pipeline.ts`.

GpuNavigatorinterface

interface GpuNavigator

A `navigator`-shaped object that MAY expose `.gpu`.

GpuQueueinterface

interface GpuQueue

Minimal queue surface: submit + data uploads (uploads only, no readback).

GpuRenderPassEncoderinterface

interface GpuRenderPassEncoder

The render-pass encoder slice the executor drives: pipeline + bind group + fullscreen-triangle draw. Deliberately NO readback/copy surface — the hot path renders texture-to-texture only (gpu-preview spec: loop-free presentation; readback is export/conformance-only and lives elsewhere).

GpuRenderPipelineinterface

interface GpuRenderPipeline

Minimal pipeline handles.

GpuSourceTexturesinterface

interface GpuSourceTextures

Resolve a `source`/`mask` pass's uploaded asset texture view. The executor is pure device plumbing — decoding bytes and uploading them is the engine layer's job (it owns the asset store), so the texture view is injected, the same way the CPU backend takes a `SourceResolver` at construction. A hash the host cannot supply is a canonical failure, never fabricated pixels.

GpuTextureinterface

interface GpuTexture

Minimal texture / view / buffer / sampler / bind-group handles.

GpuTextureViewinterface

interface GpuTextureView

HistogramSourceinterface

interface HistogramSource

The composited texture one histogram pass accumulates over.

PreparedHistograminterface

interface PreparedHistogram

One prepared histogram pass: pipeline + bind group + the atomic bin buffer.

PreparedPassinterface

interface PreparedPass

One prepared pass: its pipeline, bind group, render target, and params.

PreparedPlaninterface

interface PreparedPlan

A fully prepared plan: reusable across frames until the shape changes.

Presenterinterface

interface Presenter

The presenter's device resources, built once per (device, canvas format).

PresentRegioninterface

interface PresentRegion

A normalized UV sub-rectangle of the plan output to present (the engine's crop viewport realized at presentation time). All coordinates are in [0, 1] texture space; `{x:0, y:0, width:1, height:1}` is the full frame.

RenderGraphTiledOptionsinterface

interface RenderGraphTiledOptions

Options for {@link renderGraphTiled}.

RenderPipelineDescriptorinterface

interface RenderPipelineDescriptor

A render-pipeline descriptor (device-independent; structurally WebGPU-ish).

RenderPlaninterface

interface RenderPlan

The lowered plan for a graph: one pipeline descriptor per pass, in execution order, plus the output pass id. Built without a device, so a graph can be fully validated (every pass lowerable?) before any GPU work.

ResampleDescriptorinterface

interface ResampleDescriptor

A resample pass's WGSL-relevant fields (mirrors the `RenderPass` "resample" variant).

ResolutionTargetinterface

interface ResolutionTarget

The exact target canvas resolution to realize the graph at.

SourceWindowinterface

interface SourceWindow

A sub-rectangle of the canvas in normalized [0,1] canvas UV.

WorkgroupCountsinterface

interface WorkgroupCounts

Workgroup counts for one histogram dispatch.

AcquisitionUnavailableReasontype

type AcquisitionUnavailableReason

Why acquisition could not produce a device.

AdjustmentParamBlocktype

type AdjustmentParamBlock

The uniform param block a pass's descriptor serializes to (executor-facing): `uniform` carries the packed vec4 data the pipeline's binding 2 reads; `baked` marks the structural effects (curves) whose params live in the WGSL itself and rebuild the pipeline on change.

BindGroupLayoutEntrytype

type BindGroupLayoutEntry

A single bind-group-layout entry (subset of GPUBindGroupLayoutEntry).

ComputeCapabilitytype

type ComputeCapability

The compute capability probe outcome (gpu-histogram D5).

DeviceAcquisitiontype

type DeviceAcquisition

Every acquisition outcome, as a discriminated union. `unavailable` carries a machine-checkable reason plus a human detail — the engine binding maps it to the canonical `backend-unavailable` result and the site maps THAT to the designed "GPU required" state.

DeviceLossStatustype

type DeviceLossStatus

The queryable device-loss state.

GateCategorytype

type GateCategory

GateConsultationtype

type GateConsultation

How the executor consults the gates: - `enforce` — the production mode: every feature the passes require must be `promoted` in the given ledger, or the plan is refused naming the features. - `measure` — the conformance-harness-ONLY bypass that lets the device gates execute not-yet-promoted features in order to measure them. Never use it on a user-facing render path.

GateEntrytype

type GateEntry

GateLedgertype

type GateLedger

GateStatustype

type GateStatus

GpuApiProbetype

type GpuApiProbe

Read `navigator.gpu` off `globalThis` without assuming a browser. Node 22 exposes a global `navigator` (without `.gpu` unless a Dawn binding installs one), so we probe `globalThis.navigator?.gpu` rather than referencing the `navigator` global directly (which would be a ReferenceError where absent). Returns a named union variant — never a bare `null`/`undefined` crossing the package's own API surface.

GpuTextureFormattype

type GpuTextureFormat

A WebGPU texture format (subset used by the float pipeline).

MeasuredPsnrtype

type MeasuredPsnr

PassParamBlocktype

type PassParamBlock

A pass's uniform param block, packed for upload.

ADJUSTMENT_PARAMS_BINDINGconst

ADJUSTMENT_PARAMS_BINDING: 2

The bind-group binding the adjustment param uniform buffer occupies.

CAPTURED_LIMIT_NAMESconst

CAPTURED_LIMIT_NAMES: readonly string[]

The device limits captured at acquisition (the ones this engine consults).

DEFAULT_EXPORT_TILE_EDGEconst

DEFAULT_EXPORT_TILE_EDGE: 2048

Default tile long edge in device pixels (bounded further by device limits).

DEFAULT_GATE_LEDGERconst

const DEFAULT_GATE_LEDGER

EXPORT_MAX_PIXELSconst

EXPORT_MAX_PIXELS: 268435456

Documented export size guard: the maximum output pixels one export may produce (2^28 = 268,435,456 px = 256 MP — 64 tiles of 2048², far above any current camera sensor, bounded so a runaway request cannot exhaust device/host memory tile-by-tile).

GateCategorySchemaconst

const GateCategorySchema

The feature categories the ledger tracks (mirrors the render-graph pass union).

GateEntrySchemaconst

const GateEntrySchema

One ledger entry: a feature id, its category, and its gate status.

GateLedgerSchemaconst

const GateLedgerSchema

The whole ledger: unique feature ids (duplicate entries would make consult ambiguous).

GateStatusSchemaconst

const GateStatusSchema

Gate status: explicitly gated, or promoted with recorded measurement evidence.

HISTOGRAM_BINSconst

HISTOGRAM_BINS: 256

Bins per channel (matches the CPU reference's 256-bin histograms).

HISTOGRAM_BINS_BINDINGconst

HISTOGRAM_BINS_BINDING: 1

HISTOGRAM_BUFFER_BYTE_LENGTHconst

HISTOGRAM_BUFFER_BYTE_LENGTH: number

Byte length of the bin storage buffer (u32 counts — the ≤4KB D6 budget).

HISTOGRAM_CHANNELSconst

HISTOGRAM_CHANNELS: 4

Channel count: R, G, B, Rec.709 luma — in this order in the bin buffer.

HISTOGRAM_ENTRY_POINTconst

HISTOGRAM_ENTRY_POINT: "cs_main"

The compute entry point name.

HISTOGRAM_TEXTURE_BINDINGconst

HISTOGRAM_TEXTURE_BINDING: 0

`@group(0)` binding numbers the WGSL declares.

HISTOGRAM_TOTAL_BINSconst

HISTOGRAM_TOTAL_BINS: number

Total `atomic<u32>` bins in the one storage buffer (256 × 4).

HISTOGRAM_WORKGROUP_SIZEconst

HISTOGRAM_WORKGROUP_SIZE: 16

Square workgroup edge: `@workgroup_size(16, 16)` (D1).

INTERMEDIATE_FORMATconst

INTERMEDIATE_FORMAT: GpuTextureFormat

The intermediate render-target format for connection-space buffers (D3).

MeasuredPsnrSchemaconst

const MeasuredPsnrSchema

The observed PSNR of a promotion measurement. `identical` means the device output matched the CPU golden bit-for-bit (MSE 0 — PSNR has no finite value); `finite` carries the measured decibels. A union, never an Infinity sentinel.

PRESENT_BLIT_WGSLconst

PRESENT_BLIT_WGSL: string

The static blit WGSL: fullscreen triangle + straight texture sample.

READBACK_BUFFER_MAX_BYTESconst

READBACK_BUFFER_MAX_BYTES: 4096

Max bytes `readbackBuffer` maps per call — the gpu-histogram D6 bin budget.

READBACK_BUFFER_SIZE_ALIGNMENTconst

READBACK_BUFFER_SIZE_ALIGNMENT: 4

Buffer→buffer copy sizes (and map ranges) are 4-byte units in WebGPU.

READBACK_ROW_ALIGNMENTconst

READBACK_ROW_ALIGNMENT: 256

WebGPU's required bytesPerRow alignment for texture→buffer copies.

REC709_LUMA_WEIGHTSconst

REC709_LUMA_WEIGHTS: { readonly r: 0.2126; readonly g: 0.7152; readonly b: 0.0722; }

Rec.709 luma weights (D2) — the same weights the CPU reference luma uses.

RESAMPLE_PARAMS_BINDINGconst

RESAMPLE_PARAMS_BINDING: 2

The bind-group binding the resample param uniform buffer occupies.

@faceless-photolib/backend-webgpu

Install

Usage

API

License

API reference

On this page