Add path and partial CAR response support to the [trustless-gateway].
[trustless-gateway] solves the verifiability problem in HTTP contexts, and allows for inexpensive retrieval and caching in a way that is not tied to a specific service, library or IPFS implementation.
This IPIP improves the performance related to trustless HTTP gateways by adding additional capabilities to requests for CAR files.
The goal is to enable a client capable of translating/decoding CAR files to make a single request to a trustless gateway that in most case allows them to render the same output generated via a request to a trusted gateway (or, if not in a single request, as few requests as possible).
Save round-trips, allow more efficient resume and parallel downloads.
The solution is to allow the [trustless-gateway] to support partial responses by:
allowing for requesting sub-paths within a DAG, and getting blocks necessary for traversing all path segments for end-to-end verification
dag-scope parameter that allows for narrowing down returned blocks
entity (a logical IPLD entity, such as a file, directory,
CBOR document), or
entity-bytes parameter that allows for returning only a subset of blocks
within a logical IPLD entity
Details are in [trustless-gateway].
The approach here is pragmatic: we add a minimum set of predefined CAR export
scopes based on real world needs, as a product of the rough consensus across
key stakeholders: gateway operators, Project Saturn, Project Rhea (
dweb.link), light clients such as Capyloon and IPFS in Chromium, and
gateway implementation in
boxo/gateway library (Go).
Terse rationale for each feature:
Including blocks necessary for traversing parents when a sub-path is present makes a better default, as it produces verifiable archive that does not require any follow-up requests. The response is always enough to verify end-to-end and reject any unexpected / invalid blocks.
The ability to narrow down CAR response based on logical scope or specific byte range within an entity comes directly from the types of requests existing path gateways need to handle.
dag-scope=block allows for resolving content paths to the final CID, and
learn its type (unixfs file/directory, or a custom codec)
dag-scope=entity covers the majority of website hosting needs (returning a
file, enumerating directory contents, or any other IPLD entity)
dag-scope=all returns all blocks in a DAG: was the existing behavior and
remains the implicit default
entity-bytes=from:to enables efficient, verifiable analog to HTTP Range Requests
(resuming downloads or seeking within bigger files, such as videos)
to match the behavior of HTTP Range Requests.
Trustless HTTP Clients will be able to fetch a CAR with a file, byte range, or a directory enumeration using a way lower number of HTTP requests, which will translate to improved resource utilization, longer battery time on mobile, and lower latency due to lower number of round trips.
CAR files downloaded from HTTP Gateways will always be end-to-end verifiable. In the past, user had to manually ensure they have blocks for all path segments. With this IPIP, the CAR will always include parent blocks when a file located on a sub-path is requested.
Creating a standard way of fetching partial CAR over HTTP enables a diverse set of clients and gateway services to interoperate, and reuse libraries:
Trustless Gateway is solidified as the ecosystem wide standard.
IPIP tests added to gateway-conformance test suite and fixtures listed at the end of this IPIP make it easier to implement or operate a conformant gateway, and reduce maintenance costs.
End users are empowered with primitives and tools that reduce retrieval cost, encourage self-hosting, or make validation of conformance claims of free or commercial gateways possible.
In order to serve CAR requests for content paths other than just a CID root in a trustless manner, we are requiring the gateway to return intermediate blocks from the CID root to the path terminus as part of the returned CAR file.
HTTP Gateway implementations are currently only returning blocks starting at the end of the content path, which means an implementation of this IPIP will introduce additional blocks required for verifying.
As long the client was written in a trustless manner, and follows ring and was discarding unexpected blocks, this will be a backward-compatible change.
These parameters are opt-in, which means no breaking changes.
Gateways ignore unknown URL parameters. A client sending them to a gateway that does not implement this IPIP will get all blocks for the requested DAG.
As of 2023-06-20, the behavior of the
roots CAR field remains an unresolved item within the CARv1 specification:
Regarding the roots property of the Header block:
- The current Go implementation assumes at least one CID when creating a CAR
- The current Go implementation requires at least one CID when reading a CAR
- Current usage of the CAR format in Filecoin requires exactly one CID
It is unresolved how the roots array should be constrained. It is recommended that only a single root CID be used in this version of the CAR format.
A work-around for use-cases where the inclusion of a root CID is difficult but needing to be safely within the "at least one" recommendation is to use an empty CID:
\x01\x55\x00\x00(zero-length "identity" multihash with "raw" codec). Since current implementations for this version of the CAR specification don't check for the existence of root CIDs (see Root CID block existence), this will be safe as far as CAR implementations are concerned. However, there is no guarantee that applications that use CAR files will correctly consume (ignore) this empty root CID.
Due to the inconsistent and non-deterministic nature of CAR implementations, the gateway specification faces limitations in providing specific recommendations. Nevertheless, it is crucial for implementations to refrain from making implicit assumptions based on the legacy behavior of individual CAR implementations.
Due to this, gateway specification changes introduced in this IPIP clarify that:
roots behavior is out of scope and flags that clients MAY ignore it.
This IPIP allows clients to narrow down the amount of data returned as a CAR, and introduces a need for defensive programming when the feature set of the remote gateway is unknown.
To avoid denial of service, and resource starvation, clients should probe if the gateway supports features described in this IPIP before requesting data, to avoid fetching big DAGs when only a small subset of blocks is expected.
Following the robustness principle, invalid, duplicate or unexpected blocks should be discarded.
Below are alternate designs that were considered, but decided to be out of scope for this IPIP.
Passing arbitrary selectors was rejected due to the implementation complexity, risks, and weak value proposition, as discussed during IPFS Thing 2022
A request for
returns all blocks required for enumeration of the big HAMT
and then an additional request for
index.html needs to be issued.
Website hosting use case could be made more efficient if gateway returned a CAR
index.html instead of all blocks for directory enumeration. The server
already did the work: it knows the entity is a directory, already parsed it, it
knows it has child entity named
index.html, and everyone would pay a lower cost due
to lower number of blocks being returned in a single round-trip, instead of two.
Rhea/Saturn projects requested this to be out of scope for now, but this "web" entity scope could be added in the future, as a follow-up optimization IPIP.
Blindly requesting specific DAG depth did not translate to any type of
requests web gateways like
ipfs.io or one in Brave browser have to handle.
It is impossible to know if some entity on a sub-path is a file or a directory, without sending a probe for the root block, which introduces one round-trip overhead per entity.
This problem is not present in the case of
dag-scope=entity, which shifts the
decision to the server, and allows for fetching unknown UnixFS entity with a
Relevant tests were added to
gateway-conformance test suite
in #56 and
A detailed list of compliance checks for
entity-bytes can be found in
v0.2.0/trustless_gateway_car_test.go or later.
Below are CIDs, CARs, and short summary of each fixture.
The main utility of this scope is saving round-trips when retrieving a specific entity as a member of a bigger DAG.
To test, request a small file that fits in a single block from a sub-path. The returned CAR MUST include both the block with the file data and all blocks necessary for traversing from the root CID to the terminating element (all parents, root CID and a subdirectory below it).
(UnixFS file in a subdirectory)
(UnixFS file on a path within HAMT-sharded parent directory)
(UnixFS file on a path with DAG-CBOR root CID)
The main utility of this scope is resolving content paths. This means a CAR response with blocks related to path traversal, and the root block of the terminating entity.
To test real world use, request UnixFS
file or a
directory from a sub-path.
The returned CAR MUST include blocks required for path traversal and ONLY the
root block of the terminating entity.
/ipfs/dag-pb-cid/subdir/ascii.txt?format=car&dag-scope=block (UnixFS file in a subdirectory)
/ipfs/dag-pb-cid?format=car&dag-scope=block (UnixFS directory)
(UnixFS multi-block file on a path within HAMT-sharded parent directory)
(UnixFS HAMT-sharded directory)
The main utility of this scope is retrieving all blocks related to a meaningful IPLD entity. Currently, the most popular entity types are:
(blocks for all chunks with file data)
(blocks for the directory node, allowing its enumeration;
no root blocks for any of the child entities).
(block with raw data or DAG-CBOR document, potentially linking to other CIDs)
/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity (UnixFS multi-block file in a subdirectory)
/ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity (UnixFS directory)
(UnixFS multi-block file on a path within HAMT-sharded parent directory, returned blocks MUST be enough to deserialize the file)
(UnixFS HAMT-sharded directory, response MUST include the minimal set of blocks required for enumeration of directory contents, and no blocks that belong to child entities)
(DAG-CBOR document with IPLD Links must return all necessary blocks to verify the path, the document itself, but not the content behind any of the child entity IPLD Links)
This is the implicit default used when
dag-scope is not present,
and used in the context of deserialized UnixFS TAR responses from [ipip-0288].
/ipfs/dag-pb-cid/subdir?format=car&dag-scope=all (path parents and the entire UnixFS subdirectory, returned recursively)
/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=all (path parents and the entire UnixFS multi-block file)
This type of CAR response is used for facilitating HTTP Range Requests and byte seek within bigger entities.
Properly testing this type of response requires synthetic DAG that is only partially retrievable. This ensures systems that perform internal caching won't pass the test due to the entire DAG being precached, or fetched in full.
Use of the below fixture is highly recommended:
/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity&entity-bytes=0:* (path blocks and all the blocks for the multi-block UnixFS file)
multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:1023 (path blocks and all the blocks for the the range request within multi-block UnixFS file)
multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:-256 (path blocks and all the blocks for the the range request within multi-block UnixFS file)
/ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity&entity-bytes=0:* (path blocks and all the blocks to enumerate UnixFS directory)
/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=0:1000 (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block after the range is not retrievable)
/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=2200:* (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block before the range is not retrievable)
Copyright and related rights waived via CC0.