This proposal introduces configuration profiles for CIDs that represent files and directories using UnixFS. The legacy profiles table also documents non-UnixFS implementations for reference.
While CIDs and UnixFS DAGs are cryptographically verifiable, the same file or directory can produce different CIDs across UnixFS implementations, because DAG construction parameters like chunk size, DAG width, and layout vary between tools. Often, these parameters are not even configurable by users.
This creates two problems:
A potential solution is to define configuration profiles: well-known parameter presets that implementations can adopt when common conventions for DAG creation are desired.
See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507
The following UnixFS parameters were identified as factors that affect the resulting CID:
base32sha2-256balanced: builds a balanced tree where all leaf nodes are at the same depth. Optimized for random access, seeking, and range requests within files (e.g., video).balanced-packed: variant of balanced that may produce different tree structure for large files. See Balanced DAG layout variants below.trickle: builds a tree optimized for on-the-fly one-time streaming, where data can be consumed before the entire file is available. Useful for logs and other append-only data structures where random access is not important.File node)Directory size before converting to HAMTDirectory, based on PBNode.Links count or estimated serialized dag-pb size. See Historical inconsistency in HAMT sharding below.
links-count: PBNode.Links length (child count). Simple but ignores varying entry sizes.links-bytes: sum of PBNode.Links[].Name and PBNode.Links[].Hash byte lengths. Underestimates actual size by ignoring UnixFS Data, Tsize, and protobuf overhead.block-bytes: full serialized dag-pb node size. Most accurate, accounts for varint Tsize and optional metadata such as mode or mtime.Directory with link to the file.Tsize (correct UnixFS has Tsize of child sub-DAGs).The balanced DAG layout has implementation variants that affect CID determinism for large files. CID mismatches have been observed and investigated when comparing kubo and Singularity outputs for files exceeding 1 GiB. This IPIP introduces the name balanced-packed to distinguish Singularity's variant from the original balanced layout.
Implementations adopting a profile SHOULD specify which balanced variant they use. The unixfs-v1-2025 profile uses balanced for maximum compatibility with existing implementations.
balancedThe original balanced layout used by kubo/boxo, helia, and others in the ecosystem. Builds the tree incrementally as chunks stream in:
boxo/ipld/unixfs/importer/balanced/builder.gobalanced-packedName introduced by this IPIP for Singularity's variant. Groups pre-computed links in batch:
singularity/pack/packutil/util.go AssembleFileFromLinks()According to Singularity issue #525, "in Singularity's DAG, the last leaf node is not at the same distance from the root as the others." This structural difference causes CID mismatches for files larger than chunk_size * dag_width (e.g., >1 GiB with 1 MiB chunks and 1024 links per node), even when all other parameters match.
The IPFS ecosystem was never fully consistent in HAMT directory sharding behavior. This section documents the implementation history to explain why standardization through profiles is necessary.
Timeline of Go implementation changes:
2017-03: kubo#3042 introduced HAMT sharding with a global Experimental.ShardingEnabled flag. When enabled, all directories were sharded regardless of size. This is why historical snapshots like /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze (Wikipedia) have HAMTDirectory nodes even for parent directories with few entries.
2021-05: go-unixfs#91 introduced HAMTShardingSize threshold for automatic sharding based on estimated directory size, using links-bytes estimation. This was part of the work tracked in kubo#8106.
2021-11: go-unixfs#94 added size-based unsharding (switching from HAMT back to basic directory), completing bidirectional automatic sharding with >= comparison. The go-unixfs repository has since been archived; its code now lives in boxo.
2021-12: go-ipfs v0.11.0 (now Kubo) shipped with automatic HAMT autosharding, deprecating the global Experimental.ShardingEnabled flag.
2023-03: boxo created via Über Migration, inheriting the >= comparison behavior from go-unixfs.
2026-01: boxo#1088 fixed threshold comparison from >= to >, aligning with JS implementation and documentation. Shipped in Kubo 0.40.
Timeline of JavaScript implementation changes:
2017-03: js-ipfs-unixfs#14 added HAMT data types to UnixFS protobuf definitions.
2018-12: js-ipfs#1734 added HAMT sharding support to MFS, using entry count threshold (shardSplitThreshold, default 1000 entries).
2019-01: js-ipfs v0.34.0 shipped with HAMT support in MFS. The threshold was based on entry count, not size, which differed from Go's size-based approach.
2022-10: Helia created as the successor to js-ipfs.
2023-02: js-ipfs-unixfs#171 changed from entry count to DAGNode size threshold (shardSplitThresholdBytes, default 256 KiB), aligning with Go implementation. Uses > comparison. This was tracked in js-ipfs-unixfs#149. The js-ipfs-unixfs library remains active and is used by Helia.
2023-05: js-ipfs archived; Helia became the recommended JS implementation.
The JavaScript implementation in Helia uses size > threshold (strictly greater than) in is-over-shard-threshold.ts, consistent with Go after the 2026 fix.
These inconsistencies between Go and JS implementations over the years, combined with differing threshold methods (entry count vs size) and comparison operators (>= vs >), meant cross-implementation CID determinism for large directories was never reliably achievable. The unixfs-v1-2025 profile addresses this by standardizing on block-bytes estimation and explicit > comparison.
We analyzed the default settings across the most popular UnixFS implementations in the ecosystem. The table below documents the divergences that prevent deterministic CID generation today:
| Parameter | kubo (CIDv0) | helia | storacha | kubo (CIDv1) | singularity | dasl | pinata | filebase |
|---|---|---|---|---|---|---|---|---|
| Based on | v0.39 (unixfs-v0-2015) |
@helia/unixfs 6.0.4 | w3cli 7.12.0 | v0.39 (test-cid-v1 profile) |
v0.6.0-RC4 (454b630) | spec 2025-12 | ? | add via rpc |
| CID version | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | ? | CIDv0 |
| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | ? | sha2-256 |
| Chunking algorithm | fixed-size | fixed-size | fixed-size | fixed-size | fixed-size | N/A | ? | fixed-size |
| Max chunk size | 256KiB | 1MiB | 1MiB | 1MiB | 1MiB | N/A | ? | 256KiB |
| DAG layout | balanced | balanced | balanced | balanced | balanced-packed | N/A | ? | ? |
| DAG width (children per node) | 174 | 1024 | 1024 | 174 | 1024 | N/A | ? | ? |
| HAMTDirectory fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | 256 blocks (boxo) | N/A | ? | ? |
| HAMTDirectory threshold | 256KiB (links-bytes) | 256KiB (links-bytes) | 1000 (links-count) | 256KiB (links-bytes) | 256KiB (links-bytes) (boxo) | N/A | ? | ? |
| HAMT switch comparison | >= | > | > | >= | >= (boxo) | N/A | ? | ? |
| Leaves | dag-pb | raw | raw | raw | raw | N/A | ? | ? |
| Empty directories | included | included | excluded | included | included | N/A | ? | ? |
| Hidden entities | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | included (rclone) | N/A | ? | ? |
| Symlinks | preserved | followed | followed | preserved | skipped (rclone) | N/A | ? | ? |
| Mode (permissions) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | not supported | N/A | ? | ? |
| Mtime (modification time) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | not supported | N/A | ? | ? |
Terminology:
included: Always included in the DAG (no option to exclude)excluded: Always excluded from the DAG (no option to include)opt-in: Excluded by default; implementations provide a flag to include (e.g., --hidden in Kubo/Storacha, hidden: true in Helia)opt-out: Included by default; implementations provide a flag to excludepreserved: Symlinks stored as UnixFS Type=4 nodes with target path (per UnixFS spec). Note: Kubo (v0.39) --dereference-args only follows symlinks passed as CLI arguments; symlinks found during recursive traversal are always preserved.followed: Symlinks dereferenced and treated as target files/directoriesskipped: Symlinks ignored during traversal (not included in DAG)(rclone): Singularity delegates file traversal to rclone; values shown reflect rclone defaults(boxo): Singularity overrides some boxo defaults but relies on implicit boxo defaults for these values?: Service did not provide implementation details when queried in ipfs/specs#499We introduce a set of named configuration profiles, each specifying the complete set of parameters for generating UnixFS CIDs. When implementations use these profiles, they guarantee that the same input, processed with the same profile, will yield the same CID across different tools and implementations.
unixfs-v1-2025 modern profileBased on the research above, we define unixfs-v1-2025 as an opinionated profile for implementations that want to adopt deterministic CID generation for UnixFS DAGs with CIDv1.
| Parameter | unixfs-v1-2025 |
|---|---|
| CID version | CIDv1 |
| Hash function | sha2-256 |
| Chunking algorithm | fixed-size |
| Max chunk size | 1MiB |
| DAG layout | balanced |
| DAG width (children per node) | 1024 |
| HAMTDirectory fanout | 256 blocks |
| HAMTDirectory threshold | 256KiB (block-bytes) |
| HAMT switch comparison | > |
| Leaves | raw |
| Empty directories | included (opt-out) |
| Hidden entities | excluded (opt-in) |
| Symlinks | preserved |
| Mode (permissions) | excluded (opt-in) |
| Mtime (modification time) | excluded (opt-in) |
unixfs-v0-2015 legacy profileThis profile documents the default UnixFS DAG construction parameters used by Kubo through version 0.39 when producing CIDv0. It is provided for users who depend on CIDv0 identifiers generated by Kubo and need to reproduce them with other implementations, or verify content against existing CIDv0 references. The year 2015 in the name indicates that the majority of these parameters were picked a decade ago, when the initial go-ipfs alpha software was implemented, and these defaults were never contested since then.
Note: this profile is a best-effort approximation of historical behavior. It produces deterministic CIDs for files and smaller directories. However, as documented in Historical inconsistency in HAMT sharding, there is a risk of divergence when directories exceed the HAMT sharding threshold, due to differences in threshold comparison operators and estimation methods across software versions. In such cases, the only recourse is to identify which version of software originally created the content and manually adjust import parameters to match those historic settings.
| Parameter | unixfs-v0-2015 |
|---|---|
| CID version | CIDv0 |
| Hash function | sha2-256 |
| Chunking algorithm | fixed-size |
| Max chunk size | 256KiB |
| DAG layout | balanced |
| DAG width (children per node) | 174 |
| HAMTDirectory fanout | 256 blocks |
| HAMTDirectory threshold | 256KiB (links-bytes) |
| HAMT switch comparison | > |
| Leaves | dag-pb |
| Empty directories | included |
| Hidden entities | excluded (opt-in) |
| Symlinks | preserved |
| Mode (permissions) | excluded (opt-in) |
| Mtime (modification time) | excluded (opt-in) |
Profiles provide key advantages for working with content-addressed data:
Predictable, deterministic behavior: Profiles restore intuitive hash-like behavior: identical input data always produces identical CIDs, regardless of which implementation generates them.
Lightweight verification: Users can verify content without needing to rely on additional merkle proofs or CAR files.
Simplified workflow: Users can select a profile and automatically get consistent CIDs across all implementations, without needing to configure or understand the underlying parameters.
Improved efficiency: The unixfs-v1-2025 profile uses 1 MiB chunks with 1024 links per node, compared to the legacy 256 KiB chunks with 174 links. This results in:
UnixFS data encoded with the CID profiles defined in this IPIP remains fully compatible with existing implementations, since it conforms to the UnixFS specification.
To generate CIDs in compliance with this IPIP, implementations MUST support the unixfs-v1-2025 profile. The unixfs-v0-2015 profile is provided for backward compatibility and MAY be supported by implementations that need to produce CIDs matching historical Kubo output.
Implementations SHOULD allow users to inspect default values and adjust configuration options related to CID generation.
block-bytes estimation for unixfs-v1-2025The unixfs-v1-2025 profile uses block-bytes instead of links-bytes for HAMT threshold estimation because links-bytes has fundamental accuracy problems that undermine CID determinism.
What links-bytes ignores:
Tsize field: the cumulative size of child sub-DAGs stored in each link. This varint-encoded field can add 1-10 bytes per link.mode field: optional POSIX file permissions. When present, adds a varint to the serialized size.mtime field: optional modification timestamp. When present, adds an embedded message with seconds (varint) and optional nanoseconds (fixed32).Problems caused by underestimation:
Non-deterministic threshold crossing: a directory estimated at 250 KiB by links-bytes might actually serialize to 270 KiB. If another implementation using accurate estimation sees the true size exceeds the threshold, it converts to HAMT, producing a different CID for identical content.
Block size limit risks: near the 1 MiB or 2 MiB block size limits used by various transports, underestimation can produce blocks that exceed limits, causing failures or requiring implementation-specific workarounds.
Why links-bytes exists in unixfs-v0-2015:
The legacy profile documents historical behavior. The links-bytes estimation was the original implementation in early go-ipfs, chosen for simplicity. Since many existing CIDv0 DAGs were created with this estimation, the unixfs-v0-2015 profile preserves this behavior for users who need to reproduce legacy CIDs.
> (strictly greater than) for HAMT thresholdThe HAMT threshold comparison uses > rather than >=. A directory with estimated size exactly equal to the threshold (262144 bytes) remains a basic directory; only when size exceeds the threshold does it convert to HAMT.
Rationale:
Consistency with other profile limits: all threshold-like values in the UnixFS profile use the same > pattern, making the limits represent the maximum allowed value (inclusive). Conversion to a more complex structure happens when the count exceeds the limit:
estimatedSize > HAMTShardingSize - directory converts to HAMT when size exceeds 256 KiBlinkCount > maxLinks - directory converts to HAMT when link count exceeds MaxLinkschildCount > maxLinks - file node creates new tree level when child count exceeds FileDAGWidthThis consistency means implementers only need to understand one rule: limit values are the maximum allowed, conversion happens only when exceeding.
Implementation alignment: the Helia (JavaScript) implementation uses size > threshold. Kubo/boxo documentation also specified >, but the actual Go implementation used >= until boxo#1088 fixed it. This divergence between documentation and implementation is another example of why links-bytes never achieved true cross-implementation determinism, and why the unixfs-v1-2025 profile with block-bytes provides an opportunity to establish a proper standard.
Threshold semantics: the threshold value represents the maximum allowed size for a basic directory, not the minimum size for HAMT. A directory at exactly the threshold is still within the allowed range for basic representation.
Simpler representation preferred: at the exact boundary, basic directory is simpler (single flat node vs HAMT tree). When both representations are valid, preferring the simpler one reduces DAG complexity.
Deterministic boundary behavior: edge cases are where CID mismatches most likely occur. Explicitly specifying that the threshold value stays basic eliminates ambiguity.
As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.
Test fixtures allow implementations to verify profile compliance by comparing CIDs produced for identical input data.
ipfs get <CID>)unixfs-v0-2015 profile| CID | Description |
|---|---|
Qmf412jQZiuVUtdgnB36FXFX7xg5V6KEbSJ4dpQuhkLyfD |
Small file: hello world string, dag-pb wrapped leaf |
QmWmRj3dFDZdb6ABvbmKhEL6TmPbAfBZ1t5BxsEyJrcZhE |
File at chunk size: 262144 bytes, single dag-pb block with no links |
QmYyLxtzZyW22zpoVAtKANLRHpDjZtNeDjQdJrcQNWoRkJ |
File over chunk size: 262145 bytes, root with 2 dag-pb leaf links |
QmUbBALi174SnogsUzLpYbD4xPiBSFANF4iztWCsHbMKh2 |
File at max links: 174 × 256 KiB chunks, single-level DAG with 174 links |
QmV81WL765sC8DXsRhE5fJv2rwhS4icHRaf3J9Zk5FdRnW |
File over max links: 174 × 256 KiB + 1 byte, rebalanced to 2-level DAG |
QmX5GtRk3TSSEHtdrykgqm4eqMEn3n2XhfkFAis5fjyZmN |
Directory at HAMT threshold: links-bytes size = 262144 (basic directory) |
QmeMiJzmhpJAUgynAcxTQYek5PPKgdv3qEvFsdV3XpVnvP |
Directory over HAMT threshold: links-bytes size = 262145 (HAMT sharded) |
unixfs-v1-2025 profile| CID | Description |
|---|---|
bafkreifzjut3te2nhyekklss27nh3k72ysco7y32koao5eei66wof36n5e |
Small file: hello world string, raw leaf |
bafkreiacndfy443ter6qr2tmbbdhadvxxheowwf75s6zehscklu6ezxmta |
File at chunk size: 1048576 bytes (1 MiB), single raw leaf block |
bafybeigmix7t42i6jacydtquhet7srwvgpizfg7gjbq7627d35mjomtu64 |
File over chunk size: 1048577 bytes, root with 2 raw leaf links |
bafybeihmf37wcuvtx4hpu7he5zl5qaf2ineo2lqlfrapokkm5zzw7zyhvm |
File at max links: 1024 × 1 MiB chunks, single-level DAG with 1024 links |
bafybeibdsi225ugbkmpbdohnxioyab6jsqrmkts3twhpvfnzp77xtzpyhe |
File over max links: 1024 × 1 MiB + 1 byte, rebalanced to 2-level DAG |
bafybeic3h7rwruealwxkacabdy45jivq2crwz6bufb5ljwupn36gicplx4 |
Directory at HAMT threshold: block-bytes size = 262144 (basic directory) |
bafybeiegvuterwurhdtkikfhbxcldohmxp566vpjdofhzmnhv6o4freidu |
Directory over HAMT threshold: block-bytes size = 262145 (HAMT sharded) |
The HAMT threshold comparison uses > (strictly greater than), not >=. A directory with estimated size exactly equal to the threshold (262144 bytes) remains a basic directory. Only when the size exceeds the threshold does it convert to HAMT.
Both profiles use the balanced layout where distance from root to each leaf is the same number of hops.
balanced: trickle:
root root
/ \ / / | \ \
node node leaf node leaf node ...
/ | \ / | \ | |
leaf leaf leaf leaf ... node node
/ \ / \
leaf leaf leaf leaf
^^^^^^^^^^^^^^^^^^^^^^
uniform depth ^^^^ ^^^^^^
varying leaf depth
Implementations MAY use below CIDs to verify DAG layout handling. The trickle layout is not part of official profiles, but is included for verifying that UnixFS readers can parse non-standard DAG layouts produced by other software in the ecosystem:
| CID | Description |
|---|---|
QmV81WL765sC8DXsRhE5fJv2rwhS4icHRaf3J9Zk5FdRnW |
Balanced (v0): 174 × 256 KiB + 1 byte, 175 chunks, depth 2 |
bafybeibdsi225ugbkmpbdohnxioyab6jsqrmkts3twhpvfnzp77xtzpyhe |
Balanced (v1): 1024 × 1 MiB + 1 byte, 1025 chunks, depth 2 |
QmbqR6g7YCpndLcCZZXkGyj13uMcSrwDYUvZ47vJqBVjUH |
Trickle (v0): 45 MiB file, 180 leaves at varying depths (1-2) |
For the following cases, see existing test vectors in the UnixFS spec:
QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn (v0) and bafybeiczsscdsbs7ffqz55asqdf3smv6klcw3gofszvwlyarci47bgf354 (v1)QmWvY6FaqFMS89YAQ9NAPjVP4WZKA1qbHbicc9HeSKQTgt (v0)Copyright and related rights waived via CC0.
We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.