IPIP-0512: Limit Identity CID Size to 128 Bytes in UnixFS Contexts

Related Issues
ipfs/boxo/pull/1018
multiformats/cid/issues/21
multiformats/multihash/issues/130
History
Commit History
Feedback
GitHub ipfs/specs (inspect source, open issue)

1. Summary

This IPIP establishes a 128-byte maximum digest size limit for identity CIDs (multihash code 0x00) in UnixFS contexts to prevent abuse and clarify appropriate usage boundaries.

2. Motivation

Identity CIDs are unique in that they inline data directly into the CID itself rather than hashing it. Without clear limits, this creates several problems:

  1. Resource Exhaustion: Poorly written clients could encode large payloads as identity CIDs and propagate them through the network, consuming bandwidth and resources without providing value.

  2. Security Vulnerabilities: Identity CIDs provide no integrity verification and are vulnerable to bit flips. Large identity CIDs amplify this risk.

  3. Unclear Boundaries: The ecosystem lacks clear guidelines on when identity CIDs are appropriate, leading to potential misuse.

  4. CIDs as Data Containers: Without limits, identity CIDs could embed arbitrary amounts of data, effectively turning CIDs from content addresses into data containers.

As discussed in ipfs/boxo#1018, the community consensus is that large identity CIDs are problematic and a reasonable limit is needed.

3. Detailed design

This IPIP adds a new section to the UnixFS specification documenting the 128-byte digest size limit for identity CIDs:

3.1 Changes to UnixFS Specification

Add new section "Identity CID Size Limit" that specifies:

3.2 Test Fixtures

Add invalid test case for a 129-byte identity CID that implementations MUST reject.

4. Design rationale

The 128-byte limit was chosen based on several factors:

  1. Alignment with Existing Constraints: The limit matches DefaultMaxDigestSize already used for cryptographic hashes in the ecosystem. 128 bytes is a sensible limit that accommodates the digest sizes of the longest popular hash functions (e.g., SHA-512 produces 64-byte digests), while preventing unbounded growth.

  2. Community Consensus: Key maintainers expressed support for this limit:

    • @rvagg: "128 seems reasonable to me. I'm happy to have them squished down their happy-path use to a size where they're more likely being used for their size-saving utility"
    • @vmx: "I'm not a fan of large identity CIDs... 128 bytes sound reasonable to me"
    • @achingbrain: "It looks fine at first glance 👍" (confirming Helia compatibility)
  3. Practical Usage: 128 bytes is sufficient for legitimate use cases (small inline data) while preventing abuse.

  4. Implementation Precedent: This limit has been implemented in:

4.1 User benefit

4.2 Compatibility

Identity CIDs have always been marked as experimental, and this change does not impact users who used default settings in software like Kubo or Helia, which never produced identity CIDs by default.

This is a breaking change only for any existing identity CIDs with digest sizes exceeding 128 bytes. However:

Implementations upgrading to support this IPIP will need to:

  1. Add validation to reject oversized identity CIDs when reading
  2. Prevent creation of identity CIDs exceeding the limit
  3. Consider automatic conversion to regular blocks when data grows

4.3 Security

This change improves security by:

  1. Preventing Unbounded Resource Consumption: Limits the amount of data that can be inlined in CIDs
  2. Reducing Attack Surface: Smaller identity CIDs reduce the impact of bit flip vulnerabilities
  3. Clear Security Boundaries: Explicit limits help security audits and threat modeling
  4. Mitigating Known Vulnerabilities: The go-car library previously had a vulnerability (GHSA-9x4h-8wgm-8xfg) where decoding user-controlled identity CIDs could cause excessive memory allocation, leading to denial of service. While go-car mitigated this by capping allocations at 1MiB, establishing a 128-byte limit at the UnixFS specification level ensures all implementations are protected from this class of vulnerabilities by default.

4.4 Alternatives

Several alternatives were considered:

  1. No Limit: Rejected due to resource exhaustion and abuse potential
  2. Smaller Limit (32-64 bytes): Would break more existing use cases
  3. Larger Limit (256+ bytes): As noted by @rvagg, "the higher you go, the harder it is to justify their use"
  4. Complete Deprecation: Too disruptive; identity CIDs have legitimate uses for tiny data

5. Test fixtures

5.1 Valid Identity CID (128 bytes)

5.2 Invalid Identity CID (129 bytes)

A. References

[rfc2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119

B. Acknowledgments

We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.

Editor
Marcin Rataj (Interplanetary Shipyard) GitHub
Special Thanks
Rod Vagg GitHub
Volker Mische GitHub
Alex Potsides (Interplanetary Shipyard) GitHub