CID (Content IDentifier)

status: permanent
History
Commit History
Feedback
GitHub ipfs/specs (inspect source, open issue)

CID is a format for referencing content in distributed information systems, like IPFS. It leverages content addressing, cryptographic hashing, and self-describing formats. It is the core identifier used by IPFS and IPLD. It uses a multicodec to indicate its version, making it fully self-describing.

1. What is it?

A CID is a self-describing content-addressed identifier. It uses cryptographic hashes for content addressing and several multiformats for flexible self-description, namely:

  1. multihash for content-addressed hashing,
  2. multicodec to type that addressed content, and
  3. optionally, multibase to encode the binary CID as a string.

The first two form a self-contained binary identifier; the third is added only when the CID is written as text.

Concretely, it's a typed content address: a tuple of (content-type, content-address).

2. How does it work?

Current version: CIDv1.

CIDv1 is a binary format composed of unsigned varints prefixing a hash digest to form a self-describing "content address":

<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>

# example: a CIDv1 addressing the raw bytes "hello", in hex
01 55 12 20 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
# 01: cidv1 | 55: raw | 12 20: sha2-256, 32 bytes | 2cf2...: sha2-256 digest of "hello"

Where

3. Stringified Form

Since CIDs have many applications outside binary-only contexts, a CID may need to be base-encoded for different consumers or transports. In such applications, CIDs are expressed as a Unicode string with a multibase prefix. The multibase prefix identifies the string encoding but is not part of the CID itself; the same binary CID can appear in different bases depending on context and needs such as string length and case-sensitivity. The full string form is:

<cidv1-str> ::= <multibase-prefix><multibase-encoding(<multicodec-cidv1><multicodec-content-type><multihash-content-address>)>

Where

IPFS implementations SHOULD support at minimum base58btc (z), base32 (b), base16 (f), and base36 (k, for ed25519 keys in IPNS Records).

4. Design Considerations

The design of CIDs takes into account many difficult tradeoffs encountered while building IPFS. Most of these come from the multiformats project.

5. Versions

5.1 CIDv0

CIDv0 is a backwards-compatible version, where:

cidv0 ::= <multihash-content-address>

5.2 CIDv1

See the section: How does it work?

<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>

6. Decoding Algorithm

The binary fields of a CID are unsigned varints. Two rules hold for every CID:

A binary CID has one of two shapes, told apart by its leading bytes:

Leading bytes Shape
0x01 a CIDv1, where this first varint is the version field
0x12 0x20 a bare 34-byte sha2-256 multihash: a CIDv0
anything else not a CID; a decoder MUST reject it

A CIDv0 is identified by the two-byte prefix 0x12 0x20, not by the leading byte 0x12 alone: 0x12 is the sha2-256 multihash code and 0x20 is its digest length, 32. A CIDv0 has no version field.

6.1 Decoding a binary CID

To decode a binary CID bytes:

  1. If bytes is exactly 34 bytes long and begins with 0x12 0x20, it is a CIDv0, a bare sha2-256 multihash (cidv0 ::= <multihash-content-address>):
    1. The 34 bytes are 0x12 (the sha2-256 code), 0x20 (the digest length, 32), and a 32-byte digest.
    2. The content type is implicitly dag-pb (0x70) and is not encoded.
  2. Otherwise, read the leading varint of bytes.
  3. If the leading varint is 0x01, it is a CIDv1 (<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>):
    1. The <multicodec-cidv1> version field is the 0x01 just read.
    2. Read the next varint as the <multicodec-content-type>, which types the content.
    3. Read the <multihash-content-address> that follows, structured as <hash-code><digest-length><digest>.
    4. The <digest-length> MUST consume the remaining bytes exactly; a decoder MUST reject a truncated digest or any trailing bytes.
  4. Otherwise, a decoder MUST reject bytes. No other leading value is a CID. In particular, reject a leading 0x12 that is not the 0x12 0x20 prefix of a 34-byte input, and reject 0x00, 0x02 (reserved for the never-deployed CIDv2), and 0x03 (reserved for the never-deployed CIDv3).

6.2 Decoding a CID string

To decode a CID string (ASCII or UTF-8):

  1. Convert the string to binary bytes:
    • If it is 46 characters long and begins with Qm, it is a CIDv0; decode it as base58btc.
    • Otherwise, decode it by its multibase prefix.
  2. Decode bytes as a binary CID (above) and return the result.

7. Appendices

These sections provide additional context. This is not part of specification, and is provided here only for extra context.

7.1 FAQ

Q. I have questions on multicodec, multibase, multihash, or unsigned-varint.

Please check their repositories: multicodec, multibase, multihash, unsigned-varint.

Q. Why does CID exist?

IPFS originally used base58btc-encoded multihashes, but the need to support multiple data formats via IPLD revealed limitations of bare multihashes as identifiers. CIDs were created to provide a self-describing, versioned, typed content address. The history of this format is documented at: https://github.com/ipfs/specs/issues/130

Q. Is the use of multicodec similar to file extensions?

Yes. Like a file extension, the multicodec in a CID tells consumers how to interpret the bytes. And just like file extensions, most users will never change it, but you can swap the codec to change how the same bytes are parsed.

Q. What formats (multicodec codes) does CID support?

CID can reference content of any type registered in the multicodec table. In practice, IPFS primarily uses dag-pb (0x70), raw (0x55), dag-cbor (0x71), dag-json (0x0129), and libp2p-key (0x72).

Q. What is the process for updating CID specification (e.g., adding a new version)?

CIDs are a well established standard. IPFS uses CIDs for content-addressing and IPNS. Changing such a core protocol requires careful review, including feedback from implementers and stakeholders across the ecosystem.

For this reason, changes to the CID specification MUST be submitted as an improvement proposal to ipfs/specs repository (PR with IPIP document), and follow the IPIP process described there.

7.2 Historical Design Decisions

You can read an in-depth discussion on why this format was needed in IPFS and the original CIDv1 proposal.

7.3 Human-Readable Form

This is design guidance for tools that present a human-readable CID inspector to a user, such as a debugger, a block explorer, or the diagnostic UI at https://cid.ipfs.tech. It is not a wire format: nothing produces or parses it, and it carries no information beyond what the CID already encodes.

When such a UI needs to show what a CID contains, it can expand the already self-describing CID into a labelled listing of its parts:

<hr-cid> ::= <hr-mbc> "-" <hr-cid-mc> "-" <hr-mc> "-" <hr-mh>

Where each sub-component is the name of its code in the relevant multiformats registry:

For example, the CIDv1 for the raw bytes hello:

# example CID
bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
# corresponding human readable CID
base32 - cidv1 - raw - sha2-256-256-2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

These names come from the multiformats registries and are provisional labels for human eyes only; only the numeric codes are stable. A UI should show the codes next to the names, and no consumer should rely on a name staying the same. See: https://cid.ipfs.tech/#bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq

A. References

[rfc2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119

B. Acknowledgments

We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.

Editors
Marcin Rataj (Interplanetary Shipyard) GitHub
Robin Berjon (IPFS Foundation) Email: robin@berjon.com GitHub Twitter
Former Editor
Juan Benet GitHub
Special Thanks
Steven Allen GitHub
Rod Vagg GitHub
bumblefudge GitHub
Volker Mische GitHub
Joel Thorstensson GitHub
Oli Evans GitHub