CID is a format for referencing content in distributed information systems, like IPFS. It leverages content addressing, cryptographic hashing, and self-describing formats. It is the core identifier used by IPFS and IPLD. It uses a multicodec to indicate its version, making it fully self-describing.
A CID is a self-describing content-addressed identifier. It uses cryptographic hashes for content addressing and several multiformats for flexible self-description, namely:
The first two form a self-contained binary identifier; the third is added only when the CID is written as text.
Concretely, it's a typed content address: a tuple of (content-type, content-address).
Current version: CIDv1.
CIDv1 is a binary format composed of unsigned varints prefixing a hash digest to form a self-describing "content address":
<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>
# example: a CIDv1 addressing the raw bytes "hello", in hex
01 55 12 20 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
# 01: cidv1 | 55: raw | 12 20: sha2-256, 32 bytes | 2cf2...: sha2-256 digest of "hello"
Where
<multicodec-cidv1> is a multicodec representing the version of CID, here for upgradability purposes.<multicodec-content-type> is a multicodec code representing the content type or format of the data being addressed.<multihash-content-address> is a multihash value, which uses a registry of hash function abbreviations to prefix a cryptographic hash of the content being addressed, thus making it self-describing.Since CIDs have many applications outside binary-only contexts, a CID may need to be base-encoded for different consumers or transports. In such applications, CIDs are expressed as a Unicode string with a multibase prefix. The multibase prefix identifies the string encoding but is not part of the CID itself; the same binary CID can appear in different bases depending on context and needs such as string length and case-sensitivity. The full string form is:
<cidv1-str> ::= <multibase-prefix><multibase-encoding(<multicodec-cidv1><multicodec-content-type><multihash-content-address>)>
Where
<multibase-prefix> is a multibase prefix (1 Unicode code point) that makes the string self-describing for conversion back to binary.IPFS implementations SHOULD support at minimum base58btc (z), base32 (b), base16 (f), and base36 (k, for ed25519 keys in IPNS Records).
The design of CIDs takes into account many difficult tradeoffs encountered while building IPFS. Most of these come from the multiformats project.
CIDv0 is a backwards-compatible version, where:
multibase of the string representation is always base58btc and implicit (prefix z not present)multicodec is always dag-pb (0x70) and implicit (not written)cid-version is always cidv0 (0) and implicit (not written)multihash is written as is but is always a full (length 32) sha2-256 (0x12) hash.cidv0 ::= <multihash-content-address>
See the section: How does it work?
<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>
The binary fields of a CID are unsigned varints. Two rules hold for every CID:
A binary CID has one of two shapes, told apart by its leading bytes:
| Leading bytes | Shape |
|---|---|
0x01 |
a CIDv1, where this first varint is the version field |
0x12 0x20 |
a bare 34-byte sha2-256 multihash: a CIDv0 |
| anything else | not a CID; a decoder MUST reject it |
A CIDv0 is identified by the two-byte prefix 0x12 0x20, not by the leading byte 0x12 alone: 0x12 is the sha2-256 multihash code and 0x20 is its digest length, 32. A CIDv0 has no version field.
To decode a binary CID bytes:
bytes is exactly 34 bytes long and begins with 0x12 0x20, it is a CIDv0, a bare sha2-256 multihash (cidv0 ::= <multihash-content-address>):
0x12 (the sha2-256 code), 0x20 (the digest length, 32), and a 32-byte digest.dag-pb (0x70) and is not encoded.bytes.0x01, it is a CIDv1 (<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>):
<multicodec-cidv1> version field is the 0x01 just read.<multicodec-content-type>, which types the content.<multihash-content-address> that follows, structured as <hash-code><digest-length><digest>.<digest-length> MUST consume the remaining bytes exactly; a decoder MUST reject a truncated digest or any trailing bytes.bytes. No other leading value is a CID. In particular, reject a leading 0x12 that is not the 0x12 0x20 prefix of a 34-byte input, and reject 0x00, 0x02 (reserved for the never-deployed CIDv2), and 0x03 (reserved for the never-deployed CIDv3).To decode a CID string (ASCII or UTF-8):
bytes:
Qm, it is a CIDv0; decode it as base58btc.bytes as a binary CID (above) and return the result.These sections provide additional context. This is not part of specification, and is provided here only for extra context.
Q. I have questions on multicodec, multibase, multihash, or unsigned-varint.
Please check their repositories: multicodec, multibase, multihash, unsigned-varint.
Q. Why does CID exist?
IPFS originally used base58btc-encoded multihashes, but the need to support multiple data formats via IPLD revealed limitations of bare multihashes as identifiers. CIDs were created to provide a self-describing, versioned, typed content address. The history of this format is documented at: https://github.com/ipfs/specs/issues/130
Q. Is the use of multicodec similar to file extensions?
Yes. Like a file extension, the multicodec in a CID tells consumers how to interpret the bytes. And just like file extensions, most users will never change it, but you can swap the codec to change how the same bytes are parsed.
Q. What formats (multicodec codes) does CID support?
CID can reference content of any type registered in the multicodec table.
In practice, IPFS primarily uses dag-pb (0x70), raw (0x55), dag-cbor (0x71), dag-json (0x0129), and libp2p-key (0x72).
Q. What is the process for updating CID specification (e.g., adding a new version)?
CIDs are a well established standard. IPFS uses CIDs for content-addressing and IPNS. Changing such a core protocol requires careful review, including feedback from implementers and stakeholders across the ecosystem.
For this reason, changes to the CID specification MUST be submitted as an improvement proposal to ipfs/specs repository (PR with IPIP document), and follow the IPIP process described there.
You can read an in-depth discussion on why this format was needed in IPFS and the original CIDv1 proposal.
This is design guidance for tools that present a human-readable CID inspector to a user, such as a debugger, a block explorer, or the diagnostic UI at https://cid.ipfs.tech. It is not a wire format: nothing produces or parses it, and it carries no information beyond what the CID already encodes.
When such a UI needs to show what a CID contains, it can expand the already self-describing CID into a labelled listing of its parts:
<hr-cid> ::= <hr-mbc> "-" <hr-cid-mc> "-" <hr-mc> "-" <hr-mh>
Where each sub-component is the name of its code in the relevant multiformats registry:
<hr-mbc> is the multibase code name (eg b -> base32)<hr-cid-mc> is the CID version multicodec name (eg 0x01 -> cidv1)<hr-mc> is the content-type multicodec name (eg 0x55 -> raw)<hr-mh> is the multihash code name and digest length (eg sha2-256-256), then a final dash and the hex digestFor example, the CIDv1 for the raw bytes hello:
# example CID
bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
# corresponding human readable CID
base32 - cidv1 - raw - sha2-256-256-2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
These names come from the multiformats registries and are provisional labels for human eyes only; only the numeric codes are stable. A UI should show the codes next to the names, and no consumer should rely on a name staying the same. See: https://cid.ipfs.tech/#bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.