CID (Content IDentifier)

CID is a format for referencing content in distributed information systems, like IPFS. It leverages content addressing, cryptographic hashing, and self-describing formats. It is the core identifier used by IPFS and IPLD. It uses a multicodec to indicate its version, making it fully self-describing.

2. How does it work?

Current version: CIDv1.

CIDv1 is a binary format composed of unsigned varints prefixing a hash digest to form a self-describing "content address":

<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>

# example: a CIDv1 addressing the raw bytes "hello", in hex
01 55 12 20 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
# 01: cidv1 | 55: raw | 12 20: sha2-256, 32 bytes | 2cf2...: sha2-256 digest of "hello"

Where

<multicodec-cidv1> is a multicodec representing the version of CID, here for upgradability purposes.
<multicodec-content-type> is a multicodec code representing the content type or format of the data being addressed.
<multihash-content-address> is a multihash value, which uses a registry of hash function abbreviations to prefix a cryptographic hash of the content being addressed, thus making it self-describing.

3. Stringified Form

Since CIDs have many applications outside binary-only contexts, a CID may need to be base-encoded for different consumers or transports. In such applications, CIDs are expressed as a Unicode string with a multibase prefix. The multibase prefix identifies the string encoding but is not part of the CID itself; the same binary CID can appear in different bases depending on context and needs such as string length and case-sensitivity. The full string form is:

<cidv1-str> ::= <multibase-prefix><multibase-encoding(<multicodec-cidv1><multicodec-content-type><multihash-content-address>)>

Where

<multibase-prefix> is a multibase prefix (1 Unicode code point) that makes the string self-describing for conversion back to binary.

IPFS implementations SHOULD support at minimum base58btc (z), base32 (b), base16 (f), and base36 (k, for ed25519 keys in IPNS Records).

5. Versions

5.1 CIDv0

CIDv0 is a backwards-compatible version, where:

the multibase of the string representation is always base58btc and implicit (prefix z not present)
the multicodec is always dag-pb (0x70) and implicit (not written)
the cid-version is always cidv0 (0) and implicit (not written)
the multihash is written as is but is always a full (length 32) sha2-256 (0x12) hash.

cidv0 ::= <multihash-content-address>

5.2 CIDv1

See the section: How does it work?

<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>

6. Decoding Algorithm

The binary fields of a CID are unsigned varints. Two rules hold for every CID:

each varint MUST be minimally encoded, and a decoder MUST reject any overlong (non-minimal) varint;
a decoder MUST reject any bytes left over after the multihash.

A binary CID has one of two shapes, told apart by its leading bytes:

Leading bytes	Shape
`0x01`	a CIDv1, where this first varint is the version field
`0x12 0x20`	a bare 34-byte `sha2-256` multihash: a CIDv0
anything else	not a CID; a decoder MUST reject it

A CIDv0 is identified by the two-byte prefix 0x12 0x20, not by the leading byte 0x12 alone: 0x12 is the sha2-256 multihash code and 0x20 is its digest length, 32. A CIDv0 has no version field.

6.1 Decoding a binary CID

To decode a binary CID bytes:

If bytes is exactly 34 bytes long and begins with 0x12 0x20, it is a CIDv0, a bare sha2-256 multihash (cidv0 ::= <multihash-content-address>):
1. The 34 bytes are 0x12 (the sha2-256 code), 0x20 (the digest length, 32), and a 32-byte digest.
2. The content type is implicitly dag-pb (0x70) and is not encoded.
Otherwise, read the leading varint of bytes.
If the leading varint is 0x01, it is a CIDv1 (<cidv1> ::= <multicodec-cidv1><multicodec-content-type><multihash-content-address>):
1. The <multicodec-cidv1> version field is the 0x01 just read.
2. Read the next varint as the <multicodec-content-type>, which types the content.
3. Read the <multihash-content-address> that follows, structured as <hash-code><digest-length><digest>.
4. The <digest-length> MUST consume the remaining bytes exactly; a decoder MUST reject a truncated digest or any trailing bytes.
Otherwise, a decoder MUST reject bytes. No other leading value is a CID. In particular, reject a leading 0x12 that is not the 0x12 0x20 prefix of a 34-byte input, and reject 0x00, 0x02 (reserved for the never-deployed CIDv2), and 0x03 (reserved for the never-deployed CIDv3).

6.2 Decoding a CID string

To decode a CID string (ASCII or UTF-8):

Convert the string to binary bytes:
- If it is 46 characters long and begins with Qm, it is a CIDv0; decode it as base58btc.
- Otherwise, decode it by its multibase prefix.
Decode bytes as a binary CID (above) and return the result.

7. Appendices

These sections provide additional context. This is not part of specification, and is provided here only for extra context.

7.1 FAQ

Q. I have questions on multicodec, multibase, multihash, or unsigned-varint.

Please check their repositories: multicodec, multibase, multihash, unsigned-varint.

Q. Why does CID exist?

IPFS originally used base58btc-encoded multihashes, but the need to support multiple data formats via IPLD revealed limitations of bare multihashes as identifiers. CIDs were created to provide a self-describing, versioned, typed content address. The history of this format is documented at: https://github.com/ipfs/specs/issues/130

Q. Is the use of multicodec similar to file extensions?

Yes. Like a file extension, the multicodec in a CID tells consumers how to interpret the bytes. And just like file extensions, most users will never change it, but you can swap the codec to change how the same bytes are parsed.

Q. What formats (multicodec codes) does CID support?

CID can reference content of any type registered in the multicodec table. In practice, IPFS primarily uses dag-pb (0x70), raw (0x55), dag-cbor (0x71), dag-json (0x0129), and libp2p-key (0x72).

Q. What is the process for updating CID specification (e.g., adding a new version)?

CIDs are a well established standard. IPFS uses CIDs for content-addressing and IPNS. Changing such a core protocol requires careful review, including feedback from implementers and stakeholders across the ecosystem.

For this reason, changes to the CID specification MUST be submitted as an improvement proposal to ipfs/specs repository (PR with IPIP document), and follow the IPIP process described there.

7.2 Historical Design Decisions

You can read an in-depth discussion on why this format was needed in IPFS and the original CIDv1 proposal.

7.3 Human-Readable Form

This is design guidance for tools that present a human-readable CID inspector to a user, such as a debugger, a block explorer, or the diagnostic UI at https://cid.ipfs.tech. It is not a wire format: nothing produces or parses it, and it carries no information beyond what the CID already encodes.

When such a UI needs to show what a CID contains, it can expand the already self-describing CID into a labelled listing of its parts:

<hr-cid> ::= <hr-mbc> "-" <hr-cid-mc> "-" <hr-mc> "-" <hr-mh>

Where each sub-component is the name of its code in the relevant multiformats registry:

<hr-mbc> is the multibase code name (eg b -> base32)
<hr-cid-mc> is the CID version multicodec name (eg 0x01 -> cidv1)
<hr-mc> is the content-type multicodec name (eg 0x55 -> raw)
<hr-mh> is the multihash code name and digest length (eg sha2-256-256), then a final dash and the hex digest

For example, the CIDv1 for the raw bytes hello:

# example CID
bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq
# corresponding human readable CID
base32 - cidv1 - raw - sha2-256-256-2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

These names come from the multiformats registries and are provisional labels for human eyes only; only the numeric codes are stable. A UI should show the codes next to the names, and no consumer should rely on a name staying the same. See: https://cid.ipfs.tech/#bafkreibm6jg3ux5qumhcn2b3flc3tyu6dmlb4xa7u5bf44yegnrjhc4yeq