IMAP
Specifically (mostly), IMAP version 4 revision 1, which is the current version.
This document strives for a description midway between IMAP on Wikipedia, which is helpful but does not dive into specifics, and the RFC, which is tedious.
History in RFCs
POP came first.
- RFC 918: POST OFFICE PROTOCOL (1984)
- RFC 937: POST OFFICE PROTOCOL - VERSION 2 (1985)
- RFC 1081: Post Office Protocol - Version 3 (1998)
- RFC 1225: Post Office Protocol - Version 3 (1991)
- RFC 1460: Post Office Protocol - Version 3 (1993)
- RFC 1725: Post Office Protocol - Version 3 (1994)
- RFC 1939: Post Office Protocol - Version 3 (1996)
- Many more extensions, for SSL/TLS, authorization schemes, and other additions
- RFC 1957: Some Observations on Implementations of the Post Office Protocol (POP3) (1996). Not a spec, just a short memo saying there are some different POP3 implementations out there.
Then Crispin developed a replacement, IMAP, in 1986.
- The original IMAP (v1) was called Interim Mail Access Protocol, and exists only in Crispin’s head.
- RFC 1064: INTERACTIVE MAIL ACCESS PROTOCOL - VERSION 2 (1988)
- The original standardized version. Widely implemented.
- RFC 1176: INTERACTIVE MAIL ACCESS PROTOCOL - VERSION 2 (1990)
- obsoletes RFC 1064, but
experimental
(never implemented). I’d guess that it was meant to be a revision, but then Crispin decided it was bigger than just a revision.
- RFC 1203: INTERACTIVE MAIL ACCESS PROTOCOL - VERSION 3 (1991)
- Also obsoletes RFC 1064, but
historic
. It was used in a few rare places, apparently, but never caught on.
- I’m honestly not sure how they got this approved. Crispin had nothing (?) to do with it.
- IMAP2bis was a non-standard extension of IMAPv2, which IMAPv4 incorporated.
- RFC 1730: INTERACTIVE MAIL ACCESS PROTOCOL - VERSION 4 (1994)
proposed standard
, but not implemented. Built from IMAPv2 instead of the impostor IMAPv3.
- RFC 2060: INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1 (1996)
- obsoletes RFC 1730;
- note the s/INTERACTIVE/INTERNET/ change.
- it’s another
proposed standard
, but I don’t understand why it’s 4rev1, the same as RFC 3501
- Crispin says it is the “RFC 1176 replacement protocol in final form”, but calls it an “abortion”
- RFC 3501: INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1 (2003)
- obsoletes RFC 2060
- This is the current standard, widely implemented.
- Various extensions:
- RFC 2086: IMAP4 ACL extension (1997)
- RFC 2244: ACAP – Application Configuration Access Protocol (1997)
- RFC 4314: IMAP4 Access Control List (ACL) Extension (2005)
IMAP depends on a few independent RFCs:
- RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
- RFC 2821: Simple Mail Transfer Protocol (SMTP)
- RFC 822: STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES
- RFC 2822: Internet Message Format
The benefits of IMAP
- IMAP is multi-threaded, as in, it allows multiple clients to access the same mailbox at the same time. Pretty revolutionary idea, no? Apparently POP3 doesn’t support this without extensions. IMAP is more of a cloud solution, where the user’s mail is stored on the server, whereas POP3 expects the client to take responsibility (i.e., storage) for all the email it pulls down.
- IMAP is random-access. Clients aren’t required to handle all the mail processing; the server handles a lot more of the data organization, performing searches, only returning the part of the mail message the client requests. Stuff like that.
General structure
- IMAP is built on TCP.
- Every command that the client sends is tagged with a unique prefix string.
- Nearly every command that the client sends is discrete. We’ll talk about the two exceptions later.
- Sometimes the server will send the client unsolicited messages, which will use the prefix:
*
.
- Messages held on an IMAP server primarily consist of the message text, but have some metadata.
- I’m not sure if the message numbers count as metadata, but each message can be accessed by one of two numbers:
- The message’s unique identifier
- The message’s sequence number
Unique Identifiers (UID)
Each message has a 64-bit (integer?) value that is universally and eternally unique, at least in a single mailbox (which I presume means an IMAP user’s account). Actually, eternal uniqueness is only a suggestion, so the message’s unique ID may change across sessions.
UIDVALIDITY
is a 32-bit value that’s constant for the life of a mailbox. If you’re fine with the mandate that deleted mailboxes can never be recreated with the same name, 1 is an acceptable value for UIDVALIDITY
. But if you might consider reinstating a deleted mailbox at some time in the future, you have to use a different UIDVALIDITY
, in which case, the mailbox’s creation timestamp would be a decent choice.
Message sequence number
This number is much more flexible. It’s a 1-based index, and refers to the position of a message in a mailbox (or folder?).
The message sequence number must be monotonic with the message’s UID. I.e., messages cannot be reordered, once received.
Flags
There are two types of “flags” (each message is associated with a set of flags):
- System flags all start with
\
. They are mostly used for tracking message state.
\Seen
(special treatment described later)
\Answered
\Flagged
\Deleted
(special treatment described later)
\Draft
\Recent
(set for messages until they have been requested by a client; cannot be manipulated by the client)
- User flags are also called keywords (?), and do not begin with
\
. They are different from IMAP folders.
Flags can be permanent or session-only. Most are permanent. \Deleted
and \Recent
are examples of session-only flags.
Fetching
A message can be partially retrieved; its header, body, or MIME body part can be fetched independently.
Data structures
Data can take the form of a:
- atom: 1+ non-special characters
- number: 1+ digit characters
- string:
- literal:
{NumberOfOctets}\r\nTheOctets
(NumberOfOctets
is a number, TheOctets
is a string of octets; e.g., the empty string would be {0}\r\n
)
- quoted:
"Characters"
(Characters
is just a sequence of 7-bit characters, excluding \r
and \n
; e.g., the empty string would be ""
)
- 8-bit and binary strings are facilitated by the
literal
format.
- parenthesized list: a sequence of 0+ data items, delimited by
(
and )
characters. Can be nested.
- NIL: distinct from
""
or ()
; conveys non-existence
… continue from https://tools.ietf.org/html/rfc3501#section-5
Examples
TODO …
LOGIN
SELECT inbox
FETCH 12 full
FETCH 12 body[header]
STORE 12 +flags \Deleted
FETCH * (UID)
FETCH 68:* (BODY[])
LOGOUT
Maybe some examples using the Node.js library, https://github.com/mscdex/node-imap
References: