You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

18 KiB

coding: utf-8 title: "IANA Registration of Trustword Lists: Guide, Template and IANA Considerations" abbrev: IANA Registration of Trustword Lists docname: draft-birk-pep-trustwords-05 category: std stand_alone: yes pi: [toc, sortrefs, symrefs, comments] author: #{::include ../shared/author_tags/volker_birk.mkd} {::include ../shared/author_tags/bernie_hoeneisen.mkd} {::include ../shared/author_tags/hernani_marques.mkd} normative: RFC4949: # RFC7435: RFC8126: informative: I-D.birk-pep: RFC1751: RFC1760: RFC2289: RFC3647: RFC6117: RFC6120: # RFC4880: # RFC7258: # RFC7942: # I-D.marques-pep-email: # I-D.birk-pep-trustwords: # I-D.marques-pep-rating: I-D.marques-pep-handshake: I-D.hoeneisen-pep-keysync: PGP.wl: target: title: PGP word list date: 2017-11 bitcoin.wl: target: title: Seed Phrase date: 2019-06 ISO639: target: title: "Language codes - ISO 639" {::include ../shared/references/isoc-btn.mkd} # {::include ../shared/references/implementation-status.mkd} --- abstract This document specifies the IANA Registration Guidelines for Trustwords, describes corresponding registration procedures, and provides a guideline for creating Trustword list specifications. Trustwords are common words in a natural language (e.g., English), which hexadecimal strings are mapped to. Such a mapping makes verification processes like fingerprint comparisons more practical, and less prone to misunderstandings. --- middle # Introduction In public-key cryptography, comparing the respective public key fingerprints for each of the communication partners involved is vital to ensure that there is no Man-in-the-Middle (MITM) attack on the communication channel. These fingerprints normally consist of a chain of hexadecimal characters, which are often impractical, cumbersome, and prone to misunderstandings for end-users. To mitigate these challenges, several systems offer Trustword comparison as an alternative to these hexadecimal strings. Trustwords are common words in a natural language (e.g., English), which these hexadecimal strings are mapped to. Using Trustwords makes verification processes like fingerprint comparisons more natural for users. For example, in pEp's Privacy by Default proposition {{I-D.birk-pep}} Trustwords are used to facilitate easy contact verification for end-to-end encryption. Trustword comparison is offered after the peers have opportunistically exchanged public keys. Examples of Trustword lists used by current pEp implementations can be found here in CSV format: In addition to contact verification, Trustwords are also used for other purposes, such as Human-Readable 128-bit Keys {{RFC1751}}, One Time Passwords (OTP) {{RFC1760}} {{RFC2289}}, SSH host-key verification, VPN server certificate verification, deriving private keys in blockchain applications for cryptocurrencies, and to import or synchronize secret keys across multiple devices owned by a single user {{I-D.hoeneisen-pep-keysync}}. Further ideas include the use of Trustwords for private key recovery in case of loss, contact verification in Extensible Messaging and Presence Protocol (XMPP) {{RFC6120}}, or for X.509 certificate verification in browsers {{RFC3647}}. {::include ../shared/text-blocks/key-words-rfc2119.mkd} {::include ../shared/text-blocks/terms-intro.mkd} {::include ../shared/text-blocks/handshake.mkd} {::include ../shared/text-blocks/mitm.mkd} # The Concept of Trustword Mapping ## Example As already discussed, fingerprints normally consist of a string of hexadecimal characters. A typical fingerprint looks like this: > F482 E952 2F48 618B 01BC 31DC 5428 D7FA ACDC 3F13 Instead of the hexadecimal string, Trustwords allow users to compare ten common words of a language of their choosing. For example, the above fingerprint, mapped to English Trustwords, might appear as: > dog house brother town fat bath school banana kite task The same fingerprint might appear in German Trustwords as: > klima gelb lappen weg trinken alles kaputt rasen rucksack durch Note: These examples are for illustration purposes only, and are not derived from any published Trustword list. ## Previous work The basic concept of Trustword mapping - also known as a biometric word list - for fingerprint comparison is well-documented. Examples of this concept are used with One-Time Passwords (OTP) {{RFC1751}} {{RFC1760}} {{RFC2289}}, as well as the PGP Word List ("Pretty Good Privacy word list" {{PGP.wl}}. Furthermore, cryptocurrencies use a similar concept for deriving private keys {{bitcoin.wl}}. [[ TODO: Explain each previous usage a bit further and synchronize with section {{introduction}}. ]] Regarding today's needs, previous proposals have the following shortcomings: * Small/limited word lists, which generally result in more words to compare * Existing word lists are usually only available in English, which limits their usefulness for non-English speakers Furthermore, there are differences in the basic concept: * The Trustword concept suggested herein intends to improve usability and security for all users, instead of only the technically-savvy. * In many use cases, Trustwords are only read (aloud) during the comparison process, rather than being written or typed. For example, two users might compare their respective Trustwords during a phone call. Verbal comparison reduces the need to keep the actual Trustwords short. The use of longer Trustwords increases the entropy within the system, as it allows for a larger dictionary, and thus reduces the likelihood of phonetic collisions. ## Number of Trustwords for a language If the number of Trustwords in a dictionary is low, shorter parts of the original string (e.g., fingerprint) can be mapped to a single Trustword. Thus, many Trustwords will need to be compared, which results in a potentially cumbersome process for users, and lead to reduced usability. To reduce the number of Trustwords that need to be compared, pEp's Privacy by Default proposition {{I-D.birk-pep}} calls for 16-bit scalars to be mapped to natural language words. Therefore, the size (by number of key-value pairs) of any key-value pair structure is 65536. However, the number of unique values to be used in a language may be smaller than this number. This discrepancy can be addressed by using the same value, or Trustword, for more than one key. In such cases, the entropy of the representation is slightly reduced. For example, a Trustword list of 42000 words still allows for an entropy of log_2(42000), which is roughly 15.36 bits in 16-bit mappings. As a consequence such Trustword lists are not bijective. On the other hand, small Trustword lists allow for Trustwords consisting of words with shorter strings (number of short words per natural language is normally limited), which are easier to use in implementations where Trustwords have to be typed or written, such as in OTP applications. Note: This specification allows for registration of variable numbers of Trustwords per dictionary. ## Language Although English is used around the world, the vast majority of the global population is not English-speaking. For an application to be useful to as wide of a user base as possible, localization is essential. Therefore, this specification allows for registration of Trustword lists in different languages. In applications where two humans are attempting to establish secure communications, it is likely that they share a common language. At this time, no real-world use cases for Trustword list translation capability have been identified. Because the translation process inherently - and drastically - increases complexity from an IANA registration standpoint, the topic of Trustword translation is beyond the scope of this document. ## The nature of the words Every Trustword list SHOULD be clear of offensive language (i.e., swear/curse words, slurs, derogatory language, etc.). This process SHOULD be performed by native speakers of each respective language. # Security Considerations There are no specific security considerations. # Privacy Considerations TODO # IANA Considerations Each natural language requires a different set of Trustwords. To allow implementers for identical Trustword lists, a IANA registry is to be established. The IANA registration policy according to {{RFC8126}} is "Expert Review" and "Specification Required". [[ Note: Further details of the IANA registry and requirements for the expert to assess the specification are for further study. A similar approach as used in {{RFC6117}} is likely followed. ]] ## Registration Template (XML chunk) ~~~~~~~~~~ first second [...] last ~~~~~~~~~~ Authors of a Wordlist are encouraged to use these XML chunks as a template to create the IANA Registration Template. ## IANA Registration An IANA registration will contain the fallowing elements: ### Language Code (<languagecode>) The language code follows the ISO 639-3 specification {{ISO639}}, e.g., eng, deu. [[ Note: It is for further study, which of the ISO 639 Specifications is most suitable to address the Trustwords' challenge. ]] Example usage for German: ~~~~~~~~~~ e.g. deu ~~~~~~~~~~ ### Bit Size (<bitsize>) The bit size is the number of bits that can be mapped with the Wordlist. The number of registered words in a word list MUST be 2 ^ (<bitsize>). Example usage for 16-bit Wordlist: ~~~~~~~~~~ e.g. 16 ~~~~~~~~~~ ### Number Of Unique Words (<numberofuniquewords>) The number of unique words that are registered. ~~~~~~~~~~ e.g. 65536 ~~~~~~~~~~ ### Bijectivity (<bijective>) Whether the registered Wordlist has a one-to-one mapping, meaning the number of unique words registered equals 2 ^ (<bitsize>). Valid content: ( yes | no ) ~~~~~~~~~~ e.g. yes ~~~~~~~~~~ ### Version (<version>) The version of the Wordlist MUST be unique within a language code. [[ Note: Requirements to a "smart" composition of the version number are for further study ]] ~~~~~~~~~~ e.g. b.1.2 ~~~~~~~~~~ ### Registration Document(s) (<registrationdocs>) Reference(s) to the Document(s) containing the Wordlist ~~~~~~~~~~ e.g. e.g. (obsoleted by RFC 9999) e.g. [International Telecommunications Union, "Wordlist for Foobar application", ITU-F Recommendation B.193, Release 73, Mar 2009.] ~~~~~~~~~~ ### Requesters (<requesters>) The persons requesting the registration of the Wordlist. Usually these are the authors of the Wordlist. ~~~~~~~~~~ e.g. John Doe Example Inc. 2018-06-20 ~~~~~~~~~~ Note: If there is more than one requester, there must be one <xref> element per requester in the <requesters> element, and one <person> chunk per requester in the <people> element. ### Further Information (<additionalinfo>) Any other information the authors deem interesting. ~~~~~~~~~~ e.g. more info goes here ~~~~~~~~~~ Note: If there is no such additional information, then the <additionalinfo> element is omitted. ### Wordlist (<wordlist>) The full Wordlist to be registered. The number of words MUST be a power of 2 as specified above. The element names serve as key used for enumeration of the Trustwords (starting at 0) and the elements contains the values being individual natural language words in the respective language. ~~~~~~~~~~ e.g. first second [...] last ] ]> ~~~~~~~~~~ [[ Note: The exact representation of the Wordlist is for further study. ]] # Acknowledgments The authors would like to thank the following people who have provided feedback or significant contributions to the development of this document: Andrew Sullivan, Claudio Luck, Daniel Kahn Gilmore, Kelly Bristol, Michael Richardson, Rich Salz, Volker Birk, and Yoav Nir. This work was initially created by pEp Foundation, and then reviewed and extended with funding by the Internet Society's Beyond the Net Programme on standardizing pEp. {{ISOC.bnet}} --- back # IANA XML Template Example This section contains a non-normative example of the IANA Registration Template XML chunk. ~~~~~~~~~~ lat 16 57337 no n.0.1 This Wordlist has been optimized for the Roman Standards Process. errare humanum [...] est Julius Caesar Curia Romana 1999-12-31 ~~~~~~~~~~ # Document Changelog RFC Editor: This section is to be removed before publication * draft-birk-pep-trustwords-04: * Add Privacy Considerations section * Swapped Security and IANA Consideration Sections * Corrected typo in ISO references * Updated Introduction, Terms and concept Sections * draft-birk-pep-trustwords-03: * Update references * Minor edits * draft-birk-pep-trustwords-02: * Minor editorial changes and bug fixes * Added more items to Open Issues * Add usage example * draft-birk-pep-trustwords-01: * Included feedback from mailing list and IETF-101 SECDISPATCH WG, e.g. * Added more explanatory text / less focused on the main use case * Bit size as parameter * Explicitly stated translations are out-of-scope for this document * Added draft IANA XML Registration template, considerations, explanation and examples * Added Changelog to Appendix * Added Open Issue section to Appendix # Open Issues [[ RFC Editor: This section should be empty and is to be removed before publication. ]] * Better explain previous work on Trustwords * More explanatory text for Trustword use cases, properties and requirements * Further details of the IANA registry and requirements for the expert to assess the specification * Decide which ISO language code either 639-1 or 639-3 to use, i.e., ISO-639-1 (e.g., ca, de, en, ...) as currently used in pEp implementations (running code) or ISO-639-3 (eng, deu, ita, ...)
* Adjust exact representation of wordlists * e.g. XML, CSV, ... * Syntax for non-ASCII letters or language symbols (UTF-8) in Wordlists * Need for optional entropy value assigned to words, to account for similar phonetics among words in the same wordlist? * Need for an additional field, to define what a wordlist is optimized for, e.g., "entropy", "minimize word lengths", ...? * Work out (requirements for) "smart" composition of the version number * Decide whether in non-bijective Wordlists the redundant words need to be repeated in the IANA Registration * Register only a hash over the wordlist with IANA? * Does it make sense to open registrations for other patterns than just words, e.g., images?