590 lines
18 KiB

coding: utf-8
title: "IANA Registration of Trustword Lists:
Guide, Template and IANA Considerations"
abbrev: IANA Registration of Trustword Lists
docname: draft-birk-pep-trustwords-05
category: std
stand_alone: yes
pi: [toc, sortrefs, symrefs, comments]
#{::include ../shared/author_tags/volker_birk.mkd}
{::include ../shared/author_tags/bernie_hoeneisen.mkd}
{::include ../shared/author_tags/hernani_marques.mkd}
# RFC7435:
# RFC4880:
# RFC7258:
# RFC7942:
# I-D.marques-pep-email:
# I-D.birk-pep-trustwords:
# I-D.marques-pep-rating:
title: PGP word list
date: 2017-11
title: Seed Phrase
date: 2019-06
title: "Language codes - ISO 639"
{::include ../shared/references/isoc-btn.mkd}
# {::include ../shared/references/implementation-status.mkd}
--- abstract
This document specifies the IANA Registration Guidelines for
Trustwords, describes corresponding registration procedures, and
provides a guideline for creating Trustword list specifications.
Trustwords are common words in a natural language (e.g., English),
which hexadecimal strings are mapped to. Such a mapping makes
verification processes like fingerprint comparisons more practical,
and less prone to misunderstandings.
--- middle
# Introduction
In public-key cryptography, comparing the respective public key
fingerprints for each of the communication partners involved is vital
to ensure that there is no Man-in-the-Middle (MITM) attack on the
communication channel. These fingerprints normally consist of a chain
of hexadecimal characters, which are often impractical, cumbersome,
and prone to misunderstandings for end-users.
To mitigate these challenges, several systems offer Trustword
comparison as an alternative to these hexadecimal strings. Trustwords
are common words in a natural language (e.g., English), which these
hexadecimal strings are mapped to. Using Trustwords makes verification
processes like fingerprint comparisons more natural for users.
For example, in pEp's Privacy by Default proposition {{I-D.birk-pep}}
Trustwords are used to facilitate easy contact verification for
end-to-end encryption. Trustword comparison is offered after the peers
have opportunistically exchanged public keys. Examples of Trustword
lists used by current pEp implementations can be found here in CSV
In addition to contact verification, Trustwords are also used for
other purposes, such as Human-Readable 128-bit Keys {{RFC1751}}, One
Time Passwords (OTP) {{RFC1760}} {{RFC2289}}, SSH host-key
verification, VPN server certificate verification, deriving private
keys in blockchain applications for cryptocurrencies, and to import or
synchronize secret keys across multiple devices owned by a single user
{{I-D.hoeneisen-pep-keysync}}. Further ideas include the use of
Trustwords for private key recovery in case of loss, contact
verification in Extensible Messaging and Presence Protocol (XMPP)
{{RFC6120}}, or for X.509 certificate verification in browsers
{::include ../shared/text-blocks/key-words-rfc2119.mkd}
{::include ../shared/text-blocks/terms-intro.mkd}
{::include ../shared/text-blocks/handshake.mkd}
<!-- {::include ../shared/text-blocks/trustwords.mkd} -->
<!-- {::include ../shared/text-blocks/tofu.mkd} -->
{::include ../shared/text-blocks/mitm.mkd}
# The Concept of Trustword Mapping
## Example
As already discussed, fingerprints normally consist of a string
of hexadecimal characters. A typical fingerprint looks like this:
> F482 E952 2F48 618B 01BC 31DC 5428 D7FA ACDC 3F13
Instead of the hexadecimal string, Trustwords allow users to
compare ten common words of a language of their choosing. For example,
the above fingerprint, mapped to English Trustwords, might appear as:
> dog house brother town fat bath school banana kite task
The same fingerprint might appear in German Trustwords as:
> klima gelb lappen weg trinken alles kaputt rasen rucksack durch
Note: These examples are for illustration purposes only, and are not
derived from any published Trustword list.
## Previous work
The basic concept of Trustword mapping - also known as a biometric
word list - for fingerprint comparison is well-documented. Examples of
this concept are used with One-Time Passwords (OTP) {{RFC1751}}
{{RFC1760}} {{RFC2289}}, as well as the PGP Word List ("Pretty Good
Privacy word list" {{PGP.wl}}. Furthermore, cryptocurrencies use a
similar concept for deriving private keys {{bitcoin.wl}}.
\[\[ TODO: Explain each previous usage a bit further and synchronize
with section {{introduction}}. \]\]
Regarding today's needs, previous proposals have the following
* Small/limited word lists, which generally result in more words to
* Existing word lists are usually only available in English, which
limits their usefulness for non-English speakers
Furthermore, there are differences in the basic concept:
* The Trustword concept suggested herein intends to improve usability
and security for all users, instead of only the technically-savvy.
* In many use cases, Trustwords are only read (aloud) during the
comparison process, rather than being written or typed. For
example, two users might compare their respective Trustwords during
a phone call. Verbal comparison reduces the need to keep the actual
Trustwords short. The use of longer Trustwords increases the
entropy within the system, as it allows for a larger dictionary, and
thus reduces the likelihood of phonetic collisions.
## Number of Trustwords for a language
If the number of Trustwords in a dictionary is low, shorter parts of
the original string (e.g., fingerprint) can be mapped to a single
Trustword. Thus, many Trustwords will need to be compared, which
results in a potentially cumbersome process for users, and lead to
reduced usability.
To reduce the number of Trustwords that need to be compared, pEp's
Privacy by Default proposition {{I-D.birk-pep}} calls for 16-bit
scalars to be mapped to natural language words. Therefore, the size
(by number of key-value pairs) of any key-value pair structure
is 65536. However, the number of unique values to be used in a
language may be smaller than this number. This discrepancy can be
addressed by using the same value, or Trustword, for more than one
key. In such cases, the entropy of the representation is slightly
reduced. For example, a Trustword list of 42000 words still allows
for an entropy of log_2(42000), which is roughly 15.36 bits in 16-bit
mappings. As a consequence such Trustword lists are not bijective.
On the other hand, small Trustword lists allow for Trustwords
consisting of words with shorter strings (number of short words per
natural language is normally limited), which are easier to use in
implementations where Trustwords have to be typed or written, such as
in OTP applications.
Note: This specification allows for registration of variable numbers
of Trustwords per dictionary.
## Language
Although English is used around the world, the vast majority of the
global population is not English-speaking. For an application to be
useful to as wide of a user base as possible, localization is
essential. Therefore, this specification allows for registration of
Trustword lists in different languages.
In applications where two humans are attempting to establish
secure communications, it is likely that they share a common language.
At this time, no real-world use cases for Trustword list translation
capability have been identified. Because the translation process
inherently - and drastically - increases complexity from an IANA
registration standpoint, the topic of Trustword translation is beyond
the scope of this document.
## The nature of the words
Every Trustword list SHOULD be clear of offensive language (i.e.,
swear/curse words, slurs, derogatory language, etc.). This process
SHOULD be performed by native speakers of each respective language.
# Security Considerations
There are no specific security considerations.
# Privacy Considerations
\[\[ TODO \]\]
# IANA Considerations
Each natural language requires a different set of Trustwords. To allow
implementers for identical Trustword lists, a IANA registry is to be
established. The IANA registration policy according to {{RFC8126}} is
"Expert Review" and "Specification Required".
\[\[ Note: Further details of the IANA registry and requirements for
the expert to assess the specification are for further study. A
similar approach as used in {{RFC6117}} is likely followed. \]\]
## Registration Template (XML chunk)
<!-- ISO 639-3 (e.g. eng, deu, ...) -->
<!-- How many bits can be mapped with this list
(e.g. 8, 16, ...) -->
<!-- number of unique words registered
(e.g. 256, 65536, ...) -->
<!-- whether or not the list allows for a two-way-mapping
(e.g. yes, no) -->
<!-- version number within language
(e.g. b.1.2, n.0.1, ...) -->
<!-- Change accordingly -->
<xref type="rfc" data="rfc2551"/>
<!-- Change accordingly -->
<xref type="person" data="John_Doe"/>
<xref type="person" data="Jane_Dale"/>
<!-- Text with additional information about
the Wordlist to be registered -->
<!-- There can be artwork sections, too -->
<!-- Change accordingly -->
<person id="John_Doe">
<name> <!-- Firstname Lastname --> </name>
<org> <!-- Organization Name --> </org>
<uri> <!-- mailto: or http: URI --> </uri>
<updated> <!-- date format YYYY-MM-DD --> </updated>
<!-- repeat person section for each person -->
Authors of a Wordlist are encouraged to use these
XML chunks as a template to create the IANA Registration Template.
## IANA Registration
An IANA registration will contain the fallowing elements:
### Language Code (\<languagecode\>)
The language code follows the ISO 639-3 specification {{ISO639}},
e.g., eng, deu.
\[\[ Note: It is for further study, which of the ISO 639
Specifications is most suitable to address the Trustwords'
challenge. \]\]
Example usage for German:
e.g. <languagecode>deu</languagecode>
### Bit Size (\<bitsize\>)
The bit size is the number of bits that can be mapped with the
Wordlist. The number of registered words in a word list MUST be
2 ^ `(<bitsize>)`.
Example usage for 16-bit Wordlist:
e.g. <bitsize>16</bitsize>
### Number Of Unique Words \(<numberofuniquewords\>)
The number of unique words that are registered.
e.g. <numberofuniquewords>65536</numberofuniquewords>
### Bijectivity (\<bijective\>)
Whether the registered Wordlist has a one-to-one mapping, meaning the
number of unique words registered equals 2 ^ `(<bitsize>)`.
Valid content: ( yes \| no )
e.g. <bijective>yes</bijective>
### Version (\<version\>)
The version of the Wordlist MUST be unique within a language code.
\[\[ Note: Requirements to a "smart" composition of the version number
are for further study \]\]
e.g. <version>b.1.2</version>
### Registration Document(s) (\<registrationdocs\>)
Reference(s) to the Document(s) containing the Wordlist
e.g. <registrationdocs>
<xref type="rfc" data="rfc4979"/>
e.g. <registrationdocs>
<xref type="rfc" data="rfc8888"/> (obsoleted by RFC 9999)
<xref type="rfc" data="rfc9999"/>
e.g. <registrationdocs>
[International Telecommunications Union,
"Wordlist for Foobar application",
ITU-F Recommendation B.193, Release 73, Mar 2009.]
### Requesters (\<requesters\>)
The persons requesting the registration of the Wordlist. Usually
these are the authors of the Wordlist.
e.g. <requesters>
<xref type="person" data="John_Doe"/>
<person id="John_Doe">
<name>John Doe</name>
<org>Example Inc.</org>
Note: If there is more than one requester, there must be one \<xref\>
element per requester in the \<requesters\> element, and one
\<person\> chunk per requester in the \<people\> element.
### Further Information (\<additionalinfo\>)
Any other information the authors deem interesting.
e.g. <additionalinfo>
<paragraph>more info goes here</paragraph>
Note: If there is no such additional information, then the
\<additionalinfo\> element is omitted.
### Wordlist (\<wordlist\>)
The full Wordlist to be registered. The number of words MUST be a
power of 2 as specified above. The element names serve as key used for
enumeration of the Trustwords (starting at 0) and the elements
contains the values being individual natural language words in the
respective language.
e.g. <wordlist>
] ]>
\[\[ Note: The exact representation of the Wordlist is for further study.
# Acknowledgments
The authors would like to thank the following people who have provided
feedback or significant contributions to the development of this
document: Andrew Sullivan, Claudio Luck, Daniel Kahn Gilmore, Kelly
Bristol, Michael Richardson, Rich Salz, Volker Birk, and Yoav Nir.
This work was initially created by pEp Foundation, and then reviewed
and extended with funding by the Internet Society's Beyond the Net
Programme on standardizing pEp. {{ISOC.bnet}}
--- back
# IANA XML Template Example
This section contains a non-normative example of the IANA Registration
Template XML chunk.
<xref type="rfc" data="rfc2551"/>
<xref type="person" data="Julius_Caesar"/>
This Wordlist has been optimized for
the Roman Standards Process.
<person id="Julius_Caesar">
<name>Julius Caesar</name>
<org>Curia Romana</org>
# Document Changelog
\[\[ RFC Editor: This section is to be removed before publication \]\]
* draft-birk-pep-trustwords-04:
* Add Privacy Considerations section
* Swapped Security and IANA Consideration Sections
* Corrected typo in ISO references
* Updated Introduction, Terms and concept Sections
* draft-birk-pep-trustwords-03:
* Update references
* Minor edits
* draft-birk-pep-trustwords-02:
* Minor editorial changes and bug fixes
* Added more items to Open Issues
* Add usage example
* draft-birk-pep-trustwords-01:
* Included feedback from mailing list and IETF-101 SECDISPATCH WG,
* Added more explanatory text / less focused on the main use case
* Bit size as parameter
* Explicitly stated translations are out-of-scope for this document
* Added draft IANA XML Registration template,
considerations, explanation and examples
* Added Changelog to Appendix
* Added Open Issue section to Appendix
# Open Issues
\[\[ RFC Editor: This section should be empty and is to be removed
before publication. \]\]
* Better explain previous work on Trustwords
* More explanatory text for Trustword use cases, properties and
* Further details of the IANA registry and requirements for the expert
to assess the specification
* Decide which ISO language code either 639-1 or 639-3 to use, i.e.,
ISO-639-1 (e.g., ca, de, en, ...) as currently used in pEp
implementations (running code) or ISO-639-3 (eng, deu, ita, ...)
* Adjust exact representation of wordlists
* e.g. XML, CSV, ...
* Syntax for non-ASCII letters or language symbols (UTF-8) in
* Need for optional entropy value assigned to words, to account for
similar phonetics among words in the same wordlist?
* Need for an additional field, to define what a wordlist is optimized
for, e.g., "entropy", "minimize word lengths", ...?
* Work out (requirements for) "smart" composition of the version
* Decide whether in non-bijective Wordlists the redundant words need
to be repeated in the IANA Registration
* Register only a hash over the wordlist with IANA?
* Does it make sense to open registrations for other patterns than
just words, e.g., images?
<!-- LocalWords: utf docname toc sortrefs symrefs hoeneisen wl ACDC
<!-- LocalWords: oldid blockchain cryptocurrencies klima gelb weg
<!-- LocalWords: lappen trinken alles kaputt rasen durch eng deu WG
<!-- LocalWords: languagecode bitsize numberofuniquewords wordlist
<!-- LocalWords: registrationdocs requesters additionalinfo uri ITU
<!-- LocalWords: Firstname Lastname mailto http YYYY Bijectivity de
<!-- LocalWords: Kahn Salz Yoav Nir ISOC bnet errare humanum Romana
<!-- LocalWords: Changelog SECDISPATCH ita wordlists