|Network Working Group||C. Jennings|
|INTERNET DRAFT||J. Rosenberg|
|Category: Standards Track||E. Rescorla|
|Expires: January 2008||Network Resonance|
Address Settlement by Peer to Peer
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.
The list of current Internet-Drafts can be accessed at <http://www.ietf.org/ietf/1id-abstracts.txt>.
The list of Internet-Draft Shadow Directories can be accessed at <http://www.ietf.org/shadow.html>.
This Internet-Draft will expire in January 2008.
Copyright © The Internet Society (2007). All Rights Reserved.
This document defines Address Settlement by Peer-to-Peer (ASP), a peer-to-peer (P2P) binary signaling protocol for usage on the Internet. A P2P signaling protocol provides its clients with an abstract hash table service between a set of cooperating peers that form the P2P network. ASP is designed to support a P2P Session Initiation Protocol (SIP) network, but it can be utilized by other applications with similar requirements. ASP introduces the notion of usages, which are a collection of data types that are required for a particular application. For SIP, these types include location, STUN and TURN servers. ASP defines a security model based on a certificate enrollment service that provides peers with unique identities. ASP also provides protocol extensibility and defines a migration methodology, allowing for major upgrades of the P2P network without service disruption.
With thy sharp teeth this knot intrinsicate Of life at once untie: poor venomous fool Be angry, and dispatch. -Cleopatra, Act V, scene II, Antony and Cleopatra by William Shakespeare
This document defines Address Settlement by Peer-to-Peer (ASP), a peer-to-peer (P2P) signaling protocol for usage on the Internet. A P2P signaling protocol provides its clients with an abstract hash table service. Clients can both read and write entries into the hash table. The hash table is actually distributed: pieces of the table are stored by the various clients that access it. Such an abstract hash table service, in which the contents of the hash table are stored across many hosts, is called a Distributed Hash Table (DHT).
ASP is a lightweight, binary protocol. It provides several functions that are critical for a successful P2P protocol for the Internet. These are:
These properties were designed specifically to meet the requirements for a P2P protocol to support SIP. However, ASP is not limited to usage by SIP and could serve as a tool for supporting other P2P applications with similar needs. ASP is also based on the concepts introduced in [I-D.willis-p2psip-concepts].
Architecturally this specification splits into several layers, as shown in the following figure.
+-------+ +-----+ +-------+ Usage | SIP | |STUN | | Other | ... Layer | Usage | |Usage| | Usage | +-------+ +-----+ +-------+ --------------------------------------Distributed Storage API +---------------------+ Distributed Routing & | +-----+ +------+ | Storage Replication | |Chord| ...|Bamboo| | Topology Layer Logic | | | | | | Plugins | +-----+ +------+ | +---------------------+ ------------------------------------- Forwarding Forwarding & Encoding Logic Layer NAT & FW Connection Logic -------------------------------------Common Packet Encoding Transport +-------+ +------+ Layer |TLS | |DTLS | +-------+ +------+
The top layer, called the Usage Layer, has application usages, such as SIP Location Usage, that use an abstract distributed storage API to store and retrieve data from the DHT. The goal of this layer is to implement application-specific usages of the Distributed Storage Layer below it. The Usage defines how a specific application maps its data into something that can be stored in the DHT, where to store the data, how to secure the data, and finally how applications can retrieve and use the data.
The next layer is a Distributed Storage Layer. It can store and retrieve information, perform maintenance of the DHT as peers join and leave the DHT, and route messages. This layer is tightly bound to the specific DHT algorithm being used, as this algorithm determines how both routing and redundant storage are done in the DHT. The goal of this layer is to provide a fairly generic distributed and redundant storage service.
The next layer down is the Forwarding Layer. This layer is responsible for getting a packet to the next peer in the DHT. It uses the routing layers above it to determine what the next hop is; this layer deals with actually forwarding the packet to the next hop. Forwarding can include setting up connections to other peers through NATs and firewalls using ICE; it can take advantage of relays for NAT and firewall traversal. This layer passes packets in a common packet encoding, regardless of what DHT algorithm is being used in the Transport Layer below it. The goal of the Forwarding Layer is to forward packets to other peers.
Finally, in the bottom layer, packets are sent using a Transport Layer which uses TLS and DTLS.
Each logical address in the DHT where data can be stored is referred to as a locus. A given peer will be responsible for storing data from many loci. Typically literature on DHTs uses the term "key" to refer to a location in the DHT; however, in this specification the term key is used to refer to public or private keys used for cryptographic operations and the term locus is used to refer to a storage location in the DHT.
Each DHT will have a somewhat different structure, but many of the concepts are common. The DHT defines a large space of loci, which can be thought of as addresses. In many DHTs, the loci are simply 128- or 160-bit integers. Each DHT also has a distance metric such that we can say that locus A is closer to locus B than to locus C. When the loci are n-bit integers, they are often considered to be arranged in a ring so that (2^n)-1 and (0) are consecutive and distance is simply distance around the ring.
Each peer in the DHT is assigned a locus and is "responsible" for the nearby space of loci. So, for instance, if we have a peer P, then it would also be responsible for storing data associated with locus P+epsilon as long as no other peer P was closer. The DHT locus space is divided so that some peer is responsible for each locus.
The way routing works in a DHT is specified by the specific DHT algorithm but the basic concepts are common to most systems. Each peer maintains connections to some other set of peers N. There need not be anything special about the peers in N, except that the peer has a direct connection to them: it can reach them without going through any other peer. When it wishes to deliver a message to some peer P, it selects some member of N, N_i that is closer to P than itself (as a degenerate case, P may be in N). It then sends the message to N_i. N_i repeats this procedure until the message eventually gets to P.
In most DHTs, the peers in N are selected in a particular way. One common strategy is to have them arranged exponentially further away from yourself so that any message can be routed in a O(log(N)) steps. The details of the routing structure depend on the DHT algorithm, however, since it defines the distance metric and the structure of the direct connection table.
In ASP, messages may either be COMMANDS or RESPONSES to COMMANDS. Messages are routed as described above. In principle, responses could be routed the same way, but this makes diagnosis of errors difficult. Instead, as commands travel through the network they accumulate a history of the peers they passed through and responses are routed in the opposite direction so that they follow the same path in reverse.
The Data Storage Layer provides operations to STORE, FETCH, and REMOVE data from the DHT. Each location in the DHT is referenced by a single integer locus. However, each location may contain data elements of multiple types. Furthermore, there may be multiple values of each type, as shown below.
+--------------------------------+ | Locus | | | | +------------+ +------------+ | | | Type 1 | | Type 2 | | | | | | | | | | +--------+ | | +--------+ | | | | | Value | | | | Value | | | | | +--------+ | | +--------+ | | | | | | | | | | +--------+ | | +--------+ | | | | | Value | | | | Value | | | | | +--------+ | | +--------+ | | | | | +------------+ | | | +--------+ | | | | | Value | | | | | +--------+ | | | +------------+ | +--------------------------------+
Each type-id is a code point assigned to a specific application usage by IANA. As part of the Usage definition, protocol designers may define constraints, such as limits on size, on the values which may be stored. For many types, the set may be restricted to a single item; some sets may be allowed to contain multiple identical items while others may only have unique items. Some typical types of sets that a usage definition would use include:
When a new peer wishes to join the DHT, it must have a peer-id that it is allowed to use. It uses one of the peer-ids in the certificate it received from the enrollment server. The main steps in joining the DHT are:
First, the peer ("JP," for Joining Peer) uses the bootstrap procedures to find some (any) peer in the DHT. It then typically contacts the peer which would have formerly been responsible for the peer's locus (since that is where in the DHT the peer will be joining), the Responsible Peer (RP). It copies the other peer's state, including the data values it is now responsible for and the identities of the peers with which the other peer has direct connections.
The details of this operation depend mostly on the DHT involved, but a typical case would be:
After this process is completed, JP is a full member of the DHT and can process STORE/FETCH commands.
As described in Section 2.1.3, a peer maintains a set of direct connections to other peers in the DHT. Consider the case of a peer JP just joining the DHT. It communicates with the responsible peer RP and gets the list of the peers in RP's routing table. Naively, it could simply connect to the IP address listed for each peer, but this works poorly if some of those peers are behind a NAT or firewall. Instead, we use the CONNECT command to establish a connection.
Say that peer A wishes to form a direct connection to peer B. It gathers ICE candidates and packages them up in a CONNECT command which it sends to B through usual DHT routing procedures. B does its own candidate gathering and sends back an OK response with its candidates. A and B then do ICE connectivity checks on the candidate pairs. The result is a connection between A and B. At this point, A and B can add each other to their routing tables and send messages directly between themselves without going through other DHT peers.
TODO - More is needed here but the short version is that the replication approach is defined by the specific DHT algorithm not the Usage. The reason is that when a peer comes or goes, specific knowledge of the DHT topology is required to understand where the replication set is stored for the data. Also need to explain how data is merged after a network partition event.
The forwarding layer is responsible for looking at message and doing one of three things:
In a general messaging system, messages need a source and a destination. In an overlay network it is often useful to specify the source or destination as the path through the overlay. In addition, responses to commands need to retrace the command's path. To support this, each message has a source label stack and a destination label stack. Each label is 32 bits long, and the labels 0 to 254 are reserved for special use. 0 is an invalid label and 1 indicates that the next 4 labels are to be interpreted as a peer-id.
When a peer receives a message from the Transport Layer, it pushes a label on the source stack that indicates which TLS or DTLS flow the message arrived on. When a peer goes to transmit a message to the Transport Layer, it looks at the top label on the destination stack. If the top label is not one of the special use labels, it pops that label off the destination stack and sends the message over the TLS or DTLS flow that corresponds to that label. If the label is 1, then the next 4 labels are looked at and interpreted as a peer id. Note that these can be in the 0 to 254 range and still be interpreted as a peer-id. The routing logic in the Distributed Storage Layers is consulted to find out where to route this message. If this peer is responsible for the peer-id, then the 5 labels for the peer-id are popped off and the message is passed up to the Distributed Storage Layer for processing. Otherwise the labels are not popped off and the message is forwarded over the TLS or DTLS flow indicated in the routing logic.
When a peer goes to send a response to a command, it can simply copy the source label stack from the command into the destination label stack of the response and then start forwarding the response.
Peers that are willing to maintain state may do label compression. They do this by taking some number of labels off the top of the source label stack and replacing them with a single label that uniquely represents all the labels removed. Later, if the peer sees the compressed label in a destination label set, it removes it and replaces it with all the labels it originally popped off the s source label stack. Doing this requires a peer to save state but it allows certain peers to provide services in which they reduce the size of messages going across bandwidth-constrained links. It can also help protect the privacy of the per-compression peer topology. (TODO need more on length of validity of compressed labels)
The label stack approach provides several features. First it allows a response to follow the same path as the request. This is particularly important for peers that are sending commands while they are joining and before other peers can route to them. It also makes it easier to diagnose and manage the system. Storing a label stack that includes a peer that does label compression provides the type of Local Network Protection described inRFC 4864 [RFC4864] without requiring a NAT.
This layer sends and receives messages over TLS and DTLS. Each TLS or DTLS connection is referred to as a flow. For TLS it does the framing of messages into the stream. For DTLS it takes care of fragmentation issues. The reason for including TLS is the improved performance it can offer for bulk transport of data. The reason for including DTLS is that the percentage of the time that two devices behind NATs can form a direct connection without a relay is much higher for DTLS than for TLS. The way DTLS and TLS certificates are used does not require a global PKI, and therefore no option that uses only TCP or UDP without any security is included.
Before a new user can join the DHT for the first time, they must enroll in the P2P Network for the DHT they want to join. Enrollment will typically be done by contacting a centralized enrollment server. Other approaches are possible but are outside the scope of this specification. The user establishes his identity to the server's satisfaction and provides the server with its public key. The centralized server then returns a certificate binding the user's user name to their public key. The properties of the certificate are discussed more in Section 2.5. The amount of authentication performed here can vary radically depending on the DHT network being joined. Some networks may do no verification at all and some may require extensive identity verification. The only invariant that the enrollment server needs to ensure is that no two users may have the same identity.
During the enrollment process, the central server also provides the peer/user with the root certificate for the DHT, information about the DHT algorithm that is being used, a P2P-Network-Id that uniquely identifies this ring, the list of bootstrap peers, and any other parameters it may need to connect to the DHT. The DHT also informs the peers what Usages it is required to support to be a peer on this P2P Network. Once the peer has enrolled, it may join the DHT.
The underlying security model revolves around the enrollment process allocating a unique name to the user and issuing a certificate [REF: RFC3280] for a public/private key pair for the user. All peers in a particular DHT can verify these certificates. A given peer acts on behalf of a user, and that user is somewhat responsible for its operation.
The certificate serves two purposes:
When a user enrolls, or enrolls a new device, the user is given a certificate. This certificate contains information that identifies the user and the device they are using. If a user has more than one device, typically they would get one certificate for each device. This allows each device to act as a separate peer.
The contents of the certificate include:
Note that if peer-IDs are chosen randomly, they will be randomly distributed with respect to the user name. This has the result that any given peer is highly unlikely to be responsible for storing data corresponding to its own user, which promotes high availability.
When a peer uses a STORE command to place data at a particular location X, it must sign with the private key that corresponds to a certificate that is suitable for storing at location X. Each data type in a usage defines the exact rules for determining what certificate is appropriate. However, the most natural rule is that a certificate with user name U allows the user to store data at locus H(U) where H is a cryptographic hash function characteristic of the DHT. The idea here is that someone wishing to look up identity U goes to locus H(U), which is where the user is permitted to store their data.
The digital signature over the data serves two purposes. First, it allows the peer responsible for storing the data to verify that this STORE is authorized. Second, it provides integrity for the data. The signature is saved along with the data value (or values) so that any reader can verify the integrity of the data. Of course, the responsible peer can "lose" the value but it cannot undetectably modify it.
The second purpose of a certificate is to allow the device to act as a peer with the specified peer-ID. When a peer wishes to connect to peer X, it forms a TLS/DTLS connection to the peer and then performs TLS mutual authentication and verifies that the presented certificate contains peer-ID X.
Note that because the formation of a connection between two nodes generally requires traversing other nodes in the DHT, as specified in Section 2.1.6, those nodes can interfere with connection initiation. However, if they attempt to impersonate the target peer they will be unable to complete the TLS mutual authentication: therefore such attacks can be detected.
At some point before the certificate expires, the user will need to get a new certificate from the enrollment server.
At some point in time, a given P2P Network may want to migrate from one underlying DHT algorithm to another or update to a later extension of the protocol. This can also be used for crypto agility issues. The migration approach is done by basically having peers initializing algorithm A. When the clients go to periodically renew their credentials, they find out that the P2P Network now requires them to use algorithm A but also to store all the data with algorithm B. At this point there are effectively two DHT rings in use, rings A and B. All data is written to both but queries only go to A. At some point when the clients periodically renew their credentials, they learn that the P2P Network has moved to storing to both A and B but that FETCH commands are done with P2P Network B and that any SEND should first be attempted on P2P Network B and if that fails, retried on P2P Network A. In the final stage when clients renew credentials, they find out that P2P Network A is no longer required and only P2P Network B is in use. Some types of usages and environments may be able to migrate very quickly and do all of these steps in under a week, depending on how quickly software that supports both A and B is deployed and how often credentials are renewed. On the other hand, some very ad-hoc environments involving software from many different providers may take years to migrate.
By itself, the distributed storage layer just provides infrastructure on which applications are built. In order to do anything useful, a usage must be defined. Each Usage needs to specify several things:
TODO - Give advice on things that make bad usages - for example, things that involve unlimited storage such as storing voice mail.
From the perspective of P2PSIP, the most important usage is the SIP Usage. The basic function of the SIP usage is to allow Alice to start with a SIP URI (e.g., "email@example.com") and end up with a connection which Bob's SIP UA can use to pass SIP messages back and forth to Alice's SIP UA.
This operation can take a number of forms, but in the simplest case, Bob's SIP UA has peer-ID "B". When Bob joins the DHT (i.e., turns on his phone), he stores the following mapping in the DHT:
When Alice wants to call Bob, she starts with his URI and her UA uses the DHT to look up his peer-ID B. She then routes a message through the DHT to B requesting a direct connection. Once this connection is established she can send SIP messages over it, which allows her to set up the phone call.
This is done using three key operations that are provided by the SIP Usage. They are:
A peer acting as a SIP UA stores their registration information in the DHT by storing a label stack that routes to them at a locus in the DHT formed from the user's SIP AOR". When another peer wishes to find a peer that is registered for a SIP URI, the lookup of the user's name is done by taking the user's SIP Address or Record (AOR) and using it as the seed that is hashed to get a locus. A lookup for a data type of sip-location is done to this locus to find a set of values. Each value is a data structure contains a label stack that is used to reach a peer that represents a SIP UA registered for that AOR. The data structure also contains a string that would be a valid SIP header field value for the contact header in a 3xx response from a redirect server. This string can contain the caller-pref (TODO add reference) information for that SIP UA.
The seed for this usage is a user's SIP AOR, such as "sip:firstname.lastname@example.org", and the locus is formed by taking the top 128 bits of the SHA-1 hash of the seed. The set is a dictionary style set and is indexed by the peer-id of the certificate used to sign the STORE command. This allows the set to store many values but only one for each peer. The authorization policy is that STORE commands are only allowed if the user name in the signing certificate, when turned into a SIP URL and hashed, matches the locus. This policy ensures that only a user with the certificate with the user name "email@example.com" can write to the locus that will be used to look up calls to "sip:firstname.lastname@example.org".
Open Issue: Should the seed be "sip:email@example.com", "firstname.lastname@example.org", or a string that includes the code point defined for the type? The issue here is determining whether different usages that store data at a seed that is primarily formed from "email@example.com" should hash to the same locus as the SIP Usage. For example, if a buddy list had a seed that was roughly the same, would we want the buddy list information to end up on the same peers that stored the SIP location data or on different peers?
GRUUs that refer to peers in the P2P network are constructed by simply forming a GRUU, where the value of gr URI parameter contains a base64 encoded version of the label stack that will reach the peer. The base64 encoding is done with the alphabet specified in table 1 of RFC 4648 with the exception that ~ is used in place of =. An example GRUU is "sip:firstname.lastname@example.org;gr=MDEyMzQ1Njc4OTAxMjM0NTY3ODk~". When a peer needs to route a message to a GRUU in the same P2P network, it simply decodes the label stack and connects to that peer.
Anonymous GRUUs are done in roughly the same way but require either that the enrollment server issue a different peer-id for each anonymous GRUU required or that a label stack be used that includes a peer that compresses the label stack to stop the peer-id from being revealed.
This usage allows two clients to form a new TLS or DTLS connection between them and then use this connection for sending SIP messages to one another. This does not store any information in the DHT, but it allows the CONNECT command to be used to set up a TLS or DTLS connection between two peers and then use that connection to send SIP messages back and forth.
The CONNECT command will ensure that the connection is formed to a peer that has a certificate which includes the user that the connection is being formed to.
This usage allows each user to store their certificate in the DHT so that it can be retrieved to be checked by various peers and applications. Peers acting on behalf of a particular user store that user's certificate in the DHT, and any peer that needs the certificate can do a FETCH to retrieve the certificate. Typically it is retrieved to check a signature on a command or the signature on a chunk of data that the DHT has received.
This usage defines one new type, called "certificate." Each locus stores only a single value which is the X.509 certificate encoded using DER. The seed used to generate the locus is simply the serial number of the certificate. When a peer receives a command to STORE a particular certificate, it needs to be signed with the certificate with that serial number. This ensures that an attacker cannot overwrite the certificate of some other user.
Each user can store their current and previous certificate. This allows for transition from an old certificate to a new one. The certificate is stored as an X.509 certificate encoded with DER.
A peer should ensure that the user's certificates are stored in the DHT when joining and redo the check about every 24 hours after that. Certificate data should be stored with an expiry time of 60 days. When a client is checking the existence of data, if the expiry is less than 30 days, it should be refreshed to have an expiry of 60 days. The certificate information is frequently used for many operations, and peers should cache it for 8 hours.
This usage defines two new types, one for STUN servers and one for STUN-Relay servers.
Peers that provide the STUN server type need to support both UDP and TCP hole punching as defined in XXX, while peers that provide the STUN-Relay server type need to support the TURN extensions to STUN for media relay of both UDP and TCP traffic as defined in XXX.
The data is stored in a data structure with the IP address of the server and an indication whether the address is an IPv4 or IPv6 address. The seed used to form the storage locus is simply the peer-id. The access control rule is that the certificate used to sign the request must contain a peer-id that when hashed would match the locus where the data is being stored.
Peers can find other servers by selecting a random locus and then doing a FIND command for the appropriate server type with that locus. The FIND command gets routed to a random peer based on the locus. If that peer knows of any servers, they will be returned. The returned response may be empty if the peer does not know of any servers, in which case the process gets repeated with some other random locus. As long as the ratio of servers relative to peers is not too low, this approach will result in finding a server relatively quickly.
Any peer that is not running in one of the RFC 1597 private address spaces MUST provide a STUN server. Open issues - what about requiring STUN-Relay servers? Should there be low and high bandwidth version of STUN-Relay one can find? Low would be usable for signaling type things and high would be usable for audio and more.
This will likely be left out of scope of the initial system but just to give people a flavor of how these issues might be dealt with....
Buddy lists with reciprocal subscribes - when see indication buddy might be online, such as SUBSCRIBE from buddy, retry SUBSCRIBE to buddy. Subscriber ends up doing composition.
Single users with different devices can synchronize buddy lists when both are online
Can register a voicemail URI that fetches a greeting from a web server, plays this, and records a message, and then email the result to specified location. Could define a server usage for this similar to STUN/TURN server usage - may not have enough of them to effectively find with random probing and FIND command.
Store a mailto contact in the SIP Location and have it mean you can record a G.711 wav file for this user and email it to them.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This section provides the normative description of what peers need to do when sending and receiving the actual protocol commands. The basic message consists of a Forwarding Block that determines the destination of the message, followed by one or more Command Blocks or Response Blocks. The support for multiples of the Command or Response Blocks is just to pipeline several Commands or Responses together. Each Command Block specifies an operation and will receive a response.
The common packet format consists of a forwarding block with a TTL, P2P-Network-Id and version for that network, a stack of source and destination labels, and finally a variable number of command blocks. The top two bits in the first byte indicate the version of the ASP protocol and are set to 0 for this version. When a label is pushed on the stack, it becomes the first label; label #1 is the top of the stack and #N is the bottom.
Open issue: Do we want a magic number at front of block to indicate the protocol.
Forwarding Block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver|Resv(all 0)| Num Src Labels|Num Dst Labels | TTL | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | P2P Network ID | Network Ver | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SRC Label #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SRC Label ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SRC Label #N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DST Label #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DST Label ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DST Label #N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Command Block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R E 0 0 0 0 0 0| Command | Command Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transaction ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Command Data - variable length - 32 bit padded + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... Command Block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R E 0 0 0 0 0 0| Command | Command Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transaction ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Command Data - variable length - 32 bit padded + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Each command block starts with a command header that includes extension bits, the command, and the length of the command data (not including the command header). The transaction-id is a random number and is preserved as the message is forwarded from one hop to the next. In the command block, the R bit, if set, indicates that the peer processing the request must be able to understand this command or else an error response MUST be returned (a peer that simply forwards is not required to look at or understand the command blocks). The E bit indicates that even if this command is not understood, it MUST be echoed in any response.
The last Command Header Block in the message is typically a SIGNATURE command that computes a signature over all the previous command blocks.
Each command typically has some fixed format data at the beginning of it that carries the information that must occur in every command of that type, followed by a series of optional parameters. The first byte of the optional parameters has the same semantics as the first byte of the Command block that indicates whether the receiver needs to understand the parameter or not. The second byte defines the actual parameter type (which are IANA registered). The data length follows this in the third and forth byte.
Parameter Block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R E 0 0 0 0 0 0| Parameter | Parameter Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Parameter Data - variable length - 32 bit padded + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Stores a single copy of data in DHT. Includes a time to live for the data.
Parameters: locus, type, data, expiration time, data signature, signature, [etag]
Note the locus can be different than the destination when used for storing redundant data.
The expiration time is an absolute time to stop replay attacks, as described in the Security section.
Each time data is stored that is not bitwise identical to the previous data, the storing peer updates an entity-tag. If an etag is supplied in the command, then the operation will return an error if the current data does not have an entity-tag that matches the current etag.
Retrieves copy of data that is bitwise identical to the data in the store command
Parameters: locus, type, [etag]
If the entity tag of the data matches the optional etag in the FETCH, then a special response of SUCCESS-ETAG-MATCH is returned and no data is returned.
Response: data, data signature
Removes data - can only be done by the user that stored the data.
Parameter: locus, signature, [etag]
Returns the first instance of a stored data of a particular type that has a locus greater than or equal to the parameter
Need to also support returning the number of loci of the specified type that a peer is storing values for, as well as the range of locus space the peer is responsible for.
Parameter: locus, type
Responses: data locus, data, data signature, loci responsibility range, number of loci stored
Many DHTs will not need all of these, but some will need to use them.
Used to indicate the sender is a new peer joining the DHT
Parameters: joining peer id , user-id, signature
Response: list of existing peers that the sender might be interested in knowing about
Used to indicate that the sender is about to leave the DHT
Used to indicate that the sender wishes to flag that they exist and that the receiver may want to take some action, as a result of their existence, to deal with the stability of the DHT.
This one is highly dependent on the actually DHT algorithm. It may be possible to define some common identifiable peers such as 1st successor, nth successor, nth predecessor, other peer in finger table, and so on.
A node sends a CONNECT command when it wishes to establish a direct TCP or UDP connection to another node for the purposes of sending ASP messages or application layer protocol messages, such as SIP. Detailed procedures for the CONNECT and its response are described in Section 8.3.
The attributes included in the CONNECT command and its response are:
TODO: fill in binary encoding formats
Tests connectivity along a path. Can be addressed to a specific locus, in which case it is routed to the responsible peer to respond, or can be addressed to any locus, in which case the first peer to receive it will respond. Can be sent with anycast or multicast so it must have a small response that does not fragment and the receiver needs to be able to deal with multiple responses. Probably need the responder to insert a random response id.
Nothing signed on this one.
Responses: peer id of actual responding peer, label stack that the responding peer received
Time-stamp - not sure if this is needed to limit replay window or not
serial number of certificate used to sign
TODO - All transport flows need to have an associated label. SHOULD be unique to this peer or host and only use bottom 20 bits.
Number of retransmissions determines rate at which failure detection can occur - need to keep in lower than say SIP was - may have to be parameter of DHT instance
Need to make sure we can DEMUX this from other things - is a magic number needed at top of packet?
For TLS session, first the length of the message is sent as a 32 bit integer followed by the message. If the top two bits of the length are not set to zero, the receiver should consider this an error and close this stream. These bits are reserved for future extensibility.
TODO - deal with retransmissions, TCP rate friendly congestion control, and fragmentation of large packets above the DTLS layer.
Is a peer that routes a command transaction state-full on the command? Who runs a timer on a command to time it out? Who deals with retransmissions - has to be link by link. Suspect we can make all retransmission and timer at the original commanding peer and allow all forwarding peers to be stateless other than the issue of DTLS retransmissions - which will be a nightmare.
At numerous times during the operation of ASP, a node will need to establish a connection to another node. This may be for the purposes of building finger tables when the node joins the P2P network, or when the node learns of a new neighbor through an UPDATE and needs to establish a connection to that neighbor.
In addition, a node may need to connect to another node for the purposes of an application connection. In the case of SIP, when a node has looked up the target AOR in the DHT, it will obtain a Node-ID that identifies that peer. The next step will be to establish a "direct" connection for the purposes of performing SIP signaling.
In both of these cases, the node starts with a destination Node-ID, and its objective is to create a connection (ideally using TCP, but falling back to UDP when it is not available) to the node with that given Node-ID. The establishment of this connection is done using the CONNECT command in conjunction with ICE. It is assumed that the reader has familiarity with ICE.
ASP implementations MUST implement full ICE. Because ASP always tries to use TCP and then UDP as a fallback, there will be multiple candidates of the same IP version, which requires full ICE.
To utilize ICE, the CONNECT method provides a basic offer/answer operation that exchanges a set of candidates for a single "stream". In this case, the "stream" refers not to RTP or other types of media, but rather to a connection for ASP itself or for SIP signaling. The CONNECT request contains the candidates for this stream, and the CONNECT response contains the corresponding answer with candidates for that stream. Though CONNECT provides an offer/answer exchange, it does not actually carry or utilize Session Description Protocol (SDP) messages. Rather, it carries the raw ICE parameters required for ICE operation, and the ICE spec is utilized as if these parameters had actually been used in an SDP offer or answer. In essence, ICE is utilized by mapping the CONNECT parameters into an SDP for the purposes of following the details of ICE itself. That avoids the need for ASP to respecify ICE, yet allows it to operate without the baggage that SDP would bring.
ICE uses server reflexive and relayed candidates learned from STUN and TURN servers. With ASP, the nodes in the P2P network can provide TURN and STUN services for other nodes. Using a bootstrapping STUN server on the public Internet, a node learns with some probability that it is not behind a NAT or firewall. If it believes it is probably not behind one, it writes itself into the P2P network using a particular algorithm described below. When it comes time to gather a STUN or TURN server, an agent uses the algorithm described below to gather several servers of each type. Several servers are used for redundancy, to handle failures or cases where the server is not actually behind a NAT (which will result in the connectivity check through that server failing).
In addition, ASP only allows for a single offer/answer exchange. Unlike the usage of ICE within SIP, there is never a need to send a subsequent offer to update the default candidates to match the ones selected by ICE.
ASP and SIP always run over TLS for TCP connections and DTLS [RFC4347] for UDP "connections". Consequently, once ICE processing has completed, both agents will begin TLS and DTLS procedures to establish a secure link. Its important to note that, had a TURN server been utilized for the TCP or UDP stream, the TURN server will transparently relay the TLS messaging and the encrypted TLS content, and thus will not have access to the contents of the connection once it is established. Any attack by the TURN server to insert itself as a man-in-the-middle are thwarted by the usage of the fingerprint mechanism of RFC 4572 [RFC4572], which will reveal that the TLS and DTLS certificates are not a match for the ones used to sign the ASP messages.
An agent follows the ICE specification as described in [I-D.ietf-mmusic-ice] and [I-D.ietf-mmusic-ice-tcp] with the changes and additional procedures described in the subsections below.
Open Issue: We are still working on the algorithm in the this section and as it is currently described, there are some security issues. Expect improvement in the next release :-)
When a node starts up, it learns its bootstrap STUN server. It does this by taking the name of the DHT (for example, "example.com") and querying the DNS for the STUN server for that domain. The administrator of this domain MUST provide a STUN server. This bootstrap STUN server MUST be on the public Internet. The node then utilizes the diagnostics STUN usage [I-D.ietf-behave-nat-behavior-discovery]. If, based on this, the agent believes it is not behind a NAT or firewall, it MUST consider itself a candidate STUN server and SHOULD consider itself a candidate TURN server.
Next, the node gets an estimate N of the number of nodes in the P2P network. This computation is actually very straightforward. A given node has connections to other nodes in the DHT. For each such node i, the node directs a FIND command to it, and will get back the range of loci that this neighbor is responsible for. For that node i, an estimate Ei of the total number of nodes is the size of the hashspace divided by the number of loci in this range. Then, the node takes the average Ei across all connections. The result is an estimate of N.
Each node is configured with an estimate of the typical fraction, d, of the population that will serve as STUN or TURN servers. For STUN servers, this SHOULD be d_stun=.1, and for TURN, d_turn=.01.
If the node is a candidate STUN server, it picks a random number uniformly distributed between 0 and d_stun*N. This number is used as a seed, and the resulting value is a locus in the hashspace. The node performs a STORE operation at this locus, using the STUN server data type. This operation SHOULD be repeated four more times (for a total of five stores to different loci). If the node is a candidate TURN server, it performs the same process, but using d_turn.
When a node wishes to establish a connection for the purposes of ASP signaling or SIP signaling (or any other application protocol for that matter), it follows the process of gathering candidates as described in Section 4 of ICE [I-D.ietf-mmusic-ice]. ASP utilizes a single component, as does SIP. Consequently, gathering for these "streams" requires a single component.
An agent MUST implement ICE-tcp [I-D.ietf-mmusic-ice], and MUST gather at least one UDP and one TCP host candidate for ASP and for SIP.
The ICE specification assumes that an ICE agent is configured with, or somehow knows of, TURN and STUN servers. ASP provides a way for an agent to learn these by querying the ring. Using the procedures in Section 8.3.2, an agent estimates the number of nodes in the P2P network, N. If the node is utilizing TURN, it then computes a random number uniformly distributed between 0 and d_turn, and uses the resulting value as a seed. It then performs a FETCH targeted to the locus for that seed, asking for data of type TURN server. The result will, on average, return five TURN servers. The agent then uses each of these as its TURN servers for this CONNECT. If the agent is not utilizing TURN, it computes a random number uniformly distributed between 0 and d_stun, and uses the resulting value as a seed. It then performs a FETCH targeted to the locus for that seed, asking for data of type STUN server. The result will, on average, return five STUN servers. The agent then uses each of these as its STUN servers for this CONNECT.
The agent SHOULD prioritize its TCP-based candidates over its UDP-based candidates in the prioritization described in Section 4.1.2 of ICE [I-D.ietf-mmusic-ice].
The default candidate selection described in Section 4.1.3 of ICE is ignored; defaults are not signaled or utilized by ASP.
Section 4.3 of ICE describes procedures for encoding the SDP. Instead of actually encoding an SDP, the candidate information (IP address and port and transport protocol, priority, foundation, component ID, type and related address) is carried within the attributes of the CONNECT command or its response. Similarly, the username fragment and password are carried in the CONNECT message or its response. Section 6.4.1 describes the detailed attribute encoding for CONNECT. The CONNECT command and its response do not contain any default candidates or the ice-lite attribute, as these features of ICE are not used by ASP. The CONNECT command and its response also contain a Next-Protocol attribute, with a value of SIP or ASP, which indicates what protocol is to be run over the connection. The ASP CONNECT command MUST only be utilized to set up connections for application protocols that can be multiplexed with STUN and ASP itself.
Since the CONNECT command contains the candidate information and short term credentials, it is considered as an offer for a single media stream that happens to be encoded in a format different than SDP, but is otherwise considered a valid offer for the purposes of following the ICE specification. Similarly, the CONNECT response is considered a valid answer for the purposes of following the ICE specification.
Since all messages with ASP are secured between nodes, the node MUST implement the fingerprint attribute of RFC 4572 [RFC4572], and encode it into the CONNECT command and response as described in Section 6.4.1. This fingerprint will be matched with the certificates utilized to authenticate the ASP CONNECT command and its response.
Similarly, the node MUST implement the active, passive, and actpass attributes from RFC 4145 [RFC4145]. However, here they refer strictly to the role of active or passive for the purposes of TLS handshaking. The TCP connection directions are signaled as part of the ICE candidate attribute.
An agent MUST skip the verification procedures in Section 5.1 and 6.1 of ICE. Since ASP requires full ICE from all agents, this check is not required.
The roles of controlling and controlled as described in Section 5.2 of ICE are still utilized with ASP. However, the offerer (the entity sending the CONNECT request) will always be controlling, and the answerer (the entity sending the CONNECT response) will always be controlled. The connectivity checks MUST still contain the ICE-CONTROLLED and ICE-CONTROLLING attributes, however, even though the role reversal capability for which they are defined will never be needed with ASP. This is to allow for a common codebase between ICE for ASP and ICE for SDP.
The processes of forming check lists in Section 5.7 of ICE, scheduling checks in Section 5.8, and checking connectivity checks in Section 7 are used with ASP without change.
The controlling agent MUST utilize regular nomination. This is to ensure consistent state on the final selected pairs without the need for an updated offer, as ASP does not generate additional offer/answer exchanges.
The procedures in Section 8 of ICE are followed to conclude ICE, with the following exceptions:
An agent MUST NOT send a subsequent offer or answer. Thus, the procedures in Section 9 of ICE MUST be ignored.
STUN MUST be utilized for the keepalives described in Section 10 of ICE.
The procedures of Section 11 apply to ASP as well. However, in this case, the "media" takes the form of application layer protocols (ASP or SIP for example) over TLS or DTLS. Consequently, once ICE processing completes, the agent will begin TLS or DTLS procedures to establish a secure connection. The fingerprint from the CONNECT command and its response are used as described in RFC 4572 [RFC4572], to ensure that another node in the P2P network, acting as a TURN server, has not inserted itself as a man-in-the-middle. Once the TLS or DTLS signaling is complete, the application protocol is free to use the connection.
The concept of a previous selected pair for a component does not apply to ASP, since ICE restarts are not possible with ASP.
An agent MUST be prepared to receive packets for the application protocol (TLS or DTLS carrying ASP, SIP or anything else) at any time. The jitter and RTP considerations in Section 11 of ICE do not apply to ASP or SIP.
This section describes what needs to be specified when specifying a new DHT Algorithm.
Describe this from point of view of event driven system. Events include a user deciding to join, leave, etc. and protocol events such as receive update, join, etc. When an event is received, DHT defines a series of things to send and things to store - the DHT algorithm specifies what message gets sent on each event and what gets stored.
Note: This section need is just a very rough strawman to start thinking about the right issues.
In order to allow ASP to be used with existing and new DHT algorithms, it is important to define a clear model on how different DHTs are "plugged" into ASP. In order to make it easy to add new DHT algorithms, from the perspective of protocol changes, code changes and specification work, ASP defines an abstract API that exists between the Routing and Replication Logic and the DHT.
This API takes the form of an event driven system. Events arrive as a consequence of operations invoked by the usage and by arrival of messages over the wire. For certain events, the DHT layer is expected to provide a response. In other cases, the DHT layer is just notified of the event. In response, the DHT layer can inject messages, typically ones used for DHT maintenance.
The events passed to the DHT layer are:
The "commands" that the DHT layer can invoke include all of the commands supported by ASP. However, the DHT layer would not construct the message or perform authentication. Rather, it would instruct the routing and replication logic to send the message, and include attributes that the DHT layer wants to include in the message. When a response is received, this response is passed to the DHT layer.
This algorithm is assigned the name chord-128-2-32 to indicate it is based on Chord, and it uses a 128 bit hash function, stores 2 redundant copies of all data, and has finger tables with 32 entries.
The algorithm described here is a modified version of the Chord algorithm. Each peer keeps track of a finger table of 32 entries and a neighborhood table of 6 entries. The neighborhood table contains the 3 peers before this peer and the 3 peers after it in the DHT ring. The first entry in the finger table contains the peer half-way around the ring from this peer; the second entry contains the peer that is 1/4 of the way around; the third entry contains the peer that is 1/8th of the way around, and so on. Fundamentally, the chord data structure can be thought of a double-linked list formed by knowing the successors and predecessor peers in the neighborhood table, sorted by the peer-id. As long as the successor peers are correct, the DHT will return the correct result. The pointers to the prior peers are kept to enable inserting of new peers into the list structure. Keeping multiple predecessor and successor pointers makes it possible to maintain the integrity of the data structure even when consecutive peers simultaneously fail. The finger table forms a skip list too, so that entries in the linked list can rapidly be found - it needs to be there so that peers can be found in O(log(N)) time instead of the typical O(N) time that a linked list would provide.
A peer, n, is responsible for a particular locus k if k is less than or equal to n and k is greater than p, where p is the peer id of the previous peer in the neighborhood table. Care must be taken when computing to note that all math is modulo 2^128.
If a peer is not responsible for a locus k, then it routes a command to that location by routing it to the peer in either the neighborhood or finger table that has the largest peer-id that is still less than or equal to k.
When a peer receives a STORE command for locus k, and it is responsible for locus k, it stores the data and returns a SUCCESS response. [Note open issue, should it delay sending this SUCCESS until it has successfully stored the redundant copies?]. It then sends a STORE command to its successor in the neighborhood table and to that peers successor. Note that these STORE commands are addressed to those specific peers, even though the locus they are being asked to store is outside the range that they are responsible for. The peers receiving these check they came from an appropriate predecessor in their neighborhood table and that they are in a range that this predecessor is responsible for, and then they store the data.
[rewrite to be more event oriented]
When a peer (with peer-id n) joins the ring, it first does a PING to peer n to discover the peer, called p, that is currently responsible for the loci this peer will need to store. It then does a PING on p+1 to discover p0, a PING on p0+1 to discover p1, and finally a PING on p1+1 to discover p2. The values for p, p0,p1, and p2 form the initial values of the neighborhood table. (The values for the two peers before p will be found at a later stage when n receives an UPDATE.) The peer then fills the finger table by, for the i'th entry, doing a PING to peer (n+2^(numBitsInPeerId-i). The peer then uses the CONNECT command to form connections to all the peers in the neighborhood and finger tables. The finger table is initialized before starting to accept data so that certificates can be looked up to check signatures.
Next, peer n indicates it is ready to start receiving data by sending a JOIN command to peer p. At this point peer p transfers a copy of the data it will need to store on peer n by sending a series of STORE commands to transfer the data. Once peer p has finished sending all the STORE commands to transfer the data, it changes its neighborhood table to include n and then sends an UPDATE command to all the peers in the neighborhood table. Each one of the UPDATES contains the peer-id of all the entries in peer p's neighborhood table as well as the id for peer n.
When a peer, n, receives an UPDATE command, it looks at all the peer-ids in the UPDATE and at its neighborhood table and decides if this UPDATE would change its neighborhood table. If any peer, p, would be added or removed from the neighborhood table, the peer sends a PING to peer p; if this fails, peer p is removed from the neighborhood table, and if it succeeds, p is added to the table. After the PINGs are done, if the table has changed, peer n attempts to open a new connection to any new peers in the neighborhood table by sending them a CONNECT command. If the neighborhood table changes, the peer sends an UPDATE command to each of its neighbors.
Every time a connection to a peer in the neighborhood set is lost (as determined by connectivity pings), the peer should remove the entry from its neighborhood table and send an UPDATE to all the remaining neighbors. The update will contain all the peer-ids of the current entries of the table (after the failed one has been removed).
If connectivity is lost to all three of the peers that succeed this peer in the ring, then this peer should behave as if it is joining the network and use PINGs to find a peer and send it a JOIN. If connectivity is lost to all the peers in the finger table, this peer should assume that it has been disconnected from the rest of the network, and it should periodically try to join the DHT.
About every hour, a peer should send UPDATE commands to all of the peers in its neighborhood table.
About every hour a peer should select a random entry from the finger table and do a PING to peer (n+2^(numBitsInPeerId-i). If this returns a different peer than the one currently in this entry of the peer table, then a new connection should be formed to this peer and it should replace the old peer in the finger table.
Unfortunately most peers leave by just disconnecting. This is not good. A more orderly way to disconnect is the following. First the leaving peer stops responding to PINGS. It then sends CLOSE commands on any connections it has open. Next it sends an UPDATE to all of the peers in its neighbor set (both peers ahead and behind it in the ring) which includes its other neighbors but MUST NOT include its own peer id. It then does a STORE for each locus it has, to transfer that data to the new responsible peer. Finally it closes any connections that it has open.
Fixes the DHT and DHT parameters
Provides user name and CERT
May provide multiple DHTs for insertions multiple rings during migration from one to another
Specify some XML over HTTP based enrollment process to a central server
Discuss P2P-Network-Id creation. The top 24 bits are a hash of the P2P-Network-ID name (for example, "example.org"), while the bottom 8 bits are controlled by the site and are used for different versions of the ring.
This specification stores users' registrations and possibly other data in a Distributed Hash table (DHT). This requires a solution to securing this data as well as securing, as well as possible, the routing in the DHT. Both types of security are based on requiring that every entity in the system (whether user or peer) authenticate cryptographically using an asymmetric key pair tied to a certificate.
When a user enrolls in the DHT, they request or are assigned a unique name, such as "email@example.com". These names are unique and are meant to be chosen and used by humans much like a SIP Address of Record (AOR) or an email address. The user is also assigned a peer-ID by the central enrollment authority. Both the name and the peer ID are placed in the certificate, along with the user's public key.
Each certificate enables an entity to act in two sorts of roles:
Note that since only users of this DHT need to validate a certificate, this usage does not require a global PKI. It does, however, require a central enrollment authority which acts as the certificate authority for the DHT.
ASP provides a somewhat generic DHT storage service, albeit one designed to be useful for P2P SIP. In this section we discuss security issues that are likely to be relevant to any usage of ASP. In the subsequent section we describe issues that are specific to SIP.
In any DHT, any given user depends on a number of peers with which she has no well-defined relationship except that they are fellow members of the DHT. In practice, these other nodes may be friendly, lazy, curious, or outright malicious. No security system can provide complete protection in an environment where most nodes are malicious. The goal of security in ASP is to provide strong security guarantees of some properties even in the face of a large number of malicious nodes and to allow the DHT to function correctly in the face of a modest number of malicious nodes.
The two basic functions provided by DHT nodes are storage and routing: some node is responsible for storing your data and for allowing you to fetch data from others. Some other set of nodes are responsible for routing messages to and from the storing nodes. Each of these issues is covered in the following sections.
The foundation of storage security in ASP is that any given locus/type code pair (a slot) is deterministically bound to some small set of certificates. In order to write data in a slot, the writer must prove possession of the private key for one of those certificates. Moreover, all data is stored signed by the certificate which authorized its storage. This set of rules makes questions of authorization and data integrity - which have historically been thorny for DHTs - relatively simple.
When a client wants to store some value in a slot, it first digitally signs the value with its own private key. It then sends a STORE request that contains both the value and the signature towards the storing peer (which is defined by the seed construction algorithm for that particular type of value).
When the storing peer receives the request, it must determine whether the storing client is authorized to store in this slot. In order to do so, it executes the seed construction algorithm for the specified type based on the user's certificate information. It then computes the locus from the seed and verifies that it matches the slot which the user is requesting to write to. If it does, the user is authorized to write to this slot, pending quota checks as described in the next section.
For example, consider the certificate with the following properties:
User name: firstname.lastname@example.org Peer-Id: 013456789abcdef Serial: 1234
If Alice wishes to STORE a value of the "SIP Location" type, the seed will be the SIP AOR "sip:email@example.com". The locus will be determined by hashing the seed. When a peer receives a request to store a record at locus X, it takes the signing certificate and recomputes the seed, in this case "firstname.lastname@example.org". If H("email@example.com")=X then the STORE is authorized. Otherwise it is not. Note that the seed construction algorithm may be different for other types.
Being a peer in a DHT carries with it the responsibility to store data for a given region of the DHT. However, if clients were allowed to store unlimited amounts of data, this would create unacceptable burdens on peers, as well as enabling trivial denial of service attacks. ASP addresses this issue by requiring each usage to define maximum sizes for each type of stored data. Attempts to store values exceeding this size SHOULD be rejected. Because each slot is bound to a small set of certificates, these size restrictions also create a distributed quota mechanism, with the quotas administered by the central enrollment server.
Allowing different types of data to have different size restrictions allows new usages the flexibility to define limits that fit their needs without requiring all usages to have expansive limits. Because peers know at joining time what usages they must support (see Section XXX), peers can to some extent predict their storage requirements.
Because each stored value is signed, it is trivial for any retrieving peer to verify the integrity of the stored value. Some more care needs to be taken to prevent version rollback attacks. Rollback attacks on storage are prevented by the use of "expiration time" values in each store. An expiration time represents the latest time at which the data is valid and thus limits (though does not completely prevent) the ability of the storing node to perform a rollback attack on retrievers. In order to prevent a rollback attack at the time of the STORE request, we require that expiration times be monotonically increasing expiration time (see Section XXX ). Storing peers MUST reject STORE requests with expiration times smaller than those they are currently storing.
The mechanisms described here provide a high degree of security, but some attacks remain possible. Most simply, it is possible for storing nodes to refuse to store a value (reject any request). In addition, a storing node can deny knowledge of values which it previously accepted. To some extent these attacks can be ameliorated by attempting to store to/retrieve from replicas, but a retrieving client at least has no way of knowing what it should do so.
In addition, when a type is multivalued (e.g., a set), the storing node can return only some subset of the values, thus biasing its responses. This can be countered by using single values rather than sets, but that makes coordination between multiple storing agents much more difficult. This is a tradeoff that must be made when designing any usage.
Because the storage security system guarantees (within limits) the integrity of the stored data, routing security focuses on stopping the attacker from performing a DOS attack on the system by mis-routing requests in the DHT. There are a few obvious observations to make about this. First, it is easy to ensure that an attacker is at least a valid peer in the DHT. Second, this is a DOS attack only. Third, if a large percentage of the peers on the DHT are controlled by the attacker, it is probably impossible to perfectly secure against this.
In general, attacks on DHT routing are mounted by the attacker arranging to route traffic through or two nodes it controls. In the Eclipse attack [REF: Eclipse] the attacker tampers with messages to and from nodes for which it is on-path with respect to a given victim node. This allows it to pretend to be all the nodes that are reachable through it. In the Sybil attack [REF: Sybil], the attacker registers a large number of nodes and is therefore able to capture a large amount of the traffic through the DHT.
Both the Eclipse and Sybil attacks require the attacker to be able to exercise control over her peer IDs. The Sybil attack requires the creation of a large number of peers. The Eclipse attack requires that the attacker be able to impersonate specific peers. In both cases, these attacks are limited by the use of centralized, certificate-based admission control.
Admission to an ASP DHT is controlled by requiring that each peer have a certificate containing its peer ID. The requirement to have a certificate is enforced by using TLS mutual authentication on each connection. Thus, whenever a peer connects to another peer, each side automatically checks that the other has a suitable certificate. These peer IDs are randomly assigned by the central enrollment server. This has two benefits:
The first property allows protection against Sybil attacks (provided the enrollment server uses strict rate limiting policies). The second property deters but does not completely prevent Eclipse attacks. Because an Eclipse attacker must impersonate peers on the other side of the attacker, he must have a certificate for suitable peer IDs, which requires him to repeatedly query the enrollment server for new certificates which only will match by chance. From the attacker's perspective, the difficulty is that if he only has a small number of certificates the region of the DHT he is impersonating appears to be very sparsely populated by comparison to the victim's local region. [REF: Wallach]
In general, whenever a peer engages in DHT activity that might affect the routing table it must establish its identity. This happens in two ways. First, whenever a peer establishes a direct connection to another peer it authenticates via TLS mutual authentication. All messages between peers are sent over this protected channel and therefore the peers can verify the data origin of the last hop peer for requests and responses without further cryptography.
In some situations, however, it is desirable to be able to establish the identity of a peer with whom one is not directly connected. The most natural case is when a peer UPDATEs its state. At this point, other peers may need to update their view of the DHT structure, but they need to verify that the UPDATE message came from the actual peer rather than from an attacker. To prevent this, all DHT routing messages are signed by the peer that generated them.
[TODO: this allows for replay attacks on requests. There are two basic defenses here. The first is global clocks and loose anti-replay. The second is to refuse to take any action unless you verify the data with the relevant node. This issue is undecided.]
[TODO: I think we are probably going to end up with generic signatures or at least optional signatures on all DHT messages.]
The routing security mechanisms in ASP are designed to contain rather than eliminate attacks on routing. It is still possible for an attacker to mount a variety of attacks. In particular, if an attacker is able to take up a position on the DHT routing between A and B it can make it appear as if B does not exist or is disconnected. It can also advertise false network metrics in attempt to reroute traffic. However, these are primarily DoS attacks.
Should these be 128 bits? Should the messages signal the size of them and the implementations use variable size for them?
It would be possible for a peer that had an empty list for a service like STUN to keep pointers to the previous and next peers that did have one a peer that performed the service and manage this as a linked list. When a FIND command came, it could return a hint of likely next and previous peers that might have pointers to a peer that provided the service.
Should all data have a generation ID so that instead of fetching all the data you can just see if it has changed?
How do we do required support like tags to add new commands?
What about extension blocks inside commands?
An important point: if you assume NATs are doing ICE to set up connections, you want a lot fewer connections than you might have on a very open network - this might push towards something like Chord with fewer connections than, say, bamboo.
TODO - ref draft-irtf-p2prg-survey-search
|[I-D.ietf-mmusic-ice]||Rosenberg, J, “Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols”, Internet-Draft draft-ietf-mmusic-ice-16, June 2007.|
|[I-D.ietf-behave-rfc3489bis]||Rosenberg, J, “Session Traversal Utilities for (NAT) (STUN)”, Internet-Draft draft-ietf-behave-rfc3489bis-06, March 2007.|
|[I-D.ietf-behave-turn]||Rosenberg, J, “Obtaining Relay Addresses from Simple Traversal Underneath NAT (STUN)”, Internet-Draft draft-ietf-behave-turn-03, March 2007.|
|[RFC2119]||Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels”, BCP 14, RFC 2119, March 1997.|
|[RFC3261]||Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, “SIP: Session Initiation Protocol”, RFC 3261, June 2002.|
|[I-D.willis-p2psip-concepts]||Willis, D, “Concepts and Terminology for Peer to Peer SIP”, Internet-Draft draft-willis-p2psip-concepts-04, March 2007.|
|[RFC4864]||Van de Velde, G., Hain, T., Droms, R., Carpenter, B., and E. Klein, “Local Network Protection for IPv6”, RFC 4864, May 2007.|
|[I-D.ietf-behave-nat-behavior-discovery]||MacDonald, D and B Lowekamp, “NAT Behavior Discovery Using STUN”, Internet-Draft draft-ietf-behave-nat-behavior-discovery-00, February 2007.|
|[I-D.ietf-mmusic-ice-tcp]||Rosenberg, J, “TCP Candidates with Interactive Connectivity Establishment (ICE”, Internet-Draft draft-ietf-mmusic-ice-tcp-03, March 2007.|
|[RFC4347]||Rescorla, E. and N. Modadugu, “Datagram Transport Layer Security”, RFC 4347, April 2006.|
|[RFC4145]||Yon, D. and G. Camarillo, “TCP-Based Media Transport in the Session Description Protocol (SDP)”, RFC 4145, September 2005.|
|[RFC4572]||Lennox, J., “Connection-Oriented Media Transport over the Transport Layer Security (TLS) Protocol in the Session Description Protocol (SDP)”, RFC 4572, July 2006.|
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at <http://www.ietf.org/ipr>.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at firstname.lastname@example.org.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright © The Internet Society (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
Funding for the RFC Editor function is currently provided by the Internet Society.