Network Working GroupE. Cooper
Internet-DraftP. Matthews
Expires: August 28, 2006Avaya
 February 24, 2006

The Effect of NATs on P2P SIP Overlay Architecture

draft-matthews-p2psip-nats-and-overlays-00

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on August 28, 2006.

Copyright Notice

Copyright © The Internet Society (2006).

Abstract

This document discusses the constraints that NATs put on the possible overlay architectures of a P2P SIP system. Given what seems to be a reasonable set of assumptions on where nodes are deployed and the kinds of NATs they are located behind, the document concludes that a structured partial-mesh overlay network exhibiting a property known as "symmetric interest" is the most reasonable overlay architecture.



1. Introduction

In general terms, P2P overlays attempt to eliminate a central bottleneck in a system by taking the data traditionally stored on a server (or set of servers) and dispersing it amongst a number of peers. Also in general terms, NAT boxes multiplex many "private" IP addresses onto a single "public" address. As a result of this multiplexing function, a NAT which receives an unsolicited message on its "public" address cannot determine which "private" address should receive it. Such messages are generally discarded. In client-server network topologies this is not a problem, since servers are usually given "public" addresses and clients never receive unsolicited messages. In P2P networks however, peers that cannot receive unsolicited messages cannot participate in the overlay. It follows then, that the presence of NATs in the network topology has a major influence on the overlay architecture.

Comments on this draft are solicited and should be addressed either to the authors or to the P2P-SIP mailing list at p2p-sip@cs.columbia.edu (see https://lists.cs.columbia.edu/cucslists/listinfo/p2p-sip).



2. Scenario



Figure 1 shows a set of peers that want to create a P2P SIP overlay network. Though this set is rather small, it still illustrates some key points.


         ,-------.
       ,' P    P  `.                             ,-----.
      (          P  )                           (   P   )
       `. P   P   ,'                             `-----'
         `-------'                            NAT
                  NAT
                            _.------------.
                       ,--''               `---.
                    ,-'                         `-.
                   /                               \
                  /                                 \
                 (            Internet               )
                  \                                 /
                   \                               /
                    `-.                         ,-'
                       `---.               _.--'
          N A T             `------------''
      ,-.     ,-.                                ,-----.
     / P \   ( P )                             ,'       `.
    ( P   )   `-'                             (   P   P   )
     \  P/                                     `.       ,'
      `-'                                        `-----'

    Legend
    P   - Peer node
    NAT - NAT box

 Figure 1: Example Scenario 

In this figure we see six clouds. Five represent subnets containing peers and one represents the Internet. Some of the subnets contain just a single peer while others contain multiple peers. One of the subnets uses public IP addresses, while the other subnets have NATs between them and the Internet and thus use private addresses. Two of the subnets are sitting behind the same NAT. Not illustrated in this figure are more complex NAT scenarios -- for example, a cascading NAT scenario where there are two NATs between a subnet and the Internet.

This document talks about overlay architectures for hooking these peers together into a P2P SIP system.



3. Assumptions

This section presents and discusses our assumptions about the P2P SIP system, about the NATs the system must traverse, and about the interaction between the system and these NATs.

The first assumption deals with the size range of P2P SIP systems. We assume that there can be many different P2P SIP systems, ranging from very small systems to very large systems, and nodes can be scattered anywhere around the world. This assumption is not directly related to NATs, but influences the other assumptions.

Assumption 1: There may be many different P2P SIP systems, with sizes ranging from two nodes to millions of nodes, with the node scattered across one to millions of subnets.

The next assumption deals with the question of whether a P2P SIP system will always have a certain proportion of nodes with public IP addresses. This question is important because nodes with public IP addresses make things easier, and if there is a large proportion of them, then nodes behind NATs can be treated as leaf nodes (that hang off of nodes with public IP addresses). Most P2P systems (e.g., file sharing systems) assume a certain proportion of nodes with public IP addresses.

However, this assumption seems less tenable with P2P SIP systems, especially in systems where the system is used in an enterprise and/or is primarily composed of hard phones (rather than general-purpose computers). Thus we make the following assumption:

Assumption 2: There can be P2P SIP systems where every peer has a NAT box between it and the open Internet.

In corporate environments, we expect this situation to be common.

The next set of assumptions deal with the behavior of the various NATs. At this point, readers should be familiar with references [1] (Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” .), [2] (Guha, S. and P. Francis, “NAT Behavioral Requirements for Unicast TCP,” .), [3] (Ford, B. and P. Srisuresh, “Peer-to-Peer Communication Across Network Address Translators,” .). In this document, we use the terminology of the IETF BEHAVE working group.

The two key behaviors of a NAT as are mapping behavior and its filtering behavior.

Consider the various possible mapping behaviors first (c.f. section 4.1 of [1] (Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” .)). If a NAT has a behavior other than "Endpoint Independent mapping", then peers behind the NAT cannot use "UDP hole-punching" (see [3] (Ford, B. and P. Srisuresh, “Peer-to-Peer Communication Across Network Address Translators,” .)). The only way to support these peers is by treating them as leaf nodes hanging off a "relay peer". This relay peer must have either a public IP address or be located behind a NAT with a filtering behavior of "Endpoint Independent filtering". Since (a) acting as a relay is very bandwidth- and processor-intensive (which some peers may not be able to handle) and since (b) a given P2P network may not have a node that has the required address properties to act as a relay, many P2P SIP networks may not be able to support peers behind NATs which do not provide "Endpoint Independent mapping".

For these reasons, we limit our architectural investigation to NATs with "Endpoint Independent mapping". (A later version of this document may describe the necessary extensions to support NATs that do not satisfy this assumption).

Assumption 3: All NATs must have a mapping behavior of "Endpoint Independent mapping".

Note that various investigations (see, for example, sections 6.2 and 6.4 of [3] (Ford, B. and P. Srisuresh, “Peer-to-Peer Communication Across Network Address Translators,” .)) have suggested that about 85% of all NATs have a mapping behavior of "Endpoint Independent mapping".

Now consider the various possible filtering behaviors (c.f. section 4.1 of [1] (Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” .)). It is easier to create a P2P network with nodes behind NATs that have a filtering behavior of "Endpoint Independent filtering" than with nodes behind NATs with other filtering behaviors. However, other filtering behaviors are seen as more secure, and especially in corporate NATs, these other filtering behaviors are more common.

So can we assume that the NATs in the P2P system have a variety of filtering behaviors, and that at least a significant percentage of them have the more P2P-friendly "Endpoint Independent filtering" behavior? Unfortunately, this seems overly optimistic. This may be true in larger systems with a significant number of residential-based peers, but in smaller deployments and/or deployments with a large number of enterprise-based peers, this seems unlikely. Especially in a P2P SIP systems deployed in enterprise environments, it seems likely that many systems will reside exclusively behind NATs with a filtering behavior of "Address Dependent filtering" (or worse).

So it seems best to be very conservative in this regard, and assume the worst possible filtering behavior.

Assumption 4: The P2P SIP system must function when all peers are located behind NATs with a filtering behavior of "Address and Port Dependent filtering".

An architecture that works in this situation will also work where some NATs have a less-restrictive filtering behavior.

The BEHAVE group has specified a number of other NAT UDP requirements [1] (Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” .). The appendix discusses our assumptions relative to this document in detail. For now, there is no similar table for TCP since the work on TCP in the BEHAVE working group has just started. However, many of the requirements for UDP apply to TCP as well.



In addition to the BEHAVE approach, there are some other approaches to NAT traversal that warrant discussion: UPnP, ALGs, SBCs, and manual configuration.

Universal Plug-n-Play (UPnP) is an approach developed by Microsoft. In this approach, the P2P application talks directly with the NAT and asks the NAT to open up pinholes for it. Many consumer-grade NATs support the UPnP protocol, and this approach is a viable option for P2P applications targeted only at the consumer market. However, most corporate-grade NATs do not support UPnP. In addition, ISPs that NAT their entire network (a practice that is becoming more common in certain environments) typically do not allow their customers to configure that NAT using UPnP.

Many NATs contain one or more Application Level Gateways (ALGs). An ALG is special code within the NAT that recognizes packets of a particular application-level protocol and treats the packets specially. ALG support for the File Transfer Protocol (FTP) is almost universal in NATs, and ALG support for the SIP is becoming more common. However, ALG support requires that the application protocol not be encrypted, and encryption of both SIP and P2P messages is likely to be desirable for security reasons. Also, ALG support for whatever P2P protocol we pick is very unlikely, at least in the short term.

Assumption 5: The traversal of a given NAT must not depend on that NAT supporting either UPnP or any ALG (except for FTP).

Session Border Controllers (SBCs) are boxes that are deployed in the network, sometimes by the customer but more commonly by the SIP service provider, to enable NAT traversal for standard client-server SIP. SBCs are becoming more common, but are typically restricted to working only with the SIP proxy servers of the SIP service provider that deploys the SBC. Furthermore, they are unlikely to support whatever P2P protocol we pick. Thus they are not a NAT traversal option for P2P SIP networks.

Assumption 6: The P2P NAT traversal strategy must not depend on the presence of SBCs in the network.

NAT traversal is often much easier if the user can manually configure the NAT. The user can open up pinholes in the NAT and/or modify the NAT's behavior. However, this requires that the user have the knowledge and interest to do the configuration (non-technical users often do not), have a NAT which is configurable (some low-end NATs are not configurable), and have permission to configure the NAT (problematic in corporate environments or when the ISP NATs the entire access network).

Furthermore, history has shown that systems which are "plug-and-play" tend to get much better acceptance by users. We would like users to be able to deploy P2P SIP peers without even know what a NAT is. Though we may not be "plug-and-play" in all cases, our NAT traversal strategy will be a failure if this is not true in the vast majority of cases.

Assumption 7: The NAT traversal strategy must be "plug-and-play" in the vast majority of cases.



Finally, there is the question of how many mapping and filtering entries ("pinholes") a NAT can support. Low-end NAT boxes found in homes and small enterprises may support only a very small number of mapping and filtering entries. NAT boxes deployed in larger enterprise environments usually support more entries since there are more devices (computers, IP phones, etc) behind them. However, a general rule seems to be that NAT vendors expect a given node to use only fairly few entries at a time. The exact number is not known to the authors at this time, but it is clearly small. Thus a NAT traversal strategy that has one or more peers opening up a large number of pinholes to communicate with other peers is not acceptable, partly because it uses up what may be a very limited resource, and partly because of the refresh traffic required (especially if UDP is used).

Assumption 8: The NAT traversal strategy must limit the number of mapping and filtering entries opened up on a given NAT box to a fairly small number (exact value is TBD).



Here is a summary of the assumptions listed above:

Assumption 1: There may be many different P2P SIP systems, with sizes ranging from two nodes to millions of nodes, with the node scattered across one to millions of subnets.

Assumption 2: There can be P2P SIP systems where every peer has a NAT box between it and the open Internet.

Assumption 3: All NATs must have a mapping behavior of "Endpoint Independent mapping".

Assumption 4: The P2P SIP system must function when all peers are located behind NATs with a filtering behavior of "Address and Port Dependent filtering".

Assumption 5: The traversal of a given NAT must not depend on that NAT supporting either UPnP or any ALG (except for FTP).

Assumption 6: The P2P NAT traversal strategy must not depend on the presence of SBCs in the network.

Assumption 7: The NAT traversal strategy must be "plug-and-play" in the vast majority of cases.

Assumption 8: The NAT traversal strategy must limit the number of mapping and filtering entries opened up on a given NAT box to a fairly small number (i.e., 10s of pinholes, not 100s of pinholes).



4. Architectural Options

This section discusses various architectural options in light of the above assumptions. The goal of this section is to do a pretty complete exploration of the design space, and discuss the pros and cons of the various approaches.

First of all, it is important to note the distinction between NAT traversal for signaling messages and NAT traversal for media messages. The latter problem (media) is solved in a peer-to-peer fashion using the ICE mechanism[5] (Rosenberg, J., “Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols,” .). If two peers can exchange signaling messages in some way (perhaps indirectly through other peers), then ICE can be used to set up a direct peer-to-peer connection through intervening NATs for the exchange of media messages. Furthermore, the ICE mechanism is consistent with the assumptions listed above. Thus the problem we need to solve can be reduced to finding a way for peers to exchange signaling messages.



4.1. Types of Networks

So let's consider an overlay network of peers where all peers are behind NATs with the most restrictive filtering policy, and consider ways for the peers to exchange signaling messages. Several different approaches can be used to accomplish this:

Relay -- All peers exchange SIP messages via a centralized "Relay Server" (with a public IP address). This scheme minimizes the load on the peers and their associated NATs but requires a central server. SIP messages flow relatively quickly between the peers, provided the central server is always available and not constrained by processing power or network bandwidth.

Rendezvous -- Peers use a "Rendezvous Server" (with a public IP address) as an intermediary to initiate "NAT hole-punching" ([3] (Ford, B. and P. Srisuresh, “Peer-to-Peer Communication Across Network Address Translators,” .)) every time they wish to begin communicating. Once NAT pinholes have been established, SIP messages are then exchanged directly. This scheme is still highly dependant on a central server, but reduces the load on it somewhat. Initial SIP messages are slightly delayed by the retrieval of SIP addresses from the "Rendezvous Server" and by the "NAT hole-punching" technique. The "Rendezvous Server" must maintain knowledge of and links to every active peer.

Mesh -- Once connected into the peer network, nodes exchange messages with selected other peers periodically to keep NAT pinholes open. SIP messages are either sent directly to the destination peer, or are sent indirectly via intermediate peers. No central server is required. The load on the peers and their local NATs is proportional to the number of NAT pinholes that must be maintained and the number of messages that must be sent within the mesh. (Methods for a peer to create or join such a peer-to-peer network are discussed in section 3.2).

Graphically, the communication flows in these networks would appear as shown in Figure 2. In the diagram, only signaling connections are shown; Media (RTP) connections are not shown.




           P                  P                    P
       P   |   P          P   |.  P            P---|---P
        \  |  /           .\  | ./            /    | /  \
         \ | /            . \ | / .          |     /     |
     P-----S-----P      P-.---S-----P        P-----------P
         / | \            . / | \              \ / |    /
        /  |  \           ./  |  \              /\ |   |
       P   |   P          P   |   P            P  \|   P
           P                  P                    P

         Relay           Rendezvous              Mesh


      Legend:
      P     - Peers
      S     - Central Server
      / \ | - Permanent connnections
      .     - Temporary connections

 Figure 2: Overlay Network Connectivity 

The networks in the figure above can be considered as discrete points in a spectrum that ranges from "fully centralized" on the left to "fully distributed" on the right. In general, the effort required to establish and maintain NAT pinholes increases as we move to the right, as does the amount of effort required to deliver a SIP message between two arbitrary nodes. However, the reliance on centralized equipment and the overall scalability decreases as we move to the right, and the network becomes more peer-to-peer. Further discussion of each topology is given below.

The Relay Network appears similar to a Client-Server configuration. It operates in a straightforward manner. A peer that wishes to call another creates a request and delivers it to the "Relay Server". The server forwards the request on to the target. The performance and scalability characteristics of this network are quite suitable for small- and medium-scale deployments. As the system grows into large scale deployments however, keeping the NAT pinholes open between the clients and the server places a heavy load on the server's resources. This load increases (at least) linearly with the size of the network. Even on a smaller scale, the "Relay Server" requires a sizable expenditure of resources (both initial and operational). For very small systems, this cost may be impractical. From a network availability standpoint, the "Relay Server" is also a liability. It represents a single point of failure upon which all nodes are totally dependant. Finally, the centralization of the administration of the network may be undesirable or impractical in some deployments.

The Rendezvous Network reduces the load on a central server by eliminating it from the messaging path once communications between the two endpoints has been established. One way this could work would be to have the originating node send the "Rendezvous Server" an 'INITIATE_NAT_HOLE' request that specifies the target peer (perhaps via node-id, or SIP URI), as well as its own IP address(es). In processing this request, the "Rendezvous Server" replies with the mapped IP address and port of the target peer and forwards the request to the target peer, perhaps also appending the mapped IP address and port of the originating peer. Upon reception of the 'INITIATE_NAT_HOLE' request, the target peer begins NAT hole-punching procedures to establish a link to the originator. This effort may include an ICE-like trial of various IP addresses, to avoid the problems associated with double-NAT topologies. Once the NAT pinholes are established, the two peers can begin regular SIP communications.

Overall load on the "Rendezvous Server" is somewhat reduced, since it is only party to a portion of the session signaling. These savings may not be substantial, though, since the reduction in SIP message traffic will require an increase in traffic to keep NAT pinholes alive. The availability and administration characteristics are the same as with the Relay Network.

The Mesh Network eliminates the use of a centralized server (except perhaps for bootstrapping, see section Section 5.2 (Joining the Network)). A node in this type of overlay establishes connections to some of the other peers. SIP messages are then routed via these connections.



4.2. More on Mesh Networks

Of the topologies described above, the Mesh Network is the most peer-to-peer, the most scalable, and the most plug-and-play. Thus it seems to line up the best with our assumptions. However, even with the general Mesh paradigm, several variations are still possible. The actual number of NAT pinhole connections is a key consideration. Consider Figure 3: Mesh Network Connectivity:


             P                   P                    P
          /     \             /  |  \              / /|\ \
        P         P         P----|----P          P----|----P
       /           \       /|    |    |\        /|/ \ | / \|\
      P             P     P-------------P      P-------------P
       \           /       \|    |    |/        \|/ / | \ \|/
        P         P         P----|----P          P----|----P
          \     /             \  |  /              \ \|/ /
             P                   P                    P

            Ring            Partial Mesh          Full Mesh

 Figure 3 

A Mesh Network in which every node is connected only to two neighbours can be termed a "Ring Network". This topology expends very little effort to maintain NAT pinholes but results in extremely high hop counts as the number of nodes increases. As a result, the overall scalability of this topology is very poor.

On the other hand, in small peer-to-peer overlay networks it is possible to maintain NAT pinhole connections between all pairs of peers (a "Full Mesh Network"). However, as the number of peers and distinct NATs increase, the number of pinholes (and traffic required to maintain them) quickly becomes impractical. In this topology, overall scalability is also poor.

In between these two extremes, the "Partial Mesh Network" seeks to strike a balance between the minimum and maximum sustainable numbers of NAT pinholes. This seems to be the only viable approach. The "ideal" number of pinholes is the one that results in the lowest hop counts whilst also keeping pinhole maintenance traffic manageable.



4.3. Static vs. Dynamic Connections

Given the selection of a partial-mesh network, the next question is whether the connection topology should be relatively static, or should evolve dynamically as calls are made. Note that we are talking about signaling connections here -- as with classical client-server SIP, the volume of media messages means that it always makes sense to set up a dedicated connection between the call endpoints for the media whenever that is possible.

Say peer P wants to set up a connection to peer Q. In keeping with assumption 4, we assume peer Q is behind a NAT with a restrictive filtering behavior. Thus P cannot send a connection request directly to Q, but must send it via existing connections in the overlay. Only once the connection request is delivered to Q can P and Q use UDP (or TCP) hole-punching to initiate a connection, and then do any connection handshaking required (e.g, for TCP).

So setting up a connection requires a number of messages to be exchanged between P and Q. If P and Q just need to exchange a very small number of messages, then it is probably more efficient for P and Q to use the existing mesh of connections rather than establishing a new connection. Though it is not the goal of this document to discuss lookup and signaling mechanisms for P2P SIP, it seems likely that most transactions between two peers will be short and consist of only a small number of messages. Thus a static connection pattern (perhaps with some additional connections established dynamically) is likely to be appropriate.



4.4. Message Routing and Structured vs. Unstructured Meshes

Assuming a fairly static pattern of connections, the next logical question is: What should the pattern of connections be? There are many different patterns or schemes that can be used -- how can we classify and evaluate these choices?

We believe that an important property of a overlay is the ability to route messages from one peer to an arbitrary second peer in the overlay. We believe that this property is essential at times to allow a peer to place a call to another node, to publish the status of a peer or user (for example, to a peer acting as a distributed registrar), or when a peer want to create a connection to another peer in the overlay (when creating the partial mesh).

With this in mind, we can classify connection patterns (or schemes) into two main groups:

Structured -- In a structured scheme, connection pattern between peers is exploited when routing messages between peers.

Unstructured -- In an unstructured scheme, the connection pattern is more or less random, and properties of the connection scheme are NOT exploited when routing messages.

In the next few subsections, we consider the various properties of structured and unstructured partial meshes.



4.4.1. Unstructured Schemes

Some examples of unstructured schemes are:

There are a number of ways messages might be routed in an unstructured scheme. The simplest way is to flood the message through the overlay. Though not particularly efficient, this way may be practical in smaller overlays or when the volume of messages is low. Another way is to use a graph searching algorithm to locate the message target, for example depth-first search or breadth-first search. A graph search algorithm will generally take longer than flooding to get the message to the peer, but may use fewer messages. Remembering a route, once found, and then using source routing for subsequent messages can be used with either of these two methods to improve performance, but suffers from the problem that topology changes (caused, for example, by a peer leaving the overlay) can invalidate the route unexpectedly.

Another approach is to run a routing protocol, which is the approach used in the Internet. In this case, each peer acts as both a host and a router. Let's consider the impact of choosing one of the standard IETF routing protocols.

As can be seen, no one single IETF protocol works will in meshed networks of the scale we are interested in. The Internet solves this problem by dividing the network up into regions (Autonomous Systems or ASes), each AS containing up to a few hundred routers, then running both a link state protocol (either OSPF or IS-IS) and a version of BGP call iBGP inside each AS, and running another version of BGP called eBGP between ASes. However, all this requires considerable configuration and monitoring on the part of an army of operational personnel.

All this suggests that unstructured schemes may not represent a good choice for P2P-SIP



4.4.2. Structured Schemes

The idea of a structured scheme is to create a connection pattern that can be exploited in routing.

Consider, for example, the following connection scheme based on a few of the ideas of Chord. As in Chord, some unique peer identifier is hashed and the result used to place peers on a ring. Each peer then maintains connections to peers located at various locations going clockwise around the ring. In this scheme, a message to peer Q can be addressed to Q's location in the ring, and an intermediate peer R can forward the message by forwarding it to the peer S in R's connection table that is closest to Q without overshooting Q.

If the NAT can support 160 different connections per peer, then the targets of the connections radiating out from each peer can be located at exponentially increasing distances from that peer. This allows a peer can reach any other peer in O(log N) hops using this scheme. However, if 160 different connections per peer proves excessive (see assumption 8), then hop counts may be larger.

Many other structured connections schemes exist. For example, structured connections schemes can be created using the ideas contained any one of a number of DHT schemes. (See, however, the comments of section Section 6 (Comments on Existing P2P Overlays)).



4.4.3. Symmetric Interest

When evaluating connection schemes, there is a property we have dubbed "symmetric interest". A connection scheme exhibits "symmetric interest" if, when peer P desires a connection to peer Q, then peer Q also desires a connection to peer P. "Symmetric interest" seems a desirable property of connection schemes since connections through NATs, by their nature, are bi-directional and because both peers incur the overhead of sending keep-alives to establish and maintain the connection.

A connection scheme based on peers randomly selecting peers to establish connections to does NOT exhibit symmetric interest because peer P can select peer Q without peer Q selecting peer P. The connection scheme based on the ideas of Chord that was mentioned in the previous section also does NOT exhibit symmetric interest because a given peer P in the ring desires connections to peers in the clockwise half-circle but not in the counter-clockwise half-circle.

One scheme that does exhibit symmetric interest has each peer maintains connections to peers located an exponentially increasing distances going both clockwise AND counter-clockwise around the ring.

The authors have not yet had a chance to do a thorough analysis of various structured schemes. Never-the-less, the idea of a structured scheme (perhaps exhibiting "symmetric interest") seems a lot more promising than unstructured schemes.



5. A Few Additional Points

This section discusses a few additional points about P2P SIP architecture.



5.1. Superpeers

Orthogonal to these connectivity approaches is the idea of superpeers. A group of peers that are all behind the same NAT can elect one or more of their number to act on their behalf in the larger P2P overlay. These elected peers are called superpeers.

The overlay architecture can then create two types of connections: connections between superpeers that traverse NATs, and connections between a superpeer and its local peers that do not traverse NATs. In this way, the number of NAT pinholes can be reduced compared with an architecture that has each peer connect to peers behind other NATs.



5.2. Joining the Network

How can a node X, which is not currently a part of a particular P2P network, can join that network.

The first thing to note is that if node X can contact just one peer P in the P2P overlay network, then it can learn about other peers though peer P and so join the network.

So the question can be reworded as: how can node X locate and contact at least one peer in the P2P overlay network that it wishes to join?

One approach is to use multicast. Node X could send out a "Hello, is anyone there?" multicast message, and any peer currently in the P2P network can reply. Alternatively, peers that are currently in a P2P network can periodically send out multicast messages advertising the existence of the network.

This approach works well when there are a number of peers on the same subnet. It also works well when there a number of peers on subnets linked by multicast-enabled routers. However, many low-end routers do not support multicast, and multicast support on high-end routers needs to be configured, so using multicast between subnets likely works only in more sophisticated deployments.

A second approach can be used if node X was previously part of the P2P network and then disconnected for a while. Node X can remember the IP addresses and ports of some peers when it disconnects, and then try to contact those peers when it wants to rejoin the network. If at least one of the other peers (a) can be contacted and (b) is still a member of the P2P overlay network, then node X can rejoin the network.

This approach will not work if all the other peers are behind NATs with a filtering policy of "Address Restricted filtering" (or worse) and node X disconnects for more than the lifetime of a filtering entry in a NAT (typically 2 - 5 minutes). However, it will work if some peers are behind NATs with "Endpoint-Independent filtering".

A third approach is to configure node X with the "mapped address and port" of some peer P. Here the "mapped IP address and port" is the public IP address and port of the peer that the NAT (if any) assigns Ð this is typically learned through a protocol such as STUN (which requires a STUN server). If peer P is behind a NAT with a filtering behavior of "Address Restricted filtering" (or worse), then peer P must also configured with the mapped address and port of node X.

Given the manual configuration required, this approach must be considered a last-ditch approach.

A fourth, and most general, approach is to use an Introduction Server. This is a node with a public IP address and a DNS entry which is not part of the P2P network but is used only for bootstrapping purposes. In the minimal usage scenario, the P2P network elects a single peer P to maintain a connection to the Introduction Server. When node X contacts the Introduction Server, node X is given the mapped IP address and port of peer P, and the Introduction Server forwards node X's mapped address and port to P.

The disadvantage of this approach is that it requires a stable helper node with a public IP address. But otherwise it is the most generally applicable of all the approaches.



MulticastBuddy ListManual ConfigIntroduction Server
Plug and Play Y Y N Y
Works when node X is anywhere N Y Y Y
Can be used for first connnection Y N Y Y
Does not require an external node Y Y Y N
 Table 1: Comparison of Discovery Methods 



6. Comments on Existing P2P Overlays

Many existing P2P overlays have ignored the presence of NATs in the network. Their assumption is that all participating nodes are fully reachable by all other nodes. In practice, this turns out not to be true. The "Endpoint-dependant filtering" NAT behaviour specified in [1] (Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” .) will impair the ability of many DHT algorithms to provide the guarantees they strive for. Some popular file-sharing networks require manual configuration of user's local NAT in order to join. Incorrect configuration makes it impossible to participate in the overlay. Other P2P systems deal with NATs by assigning "helpers" to nodes behind NATs. These "helpers" have publicly available addresses and act as relay points for the NAT-ed nodes. This is a relatively effective approach, but requires the nodes with publicly available addresses to carry more than their share of the load. The load will quickly become overwhelming in a network with a small proportion of public nodes.



7. Conclusions

Given the analysis done so far, it seem like the best P2P overlay architecture will have the following properties:



Appendix A. Detailed NAT UDP Assumptions



CriterionBEHAVE#Brief DescriptionOur RequirementJustification
Mapping REQ-1 MUST be "Endpoint-Independent" Must comply Peers behind a NAT which does not comply require a "surrogate" to act on their behalf in the P2P network and to relay traffic to them. This surrogate must have either a public IP address or be behind a NAT with a Filtering rule of "Endpoint-Independent" (REQ-8). It is likely that some systems will not have peers that can act as surrogates. Furthermore, acting as a surrogate is very bandwidth- and processor-intensive.
IP Address Pooling REQ-2 RECOMMENDED to be "Paired" Don't care Since we control both endpoints, it is easy for us to handle other behaviors
Port Assignment REQ-3 MUST NOT be "Port Overloading" Must comply "Port Overloading" can often cause seemingly random and inexplicable failures, as well as making testing much harder.
Port Range REQ-3a RECOMMENDED that the range classification of the source port be preserved. Don't care Since we control both endpoints, it is easy for us to handle other behaviors.
Port Parity REQ-4 RECOMMENDED that the NAT exhibit "Port parity preservation" Don't care Since we control both endpoints, it is easy for us to handle other behaviors.
Mapping Refresh Interval REQ-5 MUST NOT be less than 2 minutes (TBD) (TBD)
  REQ-5a Value MAY be configurable (TBD) (TBD)
  REQ-5b Default RECOMMENDED to be 5 minutes Don't care  
Mapping Refresh Direction REQ-6 MUST have "NAT Outbound refresh behavior" of "True". Must comply Are their any NATs that do not comply with this???
  REQ-6a MAY have "NAT Inbound refresh behavior" of "True" Don't care Many NATs refresh only on outbound traffic, so it is simplest to assume this is false.
Conflicting Address Spaces REQ-7 MUST either ensure no conflict or behave sensibly when a conflict occurs Should comply Conflicting addresses are not common, but do occur. NATs that do not comply will cause problems for the peers behind them.
Filtering REQ-8 RECOMMENDED to be either "Endpoint independent" or "Address dependent" Should comply (see discussion in section XXX)
  REQ-8a Filtering behavior MAY be configurable Don't care Best to assume it is NOT configurable
Hairpinning REQ-9 MUST support "hairpinning" Should comply This issue becomes crucial when the NAT in question is the NAT closest to the public internet in a multi-NAT environment. In this scenario, a failure to support "hairpinning" will hinder (possibly prevent) bootstrapping attempts.
  REQ-9a Hairpinning behavior MUST be "External source IP address and port" Must comply (if NAT does hair-pinning) (TBD)
ALGs REQ-10 RECOMMENDED that ALGs be disabled by default Should comply (TBD)
  REQ-10a RECOMMENDED that each ALG can be enabled or disabled separately Should comply (TBD)
Determinism REQ-11 MUST have deterministic behavior Must comply (TBD)
ICMP support REQ-12 Receipt of ICMP message MUST NOT destroy NAT mapping Must comply (TBD)
  REQ-12a SHOULD NOT filter ICMP messages based on source IP address. Don't care (TBD)
  REQ-12b RECOMMENDED that the NAT support ICMP Destination Unreachable messages. Don't care (TBD)
Fragmentation when sending REQ-13 MUST support fragmentation of packets larger than link MTU Should comply (TBD)
Fragmentation when receiving REQ-14 MUST support "Receive Fragment Out of Order" behavior Should comply (TBD)
 Table 2: NAT UDP Assumptions 



8. References

[1] Audet, F. and C. Jennings, “NAT Behavioral Requirements for Unicast UDP,” draft-ietf-behave-nat-udp-04 (work in progress).
[2] Guha, S. and P. Francis, “NAT Behavioral Requirements for Unicast TCP,” draft-hoffman-behave-tcp-03 (work in progress).
[3] Ford, B. and P. Srisuresh, “Peer-to-Peer Communication Across Network Address Translators,” article available at http://www.brynosaurus.com/pub/net/p2pnat/.
[4] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M., Dabek, F., and H. Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications,” article available at http://pdos.csail.mit.edu/chord/.
[5] Rosenberg, J., “Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols,” draft-ietf-mmusic-ice-06 (work in progress).
[6] Network World, “P2P Traffic Still Dominates the Net,” article available at http://www.toptechnews.com/story.xhtml?story_id=38121.


Authors' Addresses

  Eric Cooper
  Avaya
  100 Innovation Drive
  Ottawa, Ontario K2K 3G7
  Canada
Phone:  +1 613 592 4343 x228
Email:  ecooper@avaya.com
  
  Philip Matthews
  Avaya
  100 Innovation Drive
  Ottawa, Ontario K2K 3G7
  Canada
Phone:  +1 613 592 4343 x224
Email:  philip_matthews@magma.ca


Intellectual Property Statement

Disclaimer of Validity

Copyright Statement

Acknowledgment