Will Long Term Dynamic Address Allocation Record Retention Help or Hurt?

“If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.”    — Cardinal Richelieu

Requirements to preserve records always need to achieve a complex balance between costs, accuracy, and public policy. Actual signatures on small credit card transactions are now waived in many establishments; the merchant takes the risk that the charge is fraudulent, but maintaining the paper trail on small charges costs more than the aggregate risk of fraud.

Network records pose an interesting challenge because of distributed infrastructure, questions about forgeries, and sheer volume. A mobile phone log can be tens of pages per month; browsing a single web page can produce tens or hundreds of connections to multiple different Internet servers in a matter of seconds. Since most machines are connected to the Internet through a firewall for security and management reasons, tracking the identity of a particular network connection requires a logging every connection established including its time, requestor and requestee. Organizations that monitor Internet use and keep such logs accumulate data at a prodigious rate. However, for technical reasons that will be noted later in this article, even then there are questions about the accuracy of such monitoring.

Organizations that monitor their networks have made a conscious decision to undertake such monitoring. When it becomes a legal requirement, the issues become far more complex.

On February 13, Senator John Cornyn (R-Texas) introduced S.436, the “Internet Stopping Adults Facilitating the Exploitation of Today’s Youth Act of 2009” (referred to as the “SAFETY Act”). Included among its several provisions amending Title 18 US Code is:

“Section 2703 of Title 18, United States Code, is amended by adding at the end the following:
‘(h) Retention of Certain Records and Information – A provider of an electronic communications service or remote computing service shall retain for a period of at least two years all records or other information pertaining to the identity of a temporarily assigned network address the service assigns to that user.’”

Representative Lamar Smith (R-Texas) introduced the companion bill in the House of Representatives, H.R.1076.

With all due respect to Senator Cornyn and Representative Smith, and several other members of the United States Congress who are co-sponsoring this legislation, this measure will undoubtedly create more difficulties than it will solve.

This is not to downplay the seriousness of Internet-related criminal activities including child pornography. Rather, the overall costs of mandatory long-term retention of information when that information is of disputable accuracy, attributability and reliability must be balanced against the benefits. Simultaneously, this requirement is costly, burdensome, and intrusive. Thus, there are high costs without commensurate benefit.

This legislation may be viewed from three perspectives:

If I were called to testify on these records, I would be required to state that the data is scientifically accurate before the records could be used. Knowing how these records can be subject to question, it would be almost impossible to use the records without a collateral forensic examination of the system that the records allege to be the external system. Even in that case, the original information may be long gone.

Accuracy has proven a factual challenge in the past, where the records of address assignment have come into question. In the case of RIAA v Sarah Ward, a 65-year old grandmother was accused of downloading music. The records from Comcast, the broadband provider, identified Mrs. Ward’s computer as the participant. However, it was demonstrated that the Kazaa, the software allegedly used was incompatible with the computer at the location (Mac).[1] We are left with the question: If this is so, were the records accurate? Fortunately, this case came to the attention of the EFF, otherwise the defendant might very well have been bankrupted defending herself.

As noted by “A User’s Guide to the Stored Communications Act – And a Legislator’s Guide to Amending It”, published in the George Washington Law Review,[2] the terms electronic communications service or remote computing service have a long legislative history, dating back more than 20 years, long before the current ubiquitous state of computing technology. Simply put, there is more computing and communications power in many notebook computers than in the data centers envisioned when these legislative characterizations were originally coined.

Unsurprisingly, the CNN report, “Bill Proposes ISPs, Wi-Fi keep logs for police”, on these resolutions notes that the definition referred to in this legislation includes all facilities providing electronic communications. The article goes on to note that each and every Wi-Fi or home firewall would be within the scope of the proposed legislation. The log retention requirement alone involves millions, if not tens of millions, of appliances distributed by various ISPs, or purchased independently by their subscribers, many of which would have to be replaced or at a minimum updated to comply with this legislation. Many, if not the overwhelming majority of these devices are minimalist, and do not have the physical capability of retaining such log information across even normal power outages, much less two years of network activity data. Thus, compliance is not merely a question of a five minute software update; it is a question of a complete replacement of the majority of the technical infrastructure now extant. Existing access points cost between US$50 and US$100 each. Access points with far larger log preservation capabilities can be expected to be commensurately higher.

Unspoken is the presumption that most of these appliances cannot be relied upon to have accurate clocks, so any time stamps on the log records are subject to question.

Left unsaid is a number of critical issues of responsibility, including retention, due care, and others. Once each access point owner is required to commit to long-term preservation of the information, the information will also become an attractive target for legal processes other than criminal investigations.

In scope, this preemptive information gathering is also far broader than the 1996 Electronic Communication Transactional Records Act, 18 USC Section 2703, which requires retention of records following a request from law enforcement authorities.

There is also the creation of a massive “collation hazard”, the potential to create a massive database whose contents can be correlated to produce very invasive conclusions (a point that I raised in the “Internet Security” chapter of the 1995 Computer Security Handbook, 3rd Edition). Compounding this hazard is the fact that the information is of dubious reliability. Thus, this legislation creates a re-enactment of the problem of published Social Security Numbers enabling identity theft, one of the factors underlying the advent of mass identity theft incidents.

The widespread information retention requirement has problems in several different spheres beyond the questions arising from feasibility and cost, including privacy and accuracy. Extending this requirement from ISPs down to all access points is also a dramatic increase in scope. It also creates a great deal of material, which then can be demanded in any number of civil proceedings. In a real sense, the potential for aggregation of this information recalls to memory the disbanded Total Information Awareness project of years past.

Some portable devices (e.g., iPods) have the potential to automatically create records subject to these requirements. Thus, the combination of a Wi-Fi access point and a constitutionally protected political or religious gathering would effectively create a legal record of attendance. Similarly, it is impractical to confine Wi-Fi signals to a limited area; passers-by would be identified as having been present. The pervasiveness of Wi-Fi technology, together with the automatic operation of the devices, creates the potential for automatic surveillance of individual movements.

Similarly, the simple fact is that the technology and the protocols that give rise to this information were never designed for the degree of reliability required for any legal proceeding, much less the “beyond reasonable doubt” standard in US criminal matters.

Lastly, completing the chain from act to an actual user is far more complex than merely logging the address allocated to a subscriber. The amount of information required to identify which computer, or which user on a shared computer is responsible for a given network operation requires far more than merely the network address assigned.

The ubiquitous SOHO (Small Office/Home Office) firewall router does more than merely serve as a passive wiring connection. The outside world sees the internal network as a single address, and machines within the firewall may be assigned addresses on a permanent or temporary basis. Each IP network operation is identified by four elements: its destination address, its destination port, its source address and its source port. When a web browser loads the files comprising a web page, it typically connects to the address of the web server on port 80 (for http) or port 443 (for https).

However, the requesting (source) side is not quite so simple. A web browser may open many different connections to one or more web servers, each to load a different part of the page. As an example, simply accessing my firm’s home page (http://www.rlgsc.com) using a recent version of FireFox requires six separate connections to retrieve the components of the page. While all of the connections have the same source address, they use different port numbers. These port numbers are assigned as needed by the firewall software as it creates the connections. This rewriting of addresses and ports is generally referred to as Network Address Translation (NAT). Thus, if there are two machines behind a firewall, a connection from port 8765 may be from one machine at one point, and from a different machine seconds later. This introduces moment to moment ambiguities that can be resolved only by detailed, accurate, reliable and most importantly, intensive, and intrusive logging.

All this depends on the amount of usage. However, with the exception of those company networks where all traffic is monitored, the actual moment to moment associations of ports to internal addresses are almost never preserved.

Thus, recording the addresses assigned to individual machines behind a firewall is in itself insufficient to attribute particular connections to particular machines, in the absence of some corroboration from an analysis of the machine involved.

The details of precisely what this information is, how it is to be used, and how it can be misused, should give everyone pause.

A review of the standard Internet protocols whose operation would be enveloped by these requirements is instructive. At first glance, these requirements would depend on information generated by the operation of DHCP (Dynamic Host Configuration Protocol, RFC 2131), ARP (Address Resolution Protocol, RFC 826), and RARP (Reverse Address Resolution Protocol, RFC 903).

The backdrop for the protocols affected by this requirement is the Local Area Network, specifically the local area networks that are descendants of the original Ethernet developed by Robert Metcalfe and his colleagues at Xerox PARC.[3] This original research network was first commercialized by an industry triumvirate of Digital Equipment Corporation, Xerox, and Intel in the mid-1980’s. Subsequently, the joint specification became the foundation for IEEE Standard 802.3. These specifications focus completely on the basics of creating a functioning network. Creating a working network was a major technical accomplishment; security and integrity were not even secondary considerations. In short, these protocols were designed for a benign environment, a fact clear from the comments in the original RFCs.[4]

ARP and its sibling, RARP, are protocols whose fundamental operating concepts date to this period at the dawn of local area technology. Both protocols address establishing the mapping between the 48-bit MAC (Media Access Control) addresses used on the actual LAN layer with the 32-bit addresses used by IP.[5]

ARP exists for a singular purpose: obtaining the MAC address corresponding to a given IP address. ARP defines the details of a LAN broadcast query for the node corresponding to a specific IP address. As a broadcast, the ARP request packet[6] is seen and decoded by each and every interface connected to the LAN, whether wired or wireless. The node whose IP address matches the query responds. From the response message, the requesting node obtains the MAC address.

Even in perfectly normal cases, the responding node may not be the actual destination node. The response to an ARP Request may also be provided by a proxy device such as a bridge, router, gateway, or other device.

It should be obvious that this trusting behavior is vulnerable to nefarious actors, and indeed such attacks happen. ARP spoofing is a well-known hazard. Easy-to-use tools implementing ARP spoofing attacks are readily available on the Internet.[7]

RARP is similar to ARP. As one would expect from its name, RARP serves a complementary function to ARP: obtaining an IP address from an Ethernet MAC address. Similarly, the same class of vulnerabilities that underlies ARP underlies RARP: the presumption that the LAN is a benign, non-hostile environment free from bad actors.

The Dynamic Host Configuration Protocol, more familiarly known by its acronym, DHCP, was originally designed to address the needs of devices without persistent storage. While it has been co-opted into the administration of large fixed networks with significant transient populations, it is merely a protocol for providing network addresses (permanent or temporary), and providing a variety of network and configuration parameters. In the words of the original author (emphasis mine):[4]

“7. Security Considerations DHCP is built directly on UDP and IP which are as yet inherently insecure. [sic[8]] Furthermore, DHCP is generally intended to make maintenance of remote and/or diskless hosts easier. While perhaps not impossible, configuring such hosts with passwords or keys may be difficult and inconvenient. Therefore, DHCP in its current form is quite insecure. [sic[8]]”

Using an address previously in use is straightforward. When that system ceases operation, its address lease is generally still in force. It is a simple technical matter to assume its IP and MAC addresses, and take its place. Should the owner of the system that requested the lease originally be held accountable for the actions of an impersonator who assumed the electronic persona of their computing device? Elvis may have left the building, but in this realm something can seamlessly assume his persona. Similarly, DHCP servers are only involved in allocating addresses; they have no ongoing traffic monitoring function.

Notwithstanding the explosive adoption of Internet technologies in the intervening years, nothing has changed the basics underlying these protocols.

It is also worth noting that DHCP does not have a monopoly on “dynamic address assignment”. It is possible to completely bypass the need for a DHCP server, by making use of ARP to probe the address space. This technique, referred to as Automatic Private IP Addressing[9] by Microsoft, is a common part of later generation Windows operating systems.

The intersection of “address allocation logs” (which are really logs of DHCP activity) and the legal system has several hazards. Using this information forensically to identify whether the system involved is actually in an illegal act requires care to avoid ensnaring and damaging innocents. Using the same pay phone as a well-known drug dealer is flimsy grounds for investigation, much less for an indictment. Creating a mass hazard for civil subpoenas is a more complex and even more intrusive question.

Notes

[1] John Schwartz “She Says She’s No Music Pirate. No Snoop Fan, Either.”, The New York Times, September 25, 2003
[2] “A User’s Guide to the Stored Communications Act – And a Legislator’s Guide to Amending It”
[3] Palo Alto Research Center
[4] Dynamic Host Configuration Protocol, RFC 2131
[5] The IP protocol suite generally uses the Hardware MAC addresses. However, the MAC address can be changed, and other protocol suites do make use of this feature. In fact, the original Ethernet specification allocates a range of MAC addresses (AA-00-04-00-xx-xx) allocated to Digital for use by the DECnet protocol suite. Actual MAC address block assignments are managed by the IEEE.
[6] ares_op$REQUEST
[7] “ARP Spoofing Tools” in “ARP Spoofing” on Wikipedia
[8] A common, perhaps overwhelmingly common misusage; the correct usage, from military communications, would be “unsecure”. “Insecure” refers to a psychological condition. From RFC 1531, Section 7.
[9] AIPA is Microsoft’s implementation of RFC 3927, “Dynamic Control of IPv4 Link Level Addresses”.

References

Picture of Robert Gezelter, CDP
RSS Feed Icon RSS Feed Icon
Add to Technorati Favorites
Follow us on Twitter
Share
Bringing Details into Focus, Focused Innovation, Focused Solutions
Robert Gezelter Software Consultant Logo
http://www.rlgsc.com
+1 (718) 463 1079