Electronic Discovery and
Digital Forensics:
The Applications Front

The sheer volume of electronically stored documents (ESI) often seems to obscure the actual business data stored on information systems. Digital forensics and electronic discovery (e-discovery) procedures encompass the full spectrum of digital information. In the legal community, electronic data is known as “Electronically stored information” (ESI). The sheer volume of documents, presentations, spreadsheets and similar electronic analogs of paper documents has spawned a huge need to collate and analyze data. The “paperless” office has, in this sense, produced a blizzard of electronic documents for analysis. In this blizzard of standard format electronic documents, the actual contents of various information systems are often underappreciated. This should not be so. Information systems, whether custom or packaged, are an important source of original raw data about a business. Abstracted documents, whether memoranda or invoices, are derivative forms based upon the raw information.

Recently, I published “Digital Forensics and E-Discovery on OpenVMS,” about how OpenVMS system managers should prepare for the need to deal with requests for digital data, specifically data in formats not understood by mass market-based procedures. This is not an OpenVMS-specific problem; the same problem is found on any computer system using software that is not in the “Top-200” list whose formats are included with major digital forensics and e-discovery packages. The problem exists on all systems: mass market systems including Microsoft’s Windows family, Apple's OS X, and all of the UNIX variants, including Linux as well as enterprise-class systems such as HP's OpenVMS and IBM's z/OS. Many applications across all systems and file systems on the enterprise-class systems are outside the capabilities of standard forensic and e-Discovery packages. This should be unsurprising.

There is an almost limitless population of applications software used today. Some applications are mass-market, appealing to a broad swath of the market. Others are niche applications that may be extremely popular within a particular industry or sector. While these applications may be popular within that industry group, they may be all but unknown outside of it (e.g., Mathworks' MATLAB).

Even more specific is applications software and business systems implemented specifically for an individual enterprise. While many such systems are variations on a theme, presumptions can be extremely misleading. Mass-market systems (e.g. Microsoft's Word and Excel) identify a least common denominator and are aimed at a wide market; their data formats and representations attempt universality and are correspondingly transportable between firms. Software developed for in-house use is contrastingly developed specifically for the needs of the individual organization as they are perceived at the time the software was implemented. This is a significant difference.

Viewed from a business records perspective, the conclusion is almost inescapable: Records stored in a custom format are as relevant as the corresponding records stored in a mass market electronic format (e.g., QuickBooks) or in hardcopy ledger books. Concluding the contrary would be absurd.

It follows that some of the most critical data stored within an organization will be stored in files whose format and organization is not within the decoding powers of standard software suites used in electronic discovery or digital forensics.

Consider the single record illustrated below:

Smith John 11401 33 17

Taken in isolation, it is not possible to determine the meanings of the individual fields within this record. It could be the first and last names of a person (or vice versa), the numeric values could include any number of things from ZIP (Postal) Codes to arbitrary indicators. Without the full context, including applications programs, related data files, and other information, conclusions can be difficult or misleading.

Some of this data is maintained by the underlying operating system; other information is contained within the file, but is not normally visible. Both classes of information are often referred to as “metadata.” Producing information in its native form is intended to preserve all of the metadata associated with the information. Native formats, including metadata, are preferred; a fact recently noted by US District Court Judge Shira A. Scheindlin, in National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency[1]

Understanding the meanings of data and metadata can be challenging. The intelligence community is long familiar with this inherent dilemma. One of the most famous examples of which is the message sent to the First Air Fleet commander Vice Admiral Chuichi Nagumo, IJN and other Imperial Japanese Navy commanders prior to the attack on Pearl Harbor:

“NITAKA YAMA NOBORE 1208;”

rendered in English as

“Climb Mt. Nitaka 1208.”

In retrospect, this is clearly the execute order for the opening of hostilities between the Japanese Empire and the United States, a fact noted by a translator more than four years later in 1945. Examined in isolation, without 20/20 hindsight, it gives no hint of its true meaning.[2,3] Alliteratively, climbing the highest mountain in Imperial Japan could be construed to be significant. However, this connection is far clearer in retrospect. As Sigmund Freud is reported to have said, “Sometimes a cigar is just a cigar.”[4,5]

Distinguishing between such custom data and mass-market formats is important. Mass-market applications face a similar problem, but on a different level. Consider Event.doc, a sample document in a standard mass- market format, for example, that created using Microsoft Word:.

Event.doc:

“Janson killed it.”

While the format is completely consistent with a specific version of the Microsoft Word document format, the meaning is far more obscure: This simple sentence could (non-exhaustively) mean:

Without fully understanding the context of the material, the precise meaning is unclear.

None of this affects the reliability or accuracy of in-house developed software. It does its job presumably with an understanding of the precise recording conventions of the data. The problem occurs when looking at the stored data without a thorough (or with a mistaken) understanding of how it is used and what it means. The effect is similar to that of coming upon a new tongue. A language may share elements with related languages, but that is no guarantee that such parallels are consistently reliable across the full breadth of the language.

Such questions frequently arise with regards to information systems. Business data stored in an organization's information systems to be vital in assessing any number of issues: accounting data (from revenues to expenditures); and precise times and locations for individual transactions, are examples. Ensuring that this information can be safeguarded and access preserved should be an important part of IT planning.

As I noted in my OpenVMS Consultant installment, this is a subtle yet very important point. Requirements to produce electronically stored information also create a concomitant need to both understand and preserve the context surrounding the information. The most reliable form of this information is not printed reports or their electronic analog. The raw data in the various files and databases used on a day-to-day basis in the normal course of business is far more detailed and accurate. An example is the difference between a normal mobile phone invoice and the so-called “tower log,” which indicates precisely which towers a cellular telephone used to complete a call.

The distinction is significant. Precisely this type of problem happened in one litigation matter. I was a consultant to the attorneys handling the matter. The central question surrounding assessing damages were the warranty claims recorded in a database. The warranty system was a series of custom programs written specifically to support the business' operations. The defendant in the case claimed that the warranty database was “unreliable.” I then performed a detailed review of the database records to determine the validity of the information stored in the database. In the end, after much research, I was able to account for each of the phenomena that had raised questions, refuting the “unreliability” claims. I have been given to understand that my client was then able to negotiate a favorable settlement. Attorneys and Information Technologists need to cooperate to identify relevant data and then to take steps to ensure that both the raw data and the technological context needed to understand data files is preserved in necessary completeness and with necessary safeguards to protect all interests, both actual parties and otherwise non-involved third parties.

Notes

[1] Shira A. Scheindlin, USDJ (2011, February 7) Opinion and Order, National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, 10 Civ. 3488
[2] Edwin Layton (1985) And I was There... pp 242
[3] Ibid, pp 528
[4] Clifton Fadiman (1985) The Little, Brown book of anecdotes Little, Brown and Company
[5] Ashton Applewhite, Tripp Evans, Andrew Frothingham (2003) And I quote: the definitive collection of quotes, sayings, and jokes for the contemporary speechmakers MacMillan, pp 224

References

URLs for referencing this entry

Picture of Robert Gezelter, CDP
RSS Feed Icon RSS Feed Icon
Add to Technorati Favorites
Follow us on Twitter
Bringing Details into Focus, Focused Innovation, Focused Solutions
Robert Gezelter Software Consultant Logo
http://www.rlgsc.com
+1 (718) 463 1079