Digital Forensics and E-Discovery on OpenVMS

Litigation and legal processes are increasingly part of the general IT landscape. Digital forensics and electronic discovery (often referred to as “e-discovery”) have become inescapable facts of business life. OpenVMS system managers need to proactively develop the plans, processes, and procedures needed to respond to these inevitable legal process requests. Correctly dealing with these requests minimizes the impact on production systems. Failure to address these situations correctly can expose the organization to significant liability.

Many digital forensics and electronic discovery practitioners are not prepared to deal with server systems. There is far more familiarity in the forensics and electronic discovery community with personal systems (e.g. Microsoft Windows® and various UNIX®-variants) and the related server systems. This unfamiliarity with OpenVMS and other server systems is not an excuse when a legal process request is received. Rather, it creates a potential for liability.

The near miss of Morgan Stanley & Co. Inc. v Coleman (Parent) Holdings Inc. should give pause. Morgan Stanley misreported the existence of backup tapes to the court and then failed to extract relevant electronic messages from those tapes. The court in effect ordered a directed verdict against Morgan for US$ 1.58 billion. The verdict was later overturned on an unrelated issue. This was far too close for comfort.

In the OpenVMS context, the challenge is that commonly used tools and procedures for digital forensics and electronic discovery do not match the OpenVMS environment from either hardware or software standpoints. The normal digital forensics tools (e.g., Guidance Software’s EnCase® and Access Data’s FTK®) do not directly support the OpenVMS on disk file structure, nor do they directly analyze the contents of typical native file formats. The standalone tools also do not operate on the processors and other hardware used with OpenVMS. This is not a surprise. Clearly, the sheer volume of court cases involving personal computers and smaller systems affects what is offered. However, the suitability of these tools and their use has been accepted by the courts. The use of non-recognized tools must be justified on a case-by-case basis. Custom programming is even more of a challenge.

It takes little to compromise the integrity of digital evidence. Disrupting operations in the name of protecting and gathering digital evidence is even easier. However, corruption and disruption are by no means foregone conclusions. With thought, care, and preparation, it is possible to both preserve digital evidence without interrupting or otherwise impairing operations. This concern is all the more serious when one is dealing with server systems, particularly server systems with 24x7x366 uptime commitments. Desktop and personal systems, by comparison, have far wider windows for processing.

However, a process can be established that will allow for the acquisition and safeguarding of digital evidence from an OpenVMS system. Done carefully, these steps ensure that data is secured in a defendable manner to a standard that should be sufficient even for the higher standards required for criminal cases.

Many, if not most, OpenVMS systems with high uptime requirements already use Volume Shadowing for OpenVMS (also referred to as Host-based shadowing). Similarly, normal procedures for backups often rely upon the use of temporary shadow set members to minimize downtime. Since all shadow set members are bit-for-bit identical when synchronized, a copy of a single member of the shadow set is adequate for most forensic purposes. Even deleted file analysis and slack space analysis can be performed (to the extent feasible with the often higher-grade security settings used on OpenVMS systems).

Once a set of fully-synchronized shadow set members has been sequestered, they can be imaged. A conventional BACKUP is not the best choice in this situation. Conventional backups do not preserve unallocated space. While popular forensic tools commonly used on personal platforms are not generally usable on the hardware supporting OpenVMS, there is a better alternative than conventional use of BACKUP: BACKUP/PHYSICAL.

BACKUP/PHYSICAL produces a precise copy of a disk volume, block-by-block, ignoring all aspects of the file structure. The precise contents of the disk are preserved, a technique which approaches the conventions used with mass market systems and accepted by the courts and legal community. The BACKUP saveset format by default also includes a CRC of each data block, which serves as a guard against hardware error.

Once the BACKUP/PHYSICIAL saveset has been created, it can be reduced in size by wrapping it within a ZIP archive (there are size restrictions on this approach created by the limits of the INFO-ZIP format and code; but there are ways to restructure the problem albeit they are sometimes laborious). A cryptographic checksum of each BACKUP saveset can also be computed at this time and recorded.

This ZIP file, which contains its own internal checksums, can then be transferred to a Microsoft Windows® system running EnCase® software, where it can be incorporated into an EnCase Logical Evidence file, which includes cryptographic checksums. In effect, this procedure uses an EnCase® Logical Evidence file as a digital vault.

With the various recorded cryptographic checksums and process records, counsel can prepare an affidavit reflecting the provenance of the data stored in the EnCase® Logical Evidence file, including its cryptographic checksums, and the process steps used for extracting it from the running system. If need be, this affidavit can be filed with a neutral third party, as a documentary guard recording the authenticity of the underlying data.

Data analysis can then proceed from the baseline provided by the “vaulted” copy of the storage volumes. Any analytical step can be traced back to the stored copy of the data. If a dispute arises, the analytical steps can be redone, step-by-step from the vouchsafed original or duplicate.

Unlike personal systems, many files stored on mission-critical class platforms like OpenVMS rely on large collections of purpose-written applications code. Unlike the vast volumes of documents on personal platforms created and accessed using mass-market applications, data files on server platforms are likely to be far less obvious. The contents of data files may be opaque or speculative when considered in the absence of the applications programs used to process the data. As an example, consider the RAD50 alphanumeric format popular for compressing filenames on the 16-bit PDP-11 series systems. Without knowledge of its use and construction, the stored names appear indecipherable. The actual programming used to create the files functions as a “rosetta stone”, clarifying the meanings of the different fields.

This is not an imaginary hazard. I once consulted on a corporate dispute where a similar issue arose. The validity and accuracy of the plaintiff’s database was called into question. The defendant’s expert asserted that the database was unreliable. One of the reasons for this purported “unreliability” was a perceived conflict in the relationship between the values in certain databases fields.

It took an extensive in-depth review, but I was able to determine, in excruciating detail, that not only were the contents of the fields reliable, but the reported “inconsistency” was actually a data recording convention required by long-published industry standards. The precise relationships were needed to enable clearing of warranty items between different sub-component manufacturers.

Thus, there is a need to preserve not only the data and related programs; but also their respective source base(s). The meanings and semantics of different data fields is not always self-evident. This can be a complex process, particularly if some aspects of the programming include utilities or components maintained or licensed from third parties.

The recent emphasis on “native format” for digital production makes this a particularly apropos topic. Native format is preferred for digital data, but the native formats used on OpenVMS and other server operating systems are frequently dramatically different than those used on more consumer-focused platforms. Ensuring the data is interpreted correctly is a vital component of ensuring an accurate outcome when disputes arise.

Isolating data for use in litigation can have complexities, both technical and legal. Achieving the most economical result in these situations is not merely a technical matter, it also requires professionals who understand the differing requirements and processes required to ensure an accurate result.

URLs for referencing this entry

Picture of Robert Gezelter, CDP
RSS Feed Icon RSS Feed Icon
Follow us on Twitter
Bringing Details into Focus, Focused Innovation, Focused Solutions
Robert Gezelter Software Consultant Logo
http://www.rlgsc.com
+1 (718) 463 1079