Computers and other digital devices have utterly transformed the manner in which people create, store and communicate information. This transformation is so complete that in many respects the technology is forgotten. Email is sent effortlessly. Text messages, blogs, Facebook pages and tweets are created, updated, and sent and in unimaginable numbers. Any fleeting inquiry or desire lasts no longer than it takes to query Google. Photographs are organized, banking is completed, and calendars updated with only a click.
The ubiquity and solitary nature of these activities foster a sense of privacy or anonymity. Surely the intemperate email or embarrassing picture are gone for good when you hit delete and the web page you viewed is no more once your browser is closed.
The reality is that nothing could be further from the truth.
In parallel with the relentless increase in computer speed and power there have been enormous increases in storage capacity and increasingly sophisticated operating systems and computer software.
In 1999 you could purchase a 19 gigabyte hard drive for $512. In 2009 a 1,500 gigabyte model costs $150. In ten years the cost of digital storage has declined from $26.95 per gigabyte to $0.10 per gigabyte. A similar explosion of capacity has occurred for the sort of flash memory storage used in digital cameras, cell phones, and USB thumb drives.
When storage capacity was both expensive and scarce, information was not typically stored in a hard drive or other medium without a direct request by the user. When you directed a computer to save or copy something, it would do so. Without an express request to save data, it would be lost when you turned a computer off. This was consistent with our experience with mechanical devices and other technology; you would not expect a typewriter or photocopier to make an extra copy of a letter and store it in the machine without your knowledge.
Modern computers and other electronic devices have taken advantage of this storage capacity in a variety of ways that are not transparent to ordinary users. Modern photocopiers, for example, are no longer optical mechanical devices. They are, in fact, digital scanners connected to laser printer engines. They contain embedded computers and, indeed, hard drives. This is why modern photocopiers no longer require mechanical collators. They simply scan an entire multiple-page document, store it to their internal hard drive, and then print multiple collated copies.
If you pull a hard drive from a modern photocopier and conduct a forensic examination, you will expect to find hundreds of scanned images. Without any request, and without your knowledge, your letters have been duplicated and stored electronically.
Similarly, as you use a web browser, it will, by default, store copies of each page you visit. This process, called caching, is designed to speed up the loading of web pages. If you return to a site you have already visited it will load more quickly without the need to retrieve it from the Internet.
Even files that are saved as a result of an overt request will often include significant amounts of information not immediately apparent to the file’s creator. This additional information is generically referred to as “metadata.” It will include information such as the name of the user that created the file, the date of creation, and in some cases, previous versions of the file or, in the case of photographs, serial number information from the camera used or even GPS information if the device used to capture a picture was so equipped.
The amount of data susceptible to forensic examination is further increased by virtue of the fact that the deletion of a file on a computer or other device will not usually result in the underlying data being removed. Typically the index entry for the “deleted” file is simply changed to indicate that the space previously occupied by the file is now available for use.
As the capacity of hard drives and other storage media has grown, the probability that the space previously occupied by a “deleted” file will be overwritten has declined. It is not uncommon for files that were “deleted” years ago to be readily recoverable by use of specialized forensic software. The same is true for hard drives that have been formatted. The process of formatting does not overwrite or remove the underlying data.
Beyond the obvious privacy concerns that arise as a result of the automated storage of vast quantities of information in a way that makes it extremely difficult for an ordinary user of technology to expunge, there is a significant risk of improper inferences being drawn from forensically recovered data.
One of the most common, but erroneous, inferences that is drawn arises as a result of the tendency to apply principles applicable to physical objects to digital information. In the case of a physical object located in a person’s home or office, it may be reasonable to draw some inference that they were aware that the object was present and that they had some means of controlling it. Depending on the location and form of data on a computer, similar inferences may be wholly inappropriate.
Erroneous inferences relating to the physical location of users and data are also common. These errors can be exacerbated by the common analogies used by forensic experts in an effort to explain the operation of computers and the organization of files. Analogizing computer file systems to index card systems, for example, calls to mind physical books and items stored in a location accessible only to people in the imaginary library. Electronic information is of an entirely different nature.
A user of a specific computer connected to the Internet may access and store files not only on that computer, but in innumerable other physical locations. The advent of the data center and the capacity to run applications remotely over the Internet disassociate the user and the location. Conversely, the location of the physical device on which data is stored does not determine who put the data on the device or where they were located when this occurred. A true user of a computer is not necessarily the person who is found sitting in front of it. Equally, the person sitting in front of a computer may well be utilizing some other computer system located anywhere else in the world.
What then does it mean when a file is “found” on a particular computer? It means very little without much more analysis.
The capacity for the legitimate remote use of computer resources is further complicated by the wide range of ways in which they are accessed illegitimately. Every week Microsoft and other software companies are releasing patches designed to plug security vulnerabilities in their software. They can only patch what they become aware of, and these patches are only effective if installed. Similarly, virus checking software is constantly updated as new threats are located. Again, these are only effective once a threat is identified and a countermeasure can be developed.
Computer viruses, Trojans, and back door software have advanced far beyond their original experimentation and digital graffiti purposes. They have become commercial enterprises. One of the common commercial uses of such software is in the creation of “botnets.” These involve the surreptitious use of large numbers of infected personal computers for purposes such as the sending of spam email. Recent botnets are estimated to have taken control of millions of computers at a time.
Once a computer has been compromised, the person or group responsible will typically have the capacity to utilize all of the resources and storage of the infected machine without the owner being aware of the activity. The creation of these networks has now been contracted out in the sense that access to them can be rented or purchased once they are established. The software used to establish these networks has advanced to the point that it will disguise itself to avoid detection or disable virus checking software that would identify it. The interface for the software can be as easy to use as any commercial software you might purchase.
One of the most serious concerns for network administrators responsible for security in large organizations is the prospect of custom-written software designed to compromise computers or steal information. Such software is now available for purchase on-line. A custom-written Trojan is, by its nature, extremely difficult to detect. Virus checking software and security patches are premised on discovering a threat and then either looking for it again or plugging its means of access. If a virus or Trojan is purpose-built and used once, detection is extremely difficult. The careful monitoring of things such as the amount or type of network traffic might reveal a problem, but this is far from reliable.
An unsophisticated forensic examination of a computer can result in an unsupportable conclusion that a machine has not been compromised by virtue of either locating previously installed virus checking software or a clean report from a virus scan conducted at the time of examination. Neither is a reasonable basis for such a conclusion. Malicious software may have disabled the defenses on the computer being examined or it may not have been identified so as to be locatable by the software used by the examiner. Alternatively, malicious software may have removed itself from the computer leaving behind only the files that it was responsible for creating. Modern Trojan software typically has the ability to remove the original file that delivered it and to remove itself once it had fulfilled its intended purpose
In order to bring the required degree of scrutiny to bear when confronted with digital evidence, a thorough and nuanced understanding of the underlying technology is essential. Without this, there is a very real probability that the wrong inferences will be drawn. While retaining a non-legally trained computer forensics expert to review an initial investigation is often a necessary first step, it will often be insufficient on its own. Cross-examination of a computer forensics expert without subject matter expertise is similar to cross-examining a witness through an interpreter — it’s difficult and usually ineffective.