Computer Forensics Concepts
A computer is simply a programmable machine that receives input, stores and manipulates data, and provides output in a useful format. Inputs are usually through a keyboard and mouse (although microphones and cameras are increasingly common), data is manipulated by the “Central Processing Unit” (the “CPU”), normally stored on a hard disk drive, and outputted through a monitor. This does not mean that all computers must have all these parts.
You may think of your computer as the white box sitting under your desk with the seemingly impossible to organize mess of wires coming out the back. However, it is important to realize that computers are everywhere, and, by extension, so are computer forensics. It is often the computers people do not think about that contain the most interesting and private information.
Your home is full of computers. It is possible to take meaningful forensic data from the following devices:
- Cell phones (Blackberrys, Iphones etc),
- Routers (internet gateways)
- Ipads and Ipods,
- MP3 players,
- Watches (if equipped with calendars and contacts),
- Game consoles (Xbox, Playstation, Wii), and
- Digital phone systems.
The list will grow every year, with refrigerators, home theatre components and lighting systems all poised to store information about what users do, and (most attractively) what they buy.
The parts of a computer – what’s in that white box?
CPU (Central Processing Unit) – Sometimes called the “Brain” of a computer, this will be a microchip manufactured by companies like Intel and AMD. Calling it a “brain” is somewhat misleading, as the CPU can’t store anything, and, unlike a brain, has no memory.
RAM (Random Access Memory) – RAM consists of small computer chips specially designed to hold information in the form of electrical charges. To continue the human analogy, this would be the computer’s short term memory. If the computer is turned off, the RAM is lost, just like the image on the monitor or an unsaved word document.
Hard Disk Drive (“HDD” or “Hard Drive”) – Explained in more detail below, the hard drive is, by very rough analogy, an electromagnetic record player that stores information on spinning disks, contained in a box the size of a thick slice of toast. Most computers will have one mounted inside the white box. External hard disks are becoming more popular, and plug into the computer, providing additional storage.
Other storage – The types of storage are varied. CDs, DVDs, thumb flash drives and other items can be plugged into a computer to store information. All these devices may be important from a forensic point of view, and are often seized and examined along with the white box.
A hard disk drive (HDD) is a non-volatile (it does not lose data when you unplug it), random access (it stores data piece meal, and not in a line like a record player) device used to store digital data. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read and written on the platter by read/write heads that float on a film of air above the platters. Hard drives are extremely sensitive instruments. Physical impact, power surges, and even air pressure changes can result in their immediate failure. It is important to note, however, that although a hard drive may have physically failed, it may still be possible to pull data from the drive.
Hard disks use the magnetic reading head to switch the millions of spots on a platter to either on or off. As will be explained in more detail later, computers, as a digital device only have two options for a piece of data: on (represented by the number 1) and off (represented by 0). More will be discussed later about how these ones and zeroes become photos or video.
Why is the disk hard? Because it is not floppy. During the evolution of personal computers, the floppy disk was the primary source of storage. Hard disk platters are stiff, metallic objects, making them “hard”.
Introduced by IBM in 1956, hard disk drives have fallen in cost and physical size over the years while dramatically increasing capacity. Hard disk drives have been the dominant device for secondary storage of data in general purpose computers since the early 1960s. Their evolution has been dramatic.
A useful starting point for comparison is the old floppy disks that have now fallen out of use. They held roughly 1 ½ megabytes of information. A megabyte is about 600 pages of plain text worth of data, meaning the old floppies held about 1000 pages of data.
During the evolution of the world wide web in 1991, a hard drive might contain 20 megabytes of information, yet still operate early versions of Microsoft Windows or Apple’s Mac software.
Today, hard disks are being measured not in megabytes (“MB”), but in gigabytes (sometimes referred to as “GB” or “Gigs”), which is approximately 1000 megabytes. The term “approximately” is used due to the nature of the measurements of data in powers of two, resulting in a gigabyte actually being 1024 MB. Even the mighty gigabyte is being supplanted by the terabyte, which is approximately 1000 gigabytes.
It is entirely common to see multiple terabyte hard drives in computers. This means that when compared to the 20 megabytes of the computers thirty years ago, modern hard drives are 100,000 times larger, and cost about the same. It would be apparent to most readers that humanity has not become 100,000 times smarter in the last twenty years, so the question arises: what is filling up these giant hard drives?
Much of the space has been claimed by huge image and video files, along with an insatiable appetite for digital music. When compared to things like emails, documents, and spreadsheets, movies, music and photos (“media files”) take up massive amounts of data. A single photo from the newest digital cameras will occupy the entire space of that 20 megabyte hard drive from 1991. Hi Definition movies run dozens of gigabytes each. These types of files are huge because of the complexity of images. A picture may be worth a thousand words, but it takes millions of bits of information to recreate them realistically.
However, media files are only part of the story. The reality is that computers are increasingly performing operations, collecting information and storing it in a myriad of ways that have bloated storage requirements. Part of this has resulted from the exponential increases in storage size; some software grows to fill space like the unwanted junk in a basement. The space is there, so software developers increasingly use it.
Microsoft Windows, for example, has grown in size from 5 megabytes to 10 gigabytes (10,000 megabytes) as of February 2011. While it is no doubt a superior product, the 2,000 fold increase in size is not reflected in a commensurate increase in functionality.
These numbers are not meaningless. The fact that computers are becoming more incredibly complex each passing year has two significant concerns for the law:
- Computers are increasingly recording and storing the private activities of users without their knowledge; and
- The complexity and dynamic nature of computers and their software create real problems for experts.
Solid state drives (“SSD”)
These drives perform the same function, but are physically different from hard disk drives. As the name suggests, they are based on solid state storage and use microchips which retain data in non-volatile memory chips and contain no moving parts. Non-volatile memory chips are chips which do not lose their contents when the computer’s power is turned off.
Compared to traditional HDDs, SSDs are less susceptible to physical shock, quieter, and can read and write data more quickly. Some SSDs use the same interface as hard disk drives and can thus easily replace hard drives in many applications. As of 2011, SSDs are still considerably less common than hard drives; however, their use is growing, expanding as quickly as their price declines. The only barrier to the complete replacement of hard disk drives is cost. SSDs are still much more expensive than HDDs of any given size.
Solid state drives are particularly beneficial when used in laptops where their small size, low power consumption, and resistance to damage are particularly important.
The way in which data is stored on SSDs creates a number of special challenges from a computer forensics perspective. In order to maximize the lifespan and speed of SSDs, the controller chips in these drives will often rearrange data once it has been written to the drive and erase portions of the drive not currently in use so as to be ready for new information to be written quickly. As a result, some of the artifacts which would be located on a HDD may not be available when performing a forensic examination of a SSD.
Information stored on Computers
What are all these 1’s and 0’s?
We use the terms “digital music” or a “digital camera” but we rarely consider what makes something “digital”. Digital simply means any system that uses discrete (black and white) values as opposed to continuous or analog values (shades of grey). It is useful to consider the root of the word, which is derived from digitus, the latin word for finger. Think of the computer as only understanding one finger, and it is either up or down, on or off, 1 or 0. Being an electrical device, the computer can, at its most fundamental level, only see “electrical charge” or “no electrical charge” when it processes something. These changes of state are commonly described as 1’s and 0’s.
Of course, the complex computations required to run modern computers require a much more complex language than just 1s and 0s. As early as the 17th century, mathematicians had examined the ability to represent complex information in the form of ones and zeros, so early computer engineers simply adopted these rules for computers to use. This is the code we now call “Binary”, named for the root “bi” due to the two options (on and off; 1 and 0) available.
Binary creates ever more complicated strings of information out of the 1’s and 0’s. Understanding binary is one of the more “math heavy” aspects to computer forensics, and will be intimidating to those lawyers who would have been doctors had they done better in algebra. While you will never be asked to translate 1’s and 0’s into language (that is what we have computers for in any event), it is important that you not be “dazzled” by an expert’s use of these terms. With that preface, here are the basics of binary:
It is not so very difficult, really. Binary numbers use the same rules as decimal (think of decimal as being rooted in “10”, an arbitrary number based on how many fingers we have). The value of any one digit always depends on its position in the entire number.
It all gets down to bases. Decimal uses base ten, so that every time a number moves one position to the left in a figure, it increases by a power of ten (e.g., 1, 10, 100, etc). Binary, on the other hand, uses base two, so each move to the left increases the value by a power of two (e.g., 1, 2, 4, etc).
Remember, each of those 1’s and 0’s represents an electrical “on” or “off”. A particularly handy size chunk of computer memory happens to be 8 “bits” long (a string of 8 1’s and 0’s). This size chunk of memory can be used to represent any number from zero (00000000) to 255 (11111111). Why does 11111111 (base 2) equal 255 (base 10)? Because it means:
1 x 128 + 1 x 64 + 1 x 32 + 1 x 16 +
1 x 8 + 1 x 4 + 1 x 2 + 1 x 1 = 255
Again, we add up the numbers each “bit” represents, and it produces 255. The importance of “8 bit” strings becomes clear when we start to translate “computer language” (binary) into “human language” (Letters and symbols on your keyboard). If we want to represent all of the characters of the English alphabet, 8 digits is the first combination that gives you enough possibilities to do this.
The name for a chunk of memory that is 8 bits long is “byte”. This is the basic unit used to measure computer memory size. “Bits” and “Bytes” are the building blocks of understanding modern computers.
Text characters are represented in computer memory as numbers. This is done through an arbitrary set of standards called the American Standard Code for Information Interchange (“ASCII”). The capital letter A is represented by the number 65 in the ASCII code (65 is 01000001 in binary). Why? Because the first 65 ASCII codes (0 through 64) are used for an assortment of control characters and symbols, meaning the alphabet started at 65. Capital B is 66 (01000010), and so on.
|ASCII Representation of Characters|
|Character||Base 10||Base 2|
Some alphabets that contain many more letters than English (such as Japanese or Chinese) and a newer version of the ASCII scheme called Unicode is now used (it uses two bytes to hold each letter; two bytes give 65,535 different values to represent characters, due to the doubling effect of binary language.
Pictures are also represented as numbers in the computer. Computers produce images through the use of Picture Elements, which is shortened to “Pixel”. You have likely heard this term in relation to digital cameras, which measure their resolution in how many pixels make up a photo. Since this is in the thousands, they are referred to as “Mega pixels”. Each pixel in a photo or image is usually represented by three “bytes” in the computer; the numbers in the bytes tell the display how much red, blue, and green should be mixed together to make the color of the pixel (three bytes can represent millions of possible colours for each pixel). Thousands of these three byte pixels start to add up, which is why photos and videos are so large in comparison to text.
We sometimes hear terms like “16 bit” processors and “64 bit” versions of Windows. These simply refer to the length of bit chains that can be processed by the computer. These processes take the form of “integers”, “memory addresses” or other data units that go beyond the scope of this page. However, an understanding of binary allows us to conceive of the differences.
Writing out numbers and letters as 8 unit chains of ones and zeroes is time consuming and difficult to assimilate quickly. Hexadecimal (“Hex”) is a method of using a system of shorthand symbols to represent each of the possible bytes of data as a pair of numbers and letters. It is, practically speaking, the closest a computer forensic analysis will come to the raw 1’s and 0’s. We have seen that the “human” (decimal) way of representing 122 is “122”. In binary, it would be 11100111101. In Hexadecimal, it would be “7A” a much shorter and more palatable way of examining writing a figure than the long binary code. You need not understand the conversions between Hexadecimal and binary, but it is useful to know what experts mean when they say “examined the Hex” while looking at a computer. It means they are looking at a short hand version of the actual 1’s and 0’s on the computer, and is a very detailed, math-based examination.
These binary and hexadecimal systems are all methods of representing information in a form other than the letters and numbers to which we have become accustomed. For instance, the word “Nerd 101” is depicted as follows in the various codes:
|Nerd 101||01001110 01100101 01110010 01100100 00100000 00110001 00110000 00110001||4e 65 72 64 20 31 30 31|
How does the computer handle all these “bits” and “bytes”?
Recall that at the core, the hard drive of a computer has tiny magnetic dots that are either “on” or “off”. Eight of the dots when lined up equal a “byte”. You will also recall there are over a trillion bytes on a 1TB (Terabyte) hard drive, that would cost a consumer about $50 in February 2011.
Due to the staggering amount of information on the hard drive, there is a system of dealing with, or “addressing”, the data. The platters on a hard drive are divided into “tracks” and “sectors”, which could be equated to the letters and numbers on a map that allow the drive to find a location.
Since most hard drives are made up of multiple platters, a piece of data also needs a third descriptor (the “cylinder”), or platter on which the data is stored.
For a computer to access all these pieces of data, it requires a “file system”. This is distinct from an “operating system” (Windows, Apple etc.) that we are used to dealing with, and will be discussed below. The “file system” does not produce graphics, speak to other computers, or let us send email. Its job is simply to control the management of the bits and bytes on the hard drive in a way that the operating system can use. There are a number of “file systems”, and they, like computers, have evolved over time. The most common file system on current computers is the New Technology File System (“NTFS”), which was “new” in the early 1990s. For Apple computers, the file system is called the Hierarchical File System (“HFS”). The other file system still in use is the File Allocation Table (FAT) system, which one finds commonly on thumb drives and other smaller electronic storage devices like digital cameras, cell phones, etc.
There are significant differences between NTFS and FAT, and some of these will be discussed below. The crucial concept from a lawyer’s perspective is understanding that it is the “file system”, not the “operating system” that actually moves data around on a hard drive.
Since the management of the trillions of tiny dots is unwieldy for the computer, the various bits of data are clumped together into groups for quicker access. Hard drives group 512, 1024 or 2048 bytes (remember the byte is a collection of eight 1’s or 0’s) into each sector. These sectors will then be grouped together (usually in groups of 8 or 16) to form “clusters”. This clumping of data into larger pieces cuts down on the time it takes to find pieces of data. The decision of how large to make clusters is one of compromise. Since the cluster is the smallest piece of data a hard drive can access (for reading data or writing), if the cluster size is too large, a huge amount of space is wasted. If it is too small, there are too many pieces of information. Analogizing the hard drive to a library can assist.
The Library Analogy
Caution with Analogies
When presenting digital forensic opinions, experts will almost invariably use the library analogy. As will be discussed in the other material on legal issues in computer forensics, the library analogy has several limitations. It is useful as a teaching method, but quickly becomes misleading when considering a computer user’s knowledge of what is on the hard drive.
With that caution in mind, if we think of the hard drive as being a library of encyclopaedias, the encyclopaedias themselves can be (loosely) compared to “files” on a computer. Each of the “clusters” of data can be thought of as books in the encyclopaedias. The files are what we as humans can perceive as the chunks of data on a computer, be they word documents, photos, or spreadsheets.
It is too cumbersome and slow for the library to keep a catalogue of every word that is located on every book in the library, but it does keep track of the books themselves. The size of the books (clusters) will have an effect on the efficiency and usability of the library. For the most part, clusters in current computers contain 4 kilobytes, meaning 4096 bytes.
For instance, if the books (clusters) were set at being 4096 words in size, and a particular encyclopaedia(file) only took up 10 words (which is common in computers as files can be very very small), then 4086 words (bytes) of that file’s book (cluster) would go unused. One of the “rules” is that books (clusters) cannot be shared, so even though 99.9% of the space in that cluster is unused, no other encyclopaedia(file) can be placed there. This results in wasted space, as if the library needs to reserve sometimes huge amounts of book space to a tiny encyclopaedia. The prohibition of sharing allows the computer to keep track of information only at the cluster level, and not have to think about each individual byte or sector. This results in much faster accessing of information.
Sometimes a file on a computer is large, and requires several clusters to hold its data. While in a perfect world (and any library known to humanity) the books of an encyclopaedia are kept in sequential order. This is not always possible on a hard drive. Since data is constantly being written, deleted and replaced with new data on a computer, things can get split up and mixed up, much like a well-used shoe closet. There will be more on deletion below, but the end result of all this is that clusters of files (books of an encyclopaedia) are very commonly split up all over the library and not in any particular order.
This is called “file fragmentation”, and it has two consequences. First, as the files become more fragmented, it takes the “file system” longer to find them all and read or change them. Second, when one actually tries to read the physical sectors of a hard drive in order (using special forensic tools), the result is often a jumbled mix of information.
Just as humans have a difficult time using a library without some form of cataloguing system, computers cannot keep track of the trillions of bits of data without one. The “file systems” referred to above set out the structure of the catalogue. For instance, in the NTFS file system mentioned above, the file system keeps a card catalogue for the library called the “Master File Table” (“MFT”). This single file keeps track of all the clusters of all the files on a computer. The FAT file system is named for its card catalogue: the File Allocation Table.
One of the most important concepts of computer forensics is the deletion of data. Commonly, the most sought after material during a computer forensics examination is the deleted data. Understanding the process of deletion is imperative.
As a computer file system fills the respective hard drive with information, it writes the information for a given encyclopaedia(file) into the books in the library, and makes a corresponding reference in the file table, be it MFT or FAT, the card catalogue. It also stores a lot of information about the manner and timing of data creation, which will be discussed below under “meta data”.
When a file is deleted, one might think it is wiped from the hard drive. This is not the case. The reality of the magnetic data stored on hard drives is that it is persistent. It will not disappear unless something changes it. When a file is deleted, the only thing that is changed is that the card in the card catalogue (the reference in the file table) is changed from “taken” to “available”. All the thousands and millions of 1’s and 0’s do not change. They stay exactly where they are.
The reason for this is simple. While the computer could change all the 1’s and 0’s back into 0’s (which means no data), that process takes time. It would create wear on the hard drive, slow down computer usage, and generally detract from a user’s computer experience. The result is that no modern computer actually deletes the information on the books from the encyclopaedias; the books and all their information remain.
When the space is marked as “available” the file system may write over the book with other information in the future. If there is other space available on the hard drive, it is possible the file will never be “overwritten” and deleted data can persist for the entire life of a computer.
Typical users not utilizing special computer forensic tools cannot see these files. To a user, the file is not there. The file system does not allow the user to view the deleted file, and it appears gone. However, the encyclopaedias are there to be read by anyone with the special tools to see them.
If the card catalogue has marked the book (cluster) on which deleted data was held as “available”, and the file system happens to write a new, different piece of data on that book, it may result in the destruction of the data. Two things affect this. Consider the book analogy. If the “deleted” encyclopaedia(file) was 400 words long, and the file system used the book some time later for a new encyclopaedia, but that encyclopaedia was only 50 words long, what would be the result? The first 50 words of the deleted file would truly be deleted, having been overwritten with new data. However, the remaining 350 words would still exist in an area called “slack space”. “File Slack” refers to the unused portion of a book (cluster) that exists whenever a file (encyclopaedia) or piece of a file that is smaller than the cluster (typically 4096 bytes) is stored there.
The result is that even when a file is deleted, it sits unchanged on the computer until a new file is written in its place. Even then, much of the file can remain if the new file that is written on top was smaller.
When computer forensic examiners examine the space on a hard drive that has never been used, used and then marked as available (deleted), or the “slack” space at the end of a cluster, they use the term “unallocated space”. “Free space” or “Empty Space” would both be misnomers, as we can now see, there may a huge amount of data in these areas. All that can be said is that the file system has not allocated the space to any particular file in the file table (card catalogue).
Now we have a basic understanding of the 1’s and 0’s that make up data, the way in which a computer stores that data on a hard drive, and the limitations of deleting data, we can move on to consider the “higher” functions of the computer that are relevant to computer forensics. These functions are determined by commercial software that is constantly being revised, updated, and fixed. Keeping track of all aspects of this software has become an impossible task. While some fundamental aspects of some versions of Windows remain the same, others have changed radically since the days of Windows 95 a few years ago. The multitude of nuanced differences between various operating systems and software suites is well beyond the scope of this page, and, in fact, beyond the abilities of any single individual.
Meta Data – hidden collection of information
When a user at a computer performs a basic action such as saving a letter they were working on, they may think they have created that file somewhere in a computer and that is all. The reality is much more complicated. Computer forensics takes advantage of the massive amount of meta data (data about the data) in forming opinions about computer usage.
File Meta Data
Unknown to most users, computers keep detailed records of the activity surrounding a file. The “card catalogue” (File Table) keeps track of the date and time a file is created, the date and time it was last modified, and even the date and time it was clicked on (accessed). This modified, accessed, created data is sometimes referred to as MAC data.
This becomes important, as a forensic examiner can pull this data from a file table and testify about when a file was downloaded, copied, opened, saved, etc. This sometimes leads to suggesting inferences that can be drawn from the existence of material on the computer. An examiner may state that since a particular document file with damaging contents was created 10 minutes after an accused checked their email, a court could draw an inference that it was the accused that created the file.
Sometimes the file meta data will be used to suggest that a file was not merely downloaded, but that it was copied, opened or altered in some way, denoting further awareness of its existence.
One of the more significant aspects of meta data is that, since it is stored not with the file itself, but in the file table, meta data is often (but not always) lost when a file is deleted. Generally speaking, when examiners find files (or parts of files) in “unallocated space”, it is impossible to tell when the file was created, changed, or deleted.
Temporary internet files
Computers will generally download images and other information from web pages and store them in a temporary holding area on the computer. This is an attempt to speed up web surfing. If a user goes back to the same page, the computer can pull up these files it has already downloaded, considerably reducing the time to load the page the second time.
Once a user navigates away from a web page, it is not gone. Much of the page has been stored on the user’s computer without their knowledge. Although users can delete their “temporary internet files” through prompts in their web browser, few do, leaving a huge amount of data about web pages they visited for months or years on their computer.
Windows has a central control centre set of files, collectively called the “Registry”. Unseen to most users, the registry routinely collects information about what programs are installed on the computer and when, the settings for times and dates (and if they have been changed), and the web addresses that have been typed into a browser.
The Registry will also keep track of “Security Identifiers” (SID), which assign a particular user account to some of the traceable actions that are registered. On a computer with many user accounts, this is sometimes used to decipher which user may have been taking the action in question on a computer.
To properly communicate with printers, computers use a “spooler” to feed data out (as if it were coming off a film spool) to the printer. The origin of this process came from a desire to be able to print a document, and continue working in an application without having to wait for the document to completely print. With spooling, one can click “print” and immediately continue working. Because of spooling, the computer will retain pictures of the documents, web pages and photos that have been printed, and, sometimes, a description of the files from which the data was drawn. These images (generally called .spl and .shd files) are not viewable by a normal user.
Windows will retain a list of recent documents that have been opened by a particular user account. This process is evident to users, who can see recent documents in their start menu or the “File” menu in Microsoft Word. It is important to note that if a file is deleted, the corresponding reference in the recent documents list is not removed, and will persist until overwritten with other activity.
“.lnk” (Link) files
Commonly referred to as “shortcuts”, .lnk files are the method of allowing a user to have convenient access to files that may be buried in a mess of folders. The most common place to see shortcuts is on the windows desktop. However, .lnk files can exist in a large number of places, and there can be thousands on a given computer. Even if a target file is deleted, the .lnk file will maintain the meta data about its creation or change.
.lnk files can be used to track every thumb drive or other storage device ever connected to the computer (users are generally unaware that these devices have unique serial numbers, and tracing their use is generally quite easy). If a forensic examiner can establish the creation of a .lnk file associated to a particular thumb drive serial number that was created under a particular SID, they may draw the conclusion that a particular thumb drive was inserted in a computer, and certain documents were copied to the thumb drive using a particular user account. These timing, identity and action conclusions are used to ask finders of fact to draw circumstantial inferences.
MFT (Master File Table)
Referenced previously in discussing the NTFS file system, the MFT is a massive file that tracks the creation and location of every piece of data on a computer. It can grow to be gigabytes in size. The MFT tracks not only the MAC (modified, accessed, created) data about files, it tracks where on the drive the various parts of a file are found, and whether the file has been marked “hidden” or “protected” (read only), as well as contains other data.
The MFT is a hidden file totally invisible to all users not utilizing forensic tools. Even those familiar with the existence of an MFT on NTFS computers will probably not know a mirror back up image of the MFT is automatically maintained by the file system. Most knowledgeable users familiar with MFTs would have a difficult time deleting or altering both copies without advanced forensic tools.
NTFS Log File
To protect a computer from total failure due to crashes, power outages and other interrupting events, NTFS creates a log of all activities taken in relation to all files on the computer. This log file is a large collection of all the meta data associated to the files on the computer. The justification for this detailed monitoring of a computer’s use is that if there were an interruption like the ones mentioned above, and the various tasks the computer was carrying out were not complete, the computer would have a “to do” list as soon as it came back online.
The consequence of this log is the existence of a complicated file on the computer whose sole purpose is to track the activities on that computer.
In another bid to prevent catastrophic failure of the computer, Windows will, from time to time, take a snapshot of the registry and other configuration data on the computer. These snapshots are taken at a user’s prompting, whenever a significant change (such as installing a program) is made, or at certain time intervals determined by the settings on the computer. These restore points can provide information about the settings, users, programs and configuration about a computer from a specific date, and can provide what is akin to an archeological record to forensic examiners.
For instance, an examiner may be able to determine that a certain change to the computer’s date and time settings were made in between two restore points, or that a certain piece of software was installed. If that software was related to a police investigation, a judge could be asked to find that, for instance, after an accused was notified of a police investigation, he installed encryption and file wiping software on the computer, then deleted the software at another time and used “system restore” in an attempt to hide all this activity.
Recalling that RAM acts as a computer’s short term memory, one can liken a page file as an overflow for RAM. Since the computer may have a large amount of processes and programs running, it will often take more space than is available in RAM to keep these things all going at once. The operating system will create a small area on the hard drive called a page file, swap file or cache, where it will deposit small bits of data that it is using to complete a process. It is called a page file because the small pieces of data being moved around are called “memory pages”.
This could be analogized to a baker who is making multiple batches of cookies. He does not wish to have to return to the refrigerator for the milk every time he adds it to the recipe, so he leaves it on the counter. If we liken the refrigerator to the hard drive’s regular storage, and the counter to the page file, we see that the computer can keep everything at arm’s reach there.
But, just as counters sometimes hold traces, the page file holds pieces of data that can be examined. Its contents are often jumbled. However, since it is a place where the computer stores information on what it is currently doing, it is sometimes suggested that it holds information about the recent use of the computer.
As stated earlier, RAM is information that is held in a volatile (non-persistent) state. If a computer is turned off, the RAM is, essentially, wiped. Until recently, forensic practice guidelines demanded that a target computer was unplugged without any use prior to examination. This was an attempt to ensure that no changes were made to the computer as a result of the examination, thereby tainting the results.
This process ensured the destruction of RAM is being questioned as a best practice.
What Happens to a File?
As we can see, there are a multitude of processes being executed with no knowledge of the computer user. When a user saves a photo from a Facebook page the following occurs:
– the Facebook page address is stored in the registry;
– the file was already saved in temporary internet files as a “thumbnail” (small image), and full size image if it was clicked on;
– the data saved to a number of clusters on the hard drive;
– the Log file records that the file has been created;
– an entry for the file is created in the MFT, along with the time and date of its creation;
– the file is likely copied reside in RAM, and the page file;
– in most cases, data about the date, time, camera and settings about the photo are recorded in its code and now carried with the photo.
If the user opens the photograph file and prints it, the following also occurs:
– the MFT file “accessed” time is updated;
– the Log file records this action;
– the photo is stored in the “recent documents” area;
– once again, it is copied to RAM and the Page file;
– a copy of the photo is saved to the print spool files.
If the user “deletes” the photos, the following occurs:
– the first character of the file’s computer data is changed to indicate it is deleted;
– the MFT file entry is marked as available for re-use; and
– the data for all the above records, copies and logs are available until they are overwritten, and practically speaking may exist for the life of hard drive.
In the modern connected world, it is rare that a computer exists in solitude. Most computers are, at some point, connected to other computers. This may be as part of an office or home network or to millions of computers through the internet. While the internet is an infinitely more complex entity, it is essentially just a very large collection of computers that can share information. Some of these computers are gigantic, such as Google’s huge collections of machines that service its 34,000 searches per second. Other can be small private computers “hosting” a hobby blog about fly fishing.
There is little difference in practice between these machines. They each have data stored on them that can be shared to other computers over the internet pathways. Every time you navigate the web, go shopping or do online banking, you are merely asking other computers to send you material so you can view it on your computer.
All this communication needs to be done in an orderly fashion. As part of the development of computer networking, various protocols were invented to organize this communication. The dominant protocol now is, fittingly, “Internet Protocol” (IP) which, as of February 2011, uses a set of four numbers between 0 and 255, separated by periods, as an address system. An IP address will look something like “18.104.22.168”
The IP address is sometimes likened to a phone number. There are challenges with this analogy, but it is a useful analogy for education purposes.
Although there is no one entity, government or otherwise, that controls all the internet traffic, there is the “Internet Corporation for Assigned Names and Numbers” (ICANN), a not for profit NGO located in California that sets out the standards for IP addresses and website names.
When a user enters a website address like www.google.com, it is called a “Uniform Resource Locator” or “URL. The standards set by ICANN are used in a directory called a “Domain Name Server” (DNS) to convert www.google.com into 22.214.171.124. DNS can be roughly equated to a phone directory. The numeric IP address is used to find the computer on the other side of the internet (in this case, Google’s giant servers) associated to that number.
Once the computers have connected to each other, Google’s computer will send the home computer information like text, pictures, and programs that are running inside the home computer’s web browser.
We think of programs as being useful software like Microsoft Word or Adobe Acrobat. Unfortunately, malware exists. The term malware simply refers to any program, big or small, that secretly causes a negative impact on a computer without the owner’s consent. Malware’s initial form could be more accurately called “prankware”. Computer programmers since the 1970s made a sport of inventing secret programs to annoy others, like forcing a machine to display a message like “Merry Christmas” or jumble the keys on a keyboard.
This practical joke beginning of these programs has evolved into an incredibly sophisticated, for-profit industry that sets out to create new methods of controlling unprotected machines to collect private financial information, store and distribute illegal material, or even be recruited to infect and take over other machines.
The term “virus” is particularly apt for these programs. They are hidden, small and built to replicate their own code over and over again unless the system’s defences recognize and destroy them.
These programs go by a variety of names such as Viruses, Trojan horses, Worms, Rootkits, Keyloggers and others. At their heart, they are custom designed software created to carry out their tasks in secret. They arrive on computers by web pages, CDs, thumb drives, email attachments and a myriad of other ways.
Antivirus software companies run a constant campaign to update their software with the ability to recognize and inoculate or cure systems. “Firewall software” is used in an attempt to prevent connections from computers that do not seek to benignly communicate with a user’s computer, but instead send an invading force of malware.
The reality is that an unprotected computer attached to the internet would be compromised from intrusion attempts or malware introduction within minutes, if not seconds.
We have seen the somewhat surprising amount of record keeping our computers do without our knowledge. Since few users have the skill to develop their own operating system (although with systems such as “Linux”, this is possible), encryption is the only method of keeping a computer entirely private.
Encryption is a process (carried out by software) of scrambling all the 1’s and 0’s discussed earlier into an incredibly complicated code that requires a “key” to translate. If an examiner looks at a drive that has been encrypted and does not have the key, the data will be totally incomprehensible.
Recognizing a market need for privacy and security, Microsoft has, for years, included encryption as an option in Microsoft Windows. The web browser “Firefox” encrypts the temporary files it downloads.
As with all locks, the location of a key is a fundamental weakness, leading to some encryption software being defeated in a few key strokes. Others, such as open source “True Crypt” have staggering protection. Using encryption keys that are 256 –bit (256 1’s and 0’s lined up), it produces a number of possible codes greater than all the atoms in all the stars in our galaxy.
Encryption is often referred to as a method of hiding illegal material or activities on a computer. This is true, in the same way front doors on houses hide illegal activity inside.
One odd term often used in computer forensics is “Hash Value”. Hashing is a method of taking the data contained into a file and representing it with a short string of numbers and letters. The term is derived from its use in cooking, where component parts are chopped and mixed to produce a different result. A somewhat awkward analogy would be looking at a hash value as the “unique serial number” for a file. This number is much smaller than the file itself, and is produced by complicated processes that are defined by the particular hash protocol that is used.
A comprehensive understanding of how hashing works is beyond the scope of this page. The significance of a hash value for a file is that it is unique. It is generally accepted that no two files will have the same hash number. This allows computer forensic examiners to conduct an examination of a computer with two useful lists of file serial numbers:
- Files that should be ignored, as they are part of an operating system or widely used program. Since nothing interesting will be found in these files, they may be excluded (in some circumstances) from examination.
- Files that are illegal material. Authorities in the United States and Canada keep comprehensive lists of the hash values for all known child abuse imagery or other illegal material. Having the list of serial numbers makes it easier to search for such material on a given computer without having to actually look through the thousands of images and movies that will show up in such a search.
While it is theoretically possible to create a second file with the same hash value intentionally, the practical reality is no two files you ever run into will have the same hash value.