Thursday, March 1, 2012

File Tags and Digital Forensics

                So I was bored one day and decided that with school and work, I just wasn’t busy enough and I needed a new project.  I wanted to do some research, preferably in the realm of digital forensics.  I started talking to my buddy Rob Marmo (@robmarmo) and the idea of looking how Windows utilizes the tag properties within certain files came up.  Eventually, after some thought and consideration we decided that it would make for an extremely fun project that would make for a great booster for our experience.  As it ends up it was a great learning experience in doing research and developing a way to utilize the information.


                If you are not familiar, Windows introduced the ability to “tag” files in Windows Vista.  This feature allows the user to add a desired keyword to certain file types so they could easily be categorized and searched within the Search function they included within Vista as well.  Here is an example of how you can tag your files:

Your other option for tagging your file is via the file properties option for the file.  If the file format supports tagging you will see the option available within the “Details” tab, like so: 

These file tags are a great way to organize information in a user customized way.  To me, that is why file tags are quite interesting from a forensic point of view, as these tags are inputted manually via the user’s input.  So it’s quite possible to say that if a file is tagged with a certain keyword it may provide that file with an added value of interest to an investigation.

How It Works

                After a lot of looking and failed avenues, I was finally able to get a handle on how Windows implements the ability to File Tag.  First, you may have noticed that I said only certain file types can be tagged.  I don’t deliberately state the files that are tagged because it is a variable.  When you begin the tagging process, simply put, the tags are inserted into an XML that is then inserted into the file.  How this is handled varies on the file type.  This determination is handled by reading the registry.  If you look at HKEY_CLASSES_ROOT\SystemFileAssociations in the registry, you will find a list of various file types registered within the operating system.  Each file type has two keys of interest for our purposes the “FullDetails” and the “PreviewDetails”.  Within these keys are several values, and the one for file tags is “System.Keywords”.  This value says there is a way for Windows to store the tags within the file.  The actual injection of the XML is done via a property handler that is based on the file type, often done via a DLL. 
            There is another part the file tagging system, beyond simply adding the keywords to a file.  As you recall, I mention that this implementation was intended to allow for quick searching of files by these keywords via the Windows Search function.  This plays an interesting point with the file tagging system.  As you enter in these keywords this creates an interaction with the Windows Search Index.  I created a quick animation GIF to show how the Search Index works as I understand it:

As you can see, the search index works with a data store.  Those data stores work with the Search function and filters, as well as a notification system.  If certain parameters are met they are then gathered and sent to the System Index, which is actually a database locally stored on the system at: 

C:\ProgramData\Microsoft\Search\Data\Application\Windows\Windows.edb *
*It should be noted that this  location can be altered via the Index Options in the Advanced options.

This is a interesting database and seems to be proprietary to the Microsoft Operating System and is utilized by a few of their applications.  If you’re interested in taking a look at the database, I was able to by doing the following:
  • Turn off Windows Search Service (This essentially unlocks the DB so you can use it)
  • Navigate to the Index location
  • Make a copy of the index EDB file
  • You can turn back on Windows Search Service
  • Download and install an EDB Viewer (I suggest EseDB Viewer from Woanware)
  • Open your copied EDB file in the viewer

Viewing the Search Index provides an interesting look at all the information that your Windows systems can analyze in a short amount of time.  As best as I can tell the Search Index database is utilized by the tagging system for the preview generation.  While you’re inputting a tag, often time you will find that upon entry once you type in a letter you will be given option of tags.  These are tags that have previously been applied to files, and from what I can tell these are pulled from the Search Index database.  The “PreviewDetail” key discussed earlier is how the registry knows you can get these tags via preview and then the property handler works with the index. 

Interest in Recovery

                Again, each file type handles tag in a different way; it is dependent upon the property handler assigned to it.  How these handlers deal with the data and inject it into the files is different, and through my testing of various files types and how they dealt with the file tags I came across an interesting discovery.  Some of file types store tag data not only within the file XML but also in various parts throughout the file.  In testing I was able to see when tags were added or modified, and even if the file tags have been deleted (or not viewable by the OS as Windows will only read tags from the XML), they are still located within the file and viewable within the hex.  Now, this only occurs in certain file types with, dare I say, inefficient property handlers.  Looking at the new Office formats like .docx it handles XML with a higher proficiency it writes most all its properties to XML and then archives it and then is read via Word or whatever.  To me that makes for much more efficient way to deal with files.  However, when we look at something like a .jpeg, it writes the tags to the file in different locations based upon the interaction.  Forensically, this provides an investigator with an added way to look at a file.  My thought was if you have a file that has a MAC timestamp of all the same times exactly, but you see that within the file that it has indications that tags were added or modified it may indicate that the file timestamps may have been altered.


                I am still in the research phase with this information, but this is everything I have gathered thus far.  I have developed a proof of concept application that is working and pulling files that have been tagged and readable by the OS.  It’s still in development, and I plan and getting it out to the community eventually once I have it optimized and have added all the features I wish to include.  Also, you may be wondering how I came to acquire this information, so I am planning on posting a research and implementation article as well that will describe the research I did in a fair amount of detail with pertinent examples and such.  It may also include how I learned to program the tool I am developing as well.  I think explaining the thought process behind research and development may be beneficial to the community and hopefully inspire others to do the same by providing some background and basic thought process.

Hope you enjoyed this, and if you have any thoughts on the implications of file tags and how they can be utilized in an investigation I would love to hear them.  Leave a comment or e-mail me at:

1 comment:

  1. Great stuff, guys...identifying an artifact that is tied to explicit user activity, showing how it's populated and how it can be retrieved and viewed. Thanks for sharing!