So I was bored one day and decided that with school and work,
I just wasn’t busy enough and I needed a new project. I wanted to do some research, preferably in
the realm of digital forensics. I
started talking to my buddy Rob Marmo (@robmarmo) and the idea of looking how
Windows utilizes the tag properties within certain files came up. Eventually, after some thought and
consideration we decided that it would make for an extremely fun project that
would make for a great booster for our experience. As it ends up it was a great learning
experience in doing research and developing a way to utilize the information.
Background
If you are not familiar, Windows
introduced the ability to “tag” files in Windows Vista. This feature allows the user to add a desired
keyword to certain file types so they could easily be categorized and searched
within the Search function they included within Vista as well. Here is an example of how you can tag your
files:
Your other option for tagging your file is via the file properties
option for the file. If the file format
supports tagging you will see the option available within the “Details” tab,
like so:
These file tags are a great way to organize information in a
user customized way. To me, that is why
file tags are quite interesting from a forensic point of view, as these tags
are inputted manually via the user’s input.
So it’s quite possible to say that if a file is tagged with a certain
keyword it may provide that file with an added value of interest to an investigation.
How It Works
After a lot of looking and failed
avenues, I was finally able to get a handle on how Windows implements the
ability to File Tag. First, you may have
noticed that I said only certain file types can be tagged. I don’t deliberately state the files that are
tagged because it is a variable. When
you begin the tagging process, simply put, the tags are inserted into an XML
that is then inserted into the file. How
this is handled varies on the file type.
This determination is handled by reading the registry. If you look at
HKEY_CLASSES_ROOT\SystemFileAssociations in the registry, you will find a list
of various file types registered within the operating system. Each file type has two keys of interest for
our purposes the “FullDetails” and the “PreviewDetails”. Within these keys are several values, and the
one for file tags is “System.Keywords”.
This value says there is a way for Windows to store the tags within the
file. The actual injection of the XML is
done via a property handler that is based on the file type, often done via a
DLL.
There is
another part the file tagging system, beyond simply adding the keywords to a
file. As you recall, I mention that this
implementation was intended to allow for quick searching of files by these
keywords via the Windows Search function.
This plays an interesting point with the file tagging system. As you enter in these keywords this creates
an interaction with the Windows Search Index.
I created a quick animation GIF to show how the Search Index works as I
understand it:
As you can see, the search index works with a data
store. Those data stores work with the
Search function and filters, as well as a notification system. If certain parameters are met they are then
gathered and sent to the System Index, which is actually a database locally
stored on the system at:
C:\ProgramData\Microsoft\Search\Data\Application\Windows\Windows.edb *
*It should be noted that this location can be altered via the Index Options
in the Advanced options.
This is a interesting database and seems to be proprietary to
the Microsoft Operating System and is utilized by a few of their
applications. If you’re interested in
taking a look at the database, I was able to by doing the following:
- Turn off Windows Search Service (This essentially unlocks the DB so you can use it)
- Navigate to the Index location
- Make a copy of the index EDB file
- You can turn back on Windows Search Service
- Download and install an EDB Viewer (I suggest EseDB Viewer from Woanware)
- Open your copied EDB file in the viewer
Viewing the Search Index provides an interesting look at all
the information that your Windows systems can analyze in a short amount of
time. As best as I can tell the Search
Index database is utilized by the tagging system for the preview
generation. While you’re inputting a
tag, often time you will find that upon entry once you type in a letter you
will be given option of tags. These are
tags that have previously been applied to files, and from what I can tell these
are pulled from the Search Index database.
The “PreviewDetail” key discussed earlier is how the registry knows you
can get these tags via preview and then the property handler works with the
index.
Interest in Recovery
Again, each file type handles tag in
a different way; it is dependent upon the property handler assigned to it. How these handlers deal with the data and
inject it into the files is different, and through my testing of various files
types and how they dealt with the file tags I came across an interesting
discovery. Some of file types store tag
data not only within the file XML but also in various parts throughout the
file. In testing I was able to see when
tags were added or modified, and even if the file tags have been deleted (or
not viewable by the OS as Windows will only read tags from the XML), they are
still located within the file and viewable within the hex. Now, this only occurs in certain file types
with, dare I say, inefficient property handlers. Looking at the new Office formats like .docx
it handles XML with a higher proficiency it writes most all its properties to
XML and then archives it and then is read via Word or whatever. To me that makes for much more efficient way
to deal with files. However, when we
look at something like a .jpeg, it writes the tags to the file in different
locations based upon the interaction.
Forensically, this provides an investigator with an added way to look at
a file. My thought was if you have a
file that has a MAC timestamp of all the same times exactly, but you see that
within the file that it has indications that tags were added or modified it may
indicate that the file timestamps may have been altered.
Future
I am still in the research phase with
this information, but this is everything I have gathered thus far. I have developed a proof of concept
application that is working and pulling files that have been tagged and
readable by the OS. It’s still in
development, and I plan and getting it out to the community eventually once I
have it optimized and have added all the features I wish to include. Also, you may be wondering how I came to
acquire this information, so I am planning on posting a research and
implementation article as well that will describe the research I did in a fair
amount of detail with pertinent examples and such. It may also include how I learned to program
the tool I am developing as well. I
think explaining the thought process behind research and development may be
beneficial to the community and hopefully inspire others to do the same by
providing some background and basic thought process.
Hope you enjoyed this, and if you have any thoughts on the implications of file tags and how they can be utilized in an investigation I would love to hear them. Leave a comment or e-mail me at: michael.ahrendt@gmail.com