How to recover a file in Google Chrome cache (gzipped or not)

Exceptionally, this article is slightly off-topic and will not be related to game development.

I wished I could find an article like this one a few hours earlier, therefore I’m writing it myself, hoping to help people fixing the same problem I had.

The context

Earlier today, I was logging onto my ftp to download the latest log of Spring Up Harmony hiscores history in order to gather up-to-date piracy statistics. As the log files can grow quite large, I regularly download it locally and remove the server copy. Then, I use all my local copies to generate new stats, similar to the one shown in my 96% Piracy blog article.

The mistake

I was probably not really focused, but instead of transferring from my site to my local computer, I transfered the other way round and lost all my logs of the last 4 days! Argh! I’m sure the next time I will examine the “Are you sure you want to overwrite this file?” popup carefully.

The failed strategies

Here is a list of actions I made trying to recover data:

  • As I looked at my stats log online a few hours earlier, I hoped that I kept a tab open in my browser but I didn’t.
  • I checked the automatic backup of my hosting provider, but the daily backup was made 20 hours ago. That’s already a few days recovered, but I hoped to recover more of it.
  • I looked into Chrome cache folder, only to find 285 Mb of strangely-named binary files. That will not get me anywhere.
  • I looked for a free app to access chrome cache but the only one I found only told me which binary file contained my log file (ChromeCacheView).

However, this last app gave me hope because it displayed information about the name and date of the file I was looking for. Only 3 hours from the mistake. Losing 3 hours of log would be acceptable.

The Solution

Looking on google, I found that Chrome has an interesting integrated cache viewer that could have helped me from the beginning. By typing about:cache in the address bar, you access a list of cached files. You can also type a direct url if you know the file in the following form: chrome://view-http-cache/http://example.com/file.htm. I thought I found the solution and quickly accessed my log file. However, Chrome does not simply give you a way to get the cached file but instead displays raw data with HTTP headers and data received from the server. For my file, it gave me something like that (click for full size):

chrome_cache

As you can see in the red circle, it’s pretty obvious that the binary data displayed below is gzip data, and not plain text sadly. However, the presentation in Chrome is not really handy to get the real binary data because it contains on the left the addresses and on the right the binary form. The useful part is the hexadecimal version on the middle.

An article on Alex Korn site gives a php script he used to extract the file content section of Chrome cache into a text file. It won’t be that simple in my case because my content is binary but it’s a good start. Alex Korn had his text file directly visible. The first time I ran this php script, I had errors extracting the gzip file (corrupted data). After a closer look it’s because of the reg exp used in the preg_match_all. Some data on the right side (binary) could match the reg exp and inject invalid bytes. I replaced the reg exp by using spaces (\s) instead of whitespaces (\b) and added code to directly decode the gzip data. This is the full script I used:

// cache.log is a copy of chrome cache page with only the file content section
$cacheString = file_get_contents("cache.log");
$matches = array();
preg_match_all('/\s[0-9a-f]{2}\s/', $cacheString, $matches);
$f = fopen("t.bin","wb");
foreach ($matches[0] as $match)
{
  fwrite($f,chr(hexdec($match)));
}
fclose($f);

ob_start();
readgzfile("t.bin");
$decoded_data=ob_get_clean();
echo $decoded_data;

And that did it, I finally recovered my file!

Conclusion

I know this article is really specific but as I said in the beggining, I really hope people looking for this information will find it here instead of spending as much time as I did figuring it out.

You can follow me on twitter or facebook, and feel free to comment this article here.

37 Responses to “How to recover a file in Google Chrome cache (gzipped or not)”

  1. [...] Update 2011/05/14: I changed the b’s in the RegEx to s’s as recommended in this post on recovering gzipped files. [...]

  2. Giacomo says:

    THANKS! You save many hours of my work!

  3. Hugh says:

    You saved my life. Thanks!

  4. beccax says:

    My apology for newbie question, how do I run a php script? I’m on ubuntu and just installed php. I tried just doing ‘php ‘, but that just echoes it instead of executing it. I tried searching online. Do I need to install apache and all that? Or should I tried to figure out how to write your script in the linux shell (BASH)?

  5. Frozax says:

    Well, I am not using linux so I don’t know exactly how you should do it. On Windows, I use easyPHP and it contains apache and php to simulate a webserver on my local machine. I think XAMPP does this on Linux. Hope this helps.

  6. grah says:

    THANKYOU!
    you saved me hours of rework.

  7. Mike says:

    where do you put the php code?

  8. Frozax says:

    As said above, you need to use something like EasyPHP. Then, your local webserver must point to a directory with an index.php file containing this code and the log file.

  9. Mike says:

    Bookmarked forever. For those (hopefully) rare slip-ups, this works wonders. Thanks for the great post.

  10. den says:

    need help,,,

    1) installed XAMPP
    2) started Apache
    3) created index1.php file with the script inside
    4) made Chrome go to http://localhost/xampp/index1.php
    5) got “object not found” message

    how to get cache.log file, from the cache page of google Chrome (about:cache)?

    1000 thanks!!

  11. Frozax says:

    Den,
    To create the cache.log file, you first need to find your lost file from the list in about:cache.
    From here, you should have a file looking like the first screenshot of the post. You remove the headers and only keep the part starting at “File Content” and copy all that in a file cache.log that you copy next to your index1.php.
    Hope this helps.

  12. inteblio says:

    That is the way-coolest thing! thanks for the successful recovery of months of work ! ! this is REAL GOOD karma you’re getting here.

  13. example says:

    i was able to recover cache with “ChromeCacheView” select the item you want to recover, click file > Copy Selected Cached Files To (F4),

  14. inteblio says:

    Er, i assume that your site killed the PHP symbols, so i’ll repost…

    THIS IS A RE-WORKING OF THE SCRIPT
    IT READS AN INPUT FOLDER OF FILES
    AND UN-FLUFFS THEM INTO AN OUTPUT FOLDER.
    it’s a bit fluffy itself, but I thought i’d share, as your post REALLY saved my neck, and this is a bit easier.

    /*
    put the files you need to fix in a folder, tell this script where the folde is.
    * You will need to be able to run PHP – which you will only be able to do if you’ve set it up, or know what you’re doing. google it.
    *
    * This script reads all the files in one folder, then re-creates them in a new one.
    */

    define (’BR’,”.PHP_EOL);
    define (’ROOT_PATH’, ‘/opt/lampp/htdocs/nicholas/recovery/’);
    define (’INPUT_FOLDER’,'input/’);
    define (’OUPUT_FOLDER’,'recovered/’);

    // finds which files are in the INPUT folder
    $files_ar = scandir ( ROOT_PATH.INPUT_FOLDER);
    foreach ($files_ar as $fname){
    if (is_file(ROOT_PATH.INPUT_FOLDER.$fname)){
    if (strpos($fname,’.')!==0){ // ignore any folder / file begining with a dot…
    $fs []= $fname; // add to list of files
    }
    }
    }
    $which = 0;// you can set this if you’ve got SO far through … but one failed – so you can start ‘higher up’ than the first..
    $len = count ($fs);

    // then re-creates them in the OUTPUT folder.
    while ($which < $len){

    // cache.log is a copy of chrome cache page with only the file content section
    error_reporting (E_ALL);
    $cacheString = file_get_contents(ROOT_PATH.INPUT_FOLDER.$fs[$which]);
    $start = strpos ($cacheString, "”);
    $start = strpos ($cacheString, “”,$start+1);
    $cacheString = substr ($cacheString,$start);
    $matches = array();
    preg_match_all(’/\s[0-9a-f]{2}\s/’, $cacheString, $matches);
    $path = ROOT_PATH.OUPUT_FOLDER.$fs[$which];
    $f = fopen($path,”wb”);
    foreach ($matches[0] as $match)
    {
    fwrite($f,chr(hexdec($match)));
    }
    fclose($f);

    echo “un gzipped this file: “.$fs[$which]. BR;
    $which++;
    }

  15. inteblio says:

    THe above code isn’t going to work because your website stripped out the HTML tags… maybe i’ll email you.

  16. Svend says:

    Hi,

    Thanks a lot for this post!

    I am not a PHP developer, I used the information from your post to come up with the following alternative manual approach:

    - “download” the file from the chrome cache page
    - extract the “file content” part of it (see your screenshot)
    - remove the first and last column (i.e. the address part ending with “:” and the last part with the binary overview). Notepad++ is great for that as it allows “square” selection.

    - remove all white spaces and end-of-line characters (here again, notepad++ for search/replace and macro)

    => at this point we should have only big line with numbers in the file

    convert this line to base 64 format, for example with this site:

    http://tomeko.net/online_tools/hex_to_base64.php

    convert the base64 into binary, for example with this site:

    http://www.motobit.com/util/base64-decoder-encoder.

    => that’s it! In my case the file were not compressed so I did not have to use gzip after that, maybe this is simply because Chrome does not always compress very small or recent files.

    Your solution is more elegant, but I just had one file to recover => it was faster to do the operation manually than to install PHP on my box or translate your script into a language I know.

    Best regards,

    Svend

  17. Frozax says:

    That’s a great idea Svend. Probably easier for people not into programming that much.

  18. Ricky Ferris says:

    Thank you so much, saved me big time!

  19. [...] script que j’ai trouvé sur le blog de Frozax résout très bien la problématique. <?php // cache.log is a copy of chrome cache page with [...]

  20. Ross says:

    Svend: you are my hero. That worked perfectly.
    To the blog writer: Thank you for this post – while I wasn’t able to use your method, thanks to you I found Svend’s method.

  21. Senseful says:

    Thanks for the info! This post inspired me to create a non-PHP solution, which might be easier for some people to use: http://www.sensefulsolutions.com/2012/01/viewing-chrome-cache-easy-way.html

  22. Fran says:

    You just saved some of my Wordpress witterings from extinction. The world won’t thank you but I will. As a junior PHP developer I now have the pleasure of working out how the script does its magic too. Cheers.

  23. [...] sea, I reached the output at the bottom and found the posts, perfectly intact. Many thanks to Frozax, who wrote [...]

  24. Mariano says:

    Great! Very useful! :-)

  25. Christina says:

    Thank you so much! This just saved my butt!

  26. THANK YOU!

    A version control system which shall remain nameless (Microsoft TFS) just overwrote a bunch of my JavaScript source with older versions.

    Fortunately the way I deliver it to the browser is as a single linked-together file, so I just had to get the most recent gzipped version out of Chrome.

    For fun, I used C# to convert the hex stuff into a binary file:

    static void Main(string[] args)
    {
    using (var outFile = new FileStream(args[1], FileMode.Create))
    {
    foreach (var line in File.ReadAllLines(args[0]))
    {
    var bytes = line.Substring(10).Substring(0, 64)
    .Split(new[] { ‘ ‘ },
    StringSplitOptions.RemoveEmptyEntries)
    .Select(hex => (byte)Convert.ToUInt32(hex, 16))
    .ToArray();

    outFile.Write(bytes, 0, bytes.Length);
    }
    }
    }

  27. shaneonabike says:

    Freakin amazing you saved me hoooooooooooooooooooours of work. Thank you so much!

  28. Sancho says:

    Thank you very much for the script. I made a change to it, because when you try to recover non compressed files, in the right area there are sometimes values that match the \s[0-9a-f]{2}\s pattern like in “Last modified 23 jul 2011″ where the 23 is matched.

    I made the following change to avoid that, by extracting only the 16 bytes from the middle section first and then matching again. The only failure may happen in the last line if you have a matching sequence on the right area of that line and less than 16 bytes.

    /* cache.log is a copy of chrome cache page with only the file content section */

    $cacheString = file_get_contents(”cache.log”);
    $matches = array();

    preg_match_all(’/([0-9a-f]{8}\:)\s+(([0-9a-f]{2})\s+){1,16}/’, $cacheString, $matches);
    $cacheString = print_r($matches[0],true); /*now with the right portion eliminated to avoid false matches */
    preg_match_all(’/\s[0-9a-f]{2}\s/’, $cacheString, $matches);

    $f = fopen(”t.bin”,”wb”);
    foreach ($matches[0] as $match)
    {
    fwrite($f,chr(hexdec($match)));
    }
    fclose($f);

  29. Felix says:

    THANK YOU!
    You saved 2 hours of work. I accidently overwritten my css file and had an old Version in my cache.

  30. riz4y says:

    you saved my 3 days of work. my css file crashed due to the power failure.. finally i googled to recover my file there many.. those are not worked for me.. finaly this was very help

  31. Kyle Macey says:

    My life has officially been saved.

  32. Smartassism says:

    Thanks!! I just needed a quick recovery of a longish comment left on a website so I ultimately used a different solution (albiet one brought about from the lively discussion which took place here).

    Many thanks to Senseful for that simple web tool! Worked a charm! It is especially useful for those away from home, not on their computer, and thus lacking in tools.

  33. Dragomir Dragnev says:

    And here is the Bash/awk variant for those who needs more control over the result as piping and mass extract:

    —start of cc2file———-

    #!/bin/bash
    if [ -z "$1" ]; then
    echo -e ” Usage $0 cache_file [out_file]\n If out_file not given will generate full original path/name in current dir\n If ‘-’ is given for out_file will print to stdout ” >&2
    exit;
    fi
    orig=$(cat “$1″ | head -n 5 | awk ‘{pp=index($0,”\x3ctable\x3e”); if (pp) { pp+=7; pe=index($0,”\x3chr\x3e”); if (pe>pp) print substr($0,pp,pe-pp); } }’)
    if [ -z $orig ]; then echo “Bad File?”; exit; fi
    re=$(cat “$1″ | head -n 5 | awk ‘{pp=index($0,”\x3cpre\x3e”); if (pp) { pp+=5; print substr($0,pp,200); } }’)
    echo -e “Processing:\t$1″ >&2
    echo -e ” Found:\t$orig” >&2
    if [ "$2" == "-" ]; then ex=”/dev/stdout”; else
    if [ -z "$2" ]; then ex=”$(pwd)/$orig”; mkdir -p $(dirname $ex); else ex=$2; fi fi
    echo -e ” Save to:\t$ex” >&2
    echo -e ” Req. Result:\t$re” >&2

    cat “$1″ | awk \
    ‘BEGIN { fou=0; }
    { if (fou<2) { fff=index($0,"\x3c""pre""\x3e""00000000:"); if (fff) { fou++; $0=substr($0,fff+5,4*16-2+11); } }
    if (fou<2) { next; }
    fff=index($0,"\x3c""\x2f""pre""\x3e");
    if (fff==1) exit;
    $0=substr($0,12,4*16-2);
    print $0;
    #for (ch=1; ch $ex

    —-end of cc2file——

    Mass extract can be achieved with command like:
    find dir_with_saved_cached_files -type f -exec /path_to/cc2file ‘{}’ \;

  34. Dragomir Dragnev says:

    Of course the code was scrambled so here it is in pastebin:
    http://pastebin.com/CjT0djvW

  35. Dean says:

    Saved my morning. Thanks for the code!!!

  36. Graham says:

    THANK YOU. Right when you think you’ve lost something, you suddenly realize you’re not even sure what you lost. Really saved my ass here, much appreciated.