How to recover a file in Google Chrome cache (gzipped or not)

Exceptionally, this article is slightly off-topic and will not be related to game development.

I wished I could find an article like this one a few hours earlier, therefore I’m writing it myself, hoping to help people fixing the same problem I had.

The context

Earlier today, I was logging onto my ftp to download the latest log of Spring Up Harmony hiscores history in order to gather up-to-date piracy statistics. As the log files can grow quite large, I regularly download it locally and remove the server copy. Then, I use all my local copies to generate new stats, similar to the one shown in my 96% Piracy blog article.

The mistake

I was probably not really focused, but instead of transferring from my site to my local computer, I transfered the other way round and lost all my logs of the last 4 days! Argh! I’m sure the next time I will examine the “Are you sure you want to overwrite this file?” popup carefully.

The failed strategies

Here is a list of actions I made trying to recover data:

  • As I looked at my stats log online a few hours earlier, I hoped that I kept a tab open in my browser but I didn’t.
  • I checked the automatic backup of my hosting provider, but the daily backup was made 20 hours ago. That’s already a few days recovered, but I hoped to recover more of it.
  • I looked into Chrome cache folder, only to find 285 Mb of strangely-named binary files. That will not get me anywhere.
  • I looked for a free app to access chrome cache but the only one I found only told me which binary file contained my log file (ChromeCacheView).

However, this last app gave me hope because it displayed information about the name and date of the file I was looking for. Only 3 hours from the mistake. Losing 3 hours of log would be acceptable.

The Solution

Looking on google, I found that Chrome has an interesting integrated cache viewer that could have helped me from the beginning. By typing about:cache in the address bar, you access a list of cached files. You can also type a direct url if you know the file in the following form: chrome://view-http-cache/http://example.com/file.htm. I thought I found the solution and quickly accessed my log file. However, Chrome does not simply give you a way to get the cached file but instead displays raw data with HTTP headers and data received from the server. For my file, it gave me something like that (click for full size):

chrome_cache

As you can see in the red circle, it’s pretty obvious that the binary data displayed below is gzip data, and not plain text sadly. However, the presentation in Chrome is not really handy to get the real binary data because it contains on the left the addresses and on the right the binary form. The useful part is the hexadecimal version on the middle.

An article on Alex Korn site gives a php script he used to extract the file content section of Chrome cache into a text file. It won’t be that simple in my case because my content is binary but it’s a good start. Alex Korn had his text file directly visible. The first time I ran this php script, I had errors extracting the gzip file (corrupted data). After a closer look it’s because of the reg exp used in the preg_match_all. Some data on the right side (binary) could match the reg exp and inject invalid bytes. I replaced the reg exp by using spaces (s) instead of whitespaces (b) and added code to directly decode the gzip data. This is the full script I used:

// cache.log is a copy of chrome cache page with only the file content section
$cacheString = file_get_contents("cache.log");
$matches = array();
preg_match_all('/s[0-9a-f]{2}s/', $cacheString, $matches);
$f = fopen("t.bin","wb");
foreach ($matches[0] as $match)
{
  fwrite($f,chr(hexdec($match)));
}
fclose($f);

ob_start();
readgzfile("t.bin");
$decoded_data=ob_get_clean();
echo $decoded_data;

And that did it, I finally recovered my file!

Conclusion

I know this article is really specific but as I said in the beggining, I really hope people looking for this information will find it here instead of spending as much time as I did figuring it out.

You can follow me on twitter or facebook, and feel free to comment this article here.

47 comments

  1. My apology for newbie question, how do I run a php script? I’m on ubuntu and just installed php. I tried just doing ‘php ‘, but that just echoes it instead of executing it. I tried searching online. Do I need to install apache and all that? Or should I tried to figure out how to write your script in the linux shell (BASH)?

  2. Well, I am not using linux so I don’t know exactly how you should do it. On Windows, I use easyPHP and it contains apache and php to simulate a webserver on my local machine. I think XAMPP does this on Linux. Hope this helps.

  3. As said above, you need to use something like EasyPHP. Then, your local webserver must point to a directory with an index.php file containing this code and the log file.

  4. Bookmarked forever. For those (hopefully) rare slip-ups, this works wonders. Thanks for the great post.

  5. need help,,,

    1) installed XAMPP
    2) started Apache
    3) created index1.php file with the script inside
    4) made Chrome go to http://localhost/xampp/index1.php
    5) got “object not found” message

    how to get cache.log file, from the cache page of google Chrome (about:cache)?

    1000 thanks!!

  6. Den,
    To create the cache.log file, you first need to find your lost file from the list in about:cache.
    From here, you should have a file looking like the first screenshot of the post. You remove the headers and only keep the part starting at “File Content” and copy all that in a file cache.log that you copy next to your index1.php.
    Hope this helps.

  7. That is the way-coolest thing! thanks for the successful recovery of months of work ! ! this is REAL GOOD karma you’re getting here.

  8. i was able to recover cache with “ChromeCacheView” select the item you want to recover, click file > Copy Selected Cached Files To (F4),

  9. Er, i assume that your site killed the PHP symbols, so i’ll repost…

    THIS IS A RE-WORKING OF THE SCRIPT
    IT READS AN INPUT FOLDER OF FILES
    AND UN-FLUFFS THEM INTO AN OUTPUT FOLDER.
    it’s a bit fluffy itself, but I thought i’d share, as your post REALLY saved my neck, and this is a bit easier.

    /*
    put the files you need to fix in a folder, tell this script where the folde is.
    * You will need to be able to run PHP – which you will only be able to do if you’ve set it up, or know what you’re doing. google it.
    *
    * This script reads all the files in one folder, then re-creates them in a new one.
    */

    define (‘BR’,”.PHP_EOL);
    define (‘ROOT_PATH’, ‘/opt/lampp/htdocs/nicholas/recovery/’);
    define (‘INPUT_FOLDER’,’input/’);
    define (‘OUPUT_FOLDER’,’recovered/’);

    // finds which files are in the INPUT folder
    $files_ar = scandir ( ROOT_PATH.INPUT_FOLDER);
    foreach ($files_ar as $fname){
    if (is_file(ROOT_PATH.INPUT_FOLDER.$fname)){
    if (strpos($fname,’.’)!==0){ // ignore any folder / file begining with a dot…
    $fs []= $fname; // add to list of files
    }
    }
    }
    $which = 0;// you can set this if you’ve got SO far through … but one failed – so you can start ‘higher up’ than the first..
    $len = count ($fs);

    // then re-creates them in the OUTPUT folder.
    while ($which < $len){

    // cache.log is a copy of chrome cache page with only the file content section
    error_reporting (E_ALL);
    $cacheString = file_get_contents(ROOT_PATH.INPUT_FOLDER.$fs[$which]);
    $start = strpos ($cacheString, "”);
    $start = strpos ($cacheString, “”,$start+1);
    $cacheString = substr ($cacheString,$start);
    $matches = array();
    preg_match_all(‘/s[0-9a-f]{2}s/’, $cacheString, $matches);
    $path = ROOT_PATH.OUPUT_FOLDER.$fs[$which];
    $f = fopen($path,”wb”);
    foreach ($matches[0] as $match)
    {
    fwrite($f,chr(hexdec($match)));
    }
    fclose($f);

    echo “un gzipped this file: “.$fs[$which]. BR;
    $which++;
    }

  10. Hi,

    Thanks a lot for this post!

    I am not a PHP developer, I used the information from your post to come up with the following alternative manual approach:

    – “download” the file from the chrome cache page
    – extract the “file content” part of it (see your screenshot)
    – remove the first and last column (i.e. the address part ending with “:” and the last part with the binary overview). Notepad++ is great for that as it allows “square” selection.

    – remove all white spaces and end-of-line characters (here again, notepad++ for search/replace and macro)

    => at this point we should have only big line with numbers in the file

    convert this line to base 64 format, for example with this site:

    http://tomeko.net/online_tools/hex_to_base64.php

    convert the base64 into binary, for example with this site:

    http://www.motobit.com/util/base64-decoder-encoder.

    => that’s it! In my case the file were not compressed so I did not have to use gzip after that, maybe this is simply because Chrome does not always compress very small or recent files.

    Your solution is more elegant, but I just had one file to recover => it was faster to do the operation manually than to install PHP on my box or translate your script into a language I know.

    Best regards,

    Svend

  11. Svend: you are my hero. That worked perfectly.
    To the blog writer: Thank you for this post – while I wasn’t able to use your method, thanks to you I found Svend’s method.

    1. I tried this and it just gave me the following error: “unrecognized format.” I also tried the console and it gave me this >> “Uncaught TypeError: Cannot read property ‘1’ of nullmessage: “Cannot read property ‘1’ of null”stack: (…)get stack: function () { [native code] }set stack: function () { [native code] }__proto__: Error(anonymous function) @ VM64:16(anonymous function) @ VM64:188InjectedScript._evaluateOn @ VM47:883InjectedScript._evaluateAndWrap @ VM47:816InjectedScript.evaluate @ VM47:682”

      I don’t understand what the problem is…

  12. You just saved some of my WordPress witterings from extinction. The world won’t thank you but I will. As a junior PHP developer I now have the pleasure of working out how the script does its magic too. Cheers.

  13. Pingback: Site down | frugi
  14. THANK YOU!

    A version control system which shall remain nameless (Microsoft TFS) just overwrote a bunch of my JavaScript source with older versions.

    Fortunately the way I deliver it to the browser is as a single linked-together file, so I just had to get the most recent gzipped version out of Chrome.

    For fun, I used C# to convert the hex stuff into a binary file:

    static void Main(string[] args)
    {
    using (var outFile = new FileStream(args[1], FileMode.Create))
    {
    foreach (var line in File.ReadAllLines(args[0]))
    {
    var bytes = line.Substring(10).Substring(0, 64)
    .Split(new[] { ‘ ‘ },
    StringSplitOptions.RemoveEmptyEntries)
    .Select(hex => (byte)Convert.ToUInt32(hex, 16))
    .ToArray();

    outFile.Write(bytes, 0, bytes.Length);
    }
    }
    }

  15. Freakin amazing you saved me hoooooooooooooooooooours of work. Thank you so much!

  16. Thank you very much for the script. I made a change to it, because when you try to recover non compressed files, in the right area there are sometimes values that match the s[0-9a-f]{2}s pattern like in “Last modified 23 jul 2011” where the 23 is matched.

    I made the following change to avoid that, by extracting only the 16 bytes from the middle section first and then matching again. The only failure may happen in the last line if you have a matching sequence on the right area of that line and less than 16 bytes.

    /* cache.log is a copy of chrome cache page with only the file content section */

    $cacheString = file_get_contents(“cache.log”);
    $matches = array();

    preg_match_all(‘/([0-9a-f]{8}:)s+(([0-9a-f]{2})s+){1,16}/’, $cacheString, $matches);
    $cacheString = print_r($matches[0],true); /*now with the right portion eliminated to avoid false matches */
    preg_match_all(‘/s[0-9a-f]{2}s/’, $cacheString, $matches);

    $f = fopen(“t.bin”,”wb”);
    foreach ($matches[0] as $match)
    {
    fwrite($f,chr(hexdec($match)));
    }
    fclose($f);

  17. you saved my 3 days of work. my css file crashed due to the power failure.. finally i googled to recover my file there many.. those are not worked for me.. finaly this was very help

  18. Thanks!! I just needed a quick recovery of a longish comment left on a website so I ultimately used a different solution (albiet one brought about from the lively discussion which took place here).

    Many thanks to Senseful for that simple web tool! Worked a charm! It is especially useful for those away from home, not on their computer, and thus lacking in tools.

  19. And here is the Bash/awk variant for those who needs more control over the result as piping and mass extract:

    —start of cc2file———-

    #!/bin/bash
    if [ -z “$1″ ]; then
    echo -e ” Usage $0 cache_file [out_file]n If out_file not given will generate full original path/name in current dirn If ‘-‘ is given for out_file will print to stdout ” >&2
    exit;
    fi
    orig=$(cat “$1″ | head -n 5 | awk ‘{pp=index($0,”x3ctablex3e”); if (pp) { pp+=7; pe=index($0,”x3chrx3e”); if (pe>pp) print substr($0,pp,pe-pp); } }’)
    if [ -z $orig ]; then echo “Bad File?”; exit; fi
    re=$(cat “$1″ | head -n 5 | awk ‘{pp=index($0,”x3cprex3e”); if (pp) { pp+=5; print substr($0,pp,200); } }’)
    echo -e “Processing:t$1″ >&2
    echo -e ” Found:t$orig” >&2
    if [ “$2” == “-” ]; then ex=”/dev/stdout”; else
    if [ -z “$2″ ]; then ex=”$(pwd)/$orig”; mkdir -p $(dirname $ex); else ex=$2; fi fi
    echo -e ” Save to:t$ex” >&2
    echo -e ” Req. Result:t$re” >&2

    cat “$1” | awk
    ‘BEGIN { fou=0; }
    { if (fou<2) { fff=index($0,"x3c""pre""x3e""00000000:"); if (fff) { fou++; $0=substr($0,fff+5,4*16-2+11); } }
    if (fou<2) { next; }
    fff=index($0,"x3c""x2f""pre""x3e");
    if (fff==1) exit;
    $0=substr($0,12,4*16-2);
    print $0;
    #for (ch=1; ch $ex

    —-end of cc2file——

    Mass extract can be achieved with command like:
    find dir_with_saved_cached_files -type f -exec /path_to/cc2file ‘{}’ ;

  20. THANK YOU. Right when you think you’ve lost something, you suddenly realize you’re not even sure what you lost. Really saved my ass here, much appreciated.

  21. Please help! I’ve tried so many ways to recover this file, but I’m having no luck at all.

    I tried your method, using this:

    But it only gives me this to display:
    Warning: fopen(t.bin): failed to open stream: Permission denied in /Applications/XAMPP/xamppfiles/htdocs/other/degreeaudit2.php on line 11

    Warning: fclose() expects parameter 1 to be resource, boolean given in /Applications/XAMPP/xamppfiles/htdocs/other/degreeaudit2.php on line 16

    Warning: readgzfile(t.bin): failed to open stream: No such file or directory in /Applications/XAMPP/xamppfiles/htdocs/other/degreeaudit2.php on line 19

    What am I doing wrong?!

    1. You could try other solution in different languages, as seen in the other comments. Otherwise, make sure the t.bin file is in the same directory than the script file and that its permissions are correct (read access from the server, to be sure, use 0755 (rwxrxrx).

  22. It’s not working anymore.
    You’ll just need to change the pattern to:

    ‘/[0-9a-f]{2} /’

    Have fun 🙂

  23. What is t.bin? I placed a file t.bin in same directory of the cache.log and the script but it remains blank when I run the script.

    1. The file `t.bin` is created by the script during the process. You have probably something wrong in your implementation. Have a look at the other comments, I think someone made another version easier to use than php.

Comments are closed.