Significance of cache buster in click tracker ,fraudulent click tracker,%n in click tracker
seen from China
seen from China

seen from United States
seen from Tunisia
seen from United Kingdom
seen from Serbia
seen from China

seen from Australia

seen from Russia

seen from United States
seen from Malaysia
seen from United States
seen from China

seen from United Kingdom

seen from Malaysia
seen from Chile
seen from Netherlands
seen from United States

seen from United Kingdom
seen from Macao SAR China
Significance of cache buster in click tracker ,fraudulent click tracker,%n in click tracker
Rails 2 Cache Buster for Cloud Front
A little while ago I talked about setting up a rails app to use Amazons Cloud Front. This is just a little snippet on how to use a cache buster. It was an adaptation of what we have and this article I found on how to do it in rails 3. Follow the article I linked to and then use the following initializer instead of the config setup (production.rb) as described in the article:
if AppConfig['asset_base_domain'] ActionController::Base.asset_host = Proc.new { |source| (url,qs) = source.split(/\?/) append_host = '' if AppConfig['cloudfront_enabled'] append_host = "/r-#{qs}" end if url.end_with?('js') "http://js.#{AppConfig['asset_base_domain']}#{append_host}" elsif url.end_with?('css') "http://css.#{AppConfig['asset_base_domain']}#{append_host}" else "http://images.#{AppConfig['asset_base_domain']}#{append_host}" end } end
We use an configuration file to specify the base domain for cloud front and the prepend it with the type as subdomain (with images being anything but css and js). You can skip that part if you like but you get the gist.
WTF is Cachebusting?
The web is full of stuff we see everyday, maybe multiple times in a single day. The Google Homepage, the Facebook logo and layout being 2 prime examples. Question is, do you think your browser always loads everything fresh from the servers on every single page load? Answer, of course not.
If your browser realizes that it has already downloaded a webpage or image, it will normally refer back to is stored version (known as the cache). This helps speed up your internet as only the dynamic content is retrieved from servers, whilst files already on your computer fill in the regular parts.
Here we assume this is the users second visit to Facebook. The logo, Sign Up button and Login widget are the same each time, so these are loaded from the browser cache. The large image on the left is different each time, so this needs to be dynamic and load fresh from the web with every visit.
So why do marketeers care about cache so much, and why is it usually to blame for everything? In order to count properly (impressions and to some degree clicks), the 3rd party (publisher, adserver, affiliate network, exchange etc.) must be contacted in some way by the user. If the file is retrieved from the cache and not the 3rd party's servers, then they will have no idea that the event occurred. Cue naughty cache comments. Cached inventory causes differences in reporting volumes (which is what people care about), but more importantly it removes control of what is delivered (the thing people should care more about). A cached banner will never rotate, effectively spoiling any sequencing or retargeting. We can get over counting discrepancies, but poor media performance cannot be rectified with a handshake.
How can a defeat this pesky Cache Monster? Simply adding a random/changing part to the url is enough to trick the browser into thinking the content is "new".
Any tag you generate from an adserver or affiliate network should come with some instructions or an indication of where to put a cache-buster. If it doesn't call your rep straight away, and if they don't know, permission from me to slap them round the face with a freshly caught salmon (I believe line caught is the best these days from an ecological stand point). Cache-busting is the single most important thing for your publishers to take care of, so the same punishment applies to them if they neglect to implement it.
If we look at a DoubleClick Ad tag:
<SCRIPT language='JavaScript1.1' SRC="http://ad.uk.doubleclick.net/adj/N137./ BXXXXXXX.2;ord=[timestamp]?"></SCRIPT> <NOSCRIPT> <A HREF="http://ad.uk.doubleclick.net/jump/N137./ BXXXXXX.2;ord=[timestamp]?"> <IMG SRC="http://ad.uk.doubleclick.net/ad/N137./ BXXXXXXX.2;ord=[timestamp]?" BORDER=0 WIDTH=728 HEIGHT=90 ALT="Click Here"></A> </NOSCRIPT>
Here I've highlighted the 3 instances that need a cache buster. DoubleClick indicate where a cache-buster is required with [timestamp]. Other companies may use {TIME}, {RANDOM} or [CACHEBUSTER], but it's all the same thing. The reason Time is bought into this, is because it's an easy random number to generate, since the time is ever changing.
If you are a publisher and using DFP (DART for Publishers) then you will use %n as your cache-buster value. Publishers using the wonderful OpenX platform should use {random} and refer to the Magic Macros list here.
That's right, 99.9% of publisher tools have a cache busting macro built in, so no excuses!
What Happens if a CacheBuster has not been used? In this scenario the 3rd party will undercount by anything from 1%-99.9%. Usually this figure is around the 10% mark, but it depends on the number of unique users (since the first ad a user sees can't be cached, as their cache is empty). There is nothing you can do about cache in the past, but you can in the future, so insist your publisher implements straight away. It's not just about bad counting, ad rotations and retargeting are ruined, so it's a serious issue. Not knowing about cache is no excuse. If a company is technical enough to sell media, it is more than qualified to implement a cache buster. Make sure cache-busting is in your IO, and check using HTTPWatch to look out for an ever changing random number in your ad urls. This way you can sleep tight, knowing your discrepancies will have nothing to do with cache!