Saturday, 2 September 2017

Data Persistence in Browser with HTTP Caching

HTTP Caching allows developers to store server responses and prevent re-downloads each time the page is refreshed. HTTP caching is controlled from the server side and whenever the server returns a response it can also send specific headers which tells the browser how the response should be cached. In this blog post the basics of HTTP caching are discussed and how developers can leverage this to make their applications more efficient.


Response Headers:
With every response the server will return response headers which specify the browser to 
cache response data or to re-download it each time the request is sent.

Cache-Control:

max-age:
This allows developers to set a validity time period for the cached response. Setting 'max-age' to 100 will specificy the browser that the cached response is only valid for 100 seconds. If the cache expires then the browser will fetch a new response and cache it again.

no-cache:
Setting 'Cache-Control: no-cache' will specify the browser to validate if the data on application server has changed(validated using ETag token) and only then it downloads a new response. The browser will send the 'ETag' for validation even if the cache has not expired.

no-store:
Setting 'Cache-Control: no-store' will force the browser to always re-download the response from the server rather than fetching it from cache. 

public :
Using 'Cache-Control: public' will allow intermediary caches to like CDN to also cache the response from the server. 

private:
Using 'Cache-Control: private' will only allow browsers to cache the responses and intermediate caches like CDN will not be able to cache the response.

ETag:

Etag is a token that is used by the browser for validation. It is usually a hash of file contents or a hash of files's last modification time stamp. The browser can use ETag to determine if the response has changed since the last time it was received. So when ever the browser receives a response from the server with ETag specified it gets saved. For future requests the browser will send ETag for validation and if it matches with the server then a response code of 304 is returned and the browser will reuse cached response. But if there is a mismatch between browser and server then a new response is returned with a code of 200.


Fig 1.0 ETag Validation Flow

Last-Modified:  

This header provides a time stamp in GMT when the file was last modified on the server.

Optimal HTTP caching:



Fig 2.0 Elements of HTTP Caching

Load Balanced Servers(Node>1):

For load balanced server the 'Last-Modified' header is much better to use compared to the 'ETag' token. This is because two different nodes can generate completely different 'ETag' tokens for the same file which leads to lack of consistency.


Also 'Cache-control: max-age=<seconds>' is a better choice than 'Expires' header since if two server nodes are present in different time zones then they will produce two different time stamps for 'Expires' header. But 'max-age' will be independent of timezone.

Single Node Server:


For single node server either 'Last-Modified' or 'ETag' can be used for validation. And for cache refreshing schemes 'Expires' or 'Cache-control: max-age=<seconds>' can be used. 

Fingerprinting:

Static files like css don't change much after the first page load. For this reason it can have 'max-age' set to 31536000s which is 1 year. So later on if developer wants to update this CSS file it will not update the cache unless the cache is manually cleared by the user. Adding finger prints to file names solves this issue. When the file is initially sent from the server it has a fingerprint associated with it 'mystyle.2abec09.css' so the URL becomes 'http://www.mysite.com/mystyle.2abec09.css'. If the developer wants to force update the cache they can simply change the fingerprint and the browser will create a new cache for the updated URL('http://www.mysite.com/mystyle.2afeg10.css').






No comments:

Post a Comment