Simulating Scheduler on Azure Web Sites

UPDATE (2014-01-20): The solution described below is not needed anymore (even if still works). Windows Azure is having two solutions for it: generic Scheduler service and Web Jobs for Web Sites.

Windows Azure Web Sites doesn’t have a built-in scheduled jobs which are available on Windows Azure Mobile Services, but it is a very handy feature. Yes, you can use WAMS scheduler service and call any URL on WAWS, but in free version it is limited only to 1 scheduled job run per hour and there is additional burden to handle two systems instead of one.

Of course there are also external Web Cron services (paid, as free limits are many times too low sometimes), but once again you have to handle two systems.

But if you use node.js on WAWS and you happen to use Standard version (dedicated VM) there is a way to have something that works more or less the same as Mobile Services scheduler but without any limits on amount of jobs and with triggering resolution lower than 10 seconds in 99.9% of the time.

Explaining the solution

As node.js is running all the time in Standard version (it is not killed automatically) and even if it will be killed, it can be bring back within 5 minutes easily by turning on monitoring. Monitoring is triggering one URL every 5 minutes, so even if node.js was shut down some way (unhandled exception, Web Sites upgrade etc.) it will be restarted in max 5 minutes. Of course if Web Site have constant traffic it may not be needed. The only requirement is that node.js must be running all the time.

Second element is a database table (it don’t have to be SQL database, so Azure Table Storage is fine too), but it has to support concurrent updates blocking. In SQL it is easy to do with proper WHERE element and unique id. In Azure Storage Table it is supported by etags. It is required because if there will be more node.js processes running (for example Web Site was scaled up to more cores or to more instances), we want the scheduled job to be started only once.

In the database we have a table with below structure:

  • jobName – it will be used to create the final triggering URL (you pay use full URL here too, it does not matter)
  • enabled – 1 means it is enabled, so we check the nextRun
  • cron – the CRON like description of when to run (or in reality how to compute nextRun time)
  • nextRun – datetime of next run in UTC
  • lastRun – may be used for optimistic concurrency, not needed on Table Storage

Every node.js process runs a scheduler check function on constant interval (for example every 10 seconds, may be every minute if resolution is every minute or less) which grabs from the database entries the ones that have nextRun in the past. For each entry algorithm is simple:

  1. Calculate new nextRun using cron.
  2. Try to update entry in the database with concurrency in mind.
  3. If succeeded, hit the URL described by jobName.
  4. If failed, do nothing (some other process is already handling it).

That is generally all you need. I prefer using URL triggering, but if the application is all in node.js there is no problem to just start a JavaScript function containing a job.

Final thoughts

Described solution is not ideal, but in most cases will work similarly to Mobile Services scheduler (both do not make any guaranties that job succeeded in any way). It is possible to extend it to retry in case of HTTP error.

There are guaranties that maximum delay of URL triggering will be 5 minutes, which is OK in most cases (of course if Azure Web Sites are down it will be more, but at least on restart the delayed jobs will be run immediately), but in 99.9% cases it will be no more than selected checks resolution.

I have not explained here that you should somehow secure your CRON triggered urls by using for example special headers with secret key. For CRON parser in node.js you can use cron-parser.

A final note to Microsoft. CRON is something that should be built-in Web Sites (not only in Mobile Services), so this tricks won’t be required. You already have the infrastructure in place, so doing something similar to Google App Engine CRON shouldn’t be hard…