Tell us about your project

Server inside a server farm

Caching server for a unique problem.

The setup:

  • a Website written in PHP lets you find and book ping-pong tables at different venues;
  • you have to provide standard information, and you get a really fast response of search results;
  • the search uses a 3rd party hosted search engine (HSE);
  • about 4000 venues and each with around 5 tables are from another 3rd party provider which has a SOAP API (presumably others can book these venues as well);
  • the venues were previously fetched from there and pushed onto the hosted search engine;
  • to know which tables are available, you need availability information that you can get from the SOAP API, however, there is a problem.

The problem:

The SOAP API allows queuing for 1 venue for 1 date at a time, and the result takes 1s to resolve. Sometimes more. To have availability information for the next 60 days, it would take 360k seconds, or over 4 days of constant (that is every second) requests to the 3rd party server which they presumably wouldn't like very much.

The requirement for a solution:

  • fast and scalable
  • have availability information that is up to date as much as possible
  • least possible connections to venue provider

The solution:

New server:

  • VPS at the hosting company where 3rd party venue provider with a slow response SOAP API is located, so it shaves off some milliseconds;
  • fixed IP;
  • node.js;
  • Redis;
  • lots of fast memory.

When a user fills in where/venue and date/time on the main Website HP, an ajax call via new server checks for the availability of all venues in a town, or a particular venue, on the selected date/time.

New server workflow:

  • new server issues 240000 SOAP requests (4000 venues x 60 days);
  • that is a little less than 3 days of consecutive 1sec calls using apache/Nginx and PHP;
  • calls should be non-blocking (async), so node.js may be a better solution, although there are async PHP solutions;
  • actually node.js greatly outperforms apache/PHP so, let's just recommend node;
  • if we have 3 async calls/sec that would finish all work within 1 day;
  • results are processed, cached using Redis, and sent to HSE;
  • once a day redis should write to disc, as a backup. If memory gets wiped, load from disk;
  • caching once a week would leave 6 days empty;
  • on the 3rd day we could do a second run, and exclude completely booked venues;
  • a request for a venue for the whole day returns all available rooms for any part of that day, so there shouldn't be many exclusions;
  • to further exclude, we should do statistical analysis to get a Gaussian on which distance from 'now' on the date axes do people usually book, include/exclude on 80/20 bookings, but in terms of days there would be a lot more dates we don't need to check, so we could exclude them;
  • as the new server completes part of the run (divided into 1-hour segments), the results are cached in local redis, the ones that are different are sent to HSE;
  • a note on building the request list:
    • select only if the cache is not from today (main run, or past 3 days for a secondary run). As we'll see, other caching might come from the main Website;
    • we should run requests in batches. For example 3 async requests x 3600 = 10800 requests in one 1 hour batch. The responses bubble up, and when all are in, we send stuff to other servers and make a new list. Why make a new batch only then? Because there might be some activity from the main Website. If some venues are already checked earlier today, no need to put them on the list. Also, no need to wait the whole day to send the results to HSE.

Main Website workflow (on day 3 and until the end of the week):

  • user comes to the HP, and fills in the search where/venue and date/time;
  • as the time is selected, an ajax call to the new server is issued, to preload the availability data;
  • the new server acts like a bridge: the script on the new server checks from redis if the venues are booked and whether this information was cached earlier today, or yesterday or max two days ago. If it is, the operation is done. If it isn't, async check availability. The results are cached and if they differ they are sent to HSE (because of async this should be over in ~1-2sec even if there were 70 SOAP requests) and also to the main Website, stored in redis;
  • we store also on the main Website because of a lot of request/responses, if we encounter a temporary holdup, the HSE server might not be updated in time, and it may return the old results. If the responses get through at the end, but the result page is already rendered, what we could do is mark those entries 'booked just now' and hide them as the user is browsing;
  • when a user books, we call the same new server script that we call on the HP after selecting time.

So basically we'd have:

  • 1st day check everything;
  • 2nd day using cached values;
  • 3rd day check venues that we know were not booked and only for major booking days
    + check venues in a set location for an exact date the users at the main Website are searching for while preventing duplicate checking;
  • 4th day using cached values;
  • 5th day using cached values;
  • 6th and 7th day check venues in a location for an exact date the users at the main Website are searching for ( but I suspect there are two days in a week when the traffic is slower, so 1st day doesn't have to be Monday ).

Also:

  • benchmark gzip request compression, and use if tests show improvement;
  • when calling the new server from the main Website, call the IP directly to skip the DNS overhead;
  • see if we can call the SOAP with IP instead of the domain name.

 

Similar content can be found on: No Captcha