I'm working on a proof of concept to see if I can sync the BH data with our own database via Rest API calls. It kind of works but where we have a customer with a LOT of contact data on BH (say over a million records) then we run into rate limiting (understandable!) issues at points. I'm wondering if there's a better way to do this.
Scenario is that our app needs access locally to candidate/contact data and it's time sensitive so cannot rely on doing a REST API call over the internet. By keeping the BH database and our local one in sync, it allows us to do a real-time lookup on our local db and get an instant response for name/company fields. As you can imagine with a large record count, there's lots of activity with contacts being added, changed and removed on the BH db all the time.
After doing an initial set of batched rest API calls to populate our database (sometimes takes 12 hours+ to complete) then the data is likely already out of date as there will have been CRUD operations on the master BH database during the initial population. So every hour later we do another rest api session but this time we set the "dateLastModified" parameter on the query to the time our initial population query started. This way we get all records since that time.
All works relatively well, but there is a problem. As there is a rate limit to the Rest API, where we have a customer with lots of records, we have to do the query in batches using the "start=" and "count=" query parameters to iterate through the full number of records. However... if there are deletes/adds during the iteration (remember, can take 12 hours, so plenty of time!) then the batches get out of sync and we miss some records. Consider a full record count of, say, 10,000 and we batch this into 50 queries of 200 records at a time. First query is:
Code: Select all
https://xxx.bullhornstaffing.com/rest-services/abc123//search/ClientContact?fields=id,namePrefix,firstName,lastName,companyName,phone,dateAdded,dateLastModified&useV2=true&start=0&count=200&sort=-id&query= dateLastModified:[19700101000001 TO 20201018091213]
Code: Select all
https://xxx.bullhornstaffing.com/rest-services/abc123//search/ClientContact?fields=id,namePrefix,firstName,lastName,companyName,phone,dateAdded,dateLastModified&useV2=true&start=201&count=200&sort=-id&query= dateLastModified:[19700101000001 TO 20201018091213]
Although this is relatively rare, it does happen and results in us missing some additions/updates and deletions, resulting in our database being out of sync with the bullhorn one. It means we end up having to do a complete resync every couple of weeks in order to pick up the possible missing ones, but during the resync we probably don't need to touch 99% of the returned records, meaning the query is massively inefficient. I realise I may not have explained this brilliantly, so if anyone does want more clarification, please feel free to throw questions at me.
I can't help but think there should be a cleverer way of doing this, whether through event subscription or by changing the order of the search results maybe? I'm hoping that by posting this (admittedly generic) question, it might trigger someone to answer that had done a similar thing.