Synchronizing candidates and clientcontacts with our own database

Forum for users and developers of Bullhorn's API service.

Moderators: StaffingSupport, s.emmons, BullhornSupport

Post Reply
Posts: 1
Joined: Sun Oct 18, 2020 5:59 am

Synchronizing candidates and clientcontacts with our own database

Post by frobisher »

I've used the search function with the keywords that seemed to make sense, but can't find anything about this.

I'm working on a proof of concept to see if I can sync the BH data with our own database via Rest API calls. It kind of works but where we have a customer with a LOT of contact data on BH (say over a million records) then we run into rate limiting (understandable!) issues at points. I'm wondering if there's a better way to do this.

Scenario is that our app needs access locally to candidate/contact data and it's time sensitive so cannot rely on doing a REST API call over the internet. By keeping the BH database and our local one in sync, it allows us to do a real-time lookup on our local db and get an instant response for name/company fields. As you can imagine with a large record count, there's lots of activity with contacts being added, changed and removed on the BH db all the time.

After doing an initial set of batched rest API calls to populate our database (sometimes takes 12 hours+ to complete) then the data is likely already out of date as there will have been CRUD operations on the master BH database during the initial population. So every hour later we do another rest api session but this time we set the "dateLastModified" parameter on the query to the time our initial population query started. This way we get all records since that time.

All works relatively well, but there is a problem. As there is a rate limit to the Rest API, where we have a customer with lots of records, we have to do the query in batches using the "start=" and "count=" query parameters to iterate through the full number of records. However... if there are deletes/adds during the iteration (remember, can take 12 hours, so plenty of time!) then the batches get out of sync and we miss some records. Consider a full record count of, say, 10,000 and we batch this into 50 queries of 200 records at a time. First query is:

Code: Select all,namePrefix,firstName,lastName,companyName,phone,dateAdded,dateLastModified&useV2=true&start=0&count=200&sort=-id&query= dateLastModified:[19700101000001 TO 20201018091213]
Next query skips first 200 records, so:

Code: Select all,namePrefix,firstName,lastName,companyName,phone,dateAdded,dateLastModified&useV2=true&start=201&count=200&sort=-id&query= dateLastModified:[19700101000001 TO 20201018091213]
...and so on until the batch is complete. If records are added or updated during the process, it's usually ok I guess as the sort is by id and the new additions will always be at the end of the query, but if there are some deletes that happen to be nearer the beginning of the results based on id, then by the time that the 10th or 11th query is done in the batch, some records that would have been picked up have "shifted" out of the batch from, say, `start=1201&count=200` resultset, because records were deleted from before that point.

Although this is relatively rare, it does happen and results in us missing some additions/updates and deletions, resulting in our database being out of sync with the bullhorn one. It means we end up having to do a complete resync every couple of weeks in order to pick up the possible missing ones, but during the resync we probably don't need to touch 99% of the returned records, meaning the query is massively inefficient. I realise I may not have explained this brilliantly, so if anyone does want more clarification, please feel free to throw questions at me.

I can't help but think there should be a cleverer way of doing this, whether through event subscription or by changing the order of the search results maybe? I'm hoping that by posting this (admittedly generic) question, it might trigger someone to answer that had done a similar thing.
Posts: 15
Joined: Thu Sep 26, 2013 1:05 pm

Re: Synchronizing candidates and clientcontacts with our own database

Post by Vaso »


Just some thoughts on this.

I have got 2 clients.

Option 1

One is doing so heavy analytics work with their data so there is no way for using REST API (they do just for updates) they are using DataMirror application from Bullhorn which LIVE sync data to a local SQL Server database hosted by my client on their server. This is a Bullhorn paid service + my client has to pay for his cloud server (hardware + Windows Server + SQL Server License). The database is around 30 GB. They have around 150K records in their contacts.

Bullhorn Sync is using independnat REST API for sync, so these shouldn't affect client's custom REST API limits.

This process starts by providing a SQL Server database seed and then setting up the sync process (Java application). ... 1067266695 ... -539334690

This particular client doesn't have millions of data but LIVE sync is a must.

Option 2

The other client went another way. I have created for them C# Azure Functions which are syncing selected data from Client Contact, Candidates, Corporate User, Client Corporation + another 10 to their Azure SQL Database. They have got around 240K records in their contacts but there is not too much data moving on.

On those entities I am using this e.g.:

var query = $"ClientContact?fields={DefaultFields}&query=dateAdded:[{dateTime:yyyyMMdd} TO *] OR dateLastModified:[{dateTime:yyyyMMddHHmmss} TO *]";

I am runnig this or simiilar at the moment every 5 minutes for each entity. So this client has almost LIVE data.

Firstly, of course I had to bring all data in which I did through this example query (I use count=500 which I think is max allowed by Bullhorn, if you join other entities in query it would max to 200 or less)

/search/Candidate?fields=id,firstName,dateAdded,owner&query=dateAdded:[20170711000000 TO *]&sort=dateAdded&start=0&count=500&showTotalMatched=true&usev2=true

This solution works because they update max 1000 records a day and would never reach their Enterprise limit of 2,000,000 a day.

This particular client doesn't have millions of data, he is ok to have updated data daily, they are having them updated every 5 minutes.

Option 3

Not sure if this is still in place but a Bullhorn customer can request their SQL Server database snapshot. I think once a year this might be provided free if requested any other time they would need to pay for it.

Hope this will help you with your solution.

Regards, Vaso.
Posts: 6
Joined: Thu Oct 22, 2020 8:46 pm

Re: Synchronizing candidates and clientcontacts with our own database

Post by rabbey »

Sorry a bit late to this, but have you considered setting up events to monitor changes on your entities. You get a list of change entity id's along with which fields are changed in each event.

You still have to process your entire change set, but you will always know which entities ARE changed.

Now you can lump all your entities and monitored events into one. Or create separate events (one for each entity)
I did read something like you are limited to 10 event subscriptions per instance. But that could be an urban myth
Post Reply