Hello,
we are using deterministic deployment as described in https://docs.gigaspaces.com/xap120adm/the-sla-deterministic.html. We use a primary-backup deployment with 1 partition. Primary and Backup instances are running on hosts which are in different locations, and we prefer the Primary instance to run in one particular location to be co-located with the primary DB instance in a normal case, if all hosts are available (this is the reason for using deterministic deployment).
The deterministic deployment works as expected in approximately 80% - 85% of all cases, but it fails in 15% - 20% of cases, even though all hosts are available. It looks like the wrong case (when Primary and Backup are chosen not as preferred, but the other way around) is like this:
* Both GSCs are instantiated
* GSC which is the preferred location for Primary instance starts to download PU jar file from GSM
* 10 seconds later the GSC which is the preferred location for Backup instance starts to download PU jar file from GSM
* In approx. 15% - 20% of all cases the second GSC downloads the jar file from GSM 1-2 seconds faster than the first GSC. In such case the second GSC is chosen as the Primary instance, contrary to the configured deterministic deployment.
It looks like the deterministic deployment relies on a timing: the GSM which is a preferred backup starts to download the PU jar file 10 seconds later in a hope that it will finish downloading later and thus be selected as a Backup instance. If this is not the case, it is selected as the Primary instance.
I believe we could significantly improve deterministic deployment (well, make it more deterministic) if we may slightly increase the delay of the second GSC instance. In most cases when the deterministic deployment fails, the first GSC was just 1-2 seconds slower than the second GSC, so if we would delay the second GSC by 20 seconds instead of 10 seconds, this will help.
Is there a configuration property which determines for how long the second GSC is delayed?
↧