We are facing an issue while upgrading our gigaspace setup from Solaris to Linux in production environment. The issue is happening while bringing up processing units. There are 8 processing units in our application
( 4 PU running on one server and 4 running on another server), we use a startup script triggered from one of the server that takes care of bringing up PU supposed to run on the same server as well as the other server, we are able to bring up the 4 PU running on the same machine from where the script is triggered but not on other server.
our lookuplocator config looks like this :
export LOOKUPLOCATORS=newserver1.itginc.com:4166,newserver2.itginc.com:4166
Sometimes the PUs running on remote server comes up partially or not at all.
Ultimately the application throws network interruption exception.
Error from logs copied below :-
2019-02-07 10:17:25,822 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - Waiting indefinitely for [8] processing unit instances to be deployed...
2019-02-07 10:17:59,718 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.3] [1] deployed successfully on [10.8.26.239]
2019-02-07 10:17:59,938 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.1] [1] deployed successfully on [10.8.26.239]
2019-02-07 10:18:05,052 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.4] [2] deployed successfully on [10.8.26.239]
2019-02-07 10:18:05,561 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.2] [2] deployed successfully on [10.8.26.239]
2019-02-07 10:19:23,424 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.2] [1] deployed successfully on [10.40.26.239]
2019-02-07 10:37:15,010 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.1] [2] deployed successfully on [10.40.26.239]
2019-02-07 10:45:23,466 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.3] [2] failed to deploy, resubmitted [true]
2019-02-07 11:30:19,599 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - [Space.3] [2] failed to deploy, resubmitted [true]
2019-02-07 11:30:19,793 INFO [org.openspaces.pu.container.servicegrid.deploy.Deploy] - Finished deploying [8] processing unit instances
Caused by: java.lang.InterruptedException: Thread was interrupted while waiting on the network
at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:88)
... 16 more
Caused by: java.rmi.ConnectException: LRMI transport protocol over NIO broken connection with ServerEndPoint: [NIO://newserver2.itginc.com:4167/pid[26907]/500292502777314_3_4896795498233689600_details[class org.openspaces.pu.container.servicegrid.PUServiceBeanImpl]]; nested exception is:
java.nio.channels.ClosedByInterruptException
at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:834)
at com.gigaspaces.lrmi.ConnPoolInvocationHandler.invoke(ConnPoolInvocationHandler.java:75)
at com.gigaspaces.lrmi.MethodCachedInvocationHandler.invoke(MethodCachedInvocationHandler.java:71)
... 16 more
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:417)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelBlocking(Reader.java:248)
at com.gigaspaces.lrmi.nio.Reader.readBytesBlocking(Reader.java:671)
at com.gigaspaces.lrmi.nio.Reader.bytesToPacket(Reader.java:590)
at com.gigaspaces.lrmi.nio.Reader.readReply(Reader.java:159)
at com.gigaspaces.lrmi.nio.CPeer.invoke(CPeer.java:769)
We diagnosed with the help of our system administration team that the network settings are perfect on both the new Linux boxes.
↧