Problem DescriptionHello!
For the needs of our application we wanted to implement "automatic re-connect" feature for the sake of connection resiliency, which is - "IGFClient" should automatically reconnect after connection is lost, without human intervention.
But we have faced a problem: try to re-connect "IGFClient", after it was disconnected often times leads to errors and exceptions which in turn can cause application crash.
Technical DetailsThe solution for reconnect is quite straightforward. "IGFClient" (obtained by the call to "GF.Api.Impl.GFApi.CreateClient()") provides event "Disconnected", which fires every time when connection to the serve is lost.
We subscribe to that event, and initiate reconnect inside that event handler.
Original implementation looked like this:
--- Solution #1: Task.Delay() based solution -----------------
private async void OnDisconnected(IGFClient gfClient, DisconnectedEventArgs eventArgs)
{
await Task.Delay(5000); // wait 5 seconds.
ConnectToGainServer();
}
public void ConnectToGainServer()
{
var ctx = new ConnectionContextBuilder()
.WithUserName(_apiUsername)
.WithPassword(_apiPassword)
.WithUUID(_apiUUID)
.WithHost(_apiHost)
.WithPort(int.Parse(_apiPort))
.WithPassword(_apiPassword)
.WithForceLogin(true)
.Build();
try
{
_gfClient.Connection.Aggregate.Connect(ctx);
}
catch (Exception e)
{
_logger.Error(e, "Connection Failed");
}
}
---
But this lead to unpredictable exceptions in code, and there was not way to catch them, except by "AppDomain.UnhandledException" event handling, which was to late, as that exception was fatal and causes application crash.
Here are examples of such exceptions...
System.NullReferenceException
The service threw an unhandled exception System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
at GF.Api.Impl.Connection.CurrentUserStore.Clear(ServerType serverType)
at GF.Api.Impl.Connection.Machine.States.ClosedConnectionState.OnEntry()
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.Transition(Func`1 getNewState)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.ConnectionDisconnected(Exception exception)
at System.Action`1.Invoke(T obj)
at GF.Api.Impl.Connection.ApiConnection`2.d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c.b__6_1(Object state)
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Another one..
System.InvalidOperationException
System.InvalidOperationException: Already closed
at GF.Api.Impl.Connection.Machine.States.ClosedConnectionState.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.c__DisplayClass34_0.b__0()
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.Transition(Func`1 getNewState)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.c.b__41_2(IConnectionStateMachine m)
at GF.EnumerableTExtensions.ForEach[T](IEnumerable`1 items, Action`1 action)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.c__DisplayClass41_0.b__1(IEnumerable`1 open)
at GF.EnumerableTExtensions.IfAny[T](IEnumerable`1 items, Action`1 ifAny)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.DisconnectAggregation(DisconnectionContext disconnectionContext)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.StateMachine_OnLoginFailed(FailReason reason)
at System.Action`1.Invoke(T obj)
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.b__24_0()
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.OnExit()
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.Transition(Func`1 getNewState)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.ConnectionDisconnected(Exception exception)
at System.Action`1.Invoke(T obj)
at GF.Api.Impl.Connection.ApiConnection`2.d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c.b__6_1(Object state)
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() System.InvalidOperationException: Already closed
at GF.Api.Impl.Connection.Machine.States.ClosedConnectionState.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.c__DisplayClass34_0.b__0()
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.Transition(Func`1 getNewState)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.c.b__41_2(IConnectionStateMachine m)
at GF.EnumerableTExtensions.ForEach[T](IEnumerable`1 items, Action`1 action)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.c__DisplayClass41_0.b__1(IEnumerable`1 open)
at GF.EnumerableTExtensions.IfAny[T](IEnumerable`1 items, Action`1 ifAny)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.DisconnectAggregation(DisconnectionContext disconnectionContext)
at GF.Api.Impl.Connection.Machine.AggregateConnectionStateMachine.StateMachine_OnLoginFailed(FailReason reason)
at System.Action`1.Invoke(T obj)
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.b__24_0()
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.OnExit()
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.Transition(Func`1 getNewState)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.OnDisconnect(DisconnectionContext context)
at GF.Api.Impl.Connection.Machine.ConnectionStateMachine.ConnectionDisconnected(Exception exception)
at System.Action`1.Invoke(T obj)
at GF.Api.Impl.Connection.ApiConnection`2.d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at GF.Api.Impl.Connection.Machine.States.ConnectingConnectionState.d__21.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c.b__6_1(Object state)
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Being not able to get around them, I suggested, that these may have something to do with threading issues, so another solution was using "IGFClient.Threading", like this:
---- Solution #2: delayed Invoke based solution ----
private async void OnDisconnected(IGFClient gfClient, DisconnectedEventArgs eventArgs)
{
_gfClient.Threading.BeginDelayedInvoke(TimeSpan.FromSeconds(5), Start, CancellationToken.None);
}
This solution did not produce exceptions, but had another problem: after several consecutive reconnect attempts the next attempt is simply "hangs".. Have a look at screenshot below..
https://www.dropbox.com/s/yv2jmo6y3lwps5e/GFAPI%20Gateway%202020-06-24%2012.41.09.png?dl=0My guesses for sources of the problemAs I don't have access to source code, I can only guess and my guess would be..
For solution #1There is some place down the the "Connection.Connect(ctx)" method invocation chain, where asynchronous (Async/Await) method is called, but the exception is not caught / awaited, so it is propagated to the root of the application.
For Solution #2The problem with the 2nd solution, is probably not using timeouts on async calls, which lead to that thread-hang issue.
Looking forward to any suggestions on how to achieve our goal (auto-reconnect).
Thanks