Before these fixes, it was possible to see errors on new RPCs after a
connection began draining, and before establishing a new connection. There is
an inherent race between choosing a SubConn and attempting to creating a stream
on it. We should be able to avoid application-visible RPC errors due to this
with transparent retry. However, several bugs were preventing this from
working correctly:
1. Non-wait-for-ready RPCs were skipping transparent retry, though the retry
design calls for retrying them.
2. The transport closed itself (and would consequently error new RPCs) before
notifying the SubConn that it was draining.
3. The SubConn wasn't synchronously updating itself once it was notified about
the closing or draining state.
4. The SubConn would go into the TRANSIENT_FAILURE state instantaneously,
causing RPCs to fail instead of queue.
- Seperated and documented the options for client and server sides.
- Better support for multiple grpc.Servers. This will be used in other
improvements that I have in the works.
- Moved some common functionality from channelz_test.go to
end2end_test.go.
- Added an option to use the default health service implementation, instead
of each test creating a new health.Server and passing it in. The
inidividual tests have not been changed in this PR. I will do that in a
follow up PR to keep the changes to a reasonable size.
- Fixed one of the tests which had to be fixed because of the separation
of client and server configs.
* end2end test cleanup #1
- Removed some old code which has a TODO asking for it's removal once
Go1.6 and Go1.7 support is gone.
- Cleaned up a couple of error messages along with it.
In the end of the test, 10 RPCs are made to make sure data is sent to
the second server. The first RPC of these 10 is made right after the
second server's listener receives a connection. But at this time, the
connectivity state on the client side is not set to READY yet (though
ac's state should be either connecting or ready, the race between ac
and balancer could cause cc to still be in transient failure). So the
first RPC fails due to transient failure, but the following 9 will
succeed.
This test sometimes fails with error creating stream due to
DeadlineExceeded. It's very hard to reproduce (failed twice in 100000
runs). Extend the RPC timeout in case it's too short.
I was trying to run this test and I had copied the name of the function
from the comment, and it took a good while to figure out why
`go test -run` was returning `testing: warning: no tests to run`.
Before this fix, stream is removed from activeStreams in finishStream,
which happens when the service handler returns status, without waiting
for the status to be sent by loopyWriter. If GracefulStop() is called in
between, it will close the connection (because activeStreams is empty),
which causes the RPC to fail with "transport is closing". This change
moves the activeStreams cleanup into loopyWriter, after sending status
on wire.
* Expose a method from the internal package to get to the raw
StatusProto wrapped by the status error, and use it from
http2Server.WriteStatus().
* Add a helper method in internal/testutils to compare two status errors
and update test code to use that instead of reflect.DeepEqual()
This removes RequireHandshakeHybrid support and changes the default behavior
to RequireHandshakeOn. Dial calls will now block and wait for a successful
handshake before proceeding. Users relying on the old hybrid behavior (cmux
users) should consult https://github.com/soheilhy/cmux/issues/64.
Also, several tests have been updated to take this into consideration by
sending settings frames.
* Fixes established streams leak in the loopy writer.
RSTStreamFrames used to be ignored by the server transport, if a trailer had already been put into the transport's control buffer. If loopy writer couldn't write anything into a stream because of an error on the client side, then this trailer would never be sent. At that point, server would receive an RSTStreamFrame from client. But this RSTStreamFrame would be ignored because a trailer was already put into the control buffer. This would keep the stream open and in memory on the server side.
With this change, a cleanupStream item is put into the transport's control buffer, whenever an RSTStreamFrame is received by the server, even after a trailer has been put into the buffer.
* When client sends a header to initiate a stream just after sending an RST_STREAM, server gets these frames in the correct order.
When server receives the RST_STREAM, it marks the stream as done and defers the deletion of the stream to the loopy writer by putting a cleanupStream item into control buffer.
Then the server receives the header to initiate a stream. It acts on the header immediately and attempts to create the stream. But because the old stream is not deleted, it hits the number of streams limit and fails.
This commit solves this problem by letting server handle the deletion immediately after receiving the RST_STREAM.
* Refactors deleteStream method.
* Moves consts declarations into test function's body.
* Closes the client transport stream, if context is cancelled while recvBuffer is reading.
* Passes a function pointer to recvBufferReader, instead of a Stream and an http2Client.
* Adds more descriptive error messages.
* If waitOnHeader notices the context cancelation, shouldRetry no longer returns a ContextError. Instead, it returns the error from the last try.
* Makes sure that test gets both statuses at least 5 times.
* Makse cntPermDenied a lambda function.