Before these fixes, it was possible to see errors on new RPCs after a
connection began draining, and before establishing a new connection. There is
an inherent race between choosing a SubConn and attempting to creating a stream
on it. We should be able to avoid application-visible RPC errors due to this
with transparent retry. However, several bugs were preventing this from
working correctly:
1. Non-wait-for-ready RPCs were skipping transparent retry, though the retry
design calls for retrying them.
2. The transport closed itself (and would consequently error new RPCs) before
notifying the SubConn that it was draining.
3. The SubConn wasn't synchronously updating itself once it was notified about
the closing or draining state.
4. The SubConn would go into the TRANSIENT_FAILURE state instantaneously,
causing RPCs to fail instead of queue.
Before this fix, stream is removed from activeStreams in finishStream,
which happens when the service handler returns status, without waiting
for the status to be sent by loopyWriter. If GracefulStop() is called in
between, it will close the connection (because activeStreams is empty),
which causes the RPC to fail with "transport is closing". This change
moves the activeStreams cleanup into loopyWriter, after sending status
on wire.
* Expose a method from the internal package to get to the raw
StatusProto wrapped by the status error, and use it from
http2Server.WriteStatus().
* Add a helper method in internal/testutils to compare two status errors
and update test code to use that instead of reflect.DeepEqual()
* When a RST_STREAM is received by the server transport, a cleanupStream
item is placed into controlbuf no matter what.
* Updates comments.
* Replaces getCleanupStream with inline struct initialization.
The client-side traces were otherwise only showing `RPC: to <nil>`,
which is not helpful.
Also clean up construction of traceInfo and firstLine in a few places.
`Write` can be called concurrently, for which it calls the `do` function.
As `WriteStatus` can close the `ht.writes` in parallel as well the `Write`
will try to write into the `ht.writes` in the `do` function, this can
lead into a panic. As there is no real usability on closing this channel
we can simply leave it to the garbage collector so we can avoid panic
during an execution.
Signed-off-by: André Martins <aanm90@gmail.com>
This removes RequireHandshakeHybrid support and changes the default behavior
to RequireHandshakeOn. Dial calls will now block and wait for a successful
handshake before proceeding. Users relying on the old hybrid behavior (cmux
users) should consult https://github.com/soheilhy/cmux/issues/64.
Also, several tests have been updated to take this into consideration by
sending settings frames.
* Fixes established streams leak in the loopy writer.
RSTStreamFrames used to be ignored by the server transport, if a trailer had already been put into the transport's control buffer. If loopy writer couldn't write anything into a stream because of an error on the client side, then this trailer would never be sent. At that point, server would receive an RSTStreamFrame from client. But this RSTStreamFrame would be ignored because a trailer was already put into the control buffer. This would keep the stream open and in memory on the server side.
With this change, a cleanupStream item is put into the transport's control buffer, whenever an RSTStreamFrame is received by the server, even after a trailer has been put into the buffer.
* When client sends a header to initiate a stream just after sending an RST_STREAM, server gets these frames in the correct order.
When server receives the RST_STREAM, it marks the stream as done and defers the deletion of the stream to the loopy writer by putting a cleanupStream item into control buffer.
Then the server receives the header to initiate a stream. It acts on the header immediately and attempts to create the stream. But because the old stream is not deleted, it hits the number of streams limit and fails.
This commit solves this problem by letting server handle the deletion immediately after receiving the RST_STREAM.
* Refactors deleteStream method.
* Moves consts declarations into test function's body.
6cc789b34b made `envconfig.RequireHandshakeOn` the default when unspecified by environment variable, but missed a fallthrough leading to `GRPC_GO_REQUIRE_HANDSHAKE=on` specifying `envconfig.RequireHandshakeHybrid`. This change adds the missing fallthrough.
* Closes the client transport stream, if context is cancelled while recvBuffer is reading.
* Passes a function pointer to recvBufferReader, instead of a Stream and an http2Client.
* Adds more descriptive error messages.
* If waitOnHeader notices the context cancelation, shouldRetry no longer returns a ContextError. Instead, it returns the error from the last try.
* Makes sure that test gets both statuses at least 5 times.
* Makse cntPermDenied a lambda function.
Previously, the transport was able to reset via the retry loop,
or via the event closures calling resetTransport. This meant
a very large amount of synchronization was necessary: one
reset meant the other had to not reset; state had to be kept
at the addrconn; and very subtle interactions were hard to
reason about.
This change removes the ability for event closures to directly
reset the transport. Instead, they signal to to the retry
loop about the event, and the retry loop is always the single
place that retries occur.
This also allows us to refactor the address switching logic
into a much simpler for loop inside the retry loop instead of
using addrConn state to keep track of an index.