Local credentials should be used in either a UDS and local TCP connection. The former will be associated with the security level PrigvacyAndIntegrity while the latter is associated with NoSecurity. Local credentials should be used instead of WithInsecure for localhost connections.
- Stop sending empty update to sub-balancers at init time
- At init, when only one sub-balancer reports transient failure, wait for other sub-balancers
- When aggregating states, consider a sub-balancer turns Connecting from TransientFailure still in TransientFailure, so aggregated state doesn't stay connecting for a long time
Replace various //go:generate lines and regenerate.sh scripts with a
single, top-level regenerate.sh that regenerates all .pb.go files.
Placing generation in a single script ensures that all files are
generated with similar parameters. The new regenerate.sh uses the
protoc-gen-go version defined in test/tools/go.mod and automatically
handles new .proto files as they are added.
Do some minor refactoring on .proto files: Every file now has a
go_package option (which will be required by a future version of the
code generator), and file imports are all relative to the repository
root.
The `Target` field in the response has been deprecated in favor or
`Targets` which specifies a list of possible targets. Pulled in the
latest proto and updated code referencing it.
Currently (go1.13.4), the default stack size for newly spawned
goroutines is 2048 bytes. This is insufficient when processing gRPC
requests as the we often require more than 4 KiB stacks. This causes the
Go runtime to call runtime.morestack at least twice per RPC, which
causes performance to suffer needlessly as stack reallocations require
all sorts of internal work such as changing pointers to point to new
addresses.
Since this stack growth is guaranteed to happen at least twice per RPC,
reusing goroutines gives us two wins:
1. The stack is already grown to 8 KiB after the first RPC, so
subsequent RPCs do not call runtime.morestack.
2. We eliminate the need to spawn a new goroutine for each request
(even though they're relatively inexpensive).
Performance improves across the board. The improvement is especially
visible in small, unary requests as the overhead of stack reallocation
is higher, percentage-wise. QPS is up anywhere between 3% and 5%
depending on the number of concurrent RPC requests in flight. Latency is
down ~3%. There is even a 1% decrease in memory footprint in some cases,
though that is an unintended, but happy coincidence.
unary-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_8-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
Title Before After Percentage
TotalOps 2613512 2701705 3.37%
SendOps 0 0 NaN%
RecvOps 0 0 NaN%
Bytes/op 8657.00 8654.17 -0.03%
Allocs/op 173.37 173.28 0.00%
ReqT/op 348468.27 360227.33 3.37%
RespT/op 348468.27 360227.33 3.37%
50th-Lat 174.601µs 167.378µs -4.14%
90th-Lat 233.132µs 229.087µs -1.74%
99th-Lat 438.98µs 441.857µs 0.66%
Avg-Lat 183.263µs 177.26µs -3.28%
If a benchmark run did not specify req/resp payload curve files, and the
result from that run is used on the `benchresult` binary, it will
segfault because there is no check for the presence of the payload curve
option before printing the set of features used. Added a sanity check
there.