Skip to content

Flaky CI failures in proxy-based replication tests #3515

@jihuayu

Description

@jihuayu

Several recent kvrocks.yaml CI runs failed in replication tests that simulate broken or slow replication links.

The most frequent failure is TestSlaveLostMaster:

--- FAIL: TestSlaveLostMaster
    client.go:147: forward tcp stream failed, err: read tcp 127.0.0.1:...: read: connection reset by peer
FAIL github.com/apache/kvrocks/tests/gocase/integration/replication

Examples:

This does not seem related to the PR contents. It has appeared across unrelated stream/cf/hash/dependency changes and across different CI matrix entries, especially macOS arm64, but not only there.

The failure appears to come from util.SimpleTCPProxy calling t.Fatalf from the proxy forwarding goroutine when the TCP connection is reset. In TestSlaveLostMaster, the test intentionally breaks the proxy connection to simulate a lost master, so a connection reset during that phase may be expected and should probably not fail the test directly.

Proxy-based tests worth auditing:

  • TestSlaveLostMaster
  • TestSlowConsumerBug
  • TestSlowConsumerBlocksIndefinitely

Execute the necessary steps

  • Investigate the reasons for failure
  • Confirm the purpose of the test cases
  • Propose solutions (communication is very important)
  • Complete the code implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions