Skip to content

Adding a yield in getTeamByServers to prevent CPU monopolization#12858

Open
neethuhaneesha wants to merge 10 commits intoapple:release-7.3from
neethuhaneesha:dd-7.3
Open

Adding a yield in getTeamByServers to prevent CPU monopolization#12858
neethuhaneesha wants to merge 10 commits intoapple:release-7.3from
neethuhaneesha:dd-7.3

Conversation

@neethuhaneesha
Copy link
Copy Markdown
Contributor

Adding a delay in getTeamByServers to prevent CPU monopolization

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

neethuhaneesha and others added 10 commits April 30, 2025 13:08
…pple#12146)

* Go: simplify network start check logic

This change attempts to address the SIGSEGV happening when network routine is started
multiple times concurrently.
It changes the network mutex to be a RW mutex, to optimize the case
of calls when network is already started.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f4600000011 pc=0x7f46a405678e]

runtime stack:
runtime.throw({0x1169773?, 0x7f46a479f96c?})
	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f465affb790 sp=0x7f465affb760 pc=0x44e45d
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:821 +0x3e9 fp=0x7f465affb7f0 sp=0x7f465affb790 pc=0x466e49

goroutine 60 [syscall]:
runtime.cgocall(0xf4f640, 0xc0001f7f80)
	/usr/local/go/src/runtime/cgocall.go:157 +0x6e fp=0xc0001f7f58 sp=0xc0001f7f20 pc=0x41840e
github.com/apple/foundationdb/bindings/go/src/fdb._Cfunc_fdb_run_network()
	_cgo_gotypes.go:378 +0x85 fp=0xc0001f7f80 sp=0xc0001f7f58 pc=0xd19d25
github.com/apple/foundationdb/bindings/go/src/fdb.startNetwork.func1()
	/home/user/go/pkg/mod/github.com/apple/foundationdb/bindings/go@v0.0.0-20221026173525-97cc643cef69/src/fdb/fdb.go:209 +0x2a fp=0xc0001f7fe0 sp=0xc0001f7f80 pc=0xd2704a
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001f7fe8 sp=0xc0001f7fe0 pc=0x487741
created by github.com/apple/foundationdb/bindings/go/src/fdb.startNetwork
	/home/user/go/pkg/mod/github.com/apple/foundationdb/bindings/go@v0.0.0-20221026173525-97cc643cef69/src/fdb/fdb.go:208 +0x72

goroutine 1 [running]:
	goroutine running on other thread; stack unavailable

* Document format for SetSpanParent

SetSpanParent does not take a 16-bytes trace id as documentation mentions,
but rather requires a specific serialized binary format (described in description field).

* Do not use pointer receiver for Close()

A mix of pointer/non-pointer receivers makes it impossible to use interfaces

* Typo fix in Close() GoDoc

* Do not commit read-only transactions

This a marginal improvement which follows the C API documentation:
> It is not necessary to commit a read-only transaction – you can simply call fdb_transaction_destroy().

Read-only transactions will be always destroyed before returning to caller,
without relying on Go garbage collection; the read-only transactions
can be destroyed so early because futures created within them must never be used
outside of them.

* Go: do not automatically close database objects (apple#11394)

Setting the finalizer prevents user from calling Close(),
as it would randomly result in SIGSEGV or some other silent memory corruption.
No finalizer is set for the database and user is expected to call Close() to
avoid memory leaks.

* Go: make sure that database creation happens only while network thread is running

All API calls which create a database are guaranteed to happen while the network thread is running;
there is no performance penalty for calls happening after network thread has already been started.

Notable change: if the call to run network thread fails, a panic will be generated instead of a logged error.

* Go: add StopNetwork method

Add a method which allows to a safe explicit stop of the network thread.
This method waits for the network thread to exit before returning to caller.

* Go: return an error when calling StopNetwork() multiple times

Specify which functions are safe to be called from multiple goroutines

* Documentation fix: datacenter_id does not need to be hexadecimal (apple#11519)

* Go binding: allow cancelling snapshots and R/O transactions

It is beneficial to cancel a transaction, even if not performing any changes,
so that futures and other associated resources can be released early.

* Go binding: fix bug with R/O transaction destroyed before futures, add GC references for db/tx (apple#11611)

* Go binding: fix bug with R/O transaction destroyed before futures, add GC references for db/tx

In a previous PR the R/O transaction was optimized and the finalizer was removed; however this prevents using
any future, since transaction is already destroyed by the time future is attempted evaluation.

Additionally, a new issue is uncovered and addressed: the Go Database, Transaction and Future objects are all
garbage-collected independently, which is incorrect because Database and Transaction must not be garbage-collected
if any future has not yet been garbage-collected.

* Go binding: add test for future evaluated outside of R/O transaction

* docs: mention -1 and 2 as values read/written for the data_distribution special key (apple#11789)

A value of -1 means never initialized, but in this case DD is considered enabled.
A value of 2 is used to enable a security mode which disables data moves but allows auditStorage part.
The rebalance_ignored subkey allows using integers since 7.1

* Go binding: do not override wrapped transaction error

When user wraps a FoundationDB error in their transaction function,
the retryable() logic will always reset it back to a FoundationDB error,
without any wrapping.
This change makes sure that original error is preserved, unless
fdb_transaction_on_error() returned a different error or error code.

NOTE: fdb_transaction_on_error() does not currently ever return a different error
than original, so the new logic is being defensive for future changes

* docs: update GoDoc for ReadTransact()

Mention that R/O transactions are garbage-collected once futures go out of scope.

* Go binding: add GetClientStatus method to Database

Allow fetching client status JSON information for any database with multi-version client enabled;
the raw JSON is returned, so that multiple versions of FoundationDB are supported without any Go
structure constraint.

* Build a working image when target architecture is arm64

Converted some ENV declaration to not trigger warnings

* fix: do not attempt fetching AWS account id twice
…2213)

* Fix a race that can causes recovery to be stuck

When purging old generations, these no longer needed generations are removed in
the in-memory LogSystem data structure. Then the change is made durable on
coordinators.

If there is a recovery happened before the change is durable, and ServerDBInfo
broadcast is sent to the old tlogs and they can be displaced/removed. As a
result, the recovery will become stuck waiting for locking these old tlogs.

This PR updates the purging so that it never directly changes the in-memory data
structure and only modifies states on coordinators. So next recovery will pick
up the change.

* Add in-memory purging of old generation after writing coordinators

This allows old tlogs to be removed after CC purges them.

20250628-180019-jzhou-69511e1cbc01ab08
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 8c0f52d
  • Duration 0:04:01
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 8c0f52d
  • Duration 0:08:18
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 8c0f52d
  • Duration 0:08:15
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 8c0f52d
  • Duration 0:08:37
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 8c0f52d
  • Duration 0:36:59
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 8c0f52d
  • Duration 0:39:11
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants