Rolling upgrade test (BATS, Docker) by dsmiley · Pull Request #3706 · apache/solr

dsmiley · 2025-09-26T22:08:20Z

I spent a bunch of time with a GitHub Copilot Agent to develop a new BATS test to test a rolling upgrade of SolrCloud.
It ignores the current project/build since it runs Docker containers of Solr.

I think this test should not be run by default; it's a slow test. Not sure yet on the best way to segment this test and maybe other slow ones. At least another directory.

It could be improved to build the local project in Docker and use that image as the upgrade destination target.

Separately, I want to use this as a basis to check for Overseer disablement settings, but that's not present here. Maybe I'll add it in this PR.

Disclaimer: Bash is not my comfort zone, but AI churned this out. IntelliJ Junie was also super helpful.

…eper Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…r checking Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…, clean up comments Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…ssert, eliminate tedious patterns Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…rade Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…, reverse upgrade order, skip intermediate verification Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…iners Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…er ownership Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

…set proper ownership" This reverts commit 769a8e4.

…ted working Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

janhoy · 2025-10-16T11:58:04Z

Idea: Soon, solr will have Testcontainers as a dependency (#3670), and Testcontainers is ideal for orchestrating and managing containers like this. I suppose you can write a JUnit test that sets up a docker network, then adds 3 Solr containers in cloud mode, then write a loop where you swap one after the other with a newer version and inspect logs or what you need.

It makes sense for such a test to be disabled by default or included in nightly.

Here's Claude's draft stab at a rolling test: https://claude.ai/public/artifacts/c4d09d7b-df54-419a-b660-169e22191a26

dsmiley · 2025-10-16T13:37:33Z

Idea: Soon, solr will have Testcontainers as a dependency (#3670), and Testcontainers is ideal for orchestrating and managing containers like this.

Thank you for #3670 but you say having a dependency (on Testcontainers) as if we've achieved something as a project. It's a few lines of configuration to depend on it. An achievement would be https://issues.apache.org/jira/browse/SOLR-17653 albeit that particular issue wouldn't be helpful for the upgrade test.

I suppose you can write a JUnit test that sets up a docker network, then adds 3 Solr containers in cloud mode, then write a loop where you swap one after the other with a newer version and inspect logs or what you need.

It makes sense for such a test to be disabled by default or included in nightly.

Here's Claude's draft stab at a rolling test: https://claude.ai/public/artifacts/c4d09d7b-df54-419a-b660-169e22191a26

I am a huge fan of Testcontainers; I filed SOLR-17653 after all. I would have personally enjoyed writing/working with Java code to manipulate containers instead of Bash/BATS. I could have used my desire for SOLR-17653 as an excuse here to do it. I was awefully tempted. I contemplated doing that very thing and I had a 1:1 call with Eric Pugh about this rolling upgrade test to discuss how to go about it. I chose Bash/BATS in spite of disliking the framework. I am no good at this framework; GitHub Copilot did the critical 1st draft and most improvements afterwards. The rolling upgrade test scenario doesn't speak to the strengths of Testcontainers specifically (JUnit wrapper for Docker). It's to the strenghts of Docker, yes, put not specifically Testcontainers/JUnit. This is just orchestration here (Not SolrJ integration which definitely requires SOLR-17653), which is basically what our pile of Bash/BATS scripts do. We have a pile of similar Docker tests in Bash too, albeit not using BATS. Together (:solr:package:integrationTests & :solr:docker:testDocker) are high level integration tests, which is what the test here definitely is.

Put differently, if this specific test were to exist as a JUnit/Testcontainers based test, then it really calls into question why we have any Bash/BATS scripts at all -- why would/shouldn't they also be TestContainers based? "When in Rome, do as the romans do" -- I respected our status quo because it works, even though I'm not productive in it.

epugh · 2025-12-16T15:00:07Z

Did our Solr 10 base image change and remove curl?

We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it?

i thought we changed out back to having curl in it?

epugh · 2025-12-16T15:53:07Z

Did our Solr 10 base image change and remove curl?

We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it?

i thought we changed out back to having curl in it?

#3803 was where we restored curl.

epugh · 2025-12-16T16:06:29Z

  # which means you don't get an error message for passing a start arg, like --jvm-opts to a stop commmand.

+  # Pre-check
+  timeout || skip "timeout utility is not available"


I tried to use the new wait_for method but didn't have any luck. So instead, let's just have better handling for when timeout isn't available.

epugh · 2025-12-16T16:07:14Z

+  local resp code body
+  sleep 5
+  resp=$(curl -s -S -w "\n%{http_code}" -X POST -H 'Content-type:application/json' -d "$json" "$url")
+  code="${resp##*$'\n'}"


I wonder if this should be using a run and a assert_success type bats calls?

epugh · 2025-12-17T01:56:34Z

I've been doing some testing of Solr 9.7 to 9.10 wiht changing luceneMatchVersion and schema.xml versions, and using a variant of this test. I'm not sure when I have taken this one test too far.....

What would we want it to have to be considred worth merging? SSL setup? oauth based security? Other complex things?

dsmiley · 2025-12-17T14:56:14Z

What would we want it to have to be considred worth merging? SSL setup? oauth based security? Other complex things?

Gosh; nothing complex to get this useful test merged! I think the leading concern I have is that this test is slow and just different... I would like it to somehow get run separately from a normal integration test run. Like this test should perhaps be opt-in, and we do the opt-in on Jenkins CI but not elsewhere.

epugh · 2026-01-05T11:18:21Z

@dsmiley I think the one change I want to make is rename it test_rolling_upgrade.bats. The fact that it uses Docker isn't important, and we don't label other tests, like the extraction ones, with the word 'docker'.... WDYT?

I'd love to get this in to use in testing some other scenarios.

This reverts commit 4bebd3c.

Tried to make it much closer to test_docker_solrcloud style, but the combination of setup and setup_file + the tear downs defeated me.

dsmiley

test_rolling_upgrade.bats is a much better name.

What keeps this heavy test from the normal rotation?

dsmiley · 2026-01-05T20:20:19Z

why change this test in this PR?

I was hoping to get both of them using better docker patterns, I'll back it out...

epugh · 2026-01-05T21:00:37Z

test_rolling_upgrade.bats is a much better name.

What keeps this heavy test from the normal rotation?

Will rename. As far as heavy test form normal rotation, it honestly isn't that much heavier than some other tests that run. I think in a seperate PR would should come up with criteria for what is considered heavy, and deal with it there...

epugh · 2026-01-19T23:22:25Z

@dsmiley how do you feel about me merging this? I'd like to build on it for another scenario...

dsmiley · 2026-01-20T18:45:18Z

Go for it Eric! Very happy to have team-work where I start something and others take it further.

dsmiley · 2026-03-24T23:54:41Z

this integration test has been failing every day since March 20th -- https://ci-builds.apache.org/job/Solr/job/Solr-TestIntegration-10.x/

I wish our CI failures would notify the hell out of us individually. Basically nobody monitors the build list unless it's like an act of charity (and I'm in the mood to be charitable at the moment)

epugh · 2026-03-25T02:16:44Z

Let la like "wait_for" command error? Though I thought that was a bash function we defined???

epugh · 2026-03-25T17:19:16Z

Okay, looks like we time out after 30 seconds, but that appears from the log like it's way not enough time due to slow machine downloading docker images... That is how I read it.. Before I bump it to 120 seconds or larger, would love it if you thought the same:

# Docker artifacts saved to: /home/jenkins/jenkins-agent/workspace/Solr/Solr-TestIntegration-10.x/solr/packaging/build/test-output/docker
not ok 79 Docker SolrCloud rolling upgrade # in 30596 ms
# (in test file test/test_rolling_upgrade.bats, line 118)
#   `wait_for 30 1 docker exec solr-node1 solr healthcheck -c test-collection' failed with status 127
# 9.10.0-SNAPSHOT-slim: Pulling from apache/solr-nightly
# af6eca94c810: Pulling fs layer
# cb0efb96dabd: Pulling fs layer
# 3e9d91201f40: Pulling fs layer
# 66b76b382631: Pulling fs layer
# 601f2c23751f: Pulling fs layer
# 0ed719aef393: Pulling fs layer
# 69a09c96e8bf: Pulling fs layer
# c937aee9a870: Pulling fs layer
# 9cdfe5320de8: Pulling fs layer
# 3aa9120e46f5: Pulling fs layer
# 0ed719aef393: Waiting
# 69a09c96e8bf: Waiting
# c937aee9a870: Waiting
# 601f2c23751f: Waiting
# 9cdfe5320de8: Waiting
# 4f4fb700ef54: Pulling fs layer
# 3aa9120e46f5: Waiting
# 4f4fb700ef54: Waiting
# cb0efb96dabd: Verifying Checksum
# cb0efb96dabd: Download complete
# 3e9d91201f40: Verifying Checksum
# 3e9d91201f40: Download complete
# 66b76b382631: Verifying Checksum
# 66b76b382631: Download complete
# af6eca94c810: Verifying Checksum
# af6eca94c810: Download complete
# 601f2c23751f: Verifying Checksum
# 601f2c23751f: Download complete
# 69a09c96e8bf: Download complete
# c937aee9a870: Verifying Checksum
# c937aee9a870: Download complete
# 3aa9120e46f5: Download complete
# 9cdfe5320de8: Download complete
# af6eca94c810: Pull complete
# 4f4fb700ef54: Verifying Checksum
# 4f4fb700ef54: Download complete
# 0ed719aef393: Verifying Checksum
# 0ed719aef393: Download complete
# cb0efb96dabd: Pull complete
# 3e9d91201f40: Pull complete
# 66b76b382631: Pull complete
# 601f2c23751f: Pull complete
# 0ed719aef393: Pull complete
# 69a09c96e8bf: Pull complete
# c937aee9a870: Pull complete
# 9cdfe5320de8: Pull complete
# 3aa9120e46f5: Pull complete
# 4f4fb700ef54: Pull complete
# Digest: sha256:15c8155ceffe1042f37c77bb537aa531804a3510702ff87f2af5fedf523f4203
# Status: Downloaded newer image for apache/solr-nightly:9.10.0-SNAPSHOT-sli

epugh · 2026-03-25T17:30:03Z

actually, that doens't make sense... where the wait_for on line 118 is way after we have pulled hte images, so it's not waiting for the docker image to download to run the command I don't think...

dsmiley · 2026-03-25T20:47:24Z

From the description of this PR:

I think this test should not be run by default; it's a slow test. Not sure yet on the best way to segment this test and maybe other slow ones. At least another directory.

I still think that. I doubt we'll ever write a more costly test.

Copilot AI and others added 21 commits September 23, 2025 04:09

Initial plan

13e4be0

Add Docker SolrCloud BATS test with 2-node cluster and embedded ZooKe…

699c0c8

…eper Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Enhance Docker SolrCloud test: add 3rd node, 300MB memory limit, erro…

f734ba5

…r checking Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Fix memory issues and remove queries from Docker SolrCloud test

f1bb301

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Remove warning checks and add Docker volume-based rolling upgrade test

de274ec

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Address PR feedback: remove save_home_on_failure, parameterize images…

d80c1e2

…, clean up comments Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Add Docker Compose SolrCloud test with separate ZooKeeper container

5a1ce45

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

packaging build.gradle integrationTests improvements

66875ef

Overhaul Docker SolrCloud test: use custom network, --rm flag, solr a…

535e4ab

…ssert, eliminate tedious patterns Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Add Docker SolrCloud BATS test with custom networking and rolling upg…

b1fd519

…rade Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Finalize Docker SolrCloud test: remove redundant env vars, use --host…

490d40f

…, reverse upgrade order, skip intermediate verification Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Simplify

6ba5863

Fix Docker volume permissions by adding --user=solr flag to all conta…

b28decd

…iners Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Fix Docker volume permissions: create volumes in setup() and set prop…

769a8e4

…er ownership Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

Revert "Fix Docker volume permissions: create volumes in setup() and …

35610fe

…set proper ownership" This reverts commit 769a8e4.

fixes

456275a

fixes

19e9d23

Add wait_for utility, fix volume mount, add healthcheck - test valida…

aa72b52

…ted working Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>

fixes

159da8c

Better diagnostics. Support Solr 10.

a371565

More logging

9132401

dsmiley requested review from HoustonPutman and epugh September 26, 2025 22:08

github-actions Bot added the tool:build label Sep 26, 2025

epugh mentioned this pull request Oct 16, 2025

SOLR-16957: Test user managed cluster with a twist! #1875

Open

epugh reviewed Oct 16, 2025

View reviewed changes

Comment thread solr/packaging/test/test_rolling_upgrade.bats

epugh reviewed Oct 16, 2025

View reviewed changes

Comment thread solr/packaging/test/test_docker_solrcloud.bats Outdated

epugh reviewed Oct 16, 2025

View reviewed changes

Comment thread solr/packaging/test/test_docker_solrcloud.bats Outdated

refactoring of helpers

5875be2

Avoid failing bats test when timeout not installed.

8f02891

epugh reviewed Dec 16, 2025

View reviewed changes

epugh added 3 commits December 16, 2025 11:18

simplify logic

54eaf5e

Respond to comment.

7d78041

more specific docs on how to override

8a05ba8

Make sure indexed data is still there

39e5291

Solr now has multi arch build for docker image

4bebd3c

epugh added 3 commits January 5, 2026 06:48

Revert "Solr now has multi arch build for docker image"

6af75d4

This reverts commit 4bebd3c.

Simplify docker missing handling.

fcf77ac

Tried to make it much closer to test_docker_solrcloud style, but the combination of setup and setup_file + the tear downs defeated me.

Bring in deeper check for docker. assert more commands.

be2dc1d

dsmiley commented Jan 5, 2026

View reviewed changes

More accurate name of what the test is about

d413f42

epugh added the no-changelog label Jan 5, 2026

Merge remote-tracking branch 'upstream/main' into pr/3706

942fbac

epugh merged commit 13fd8ad into apache:main Feb 4, 2026
6 of 7 checks passed

Conversation

dsmiley commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janhoy commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsmiley commented Oct 16, 2025

Uh oh!

epugh commented Dec 16, 2025

Uh oh!

epugh commented Dec 16, 2025

Uh oh!

epugh Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

epugh Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

epugh commented Dec 17, 2025

Uh oh!

dsmiley commented Dec 17, 2025

Uh oh!

epugh commented Jan 5, 2026

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

dsmiley Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

epugh Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

epugh commented Jan 5, 2026

Uh oh!

epugh commented Jan 19, 2026

Uh oh!

dsmiley commented Jan 20, 2026

Uh oh!

Uh oh!

dsmiley commented Mar 24, 2026

Uh oh!

epugh commented Mar 25, 2026

Uh oh!

epugh commented Mar 25, 2026

Uh oh!

epugh commented Mar 25, 2026

Uh oh!

dsmiley commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

janhoy commented Oct 16, 2025 •

edited

Loading