rails_migration[gitlab-rails] action run fails from 14.8.2 to 14.10.0 (Ubuntu 20.04) (#360377) · Issues · GitLab.org / GitLab

Just to confirm: I'm running into the same issue and would be very much interested in a fix :). Identical system information, other than the URLs.

Edit: I'm trying to upgrade from 14.6.3-ce.0.

Managed to fix this on my end by running the following commands:

gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
gitlab-rake db:migrate
gitlab-ctl reconfigure
apt dist-upgrade
gitlab-ctl restart

That's based on the steps outlined/linked by @AHBrook in #360377 (comment 926678321) below – thank you! Also cf. omnibus-gitlab#6795 (closed), and omnibus-gitlab#6797 (closed).

Thanks, Michael! The steps you listed have solved the issue in both of our installations.

These are the steps:

gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
gitlab-rake db:migrate
gitlab-ctl reconfigure
apt dist-upgrade
gitlab-ctl restart

Hi @JuliBCN -- Did you run these steps after the failed upgrade or beforehand?

Hi Brad,

I did those after the migration failure. gitlab-ctl reconfigure could not finish, so the service was down. It can be started manually if needed, but no changes will be allowed, per example, renew the SSL Cert with Let's Encrypt.

With those steps, I could resolve the issue and fix the db migration.

Ran into the same issue doing staged upgrades from 14.8.2 to 14.10.0. (Did each intermediate upgrade: 14.8.2 -> 14.8.3 -> 14.8.6 -> 14.9.0 -> 14.9.5 -> 14.10.0). Still hit the issue.

Thanks to the steps above (skipping the apt dist-upgrade one that seems like overkill).

Note also the first step: gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]'] will be parsed incorrectly if you use zsh as your shell. Run it in bash or escape it correctly.

We just encountered the same thing, upgrading from 14.8.2 to 14.10.0. We checked and didn't see any warnings regarding this upgrade path.

Bit more info: We did a restore of our VM back to 14.8.2 whole cloth, and then did a staged update to 14.9.3, and then to 14.10.0. All went fine. I suspect there is some upgrade happening in the 14.9.X upgrade that the 14.10 is expecting to already be done.

I spoke too soon! We ran into the exact same issue in production, despite going 14.8.2 -> 14.9.3 -> 14.10.0. The only difference between our test and prod boxes was that we had to restore Test from a VM backup.

We attempted to restart the server and run sudo gitlab-ctl reconfigure again, and we got the same error. So we are going to roll back production and start over. Interestingly, despite the errors, we are seeing all the services up and running properly and are able to log in and see our systems. It still makes me uneasy though.

I did find a stack Overflow article that seemed similar, indicating RAM issues: https://stackoverflow.com/questions/46907157/cannot-install-gitlab-using-omnibus-error-executing-action-run-on-resource-b

The following errors are in our PostgreSQL "current" log:

2022-04-27_09:12:00.64116 LOG:  starting PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2022-04-27_09:12:00.65239 LOG:  listening on Unix socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"
2022-04-27_09:12:00.86046 LOG:  database system was shut down at 2022-04-27 09:10:49 GMT
2022-04-27_09:12:00.90642 LOG:  database system is ready to accept connections
2022-04-27_16:27:56.75786 ERROR:  duplicate key value violates unique constraint "namespace_aggregation_schedules_pkey"
2022-04-27_16:27:56.75788 DETAIL:  Key (namespace_id)=(106) already exists.
2022-04-27_16:27:56.75788 STATEMENT:  /*application:sidekiq,correlation_id:71f72b733d02f72777dfa460f0b04b34,jid:aa777b2d954ff0ab295f8f09,endpoint_id:Namespaces::ScheduleAggregationWorker,db_config_name:main*/ INSERT INTO "namespace_aggregation_schedules" ("namespace_id") VALUES (106) RETURNING "namespace_id"
2022-04-27_18:16:35.62741 ERROR:  duplicate key value violates unique constraint "namespace_aggregation_schedules_pkey"
2022-04-27_18:16:35.62743 DETAIL:  Key (namespace_id)=(106) already exists.
2022-04-27_18:16:35.62744 STATEMENT:  /*application:sidekiq,correlation_id:21f0bab666a93a76ef21d4099defc274,jid:531e0695389fa2420bf3a39c,endpoint_id:Namespaces::ScheduleAggregationWorker,db_config_name:main*/ INSERT INTO "namespace_aggregation_schedules" ("namespace_id") VALUES (106) RETURNING "namespace_id"

After a bunch of hunting, a GitLab community post showed up in my search and pointed me in the right direction.

https://forum.gitlab.com/t/gitlab-ctl-reconfigure-doesnt-work-after-gitlab-omnibus-updated/68715

The command suggested for the finalize migrations didn't work for me, but running the one the output suggested did. Now, everything looks to be working properly... but I'm still worried about the long-term health of the system. We still see errors in our sql logs about column "on_hold_until" does not exist at character 316, but that doesn't appear to be hurting anything.

@AHBrook Thanks, I've edited the issue description with possible fixes linking to your comment and the forum and Reddit posts.

I believe the workaround for Docker/Docker Swarm deployments would be:

Disable auto-reconfigure by mounting file /etc/gitlab/skip-auto-reconfigure
Disable automatic migration at reconfigure even if it ends up running despite the above, add to gitlab.rb the value gitlab_rails['auto_migrate'] = false
Start the service container with these altered config mounted in it and it will not enter a crash loop this time 😌

Open a shell 🐚 and run the commands you need

gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
gitlab-rake db:migrate
gitlab-ctl reconfigure
gitlab-ctl restart

Revert all the config change and file creation performed above for future upgrades 🎉

(thanks to @hchouraria for these directions 😄 )

What would the workaround be for Kubernetes deployments?

What would the workaround be for Kubernetes deployments?

The gitlab-toolbox pods should be available with the new codebase once the migration jobs are failing with this.

Execing into the toolbox pod and running gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]'] should get that in place.

Then re-running the helm upgrade command for the chart should trigger a new rollout with a new migration job.

I see that from the forum post they tried this, and the finalise job is failing in their case. It looks like the finalise failure needs to be investigated.

I'm trying to work out how to get around this issue for my K8s install.

The above command fails as per my forum post, is there a way to find out why it's failing? The error given by the above command even with trace on is not helpful.

`rake aborted!

Gitlab::Database::BackgroundMigration::BatchedMigrationRunner::FailedToFinalize: Gitlab::Database::BackgroundMigration::BatchedMigrationRunner::FailedToFinalize`

@greg My customer attempted these steps, unsuccessfully. Any recommendations?

Thought I'd come back to share what I've done to resolve it and upgrade to 14.10.4

First, - downgraded to 14.7.7 (Where I was originally)
Tried the upgrade again (crazy I know), this failed. Reverted my install back to 14.7.7
Ran kubectl exec <gitlab-toolbox-pod-name> -it -- bash
Then executed gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
Which came back with Done.
Re-ran the upgrade to 14.10.4 which worked this time.

Activity