Some time ago, an application of mine that is deployed in the Pivotal Cloudfoundry began throwing errors and crashed:
2019-09-29T06:42:54.76+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:54.761 INFO 10 --- [ main] o.a.c.c.C.[.[localhost].[/ithubbs] : Initializing Spring embedded WebApplicationContext
2019-09-29T06:42:54.76+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:54.763 INFO 10 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4397 ms
2019-09-29T06:42:55.00+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:55.001 INFO 10 --- [ main] o.c.metrics.CloudFoundryTagsMeterFilter : Adding CloudFoundry Micrometer tags if not otherwise specified: cf.account=https://donotuseapi.run.pivotal.io, cf.application=ithubbs_backend, cf.cluster=ithubbs_backend, cf.instance.index=0, cf.organization=ithubbs, cf.space=prod
2019-09-29T06:42:55.87+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:55.877 INFO 10 --- [ main] o.s.c.s.r.PooledDataSourceCreator : Found HikariCP on the classpath. Using it for DataSource connection pooling.
2019-09-29T06:42:56.63+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:56.631 INFO 10 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2019-09-29T06:42:56.89+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:56.898 INFO 10 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2019-09-29T06:42:58.51+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.518 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT COUNT(*) FROM public.databasechangeloglock
2019-09-29T06:42:58.53+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.538 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT COUNT(*) FROM public.databasechangeloglock
2019-09-29T06:42:58.55+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.544 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:42:58.55+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.558 INFO 10 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock....
2019-09-29T06:43:08.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:08.559 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:08.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:08.561 INFO 10 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock....
2019-09-29T06:43:18.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:18.563 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:18.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:18.568 INFO 10 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock....
2019-09-29T06:43:28.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:28.569 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:28.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:28.571 INFO 10 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock....
2019-09-29T06:43:38.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:38.572 INFO 10 --- [ main] liquibase.executor.jvm.JdbcExecutor : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:38.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:38.575 INFO 10 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock....
2019-09-29T06:43:45.77+0200 [HEALTH/0] ERR Failed to make TCP connection to port 8080: connection refused
2019-09-29T06:43:45.77+0200 [CELL/0] ERR Timed out after 1m0s: health check never passed.
2019-09-29T06:43:45.78+0200 [CELL/SSHD/0] OUT Exit status 0
2019-09-29T06:43:45.87+0200 [APP/PROC/WEB/0] OUT Exit status 143
2019-09-29T06:43:51.43+0200 [CELL/0] OUT Cell 76dba435-3477-41a6-bea1-9bd134040065 stopping instance f480f9d0-d3d7-4842-598c-8926
2019-09-29T06:43:51.44+0200 [API/4] OUT Process has crashed with type: "web"
2019-09-29T06:43:51.43+0200 [CELL/0] OUT Cell 76dba435-3477-41a6-bea1-9bd134040065 destroying container for instance f480f9d0-d3d7-4842-598c-8926
2019-09-29T06:43:51.47+0200 [API/4] OUT App instance exited with guid 740b4dfc-3200-4234-bc8c-d0efc7f38e67 payload: {"instance"=>"f480f9d0-d3d7-4842-598c-8926", "index"=>0, "cell_id"=>"76dba435-3477-41a6-bea1-9bd134040065", "reason"=>"CRASHED", "exit_description"=>"Instance never healthy after 1m0s: Failed to make TCP connection to port 8080: connection refused", "crash_count"=>1, "crash_timestamp"=>1569732231402656049, "version"=>"d2b95a45-ad59-433c-b780-08c2e5ba31d4"}
There have been no prior changes on the application or the setup.
Because the error occurred on a live application, I tried restarting it without looking too much into the cause of the problem. It was important to have this application live again as soon as possible. However, restarting failed with the same error.
In step two, I suspected Spring to not have enough resources, so I raised both “memory allocated” and “disk allowed”, without effect.
At that time, I began looking into the log in more detail. The failed connection / “connection refused” pointed to the database because it is the only service for that application. I thought that the health-check prevented startup of the application because the connection to the database could not be established fast enough. Raising the health-check timeout to 10 minutes also did not work.
At that point, it is worth mentioning that my application uses Liquibase. Googling for the multiple “Waiting for changelog lock” brought me to this article which suggested that Liquibase locked the database. Indeed, a
SELECT * FROM DATABASECHANGELOGLOCK;
returned
1 true2019-09-28 01:42:13 +0000 86b4b40a-826e-47f0-66f6-38f1 (10.243.232.98)
Unlocking it with
UPDATE DATABASECHANGELOGLOCK
SET locked=0, lockgranted=null, lockedby=null
WHERE id=1
brought the application back up.
Researching for the reason did not bring a specific and reproducible result. I found hints to the fact that when the application is stopped, Liquibase sometimes keeps the lock. However, that would not explain why the app was restarted in the first place.
I decided not to go into further research because this incident was the first problem with Liquibase and nothing really bad happened. Sadly, there are no build-in notifications for crashes like this in Cloudfoundry.
TL;DR
If you see “Waiting for changelog lock….” multiple times during startup of your Liquibase-using application, check if your database is unlocked.