fix: prevent cron from permanently dying after handler errors#16219
fix: prevent cron from permanently dying after handler errors#16219
Conversation
| queue: 'autorunSecond', | ||
| }) | ||
|
|
||
| // Wait for cron to pick up the new job on subsequent ticks |
There was a problem hiding this comment.
While the bug is active, the cron would not have picked up this valid queued job until process restart. Now, it will run
📦 esbuild Bundle Analysis for payloadThis analysis was generated by esbuild-bundle-analyzer. 🤖
Largest pathsThese visualization shows top 20 largest paths in the bundle.Meta file: packages/next/meta_index.json, Out file: esbuild/index.js
Meta file: packages/payload/meta_index.json, Out file: esbuild/index.js
Meta file: packages/payload/meta_shared.json, Out file: esbuild/exports/shared.js
Meta file: packages/richtext-lexical/meta_client.json, Out file: esbuild/exports/client_optimized/index.js
Meta file: packages/ui/meta_client.json, Out file: esbuild/exports/client_optimized/index.js
Meta file: packages/ui/meta_shared.json, Out file: esbuild/exports/shared_optimized/index.js
DetailsNext to the size is how much the size has increased or decreased compared with the base branch of this PR.
|
DanRibbens
left a comment
There was a problem hiding this comment.
One question about the updated croner major version.
| "ci-info": "^4.1.0", | ||
| "console-table-printer": "2.12.1", | ||
| "croner": "9.1.0", | ||
| "croner": "10.0.1", |
There was a problem hiding this comment.
https://github.com/Hexagon/croner/releases/tag/10.0.0
Did you verify that these breaking changes are not going to impact anyone?
There was a problem hiding this comment.
I updated the PR to set sloppyRanges: true which mitigates the most impactful breaking change.
There is one breaking change left that could affect users: Hexagon/croner#293
Example:
0 ? * * *
Before: ? got replaced with the current hour when the cron was created, so this became effectively 0 <current-hour> * * *.
Now: ? is a wildcard like *, so this is 0 * * * * - at minute 0 of every hour.
This old behavior was Croner-specific. Other cron libraries that support ? do not use it this way, so it seems unlikely many users would depend on it.
When any error occurs during a cron job, the cron stops firing permanently and never recovers until the process is restarted.
This happens because of a bug in croner v9 combined with missing error handling in the cron setup. Croner's
protect: trueoption sets an internalblockingflag before running the handler and only clears it after the handler completes. In v9, if the handler throws, the flag is never cleared because there's notry/finallyaround it. Every future cron seesblocking = trueand skips, so the cron is dead forever.Even without the croner bug, the error from the handler was propagating as an unhandled promise, which can crash the process.
One example where this affected us was if there is a db error while running the job. Even if that error is temporary and would be resolved during the next run, the cron job would no longer run until the process is restarted.
Two changes fix this:
Bumped croner from v9 to v10, which wraps the handler in
try/finallyso theblockingflag is always cleared regardless of whether the handler throws. This was a confirmed bug in v9Added the
catchoption to the Cron constructor, which tells croner to catch handler errors and pass them to a callback instead of letting them escape as unhandled rejections. The callback logs the error viapayload.logger.errorso it's visible in application logs.