Skip to content

Conversation

@pkit
Copy link

@pkit pkit commented Dec 6, 2025

During prepare_save we must unconditionally trigger an interrupt to ensure the guest gets notified after restore. The guest may have suppressed notifications, but after snapshot/restore it needs to be woken up regardless.

Fixes #5554

Changes

Fixes a bug where guest would hang indefinitely on interrupts after resume.

Reason

See above.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@pkit
Copy link
Author

pkit commented Dec 6, 2025

I will add some tests soon.

@pkit
Copy link
Author

pkit commented Dec 8, 2025

@dobrac I have improved your tests to iterate until pending ops queue is reproduced.
Now it quite reliably repros in under 10 iterations for me locally.

@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.23%. Comparing base (d130c7d) to head (97d23a9).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...vmm/src/devices/virtio/block/virtio/io/async_io.rs 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5568      +/-   ##
==========================================
- Coverage   83.24%   83.23%   -0.01%     
==========================================
  Files         277      277              
  Lines       29263    29268       +5     
==========================================
+ Hits        24359    24362       +3     
- Misses       4904     4906       +2     
Flag Coverage Δ
5.10-m5n.metal 83.58% <77.77%> (+<0.01%) ⬆️
5.10-m6a.metal 82.91% <77.77%> (-0.01%) ⬇️
5.10-m6g.metal 80.19% <77.77%> (-0.01%) ⬇️
5.10-m6i.metal 83.57% <77.77%> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.90% <77.77%> (-0.01%) ⬇️
5.10-m7g.metal 80.19% <77.77%> (-0.01%) ⬇️
5.10-m7i.metal-24xl 83.55% <77.77%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 83.54% <77.77%> (-0.02%) ⬇️
5.10-m8g.metal-24xl 80.18% <77.77%> (-0.01%) ⬇️
5.10-m8g.metal-48xl 80.19% <77.77%> (-0.01%) ⬇️
6.1-m5n.metal 83.60% <77.77%> (-0.01%) ⬇️
6.1-m6a.metal 82.94% <77.77%> (-0.02%) ⬇️
6.1-m6g.metal 80.19% <77.77%> (-0.01%) ⬇️
6.1-m6i.metal 83.60% <77.77%> (-0.01%) ⬇️
6.1-m7a.metal-48xl 82.93% <77.77%> (-0.01%) ⬇️
6.1-m7g.metal 80.18% <77.77%> (-0.01%) ⬇️
6.1-m7i.metal-24xl 83.61% <77.77%> (-0.01%) ⬇️
6.1-m7i.metal-48xl 83.61% <77.77%> (-0.02%) ⬇️
6.1-m8g.metal-24xl 80.18% <77.77%> (-0.01%) ⬇️
6.1-m8g.metal-48xl 80.19% <77.77%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

During prepare_save we must unconditionally trigger an interrupt
to ensure the guest gets notified after restore.
The guest may have suppressed notifications, but
after snapshot/restore it needs to be woken up regardless.

Fixes firecracker-microvm#5554

Signed-off-by: Constantine Peresypkin <pconstantine@gmail.com>
@pkit
Copy link
Author

pkit commented Dec 8, 2025

Codecov idea of "coverage" seems incorrect here. Flagging a debug print is not the best use of coverage checks. So, ignored.

@pkit
Copy link
Author

pkit commented Dec 9, 2025

@bchalios @kalyazin I'm not sure how to kick-off the codecov pass again, other than that this one should be ready.

@bchalios
Copy link
Contributor

@bchalios @kalyazin I'm not sure how to kick-off the codecov pass again, other than that this one should be ready.

Codecov updates automatically every time the PR tests are running. Nothing more to do on your side. We just need to let the tests run

queue.advance_used_ring_idx();

if queue.prepare_kick() {
if (force_signal && used_any) || queue.prepare_kick() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the problem seems to be that prepare_kick() returns false because of suppressed notifications from the side of the guest and then we somehow miss the chance to notify the guest.

If that is the case, then snapshot or no snapshot prepare_kick() should eventually return true and we should send the interrupt, which means that there's a problem in the way we save the state of the queue. If that's the case, let's fix that (instead of sending an unsolicited interrupt).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think prepare_kick() will eventually return true.
No new I/O is coming, guest kernel is in HLT state waiting for interrupt (I did a thread dump when it froze)
So it looks like a deadlock to me, thus just interrupting it solves the problem.
We can probably track num_added at the time of prepare_save or do not call prepare_kick at the time of the prepare_save at all.
But I'm not sure it's a better solution.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think prepare_kick() will eventually return true.

Well, it should, right? Although, indirectly. The fact that we don't send the interrupt shouldn't matter because the guest is not expecting it (unless there's a bug in prepare_kick()). So, the guest should see the descriptor we just added (even without the interrupt) and should continue adding more descriptors, so eventually we would call prepare_kick() and it would return true

The thing that is perplexing to me is that I can't understand why your solution fixes the issue. You are injecting an interrupt in the guest while we're taking a snapshot, but we're saving KVM state before we save device state. This means that the guest should never see this interrupt 😝

What I am afraid is that us sending the interrupt while taking the snapshot simply changes the ordering of things and hides the problem rather than actually fixing it.

Not saying that this is definitely the case, but I'm still looking into it. I want to make sure that we fix this properly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this bug is all about race condition as adding simple debug prints reduces reproduction rate significantly. So I agree that it looks fishy.

@pkit
Copy link
Author

pkit commented Dec 10, 2025

@bchalios
Here's the thread dump:

firecracker (pid=155):
[<0>] ep_poll+0x46a/0x4a0
[<0>] do_epoll_wait+0x58/0xd0
[<0>] do_compat_epoll_pwait.part.0+0x12/0x90
[<0>] __x64_sys_epoll_pwait+0x8c/0x150
[<0>] x64_sys_call+0x1814/0x2330
[<0>] do_syscall_64+0x81/0xc90
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

fc_api (pid=156):
[<0>] ep_poll+0x46a/0x4a0
[<0>] do_epoll_wait+0x58/0xd0
[<0>] do_compat_epoll_pwait.part.0+0x12/0x90
[<0>] __x64_sys_epoll_pwait+0x8c/0x150
[<0>] x64_sys_call+0x1814/0x2330
[<0>] do_syscall_64+0x81/0xc90
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

fc_vcpu 0 (pid=161):
[<0>] kvm_vcpu_block+0x4a/0xc0 [kvm]
[<0>] kvm_vcpu_halt+0xfc/0x4b0 [kvm]
[<0>] vcpu_run+0x202/0x280 [kvm]
[<0>] kvm_arch_vcpu_ioctl_run+0x351/0x510 [kvm]
[<0>] kvm_vcpu_ioctl+0x128/0x910 [kvm]
[<0>] __x64_sys_ioctl+0xa0/0x100
[<0>] x64_sys_call+0x1151/0x2330
[<0>] do_syscall_64+0x81/0xc90
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

fc_vcpu 1 (pid=162):
[<0>] kvm_vcpu_block+0x4a/0xc0 [kvm]
[<0>] kvm_vcpu_halt+0xfc/0x4b0 [kvm]
[<0>] vcpu_run+0x202/0x280 [kvm]
[<0>] kvm_arch_vcpu_ioctl_run+0x351/0x510 [kvm]
[<0>] kvm_vcpu_ioctl+0x128/0x910 [kvm]
[<0>] __x64_sys_ioctl+0xa0/0x100
[<0>] x64_sys_call+0x1151/0x2330
[<0>] do_syscall_64+0x81/0xc90
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

kvm-nx-lpage-re (pid=163):
[<0>] vhost_task_fn+0xea/0x110
[<0>] ret_from_fork+0x131/0x150
[<0>] ret_from_fork_asm+0x1a/0x30

As you can see both vCPUs are halted waiting for interrupt, so the interrupt is not really "unsolicited" (although technically it is).
I tried various other approaches, but all of them essentially boil down to "send interrupt on save or on restore".
Not sure which one is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] When using Async IO Engine pending ops cause resume to freeze

2 participants