Opened 8 years ago
Last modified 8 years ago
#16429 new defect
Hard lockup of Linux guests on Mac Host
Reported by: | drewmoseley | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 5.1.14 |
Keywords: | lockup, hang | Cc: | |
Guest type: | Linux | Host type: | other |
Description
I am running an Ubuntu 16.04 guest on a MacOS Sierra host. In my guest I am running a Yocto project build (https://yoctoproject.org/). This is a fairly substantial workload and eventually the target apparently locks up. I've try to get kernel logs out of dmesg, /var/log/syslog and the VT1 console but when the system is hung, there is nothing printed to any of those locations indicating any trouble.
I have seen this happen also on Ubuntu 14 and CentOS 7 targets.
It is reproducible 100% of the time with the Yocto project build. When it is hung, the GUI is still displayed but nothing updates, not even the clock app.
I tested with both VirtualBox 5.1.14 and 5.0.32 and could reproduce the issue with both versions. I'm testing now with the 5.1.15 nightly build
Attachments (6)
Change History (24)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
No luck with the 5.1.15 test build.
I do use shared folders as well as NFS.
I'll run a new build and trigger the failure and attach the VBox.log file.
After that I will try as a single-core system with IOAPIC disabled. This seems similar to https://www.215389.xyz/ticket/15529.
comment:3 by , 8 years ago
Attached is a VBox.log file when the system is in the hung state.
Next I'll try a single core system with IOAPIC disabled.
by , 8 years ago
comment:4 by , 8 years ago
I forgot to point out that I reverted to the released 5.1.14 r112924 release for the current testing. The 5.1.15 test build seemed to have some issues with host-only networking stability.
by , 8 years ago
Attachment: | VBox-single-core-no-APIC-no-failure.log added |
---|
Log file of my single build that did _not_ lock up
comment:5 by , 8 years ago
Initial single core build with no IOAPIC succeeded. I'm going to bump up the parallelism of my build but leave it at unicore to stress the system a bit.
comment:6 by , 8 years ago
No failures with increased parallelism in my builds.
Using a single core build with IOAPIC explicitly enabled I am also unable to reproduce the system hang.
I'll put it back to a multi-core system and run with debugging enabled.
by , 8 years ago
Attachment: | VBox-debug-info-apic-ioapic.log added |
---|
VBox log from hung session in debug mode.
comment:7 by , 8 years ago
I've attached the log and debug window output from a hung system with debugging enabled. For some reason copy/paste doesn't work from the debug window in MacOS so I attached it as a png.
comment:9 by , 8 years ago
I'm not doing significant vboxsf traffic. I can try disabling all file shares and removing the module to see if that has any effect.
Also, I get no kernel stack traces so it's hard to say for certain whether this is the same or not.
comment:10 by , 8 years ago
I removed all shared folders, unloaded the vboxsf module, and the system still hung.
comment:11 by , 8 years ago
Thanks for the additional testing. So it doesn't look like this is related to shared folders at all. Let me summarize: You observe the guest hang with multi-SMP guests as well as with a single SMP guest if the I/O-APIC is enabled. With I/O-APIC disabled you don't see the hang.
comment:12 by , 8 years ago
Could you provide additional information: Try to reproduce the hang with 1 VCPU with I/O-APIC enabled, then enter
info ioapic detect dmesg info cpum info cpumguest
into the debug console. I saw you already found out how to enable it. We would prefer if you could copy+paste the output from the console to a separate file and attach it to this ticket as text file not .png. And please do also attach the corresponding VBox.log file to the VM session where you took the commands for.
And it would also help if you could repeat the test with 2 VCPUs and attach the same amount of files (info ... from above + corresponding VBox.log file).
follow-up: 14 comment:13 by , 8 years ago
Frank, your summary is close. With a single core system, I have no guest OS hang with ioapic enabled or disabled. With a multicore system it seems to hang every time with no kernel messages or dmesg output on the guest OS.
As an experiment, I tried disabling nested paging and the system did not lock up. It ran extremely slowly though so I gave up after about 36h of building and my build was still only about 30% complete.
I'll try to get the debug info you requested. When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. it worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?
comment:14 by , 8 years ago
When I tried to get the debug info previously, copy/paste on my MacOS Sierra host from the debug window did not work. It worked for other apps, thus the png file rather than a text file. I'll give it another shot. Are there any tricks to using copy/paste from the debug window on MacOS Sierra?
Doesn't the right-click menu working on 10.12? Or the ⌘-A, ⌘-C shortcuts? I'm on a 10.9.5 and it works as advertised.
comment:15 by , 8 years ago
Regarding copy/paste in the debug window, it seems to be working now. Not sure why I had issues with it before. I'm in the process of pulling those logs for all three scenarios and will post them as soon as they are available.
by , 8 years ago
Attachment: | multicore-hang-debug-commands.txt added |
---|
Debug command output from multicore guest hang
comment:16 by , 8 years ago
I've attached both the VBox.log and Debug window output from my multicore guest when the system is in the hung state.
So far I've been unable to reproduce with a unicore guest. Is there value in providing the output of those systems?
comment:17 by , 8 years ago
Hmm, guest multicore hang but multicore-hang-debug-commands.txt shows only 1 VCPU?
comment:18 by , 8 years ago
Apologies for going silent on this. I got sidetracked by starting a new job. I am unable to reproduce this in one or two build tries using v5.1.22. I'm not sure if that fixed it or just made it more difficult to reproduce but for the time being I am unblocked.
The absolut minimal required information is the VBox.log file of such a VM session.
Also, is your guest working on shared folders and is your guest configured to use more than 1 VCPU?