Opened 9 years ago
Closed 9 years ago
#15412 closed defect (fixed)
Virtualbox 5.0.20 Breaks SSH after VM was Saved and Re-Started if NIC is NAT => Fixed in SVN
Reported by: | Hagen Kuehn | Owned by: | |
---|---|---|---|
Component: | network/NAT | Version: | VirtualBox 5.0.20 |
Keywords: | SSH savestate | Cc: | |
Guest type: | Linux | Host type: | Linux |
Description
Affected: Virtualbox 5.0.20
Problem: It's not possible to SSH to the guest VM once the VM was saved and re-started from saved state, if the NIC is NAT.
This problem was initially observed while using Vagrant and was reported with https://github.com/mitchellh/vagrant/issues/7306 .
How to reproduce:
Create a vanilla Ubuntu 16.04 Server VM.
- Downloaded Ubuntu 16.04 Server from http://www.ubuntu.com/download/server
- Install Ubuntu 16.04 Server as new VM
- Configure VM to have one NIC as NAT
- Only install SSH Server
- Starte VM
- Successfully establishe SSH connection
$ ssh -p 2222 ubuntu@localhost
- Save the state of the VM by using Virtualbox
$ VBoxManage controlvm Ubuntu-Server-16.04 savestate
- Start the VM from the saved state by using Virtualbox
$ VBoxManage startvm Ubuntu-Server-16.04
- Observe Failure: SSH to the VM but fails with error: ssh: connect to host localhost port 2222: Connection refused
$ ssh -p 2222 ubuntu@localhost
Attachments (5)
Change History (22)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Please, provide a VBox.log files from VM started anew and from VM resumed from a saved state.
Does the guest use static IP address?
Do you use wildcard destination IP in your port-forwarding rule?
by , 9 years ago
Attachment: | VBox-Started-From-Saved-State.log added |
---|
Log when VM was started from saved state
comment:3 by , 9 years ago
@frank The workaround is to revert back to Virtualbox 5.0.18 where it works. Please see https://github.com/mitchellh/vagrant/issues/7306 for more details.
@vushakov
- Attached as VBox-Started-As-New.log and VBox-Started-From-Saved-State.log
- Guest VM has no static IP address
- Below is a copy of the only port-forwarding rule in place.
Name Protocol Host IP Host Port Guest IP Guest Port Rule 1 TCP 127.0.0.1 2222 22
by , 9 years ago
Attachment: | port-forwarding-configuration.png added |
---|
Shows port forward configuration
by , 9 years ago
Attachment: | GuestVMConsole-Pressing-Enter-Makes-SSH-Work.png added |
---|
SSH works after enter is pressed once on the UI
comment:4 by , 9 years ago
- Since the forwarding rule outlined above is not easily readable, I have attached a screen shot.
- Also attached an OVA export of the sample VM. The username and password are "ubuntu".
Interesting finding: While collecting the logs for you I observed something interesting.
I started the VM with "Normal Start"/ in foreground modus and reproduced the error as usual. However I then accidentally pressed "enter" in the VM guest console/ UI and then thought to try the SSH connection command again. Subrisingly the SSH command worked!
Below is a sequence of suggested steps when trying to reproduced.
- Import the provided OVA
- Start the VM
- From the Virtualbox host run the command "ssh -p 2222 ubuntu@localhost" and enter the password "ubuntu". Observe that this works correctly!
- Save the state of the VM
- Start the VM from the saved state
- From the Virtualbox host run the command "ssh -p 2222 ubuntu@localhost". Observe that you encounter the error "ssh_exchange_identification: Connection closed by remote host".
- Press "enter" on the VM guest UI/ console once
- From the Virtualbox host run the command "ssh -p 2222 ubuntu@localhost" and enter the password "ubuntu". Observe that this works correctly!
comment:5 by , 9 years ago
The OVA upload failed, therefore please use the below Dropbox link to download the OVA. https://www.dropbox.com/s/mx4jgw4xxxd7h58/Ubuntu-Server-16.04.ova?dl=0
comment:6 by , 9 years ago
In 5.0.20 the handling of port-forwarding was changed to fix several long-standing bugs (e.g. #13570). That change affects port-forwarding rules with wildcard guest address. NAT needs some sign of life from the guest to guess its address. DHCP or gratuitous ARP. E.g. in the cold boot log you can see
NAT: Guest address guess set to 10.0.2.15 by DHCP ACK
Pressing <Enter> was probably just a coincidence. If you check the log of the resumed VM (not the one you attached, but after ssh works), you will see a similar log line. Since NAT flaps the ethernet link after resume, a DHCP guest will reacquire its DHCP lease and that will tell NAT the IP address. (Note, that we can't use just any packet to infer guest's IP, since the guest may be a router for other VMs, etc, etc).
comment:7 by , 9 years ago
@vushakov Concerning the <Enter>, perhaps it was a coincidence, though I have tested a couple of times with the attached VM. However testing this with my self baked Vagrant boxes, I do not observe the that <Enter> causes the VM to be discoverable.
If I understand you correctly, with Virtualbox 5.0.20 you are clearing the NAT cache and thus Virtualbox needs to receive some sign of life in the shape of a gratuitous ARP or similar request to infer the IP address.
I have now tested Virtualbox 5.0.20 with Ubuntu 14.04 VMs and it works, though it takes almost a minute before SSH becomes available. As far as I know the Ubuntu 14.04 VM I have tested with is vanila too. This could be a reason why not many people started to report it yet.
Based on your feedback I could establish a "workaround" for the Ubuntu 16.04 VM, by configuring a script that fires an ARP request (i.e. arping -c 1 -A eth0 10.0.2.15) every 5 seconds. With this it's possible to establish a SSH connection to the VM after about 40 seconds. Since those VMs on Virtualbox are only for development and CI purposes I don't mind it too much but find this approach nonetheless far from desirable.
Is there a different approach so you can fix your long standing issues but don't break it for Ubuntu 16.04?
Furthermore, as far as I know, correct me if I am wrong, the "Saved State" and the subsequent start of the VM is completely transparent to the VM itself? If this is not the case, I wonder if it would be possible to capture that and run the ARP request only once after the VM was started from the "Saved State"?
comment:8 by , 9 years ago
The easiest workaround is to not use wildcard (empty or 0.0.0.0
) guest address in the forwarding rules.
As I said earlier, after restoring a VM, we disconnect its ethernet cable for 5 seconds. Normally, DHCP clients reacquire their DHCP lease after that.
comment:9 by , 9 years ago
Yes it works when the guest IP address is specified in the Forwarding Rule. However when using DHCP I cannot reliably know the IP address prior to starting the VM. Therefore I don't think it's an option.
As you said that you are disconnecting the Ethernet cable for 5 seconds after restoring the VM, which should trigger DHClient to new DHCP lease. In the light of this I now start to wonder that Ubuntu's DHClient has changed too and thus this problem is observed.
As of the this Ubuntu DHClient bug I have the feeling there were some changes made too. https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1551351
comment:10 by , 9 years ago
I have now filed an Ubuntu bug concerning the isc-dhclient-4.3.3 package.
Bug: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1582163
comment:11 by , 9 years ago
I guess the link flap might be not long enough as seen in guest's time.
I think the path of least resistance here is to initialize the guess to the default (.15
) guest address, since that's what DHCP will hand out by default anyway. If that guess is right (which it is most of the time), port-forwaring will work right away. If it's wrong, we are not worse off, since it wouldn't have worked anyway and will work as soon as there's a good packet to fix the guess.
comment:12 by , 9 years ago
Yes, your suggestion concerning the default guess sounds good to me. It would decrease the chance of encountering a connection issue even for many other potential VM guest operating systems too.
I have the feeling a longer cable disconnect will not make it work either, at least in conjunction with Ubuntu 16.04. I had run a couple of manual test with Ubuntu 16.04 guest OS and had the virtual cable disconnected for about a minute or two but the subsequent reconnect of the cable did not trigger Ubuntu 16.04 to request a new DHCP lease. In the light of this I now believe that it is a Ubuntu 16.04 DHClient bug.
From what I have established so far, this problem encountered here is a result of two independent changes made in Virtualbox and Ubuntu (i.e. Virtualbox 5.0.20 and Ubuntu 16.04 DHClient).
- Virtualbox 5.0.20 should function in conjunction with Ubuntu 16.04 guest OS, if Ubuntu 16.04 would fire a DHCP lease request after the cable was disconnected and reconnected. This will hopefully be addressed with https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1582163
- Virtualbox 5.0.20 could be enhanced by performing an initial guest IP address guess (.15) as suggested by @vushakov
comment:15 by , 9 years ago
I have installed the Ubuntu 16.04 test build and tested in conjunction with Vagrant.
It worked like a charm!
Log says: "NAT: Guest address guess set to 10.0.2.15 by initialization"
I have also attached the "VBox.log" as "5.0.21-VBox.log" to this ticket.
Excellent work!
by , 9 years ago
Attachment: | 5.0.21-VBox.log added |
---|
VBox.log when tested with dev build Virtualbox 5.0.21
comment:16 by , 9 years ago
Summary: | Virtualbox 5.0.20 Breaks SSH after VM was Saved and Re-Started if NIC is NAT → Virtualbox 5.0.20 Breaks SSH after VM was Saved and Re-Started if NIC is NAT => Fixed in SVN |
---|
Reading the summary it sounds like this was broken with VBox 5.0.20. Did this ever work? If so, which was the last version when it worked for you?