Another networking issue with an instance. Occasionally, a precise instance would come up but the networking wasn’t working, it wouldn’t get an IP address. Interestingly, this only seemed to happen when it was running on the head node.
(Yes, we sometimes also use our head node as a compute node. It’s a small cluster).
I could see the DHCPDISCOVER and DHCPOFFER packets by tcpdump’ing vnet0, but it never sent out a DHCPREQUEST packet.
To debug this, I had to log into the instance. The problem was, I didn’t know the password for the “ubuntu” user. This was an image that I had downloaded from Canonical.
I needed to alter the password inside of the instance so that I could log into it using VNC. I got the libvirt domain name with a “nova show <instance>” command.
$ nova show myinstance | grep instance_name | OS-EXT-SRV-ATTR:instance_name | instance-000000e1 |
Then I shut it down:
$ sudo virsh shutdown instance-000000e1
I needed to edit the /etc/shadow file to specify a password for the root account. The problem was that I didn’t know how to generate a password hash in the right format.
It turns out the OpenStack Compute code does this. I whipped up a quick Python script that would output the appropriate hash, and then did this:
$ ./mkpasswd.py mypassword $5$hla.HR1DOHbjcsPK$FvCd7KYZ0SD.9lpA1Iz5u22DamGbh9YFoCH2u8byr/5
To edit /etc/shadow inside of the guest, I used the virt-edit program from libguestfs:
$ sudo virt-edit -d instance-000000e1 /etc/shadow
I added the hash to the root account. Then, I brought it started it up paused, so I could tcpdump and connect to the vnc console before it started to boot:
$ sudo virsh start instance-000000e1 --paused $ sudo tcpdump -i vnet0 port 67 or port 68
Then I resumed it:
$ sudo virsh resume instance-000000e1
Once it booted, I logged in to the root account using VNC. I ran the DHCP client manually in the foreground, to watch what happened:
# dhclient -d eth0
Ah, the infamous UDP checksum problem!
This fixed it:
iptables -t mangle -A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
I looked at the nova code, and they only enable it when running in multi-host node. So perhaps those checksums get filled by the networking drivers when the UDP packets leave the network host, but if the DHCP server is running on the same machine as the guest, this doesn’t happen (in multi-host, this is always the case).