Networking Heisenbugs

While debugging an issue with OpenStack and floating ips, I ran into a strange issue where running tcpdump on the bridge interface on the network controller would cause packets to be forwarded successfully to a compute node, but if I stopped running tcpdump than the packets wouldn’t get forwarded.

Somebody on serverfault provided the solution: tcpdump puts the interface into promiscuous mode. And, indeed, if I set the interface into promiscuous mode, the packets got forwarded. This is a classic Heisenbug.

Posted in Uncategorized | Tagged | Leave a comment

XPath and Chrome dev tools

Here’s a simple way to get the XPath of an element on an HTML page in page.

  1. Right-click on the element on the web page, choose “Inspect Element” from the context-menu
  2. Right-click on the highlighted HTML line that appears in the Chrome Developer Tools view at the bottom of the browser window, choose “Copy XPath” from the context menu.

Very handy for use with something like Splinter’s find_by_xpath method.

Posted in Uncategorized | Tagged , | Leave a comment

Fun with Windows TCP/IP debugging

Today I learned… In Windows, if a process listens on a port, spawns a child, then dies, then no other process can listen on that port until all of the children have been terminated. So, if you were running, say, PowerShellServer, and a process inside of an SSH session hangs, then you can’t restart it until you hunt down the process.

Thank you Server Fault for the answer, TCPView for telling me that a zombie was listening on the port, and Process Explorer for identifying the orphaned processes.

Posted in Uncategorized | Tagged | Leave a comment

Test, test

Test, test

Testing out the new Byword functionality of posting to WordPress.

Posted in Uncategorized | Leave a comment

The infamous checksum bug

Another networking issue with an instance. Occasionally, a precise instance would come up but the networking wasn’t working, it wouldn’t get an IP address. Interestingly, this only seemed to happen when it was running on the head node.

(Yes, we sometimes also use our head node as a compute node. It’s a small cluster).

I could see the DHCPDISCOVER and DHCPOFFER packets by tcpdump’ing vnet0, but it never sent out a DHCPREQUEST packet.

To debug this, I had to log into the instance. The problem was, I didn’t know the password for the “ubuntu” user. This was an image that I had downloaded from Canonical.

I needed to alter the password inside of the instance so that I could log into it using VNC. I got the libvirt domain name with a “nova show <instance>” command.

$ nova show myinstance | grep instance_name
| OS-EXT-SRV-ATTR:instance_name | instance-000000e1 |

Then I shut it down:

$ sudo virsh shutdown instance-000000e1

I needed to edit the /etc/shadow file to specify a password for the root account. The problem was that I didn’t know how to generate a password hash in the right format.

It turns out the OpenStack Compute code does this. I whipped up a quick Python script that would output the appropriate hash, and then did this:

$ ./mkpasswd.py mypassword
$5$hla.HR1DOHbjcsPK$FvCd7KYZ0SD.9lpA1Iz5u22DamGbh9YFoCH2u8byr/5

To edit /etc/shadow inside of the guest, I used the virt-edit program from libguestfs:

$ sudo virt-edit -d instance-000000e1 /etc/shadow

I added the hash to the root account. Then, I brought it started it up paused, so I could tcpdump and connect to the vnc console before it started to boot:

$ sudo virsh start  instance-000000e1 --paused
$ sudo tcpdump -i vnet0 port 67 or port 68

Then I resumed it:

$ sudo virsh resume instance-000000e1

Once it booted, I logged in to the root account using VNC. I ran the DHCP client manually in the foreground, to watch what happened:

# dhclient -d eth0

dhclient-d

Ah, the infamous UDP checksum problem!

This fixed it:

iptables -t mangle -A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

I looked at the nova code, and they only enable it when running in multi-host node. So perhaps those checksums get filled by the networking drivers when the UDP packets leave the network host, but if the DHCP server is running on the same machine as the guest, this doesn’t happen (in multi-host, this is always the case).

Posted in openstack | Leave a comment

Writing a cinder_manage Ansible module

I was working on a cinder_manage module for Ansible to invoke the equivalent of “cinder-manage db sync” inside of Python code. This is used to initialize the OpenStack Block Storage database. I had already written one for glance-manage and keystone-manage, but each of the OpenStack projects uses a slightly different internal API for database initialization. This blog post documents my thought process trying to go through the Python code to figure out how it works.

To implement this, I need to invoke the equivalent of “cinder-manage db sync” inside of Python code. I could do this with the shell, but I  prefer to dig into the guts of the cinder internals and do this in pure Python.

(Technically, I do call “cinder-manage db sync” from the shell, but I implement check mode using Python code).

cinder/db/migration.py has db_sync and db_version methods, but they don’t provide a way to pass in the path to the cinder.conf file that has the database connection info, so I can’t call them directly.

They both defer to methods with the same name in cinder/db/sqlalchemy/migration.py

Looking at cinder/db/sqlalchemy/migration.py:db_version, we see this:

repository = _find_migrate_repo()
 try:
    return versioning_api.db_version(get_engine(), repository)

The _find_migration_repo function is looking for the sqlalchemy migration scripts, that’s going to look relative to the current directory of the Python script, no need to mess with that. The connection string is going to be needed by that get_engine() method:from

cinder.db.sqlalchemy.session import get_engine

OK, let’s look at cinder/db/sqlalchemy/session.py:get_engine

def get_engine():
 """Return a SQLAlchemy engine."""
 global _ENGINE
 if _ENGINE is None:
 connection_dict = sqlalchemy.engine.url.make_url(FLAGS.sql_connection)

A-ha, it’s FLAGS.sql_connection. What’s flags?

FLAGS = flags.FLAGS

OK…

import cinder.flags as flags

There we go, it’s cinder.flags.FLAGS

cinder/flags.py:
from cinder.openstack.common import cfg
FLAGS = cfg.CONF

All right, so flags come from cinder/openstack/common/cfg.py

CONF = CommonConfigOpts()

Hmm, CommonConfigOpts doesn’t take any arguments. Let’s look back at cinder/flags.py

def parse_args(argv, default_config_files=None):
 FLAGS.disable_interspersed_args()
 return argv[:1] + FLAGS(argv[1:],
 project='cinder',
 default_config_files=default_config_files)

That’s interesting, it’s actually calling FLAGS and adding to it. That’s what we want. Except we don’t really want to call parse_args, because we don’t have an argv. I think we just want to call FLAGS with our arguments.

But, is default_config_files going to be set for us already? And what’s that first argument? Recall that Flags are of type CommonConfigOpts. Is that callable? Let’s take a look.

Its parent, ConfigOpts, is callable:

def __call__(self, args=None, project=None, prog=None, 
             version=None, usage=None, default_config_files=None)

Let’s see if we can test things out. We want to do something like
CONF.(args=[], project=’cinder’, default_config_files=[‘/etc/cinder/cinder.conf’])

One way to test this is to check if the value changes from a default.

>>> from cinder.flags import FLAGS
>>> FLAGS.verbose
False
>>> FLAGS(args=[], project='cinder', default_config_files=['/etc/cinder/cinder.conf'])
[]
>>> FLAGS.verbose
True

Here’s another test

>>> from cinder.flags import FLAGS
>>> FLAGS.sql_connection
'sqlite:////usr/lib/python2.7/dist-packages/cinder.sqlite'
>>> FLAGS(args=[], project='cinder', default_config_files=['/etc/cinder/cinder.conf'])
[]
>>> FLAGS.sql_connection
'sqlite:////var/lib/cinder/cinder.sqlite'

Yup, working.

OK, so we should be able to write a method in cinder_manage to load the config file

def load_config_file(conf):
 flags.FLAGS(args=[], project='cinder',
 default_config_files=['/etc/cinder/cinder.conf'])

Now we need to figure out the current version and the repo version. Current version is easy:

from cinder.db import migration
current_version = migration.db_version()

How about the repo version? Let’s look back at how it was done in cinder code.

The db_sync method in cinder/db/sqlalchemy/migration.py isn’t too helpful here:

def db_sync(version=None):
 if version is not None:
 try:
 version = int(version)
 except ValueError:
 raise exception.Error(_("version should be an integer"))
current_version = db_version()
 repository = _find_migrate_repo()
 if version is None or version > current_version:
   return versioning_api.upgrade(get_engine(), repository, version)
 else:
   return versioning_api.downgrade(get_engine(), repository,
 version)

It tells us how to find the sqlalchemy repository:

repository = _find_migrate_repo()

But it doesn’t actually retrieve the repo version.

In keystone_manage, we did this:

  repo_path = migration._find_migrate_repo() repo_version = versioning_api.repository.Repository(repo_path).latest

Will that still work? Let’s check on the command-line. We’ll need to do this:

import cinder.db.sqlalchemy.migration
repo_path = cinder.db.sqlalchemy.migration._find_migrate_repo()

It turns out that this returns a repository, not a path

In [10]: cinder.db.sqlalchemy.migration._find_migrate_repo()
Out[10]: <migrate.versioning.repository.Repository at 0x37fd250>

We need to change code a little, we can just do this:

from cinder.db import migration
repository = migration._find_migrate_repo()
repo_version = repository.latest

Done!

Of course, this uses an internal API, which means its likely to change in the next release, but we can just update the ansible module when that happens.

Posted in openstack | 2 Comments

Partitioning is hard to do

Configuring Ubuntu preseed files for automatically partitioning is… non-trivial. Especially when you want to boot off  of a large disk. Here’s a gist for those interested. I imagine some lines here are superfluous, but this works.

Posted in sysadmin | Leave a comment