Start stopped instances that all have the same name

It’s worth learning the AWS CLI and jq in order to do one-off batch operations to EC2 instances. I needed to start a group of stopped instances that all had the same name. Here’s the one-liner:

aws ec2 describe-instances --filters \
Name=tag:Name,Values="name goes here" | \
jq ".Reservations[].Instances[].InstanceId" -r | \
xargs aws ec2 start-instances --instance-ids
Advertisement
Posted in sysadmin | Tagged | Leave a comment

mitmproxy and ec2-api-tools

Here’s how you can mitmproxy on OS X to see which URLs the ec2-api-tools are querying against.
1. Install mitmproxy:

    sudo pip install mitmproxy
  1. Start it up:

    mitmproxy -p 8080
    
  2. Configure the Java keystore to trust the mitmproxy CA certificate:

    sudo keytool -importcert -alias mitmproxy -storepass "changeit" \
    -keystore /System/Library/Java/Support/CoreDeploy.bundle/Contents/Home/lib/security/cacerts \
    -trustcacerts -file ~/.mitmproxy/mitmproxy-ca-cert.pem
    

(Type yes when asked to trust the certificate)

  1. Configure the EC2 tools to use the mitm proxy:

    export EC2_JVM_ARGS="-DproxySet=true -DproxyHost=127.0.0.1 -DproxyPort=8080 -Dhttps.proxySet=true -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=8080"
    
  2. Run an ec2-api command, e.g.:

    ec2-describe-instances
    

Don’t forget to delete the mitmproxy CA cert when you’re done:

    sudo keytool -delete -alias mitmproxy -storepass "changeit" \
    -keystore /System/Library/Java/Support/CoreDeploy.bundle/Contents/Home/lib/security/cacerts
Posted in Uncategorized | Tagged | Leave a comment

mod_auth_openid on OSX

I wanted to play with mod_auth_openid on my Macbook Pro. OS X ships with Apache installed, so all I needed to do was build the module and edit the Apache configuration.

I wasn’t able to build mod_auth_openid from the git repository because of issues with autotools on OSX, but I was able to build from the latest release tarball (in this case, mod_auth_openid-0.7.tar.gz).

mod_auth_openid needs libopkele (a C++ OpenID library), which can be installed via Homebrew:

brew install libopkele

My initial attempt to build mod_auth_openid failed with:

/usr/share/apr-1/build-1/libtool: line 4575: /Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain/usr/bin/cc: No such file or directory

For some reason, there’s a reference in an Apache config file to a non-existent path for the C compiler. I tried to edit the file in question, but that failed to resolve the issue. In the end, I just added a symlink:

cd /Applications/Xcode.app/Contents/Developer/Toolchains
sudo ln -s XcodeDefault.xctoolchain OSX10.8.xctoolchain

Then it was just configure, make, sudo make install.

I then created a /Library/WebServer/Documents/protected directory with an index.html file inside and configured Apache to only allow access via OpenID by adding the following to /private/etc/apache2/httpd.conf:

<Directory "/Library/WebServer/Documents/protected">
    AuthType OpenID
    require valid-user
</Directory>

And I restarted Apache via launchctl:

sudo launchctl unload -w  /System/Library/LaunchDaemons/org.apache.httpd.plist
sudo launchctl load -w  /System/Library/LaunchDaemons/org.apache.httpd.plist

Voila! An OpenID consumer on my laptop.

Posted in Uncategorized | Tagged | Leave a comment

Networking Heisenbugs

While debugging an issue with OpenStack and floating ips, I ran into a strange issue where running tcpdump on the bridge interface on the network controller would cause packets to be forwarded successfully to a compute node, but if I stopped running tcpdump than the packets wouldn’t get forwarded.

Somebody on serverfault provided the solution: tcpdump puts the interface into promiscuous mode. And, indeed, if I set the interface into promiscuous mode, the packets got forwarded. This is a classic Heisenbug.

Posted in Uncategorized | Tagged | Leave a comment

XPath and Chrome dev tools

Here’s a simple way to get the XPath of an element on an HTML page in page.

  1. Right-click on the element on the web page, choose “Inspect Element” from the context-menu
  2. Right-click on the highlighted HTML line that appears in the Chrome Developer Tools view at the bottom of the browser window, choose “Copy XPath” from the context menu.

Very handy for use with something like Splinter’s find_by_xpath method.

Posted in Uncategorized | Tagged , | Leave a comment

Fun with Windows TCP/IP debugging

Today I learned… In Windows, if a process listens on a port, spawns a child, then dies, then no other process can listen on that port until all of the children have been terminated. So, if you were running, say, PowerShellServer, and a process inside of an SSH session hangs, then you can’t restart it until you hunt down the process.

Thank you Server Fault for the answer, TCPView for telling me that a zombie was listening on the port, and Process Explorer for identifying the orphaned processes.

Posted in Uncategorized | Tagged | Leave a comment

Test, test

Test, test

Testing out the new Byword functionality of posting to WordPress.

Posted in Uncategorized | Leave a comment

The infamous checksum bug

Another networking issue with an instance. Occasionally, a precise instance would come up but the networking wasn’t working, it wouldn’t get an IP address. Interestingly, this only seemed to happen when it was running on the head node.

(Yes, we sometimes also use our head node as a compute node. It’s a small cluster).

I could see the DHCPDISCOVER and DHCPOFFER packets by tcpdump’ing vnet0, but it never sent out a DHCPREQUEST packet.

To debug this, I had to log into the instance. The problem was, I didn’t know the password for the “ubuntu” user. This was an image that I had downloaded from Canonical.

I needed to alter the password inside of the instance so that I could log into it using VNC. I got the libvirt domain name with a “nova show <instance>” command.

$ nova show myinstance | grep instance_name
| OS-EXT-SRV-ATTR:instance_name | instance-000000e1 |

Then I shut it down:

$ sudo virsh shutdown instance-000000e1

I needed to edit the /etc/shadow file to specify a password for the root account. The problem was that I didn’t know how to generate a password hash in the right format.

It turns out the OpenStack Compute code does this. I whipped up a quick Python script that would output the appropriate hash, and then did this:

$ ./mkpasswd.py mypassword
$5$hla.HR1DOHbjcsPK$FvCd7KYZ0SD.9lpA1Iz5u22DamGbh9YFoCH2u8byr/5

To edit /etc/shadow inside of the guest, I used the virt-edit program from libguestfs:

$ sudo virt-edit -d instance-000000e1 /etc/shadow

I added the hash to the root account. Then, I brought it started it up paused, so I could tcpdump and connect to the vnc console before it started to boot:

$ sudo virsh start  instance-000000e1 --paused
$ sudo tcpdump -i vnet0 port 67 or port 68

Then I resumed it:

$ sudo virsh resume instance-000000e1

Once it booted, I logged in to the root account using VNC. I ran the DHCP client manually in the foreground, to watch what happened:

# dhclient -d eth0

dhclient-d

Ah, the infamous UDP checksum problem!

This fixed it:

iptables -t mangle -A POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

I looked at the nova code, and they only enable it when running in multi-host node. So perhaps those checksums get filled by the networking drivers when the UDP packets leave the network host, but if the DHCP server is running on the same machine as the guest, this doesn’t happen (in multi-host, this is always the case).

Posted in openstack | Leave a comment

Writing a cinder_manage Ansible module

I was working on a cinder_manage module for Ansible to invoke the equivalent of “cinder-manage db sync” inside of Python code. This is used to initialize the OpenStack Block Storage database. I had already written one for glance-manage and keystone-manage, but each of the OpenStack projects uses a slightly different internal API for database initialization. This blog post documents my thought process trying to go through the Python code to figure out how it works.

To implement this, I need to invoke the equivalent of “cinder-manage db sync” inside of Python code. I could do this with the shell, but I  prefer to dig into the guts of the cinder internals and do this in pure Python.

(Technically, I do call “cinder-manage db sync” from the shell, but I implement check mode using Python code).

cinder/db/migration.py has db_sync and db_version methods, but they don’t provide a way to pass in the path to the cinder.conf file that has the database connection info, so I can’t call them directly.

They both defer to methods with the same name in cinder/db/sqlalchemy/migration.py

Looking at cinder/db/sqlalchemy/migration.py:db_version, we see this:

repository = _find_migrate_repo()
 try:
    return versioning_api.db_version(get_engine(), repository)

The _find_migration_repo function is looking for the sqlalchemy migration scripts, that’s going to look relative to the current directory of the Python script, no need to mess with that. The connection string is going to be needed by that get_engine() method:from

cinder.db.sqlalchemy.session import get_engine

OK, let’s look at cinder/db/sqlalchemy/session.py:get_engine

def get_engine():
 """Return a SQLAlchemy engine."""
 global _ENGINE
 if _ENGINE is None:
 connection_dict = sqlalchemy.engine.url.make_url(FLAGS.sql_connection)

A-ha, it’s FLAGS.sql_connection. What’s flags?

FLAGS = flags.FLAGS

OK…

import cinder.flags as flags

There we go, it’s cinder.flags.FLAGS

cinder/flags.py:
from cinder.openstack.common import cfg
FLAGS = cfg.CONF

All right, so flags come from cinder/openstack/common/cfg.py

CONF = CommonConfigOpts()

Hmm, CommonConfigOpts doesn’t take any arguments. Let’s look back at cinder/flags.py

def parse_args(argv, default_config_files=None):
 FLAGS.disable_interspersed_args()
 return argv[:1] + FLAGS(argv[1:],
 project='cinder',
 default_config_files=default_config_files)

That’s interesting, it’s actually calling FLAGS and adding to it. That’s what we want. Except we don’t really want to call parse_args, because we don’t have an argv. I think we just want to call FLAGS with our arguments.

But, is default_config_files going to be set for us already? And what’s that first argument? Recall that Flags are of type CommonConfigOpts. Is that callable? Let’s take a look.

Its parent, ConfigOpts, is callable:

def __call__(self, args=None, project=None, prog=None, 
             version=None, usage=None, default_config_files=None)

Let’s see if we can test things out. We want to do something like
CONF.(args=[], project=’cinder’, default_config_files=[‘/etc/cinder/cinder.conf’])

One way to test this is to check if the value changes from a default.

>>> from cinder.flags import FLAGS
>>> FLAGS.verbose
False
>>> FLAGS(args=[], project='cinder', default_config_files=['/etc/cinder/cinder.conf'])
[]
>>> FLAGS.verbose
True

Here’s another test

>>> from cinder.flags import FLAGS
>>> FLAGS.sql_connection
'sqlite:////usr/lib/python2.7/dist-packages/cinder.sqlite'
>>> FLAGS(args=[], project='cinder', default_config_files=['/etc/cinder/cinder.conf'])
[]
>>> FLAGS.sql_connection
'sqlite:////var/lib/cinder/cinder.sqlite'

Yup, working.

OK, so we should be able to write a method in cinder_manage to load the config file

def load_config_file(conf):
 flags.FLAGS(args=[], project='cinder',
 default_config_files=['/etc/cinder/cinder.conf'])

Now we need to figure out the current version and the repo version. Current version is easy:

from cinder.db import migration
current_version = migration.db_version()

How about the repo version? Let’s look back at how it was done in cinder code.

The db_sync method in cinder/db/sqlalchemy/migration.py isn’t too helpful here:

def db_sync(version=None):
 if version is not None:
 try:
 version = int(version)
 except ValueError:
 raise exception.Error(_("version should be an integer"))
current_version = db_version()
 repository = _find_migrate_repo()
 if version is None or version > current_version:
   return versioning_api.upgrade(get_engine(), repository, version)
 else:
   return versioning_api.downgrade(get_engine(), repository,
 version)

It tells us how to find the sqlalchemy repository:

repository = _find_migrate_repo()

But it doesn’t actually retrieve the repo version.

In keystone_manage, we did this:

  repo_path = migration._find_migrate_repo() repo_version = versioning_api.repository.Repository(repo_path).latest

Will that still work? Let’s check on the command-line. We’ll need to do this:

import cinder.db.sqlalchemy.migration
repo_path = cinder.db.sqlalchemy.migration._find_migrate_repo()

It turns out that this returns a repository, not a path

In [10]: cinder.db.sqlalchemy.migration._find_migrate_repo()
Out[10]: <migrate.versioning.repository.Repository at 0x37fd250>

We need to change code a little, we can just do this:

from cinder.db import migration
repository = migration._find_migrate_repo()
repo_version = repository.latest

Done!

Of course, this uses an internal API, which means its likely to change in the next release, but we can just update the ansible module when that happens.

Posted in openstack | 2 Comments

Partitioning is hard to do

Configuring Ubuntu preseed files for automatically partitioning is… non-trivial. Especially when you want to boot off  of a large disk. Here’s a gist for those interested. I imagine some lines here are superfluous, but this works.


# Use LVM for partitioning
d-i partman-auto/method string lvm
# If one of the disks that are going to be automatically partitioned
# contains an old LVM configuration, the user will normally receive a
# warning. Preseed this away
d-i partman-lvm/device_remove_lvm boolean true
# And the same goes for the confirmation to write the lvm partitions.
d-i partman-lvm/confirm boolean true
# Really, please don't prompt me!
d-i partman-lvm/confirm_nooverwrite boolean true
# partitioning
# Physical partitions:
# 1. BIOS boot partition: 1 MB See https://wiki.archlinux.org/index.php/GRUB2#GUID_Partition_Table_.28GPT.29_specific_instructions
# 2. Boot partition: 250 MB
# 2. LVM, with the following logical volumes
# – Root partition: 250 GB (256000 MB), ext4.
# – Swap: 100% of RAM
# – Data partition: remaining space, XFS
d-i partman-auto/expert_recipe string \
boot-root :: \
1 1 1 free method{ biosgrub } . \
250 250 250 ext2 \
$primary{ } $bootable{ } \
method{ format } format{ } \
use_filesystem{ } filesystem{ ext2 } \
mountpoint{ /boot } \
. \
100% 2048 100% linux-swap \
lv_name{ swap } \
method{ swap } format{ } \
$lvmok{ } \
. \
256000 256000 256000 ext4 \
lv_name{ root } \
method{ lvm } format{ } \
use_filesystem{ } filesystem{ ext4 } \
mountpoint{ / } \
$lvmok{ } \
. \
1024 1024 -1 xfs \
lv_name{ data } \
method{ lvm } format{ } \
use_filesystem{ } filesystem{ xfs } \
mountpoint{ /data } \
$lvmok{ } \
.
# This makes partman automatically partition without confirmation, provided
# that you told it what to do using one of the methods above.
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true

Posted in sysadmin | Leave a comment