와우.. awesome!!!
This article describes the internals of launching an instance in OpenStack Nova.
Overview
Launching a new instance involves multiple components inside OpenStack Nova:
- API server: handles requests from the user and relays them to the cloud controller.
- Cloud controller: handles the communication between the compute nodes, the networking controllers, the API server and the scheduler.
- Scheduler: selects a host to run a command.
- Compute worker: manages computing instances: launch/terminate instance, attach/detach volumes…
- Network controller: manages networking resources: allocate fixed IP addresses, configure VLANs…
Note: There are more components in Nova like the authentication manager, the object store and the volume controller but we are not going to study them as we are focusing on instance launching in this article.
The flow of launching an instance goes like this: The API server receives a run_instances command from the user. The API server relays the message to the cloud controller (1). Authentication is performed to make sure this user has the required permissions. The cloud controller sends the message to the scheduler (2). The scheduler casts the message to a random host and asks him to start a new instance (3). The compute worker on the host grabs the message (4). The compute worker needs a fixed IP to launch a new instance so it sends a message to the network controller (5,6,7,8). The compute worker continues with spawning a new instance. We are going to see all those steps in details next.
API
You can use the OpenStack API or EC2 API to launch a new instance. We are going to use the EC2 API. We add a new key pair and we use it to launch an instance of type m1.tiny.
2 | euca-add-keypair test > test .pem |
3 | euca-run-instances -k test -t m1.tiny ami-tiny |
run_instances() in api/ec2/cloud.py is called which results in compute API create() in compute/API.py being called.
1 | def run_instances( self , context, * * kwargs): |
3 | instances = self .compute_api.create(context, |
4 | instance_type = instance_types.get_by_type( |
5 | kwargs.get( 'instance_type' , None )), |
6 | image_id = kwargs[ 'image_id' ], |
Compute API create() does the following:
- Check if the maximum number of instances of this type has been reached.
- Create a security group if it doesn’t exist.
- Generate MAC addresses and hostnames for the new instances.
- Send a message to the scheduler to run the instances.
Cast
Let’s pause for a minute and look at how the message is sent to the scheduler. This type of message delivery in OpenStack is defined as RPC casting. RabbitMQ is used here for delivery. The publisher (API) sends the message to a topic exchange (scheduler topic). A consumer (Scheduler worker) retrieves the message from the queue. No response is expected as it is a cast and not a call. We will see call later.
Here is the code casting that message:
1 | LOG.debug(_( "Casting to scheduler for %(pid)s/%(uid)s's" |
2 | " instance %(instance_id)s" ) % locals ()) |
5 | { "method" : "run_instance" , |
6 | "args" : { "topic" : FLAGS.compute_topic, |
7 | "instance_id" : instance_id, |
8 | "availability_zone" : availability_zone}}) |
You can see that the scheduler topic is used and the message arguments indicates what we want the scheduler to use for its delivery. In this case, we want the scheduler to send the message using the compute topic.
Scheduler
The scheduler receives the message and sends the run_instance message to a random host. The chance scheduler is used here. There are more scheduler types like the zone scheduler (pick a random host which is up in a specific availability zone) or the simple scheduler (pick the least loaded host). Now that a host has been selected, the following code is executed to send the message to a compute worker on the host.
2 | db.queue_get_for(context, topic, host), |
5 | LOG.debug(_( "Casting to %(topic)s %(host)s for %(method)s" ) % locals ()) |
Compute
The Compute worker receives the message and the following method in compute/manager.py is called:
1 | def run_instance( self , context, instance_id, * * _kwargs): |
run_instance() does the following:
- Check if the instance is already running.
- Allocate a fixed IP address.
- Setup a VLAN and a bridge if not already setup.
- Spawn the instance using the virtualization driver.
Call to network controller
A RPC call is used to allocate a fixed IP. A RPC call is different than a RPC cast because it uses a topic.host exchange meaning that a specific host is targeted. A response is also expected.
Spawn instance
Next is the instance spawning process performed by the virtualization driver. libvirt is used in our case. The code we are going to look at is located in virt/libvirt_conn.py.
First thing that needs to be done is the creation of the libvirt xml to launch the instance. The to_xml() method is used to retrieve the xml content. Following is the XML for our instance.
02 | < name >instance-00000001</ name > |
03 | < memory >524288</ memory > |
06 | < kernel >/opt/novascript/trunk/nova/..//instances/instance-00000001/kernel</ kernel > |
07 | < cmdline >root=/dev/vda console=ttyS0</ cmdline > |
08 | < initrd >/opt/novascript/trunk/nova/..//instances/instance-00000001/ramdisk</ initrd > |
16 | < driver type = 'qcow2' /> |
17 | < source file = '/opt/novascript/trunk/nova/..//instances/instance-00000001/disk' /> |
18 | < target dev = 'vda' bus = 'virtio' /> |
20 | < interface type = 'bridge' > |
21 | < source bridge = 'br100' /> |
22 | < mac address = '02:16:3e:17:35:39' /> |
24 | < filterref filter = "nova-instance-instance-00000001" > |
25 | < parameter name = "IP" value = "10.0.0.3" /> |
26 | < parameter name = "DHCPSERVER" value = "10.0.0.1" /> |
27 | < parameter name = "RASERVER" value = "fe80::1031:39ff:fe04:58f5/64" /> |
28 | < parameter name = "PROJNET" value = "10.0.0.0" /> |
29 | < parameter name = "PROJMASK" value = "255.255.255.224" /> |
30 | < parameter name = "PROJNETV6" value = "fd00::" /> |
31 | < parameter name = "PROJMASKV6" value = "64" /> |
37 | < source path = '/opt/novascript/trunk/nova/..//instances/instance-00000001/console.log' /> |
41 | < console type = 'pty' tty = '/dev/pts/2' > |
42 | < source path = '/dev/pts/2' /> |
47 | < source path = '/dev/pts/2' /> |
The hypervisor used is qemu. The memory allocated for the guest will be 524 kbytes. The guest OS will boot from a kernel and initrd stored on the host OS.
Number of virtual CPUs allocated for the guest OS is 1. ACPI is enabled for power management.
Multiple devices are defined:
- The disk image is a file on the host OS using the driver qcow2. qcow2 is a qemu disk image copy-on-write format.
- The network interface is a bridge visible to the guest. We define network filtering parameters like IP which means this interface will always use 10.0.0.3 as the source IP address.
- Device logfile. All data sent to the character device is written to console.log.
- Pseudo TTY: virsh console can be used to connect to the serial port locally.
Next is the preparation of the network filtering. The firewall driver used by default is iptables. The rules are defined in apply_ruleset() in the class IptablesFirewallDriver. Let’s take a look at the firewall chains and rules for this instance.
03 | :nova-ipv4-fallback - [0:0] |
07 | -A nova-ipv4-fallback -j DROP |
08 | -A FORWARD -j nova- local |
09 | -A nova- local -d 10.0.0.3 -j nova-inst-1 |
10 | -A nova-inst-1 -m state --state INVALID -j DROP |
11 | -A nova-inst-1 -m state --state ESTABLISHED,RELATED -j ACCEPT |
12 | -A nova-inst-1 -j nova-sg-1 |
13 | -A nova-inst-1 -s 10.1.3.254 -p udp --sport 67 --dport 68 |
14 | -A nova-inst-1 -j nova-ipv4-fallback |
15 | -A nova-sg-1 -p tcp -s 10.0.0.0/27 -m multiport --dports 1:65535 -j ACCEPT |
16 | -A nova-sg-1 -p udp -s 10.0.0.0/27 -m multiport --dports 1:65535 -j ACCEPT |
17 | -A nova-sg-1 -p icmp -s 10.0.0.0/27 -m icmp --icmp- type 1/65535 -j ACCEPT |
First you have the chains: nova-local, nova-inst-1, nova-sg-1, nova-ipv4-fallback and then the rules.
Let’s look at the different chains and rules:
Packets routed through the virtual network are handled by the chain nova-local.
1 | -A FORWARD -j nova- local |
If the destination is 10.0.0.3 then it is for our instance so we jump to the chain nova-inst-1.
1 | -A nova- local -d 10.0.0.3 -j nova-inst-1 |
If the packet could not be identified, drop it.
1 | -A nova-inst-1 -m state --state INVALID -j DROP |
If the packet is associated with an established connection or is starting a new connection but associated with an existing connection, accept it.
1 | -A nova-inst-1 -m state --state ESTABLISHED,RELATED -j ACCEPT |
Allow DHCP responses.
1 | -A nova-inst-1 -s 10.0.0.254 -p udp --sport 67 --dport 68 |
Jump to the security group chain to check the packet against its rules.
1 | -A nova-inst-1 -j nova-sg-1 |
Security group chain. Accept all TCP packets from 10.0.0.0/27 and ports 1 to 65535.
1 | -A nova-sg-1 -p tcp -s 10.0.0.0/27 -m multiport --dports 1:65535 -j ACCEPT |
Accept all UDP packets from 10.0.0.0/27 and ports 1 to 65535.
1 | -A nova-sg-1 -p udp -s 10.0.0.0/27 -m multiport --dports 1:65535 -j ACCEPT |
Accept all ICMP packets from 10.0.0.0/27 and ports 1 to 65535.
1 | -A nova-sg-1 -p icmp -s 10.0.0.0/27 -m icmp --icmp- type 1/65535 -j ACCEPT |
Jump to fallback chain.
1 | -A nova-inst-1 -j nova-ipv4-fallback |
This is the fallback chain’s rule where we drop the packet.
1 | -A nova-ipv4-fallback -j DROP |
Here is an example of a packet for a new TCP connection to 10.0.0.3:
Following the firewall rules preparation is the image creation. This happens in _create_image().
1 | def _create_image( self , inst, libvirt_xml, suffix = '', disk_images = None ): |
In this method, libvirt.xml is created based on the XML we generated above.
A copy of the ramdisk, initrd and disk images are made for the hypervisor to use.
If the flat network manager is used then a network configuration is injected into the guest OS image. We are using the VLAN manager in this example.
The instance’s SSH key is injected into the image. Let’s look at this part in more details. The disk inject_data() method is called.
1 | disk.inject_data(basepath( 'disk' ), key, net, |
2 | partition = target_partition, |
3 | nbd = FLAGS.use_cow_images) |
basepath(‘disk’) is where the instance’s disk image is located on the host OS. key is the SSH key string. net is not set in our case because we don’t inject a networking configuration. partition is None because we are using a kernel image otherwise we could use a partitioned disk image. Let’s look inside inject_data().
First thing happening here is linking the image to a device. This happens in _link_device().
1 | device = _allocate_device() |
2 | utils.execute( 'sudo qemu-nbd -c %s %s' % (device, image)) |
6 | if os.path.exists( "/sys/block/%s/pid" % os.path.basename(device)): |
9 | raise exception.Error(_( 'nbd device %s did not show up' ) % device) |
_allocate_device() returns the next available ndb device: /dev/ndbx where x is between 0 and 15. qemu-nbd is a QEMU disk network block device server. Once this is done, we get the device, let say: /dev/ndb0.
We disable filesystem check for this device. mapped_device here is “/dev/ndb0″.
1 | out, err = utils.execute( 'sudo tune2fs -c 0 -i 0 %s' % mapped_device) |
We mount the file system to a temporary directory and we add the SSH key to the ssh authorized_keys file.
1 | sshdir = os.path.join(fs, 'root' , '.ssh' ) |
2 | utils.execute( 'sudo mkdir -p %s' % sshdir) |
3 | utils.execute( 'sudo chown root %s' % sshdir) |
4 | utils.execute( 'sudo chmod 700 %s' % sshdir) |
5 | keyfile = os.path.join(sshdir, 'authorized_keys' ) |
6 | utils.execute( 'sudo tee -a %s' % keyfile, '\n' + key.strip() + '\n' ) |
In the code above, fs is the temporary directory.
Finally, we unmount the filesystem and unlink the device. This concludes the image creation and setup.
Next step in the virtualization driver spawn() method is the instance launch itself using the driver createXML() binding. Following that is the firewall rules apply step.
출처 : http://www.laurentluce.com/posts/openstack-nova-internals-of-instance-launching/
댓글