For a few months I've been working with Eucalyptus. I found it to be very promising, yet still somewhat incomplete platform. I decided to share my experiences. This is and introductory post, others will be howtos for specific tasks.
Eucalyptus is an open-source cloud computing platform designed to closely resemble (from the user's point of view) Amazon EC2. Standard Amazon tools can work with Eucalyptus and vice versa. This allows for very interesting setups: hybrid systems using both local and public VMs.
Eucalyptus started as a university project in 2008. In the beginning of 2009, it got commercial support. As you can see, the project is in the early phase. And yet there already some large scale deployments, with most famous being Ubuntu Public Cloud. Other noteworthy user of Eucalyptus is NASA. The project progresses at a rapid rate, in a few weeks we can expect a new version with more features.
Pros:
• Mostly (not 100%) compatible with EC2.
• Works with Xen and KVM, can use hardware virtualization support but doesn't require it. Note: I will only cover Xen setup in the howtos.
• Multi-user setup. Users can create their own VMs without administrator's intervention.
• Security groups: VMs within one group can communicate with each other, but groups are separated (not only by IP subnetwork, but also by VLAN!). Default is one security group per user, you can create more or share a group if you like.
• Fully scriptable.
• Authentication by X.509 certificates.
• Open source, easy to customize and modify.
• Relatively easy (for such complex system) installation
Cons:
• Sometimes incomplete documentation.
• Some planned functionality is still missing, namely: multi-cluster grids, ability to restart controller.
• Requires custom scripting for many typical administration tasks. This should also change in the new release.
• Some small, but annoying issues. For example, you should use the same Linux distribution on all controllers (note: not VMs, they can run anything your hypervisor supports); controller timezone should be set to ETC.
Eucalyptus from the user's perspective
First time user needs to login to a web interface and request an account. After administrator's approval (this is the only part that requires human interaction), she downloads a personal certificate. User also has to install client software: either Eucalyptus euca2ools, Amazon's EC2 API/AMI tools or a GUI client, eg. EC2Dream. Whatever the choice, the software has to be configured by providing the certificate plus all the URLs, usernames etc.
From that point, user can begin working with the system. There are commands for:
- running a new instance of virtual machine,
- stopping a running instance,
- configuring instances (eg. assigning network resources, security groups),
- storing, retrieving and moving around files inside Eucalyptus storage service,
- creating and modifying VM images.
They work exactly the same for EC2 and Eucalyptus.
Architecture of Eucalyptus
Eucalyptus system consists of 3 parts:
• Cloud controller - to manage the whole installation.
• Cluster controller - in future, it will be possible to divide the installation into several clusters. Cluster is a group of machines connected to the same LAN, cloud can be distributed. However, current version (1.5.2) doesn't support it yet. For now, cloud controller and cluster controller has to be the same host.
• Node controller - on each physical machine that will run VMs.
The client only interacts with cloud controller, which then sends commands and reads replies from the nodes. Note that for a publicly available cloud, you only need a public IP address on a cloud controller.
Eucalyptus uses a storage service called Walrus - an open source software using the same public API as Amazon S3. All VM images are initially stored in Walrus in compressed and encrypted form (Walrus can temporarily cache unencrypted version for performance reasons). Images can be either private (readable only for the user who stored it) or public. Whenever the client requests to run a new instance, the controller decrypts the VM image and sends it the node. Node controller then stores it on a local disk and attempts to run it.
Is EC2 API the right choice for you?
The system closely follows EC2, which can be a disadvantage in some scenarios:
• VM instances are treated as disposable resources. If they fail, it's up to the user to respawn them. Whenever they are shut down, all the data is lost (you have to use S3/Walrus for persistent storage). There's no support for failover (although the user can made one inside the VM), migration etc.
• Node controllers store VMs on local disks. Obviously, you can use a network drive instead, but Eucalyptus will be unaware of it. Even if you mount the same volume on both node and cloud controller, Eucalyptus will still copy the files from one machine to the other. This can severely affect your network performance if you spawn a lot of VMs.
• Some systems allow to configure VMs while deploying them, but Eucalyptus doesn't care at all about what's hapenning inside the VM. It's an intentional design choice: it's a multi-user cloud, so VM configuration is a user's job. If you need a centrally-configured system of interoperating VMs, there might be other choices. Eucalyptus allows for that, but doesn't provide any support.
There is, however, one major advantage of using EC2 comptabile platform: ability to create a hybrid cloud system. Sure, you need to write your custom scripts to deploy VMs. But once you have them, switching from local to EC2 resources and back is a matter of a few shell variables.
Tuesday, September 15, 2009
Subscribe to:
Post Comments (Atom)
Nice summarization, really liked your description, Thanks.
ReplyDelete