// August 19th, 2011 // Comments Off // Uncategorized
I've seen Ensemble evolve from a series of design-level conversations (Brussels May 2010), through a year of fast-paced Canonical-style development, and participated in Ensemble sprints (Cape Town March 2011, and Dublin June 2011). I've observed Ensemble at first as an outsider, then provided feedback as a stake-holder, and have now contributed code as a developer to Ensemble and authored Formulas.
Think about bzr or gitcirca 2004/2005, or apt circa 1998/1999, or even dpkg circa 1993/1994... That's where we are today with Ensemble circa 2011.
Ensemble is a radical, outside-of-the-box approach to a problem that the Cloud ecosystem is just starting to grok: Service Orchestration. I'm quite confident that in a few years, we're going to look back at 2011 and the work we're doing with Ensemble and Ubuntu and see an clear inflection point in the efficiency of workload management in The Cloud.
From my perspective as the leader of Canonical's Systems Integration Team, Ensemble is now the most important tool in our software tool belt when building complex cloud solutions.
Period.
Juan, Marc, Brian, and I are using Ensemble to generate modern solutions around new service deployments to the cloud. We have contributed many formulas already to Ensemble's collection, and continue to do so every day.
There's a number of novel ideas and unique approaches in Ensemble. You can deep dive into the technical details here. For me, there's one broad concept in Ensemble that just rocks my world... Ensemble deals in individual service units, with the ability to replicate, associate, and scale those units quite dynamically. Service units in practice are cloud instances (or if you're using Orchestra + Ensemble, bare metal systems!). Service units are federated together to deliver a (perhaps large and complicated) user facing service.
Okay, that's a lot of words, and at a very high level. Let to me try to break that down into something a bit more digestable...
I've been around Red Hat and Debian packaging for many years now. Debian packaging is particularly amazing at defining prerequisites packages, pre- and post- installation procedures, and are just phenomenal at rolling upgrades. I've worked with hundreds (thousands?) of packages at this point, including some mind bogglingly complex ones!
It's truly impressive how much can be accomplished within traditional Debian packaging. But it has its limits. These limits really start to bare their teeth when you need to install packages on multiple separate systems, and then federate those services together. It's one thing if you need to install a web application on a single, local system: depend on Apache, depend on MySQL, install, configure, restart the services...
sudo apt-get install your-web-app
...
Profit!
That's great. But what if you need to install MySQL on two different nodes, set them up in a replicating configuration, install your web app and Apache on a third node, and put a caching reverse proxy on a fourth? Oh, and maybe you want to do that a few times over. And then scale them out. Ummmm.....
sudo apt-get errrrrrr....yeah, not gonna work :-(
But these are exactly the type(s) of problems that Ensemble solves! And quite elegantly in fact.
Stay tuned here and I'll actually show some real Ensemble examples in a series of upcoming posts. I'll also write a bit about how Ensemble and Orchestra work together.
After that, grab the nearest terminal and come help out!
We are quite literally at the edge of something amazing here, and we welcome your contributions! All of Ensemble and our Formula Repository are entirely free software, building on years of best practice open source development on Ubuntu at Canonical. Drop into the #ubuntu-ensemble channel in irc.freenode.net, introduce yourself, and catch one of the earliest waves of something big. Really, really big.
Ubuntu is a fast, secure and easy-to-use operating system used by millions of people around the world.
Secure, fast and powerful, Ubuntu Server is transforming IT environments worldwide. Realise the full potential of your infrastructure with a reliable, easy-to-integrate technology platform.
Juju is a next generation service orchestration framework. It has been likened to APT for the cloud. With juju, different authors are able to create service charms independently, and make those services coordinate their communication through a simple protocol. Users can then take the product of different authors and very comfortably deploy those services in an environment. The result is multiple machines and components transparently collaborating towards providing the requested service.
HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!
Now that we are all caught up, let's delve right into it. I will be discussing the details of my newly created hpcc juju charm.
The hpcc charm has been one of the trickiest one to date to get working properly so, I want to take some time to explain some of the challenges that I encountered.
hpcc seems to use ssh keys for authentication and a single xml file to hold it's configuration. All nodes that are part of the cluster should have identical keys and xml configuration file.
The ssh keys are pretty easy to do ( there is even a script that will do it all for you located at /opt/HPCCSystems/sbin/keygen.sh ). You can just run: ssh-keygen -f path_where_to_save_keys/id_rsa -N "" -q
The configuration file environment.xml is a lot trickier to configure so, I will use cheetah templates to help make a template out of this enormous file.
According to their website:
Cheetah is an open source template engine and code generation tool, written in Python. It can be used standalone or combined with other tools and frameworks. Web development is its principle use, but Cheetah is very flexible and is also being used to generate C++ game code, Java, sql, form emails and even Python code.
With cheetah, I can create self contained templates that can be generated into their intended file by just calling cheetah. This is because we can embed python code inside the template itself, making the template ( environment.tmpl in our case ) more or less a python program that generates a fully functional environment.xml file ready for hpcc to use.
Another, very important, reason to use a template engine is the ability to create identical configuration files from each node without having to pass them around. In other words, each node can create it's own configuration file and, since all nodes are using the same methods and data to create the file, they will all be exactly the same.
The hpcc configuration file is huge so, I'll just talk about some of the interesting bits of it here:
The above piece of code is what I am currently using to populate a list with the FQDN and address of each cluster member sorted by install time. This puts the "master" of the cluster at the top of the list which will become useful when populating certain parts of the configuration file.
As we can see by the code above, the main piece of information that we use in this template is the node list. Here is a sample of how we use it in the environment.tmpl template file:
#for $netAddress, $name in $nodes:
<Computer computerType="linuxmachine"
domain="localdomain"
name="$name"
netAddress="$netAddress"/>
#end for
I encourage you to download the charm here and examine the environment.tmpl file in the templates directory.
Here is the complete environment.tmpl file... I know it's pretty small and, you can just download the charm and read the file at your leasure but, I wanted to give you an idea of the size and complexity of hpcc's configuration file.
Even just scrolling past the file takes a while!! This behemoth of a file was tamed thanks to cheetah, I highly encourage you to read up on it.
This charm may require some changes to your environment.yaml file in ~/.juju as hpcc will only run on 64-bit instances. Make sure that your juju environment has been properly shutdown before you edit this file ( juju destroy-environment ). Here is my environment.yaml file where I show you the important part to check:
HPCC (High Performance Computing Cluster) is a massive
parallel-processing computing platform that solves Big Data problems.
provides:
hpcc:
interface: hpcc
requires:
hpcc-thor:
interface: hpcc-thor
hpcc-roxie:
interface: hpcc-roxie
peers:
hpcc-cluster:
interface: hpcc-cluster
There are various provides and requires interfaces in this metadata.yaml file but, for now, only the peers interface is being used. I'll work on the other ones as the charm matures.
Let's look at the hpcc-cluster interface. More specifically the hpcc-cluster-relation-changed hook where the new configuration is created:
#!/bin/bash
CWD=$(dirname $0)
cheetah fill --oext=xml --odir=/etc/HPCCSystems/ ${CWD}/../templates/environment.tmpl
service hpcc-init restart
It's pretty simple isn't it? Since the "heavy lifting" is being done with the self contained cheetah template , we don't have much to do here but, to generate the configuration file and restart hpcc.
The other files in this charm are pretty self explanatory and simple so, I am leaving the details of them as an exercise to the reader.
All of the complexities in hpcc has been distilled to the following commands:
juju bootstrap
bzr branch lp:~negronjl/+junk/hpcc
juju deploy --repository . hpcc
wait a minute of two
juju status
you should see something similar to this:
negronjl@negronjl-laptop:~/src/juju/charms$ juju status 2011-08-18 16:00:54,413 INFO Connecting to environment.
2011-08-18 16:00:58,374 INFO 'status' command finished successfully negronjl@negronjl-laptop:~/src/juju/charms$
The above commands, will give you a single node.
You can access the web interface of your node by pointing your browser to http://<FQDN>:8010 Where FQDN is the Fully Qualified Domain Name or Public IP Address of your hpcc instance. On the left side, there should be a menu, explore the items on the Topology section. The Target Clusters section should look something similar to this:
To experience the true power of hpcc, you should probably throw in some more nodes at it. Let's do just that with:
juju add-unit hpcc
do this as many times as you feel comfortable
Each command will give you a new node in the cluster
wait a minute or two and you should see something similar to this:
negronjl@negronjl-laptop:~$ juju status
2011-08-18 16:25:55,739 INFO Connecting to environment.
2011-08-18 16:26:01,837 INFO 'status' command finished successfully
Notice how we now have more hpcc nodes :) Here is what the web interface could look like:
Again....we have more nodes :)
Now that we have a working cluster, let's try it. We'll first do the mandatory Hello World in ECL. It looks something like this (hello.ecl):
Output('Hello world');
We have to compile our hello.ecl so we can use it. We do that by logging into one of the nodes ( I used juju ssh 1 to log on to the first/master node ) and typing the following:
eclcc hello.ecl -o
We run the file just like we would any other binary:
./hello
... and the output is:
ubuntu@ip-10-111-19-210:~$ ./hello Hello world ubuntu@ip-10-111-19-210:~$
There are far more interesting examples in the Learning ECL Documentation here. I highly encourage you to go and read about it.
That's it for now. Feedback is always welcome of course so, let me know how I'm doing.
A lot of work is going into making sure Ensemble is more secure and enterprise ready. As part of that, all deployed services are now firewalled by default and for a formula deployed service to be publically accessible, the formula author has to specify which ports are open and when, as well as the operator needs to signal wanting to open that port. All formulas that expose ports should use open-port (and optionally close-port) diligently. Here’s what you need to know
Updating formulas for the new expose functionality
This is the only change necessary for the WordPress/MySQL example, in example/wordpress/hooks/db-relation-changed:
# Make it publicly visible, once the wordpress service is exposed
open-port 80/tcp
It is important that formulas open ports only when ready. So in the WordPress example, you wouldn’t want to do this port opening until Apache has been successfully configured and restarted. Otherwise, there’s a chance that users might see “It works!” before the desired page is available.
Firewall changes also are a two-step process. The hooks for a service unit need to open ports (and they can also close ports), but the Ensemble administrator must also expose the service. For the WordPress example, you can expose it any time after the service has been deployed with the following:
ensemble expose wordpress
Just expose the services you’re interested in exposing, possibly as soon as immediately after deployment. Again, it’s the formula author’s responsibility to ensure that port opening is done at the right time.
The service can be subsequently unexposed with
ensemble unexpose wordpress
You can see if a service is exposed with ensemble status. This would result in output similar to the following:
This work is only part of the effort to ensure Ensemble uses secure mechanisms in its operations. Recent work also made sure all state information between cloud nodes are properly access controlled to avoid leaking any confidential data. Ensemble is rapidly progressing, and now is a great time to start playing with the technology, and to start writing your own formulas!
Interested? Join the friendly Ensemble community at #ubuntu-ensemble on IRC freenode, drop in, say hi, and grab me (kim0) for any questions
MongoDB is such a great piece of open-source technology. It supports some very interesting features such as sharding and replica-sets. I have seen demos of MongoDB, where the speaker happily calls creating the replica-set cluster a “one hour thing“! I decided to sprinkle some Ensemble magic on this problem, using Jaun’s formulas, the problem becomes a “10 second thing” basically! Spinning up a Mongo replica-set cluster could not be easier! Check this video out
Yep that’s how simple it is! If you want to create more read-slaves, you only need to ask Ensemble to do it for you:
$ ensemble add-unit mongodb
If you’re interested to learn more about exactly how this “magic” works, check out this indepth guide dissecting how the Mongo Ensemble formulas exactly works by “Juan Negron” the formula author.
So was this useful? Will you be deploying your next mongodb servers with Ensemble?
Leave me a comment, let me know your thoughts! Also let me know what you’d like to see deployed next with Ensemble. Be sure to drop in to #ubuntu-ensemble on freenode irc and say hi
// August 12th, 2011 // Comments Off // Uncategorized
** This is an updated post reflecting the new name of the project formerly known as Juju now known as Juju **
A very popular database used by many companies and projects these days seem to be Cassandra.
From their website:
The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
I am by no means an expert on Cassandra but, I have done some medium size deployments on Amazon's cloud so, I wanted to translate my knowledge of Cassandra "rings" and develop an Juju charm that could use their peers interfaces to expand and contract the ring as needed.
For the impatient, the cassandra juju charm is here
The rest of us, let's move on to some details about the charm:
It should be simple ( juju deploy cassandra ... nothing more than that )
It should work stand-alone
It should be expandable via peers interfaces
grow the cluster/ring via juju add-unit cassandra
Make use of the Cassandra default configuration as much as possible.
Extract common variables from the configuration file(s) into the charm so they can be changed in the future.
The steps to install Cassandra can be distilled down to:
add repositories
install dependency packages
install cassandra
modify the configuration
/etc/cassandra/cassandra-env.sh
/etc/cassandra/cassandra.yaml
Now that we know the design goals and we have an idea on what's needed to get Cassandra up and running, let's delve into the charm.
metadata.yaml
name: cassandra revision: 1 summary: distributed storage system for structured data description: | Cassandra is a distributed (peer-to-peer) system for the management and storage of structured data. provides: database: interface: cassandra jmx: interface: cassandra peers: cluster: interface: cassandra-cluster
service cassandra status && service cassandra restart || service cassandra start
Now we should have enough of a charm to deploy a single Cassandra node. The other hooks in the charm are:
jmx-relation-joined ( mainly to advertise our jmx interface )
database-relation-joined ( mainly to advertise our database interface )
cluster-relation-joined ( persists some values that need to be available to all nodes in the ring )
cluster-relation-changed ( we use the data persisted by cluster-relation-joined to reconfigure Cassandra so it shares data with the other nodes and form a ring )
The most interesting hook of the ones above is the cluster-relation-changed one so, I'll show that one here:
bzr branch the Cassandra charm ( bzr branch lp:~negronjl/+junk/cassandra )
juju bootstrap ( wait a few minutes while the environment is set up )
negronjl@negronjl-laptop:~/src/juju/charms$ juju bootstrap2011-08-11 20:45:26,976 INFO Bootstrapping environment 'sample' (type: ec2)...2011-08-11 20:45:37,804 INFO 'bootstrap' command finished successfully
juju status ( to ensure that the environment is up )
negronjl@negronjl-laptop:~/src/juju/charms$ juju status2011-08-11 20:47:57,196 INFO Connecting to environment.machines: 0: {dns-name: ec2-50-16-150-73.compute-1.amazonaws.com, instance-id: i-57642336}services: {}2011-08-11 20:48:02,029 INFO 'status' command finished successfully
juju deploy --repository . cassandra ( to deploy the Cassandra charm )
negronjl@negronjl-laptop:~/src/juju/charms$ juju deploy --repository . cassandra2011-08-11 20:48:41,251 INFO Connecting to environment.2011-08-11 20:48:48,659 INFO Charm deployed as service: 'cassandra'2011-08-11 20:48:48,662 INFO 'deploy' command finished successfully
juju status ( to ensure Cassandra deployed properly )
negronjl@negronjl-laptop:~/src/juju/charms$ juju status 2011-08-11 21:02:36,264 INFO Connecting to environment. machines: 0: {dns-name: ec2-50-16-150-73.compute-1.amazonaws.com, instance-id: i-57642336} 1: {dns-name: ec2-50-19-73-31.compute-1.amazonaws.com, instance-id: i-5f62253e} services: cassandra: charm: local:cassandra-1 relations: {cluster: cassandra} units: cassandra/0: machine: 1 relations: cluster: {state: up} state: started <---- NOW IT IS READY 2011-08-11 21:02:42,506 INFO 'status' command finished successfully
juju ssh 1 ( this will ssh into the Cassandra machine )
Once in the Cassandra machine, verify the status of it by typing:
nodetool -h `hostname -f` ring
ubuntu@ip-10-245-211-95:~$ nodetool -h `hostname -f` ring Address DC Rack Status State Load Owns Token 10.245.211.95 datacenter1 rack1 Up Normal 6.55 KB 100.00% 124681228764612737621872162332718392045
Back on your machine ( not the Cassandr one ), type the following to add more Cassandra nodes:
juju add-unit cassandra ( repeat as many times as you want )
negronjl@negronjl-laptop:~/src/juju/charms$ juju status
2011-08-11 21:11:40,367 INFO Connecting to environment.
2011-08-11 21:11:54,132 INFO 'status' command finished successfully
After the new nodes have been properly deployed ( you can see the status of the deployment by running juju status ), log back on the Cassandra node ( juju ssh 1 ) and type:
nodetool -h `hostname -f` ring ( to see that the new nodes are being added to the ring )
ubuntu@ip-10-245-211-95:~$ nodetool -h `hostname -f` ringAddress DC Rack Status State Load Owns Token 124681228764612737621872162332718392045 10.38.33.97 datacenter1 rack1 Up Normal 11.06 KB 69.21% 72298506053176682474361069083301352072 10.99.45.243 datacenter1 rack1 Up Normal 15.34 KB 9.26% 88046943828017032654712668424156081726 10.245.211.95 datacenter1 rack1 Up Normal 11.06 KB 21.53% 124681228764612737621872162332718392045 ubuntu@ip-10-245-211-95:~$
As you can see, once you create a charm on Juju, it's pretty easy to share and use.
If you have feedback about this ( or any other charm ), I would love to hear from you.
// August 12th, 2011 // 2 Comments » // Uncategorized
A very popular database used by many companies and projects these days seem to be Cassandra.
From their website:
The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
I am by no means an expert on Cassandra but, I have done some medium size deployments on Amazon's cloud so, I wanted to translate my knowledge of Cassandra "rings" and develop an Ensemble formula that could use their peers interfaces to expand and contract the ring as needed.
For the impatient, the cassandra ensemble formula is here
The rest of us, let's move on to some details about the formula:
It should be simple ( ensemble deploy cassandra ... nothing more than that )
It should work stand-alone
It should be expandable via peers interfaces
grow the cluster/ring via ensemble add-unit cassandra
Make use of the Cassandra default configuration as much as possible.
Extract common variables from the configuration file(s) into the formula so they can be changed in the future.
The steps to install Cassandra can be distilled down to:
add repositories
install dependency packages
install cassandra
modify the configuration
/etc/cassandra/cassandra-env.sh
/etc/cassandra/cassandra.yaml
Now that we know the design goals and we have an idea on what's needed to get Cassandra up and running, let's delve into the formula.
metadata.yaml
ensemble: formula name: cassandra revision: 1 summary: distributed storage system for structured data description: | Cassandra is a distributed (peer-to-peer) system for the management and storage of structured data. provides: database: interface: cassandra jmx: interface: cassandra peers: cluster: interface: cassandra-cluster
service cassandra status && service cassandra restart || service cassandra start
Now we should have enough of a formula to deploy a single Cassandra node. The other hooks in the formula are:
jmx-relation-joined ( mainly to advertise our jmx interface )
database-relation-joined ( mainly to advertise our database interface )
cluster-relation-joined ( persists some values that need to be available to all nodes in the ring )
cluster-relation-changed ( we use the data persisted by cluster-relation-joined to reconfigure Cassandra so it shares data with the other nodes and form a ring )
The most interesting hook of the ones above is the cluster-relation-changed one so, I'll show that one here:
bzr branch the Cassandra formula ( bzr branch lp:~negronjl/+junk/cassandra )
ensemble bootstrap ( wait a few minutes while the environment is set up )
negronjl@negronjl-laptop:~/src/ensemble/formulas$ ensemble bootstrap2011-08-11 20:45:26,976 INFO Bootstrapping environment 'sample' (type: ec2)...2011-08-11 20:45:37,804 INFO 'bootstrap' command finished successfully
ensemble status ( to ensure that the environment is up )
negronjl@negronjl-laptop:~/src/ensemble/formulas$ ensemble status2011-08-11 20:47:57,196 INFO Connecting to environment.machines: 0: {dns-name: ec2-50-16-150-73.compute-1.amazonaws.com, instance-id: i-57642336}services: {}2011-08-11 20:48:02,029 INFO 'status' command finished successfully
ensemble deploy --repository . cassandra ( to deploy the Cassandra formula )
negronjl@negronjl-laptop:~/src/ensemble/formulas$ ensemble deploy --repository . cassandra2011-08-11 20:48:41,251 INFO Connecting to environment.2011-08-11 20:48:48,659 INFO Formula deployed as service: 'cassandra'2011-08-11 20:48:48,662 INFO 'deploy' command finished successfully
ensemble status ( to ensure Cassandra deployed properly )
negronjl@negronjl-laptop:~/src/ensemble/formulas$ ensemble status
2011-08-11 21:02:36,264 INFO Connecting to environment.
machines:
0: {dns-name: ec2-50-16-150-73.compute-1.amazonaws.com, instance-id: i-57642336}
1: {dns-name: ec2-50-19-73-31.compute-1.amazonaws.com, instance-id: i-5f62253e}
services:
cassandra:
formula: local:cassandra-1
relations: {cluster: cassandra}
units:
cassandra/0:
machine: 1
relations:
cluster: {state: up}
state: started <---- NOW IT IS READY
2011-08-11 21:02:42,506 INFO 'status' command finished successfully
ensemble ssh 1 ( this will ssh into the Cassandra machine )
Once in the Cassandra machine, verify the status of it by typing:
nodetool -h `hostname -f` ring
ubuntu@ip-10-245-211-95:~$ nodetool -h `hostname -f` ring Address DC Rack Status State Load Owns Token 10.245.211.95 datacenter1 rack1 Up Normal 6.55 KB 100.00% 124681228764612737621872162332718392045
Back on your machine ( not the Cassandr one ), type the following to add more Cassandra nodes:
ensemble add-unit cassandra ( repeat as many times as you want )
negronjl@negronjl-laptop:~/src/ensemble/formulas$ ensemble status
2011-08-11 21:11:40,367 INFO Connecting to environment.
2011-08-11 21:11:54,132 INFO 'status' command finished successfully
After the new nodes have been properly deployed ( you can see the status of the deployment by running ensemble status ), log back on the Cassandra node ( ensemble ssh 1 ) and type:
nodetool -h `hostname -f` ring ( to see that the new nodes are being added to the ring )
ubuntu@ip-10-245-211-95:~$ nodetool -h `hostname -f` ringAddress DC Rack Status State Load Owns Token 124681228764612737621872162332718392045 10.38.33.97 datacenter1 rack1 Up Normal 11.06 KB 69.21% 72298506053176682474361069083301352072 10.99.45.243 datacenter1 rack1 Up Normal 15.34 KB 9.26% 88046943828017032654712668424156081726 10.245.211.95 datacenter1 rack1 Up Normal 11.06 KB 21.53% 124681228764612737621872162332718392045 ubuntu@ip-10-245-211-95:~$
As you can see, once you create a formula on Ensemble, it's pretty easy to share and use.
If you have feedback about this ( or any other formula ), I would love to hear from you.
// August 11th, 2011 // Comments Off // Uncategorized
** This is an updated post reflecting the project formerly known as Ensemble, Juju **
I have always liked MongoDB and, recently Juju so, it was a matter of time until I came up with a MongoDB charm for Juju.
Here are some of the goals I set out to accomplish when I started working on this charm:
stand alone deployment.
replica sets. More information about replica sets can be found here.
master and server relationships
Don't try to solve all deployment scenarios just concentrate on the above ones for now.
Let's start with the stand-alone deployment first and, we'll add the other functionality a bit later.
Before we go into creating the directories and files, I should probably mention Charm Tools. Charm Tools is ( as the name implies ) a set of tools that facilitates the creation of charms for juju.
You can get charm-tools on most supported release of Ubuntu in the Juju ppa:
After installing charm-tools, go to the directory where you will be creating your charms and type the following to get started:
charm create mongodb
The above commands will look in your cache for a package called mongodb and create a "skeleton" structure with the metadata.yaml, hooks and descriptions already done for you into a directory called ( you guessed it ), mongodb.
metadata.yaml At this point in the development, the metadata.yaml file should look very similar to this:
name: mongodb revision: 1 summary: An object/document-oriented database (metapackage) description: | MongoDB is a high-performance, open source, schema-free document- oriented data store that's easy to deploy, manage and use. It's network accessible, written in C++ and offers the following features : * Collection oriented storage - easy storage of object- style data * Full index support, including on inner objects * Query profiling * Replication and fail-over support * Efficient storage of binary data including large objects (e.g. videos) * Auto-sharding for cloud-level scalability (Q209) High performance, scalability, and reasonable depth of functionality are the goals for the project. This is a metapackage that depends on all the mongodb parts. provides: relation-name: interface: interface-name requires: relation-name: interface: interface-name peers: relation-name: interface: interface-name
For our purposes, let's change the emphasized lines to the following:
provides:
database:
interface: mongodb
peers:
replica-set:
interface: mongodb-replica-set
The peers section will be used when we start working with replica sets so, let's just ignore that one for now.
provides: is the way we "announce" what our particular charm ...well ... provides. In this case we provide a database interface by the name of mongodb.
Not much else to do with metadata.yaml file as charm create did the brunt of work here for us.
hooks/install charm create also took care of providing us with a basic install script based on the mongodb package already available in Ubuntu. It should look very similar to this:
#!/bin/bash # Here do anything needed to install the service # i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
apt-get install -y mongodb
After some trial and error and some debugging, here is what I came up with:
#!/bin/bash # Here do anything needed to install the service # i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
set -ux
################################################################################# # Install some utility packages needed for installation ################################################################################# rm -f /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list echo deb http://ppa.launchpad.net/facter-plugins/ppa/ubuntu oneiric main >> /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list echo deb-src http://ppa.launchpad.net/facter-plugins/ppa/ubuntu oneiric main >> /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B696B50DD8914A9290A4923D6383E098F7D4BE4B
################################################################################## # Set some variables that we'll need for later ##################################################################################
################################################################################## # Change the default mongodb configuration to bind to relfect that we are a master ##################################################################################
################################################################################## # Reconfigure the upstart script to include the replica-set option. # We'll need this so, when we add nodes, they can all talk to each other. # Replica sets can only talk to each other if they all belong to the same # set. In our case, we have defaulted to "myset". ################################################################################## sed -i -e "s/ -- / -- --replSet ${DEFAULT_REPLSET_NAME} /" /etc/init/mongodb.conf
################################################################################## # stop then start ( *** not restart **** ) mongodb so we can finish the configuration ################################################################################## service mongodb stop # There is a bug in the upstart script that leaves a lock file orphaned.... Let's wipe that file out rm -f /var/lib/mongodb/mongod.lock service mongodb start
################################################################################## # Register the port ################################################################################## [ -x /usr/bin/open-port ] && open-port 27017/TCP
I have tried to comment the install script so you have an idea of what's going on ...
hooks/start This is the script that Juju will call to start mongodb. Here is what mine looks like:
#!/bin/bash # Here put anything that is needed to start the service. # Note that currently this is run directly after install # i.e. 'service apache2 start'
service mongodb status && service mongodb restart || service mongodb start
It's simple enough.
hooks/stop
#!/bin/bash # This will be run when the service is being torn down, allowing you to disable # it in various ways.. # For example, if your web app uses a text file to signal to the load balancer # that it is live... you could remove it and sleep for a bit to allow the load # balancer to stop sending traffic. # rm /srv/webroot/server-live.txt && sleep 30
service mongodb stop rm -f /var/lib/mongodb/mongod.lock
This is the script that Juju calls when it needs to stop a service.
These files are templates for the relationships ( provides, requires, peers, etc. ) declared in the metadata.yaml file. Here is a look at the ones that I have for mongodb:
Per the metadata.yaml, we need to define the following relationships:
database
replica-set
Based on that information, here are the files that I created for this charm:
database-relation-joined
#!/bin/bash
# This must be renamed to the name of the relation. The goal here is to
# affect any change needed by relationships being formed
The above commands satisfy one of our design goals, standalone deployment. Let's check out the replica sets. Type this:
juju add-unit mongodb
And that's all that is needed to add a new mongodb node that will automatically create a replica set with the existing node. You can continue to "add-unit" to add more nodes to the replica set. Notice that all of the configuration, is taken care of with the replica-set-relation-joined and replica-set-relation-changed hook scripts that we wrote above.
The beauty of this charm is that the user doesn't really have to know exactly what is needed to get a replica set cluster up and running. Juju charms are self-contained and idempotent. This means portability.
// August 11th, 2011 // 2 Comments » // Uncategorized
I have always liked MongoDB and, recently Ensemble so, it was a matter of time until I came up with a MongoDB formula for Ensemble.
Here are some of the goals I set out to accomplish when I started working on this formula:
stand alone deployment.
replica sets. More information about replica sets here.
master and server relationships
Don't try to solve all deployment scenarios just concentrate on the above ones for now.
Let's start with the stand-alone deployment first and, we'll add the other functionality a bit later.
Before we go into creating the directories and files, I should probably mention Principia Tools. Principia Tools is ( as the name implies ) a set of tools that facilitates the creation of formulas for ensemble.
You can get principia-tools on most supported release of Ubuntu in the Ensemble ppa:
After installing principia-tools, go to the directory where you will be creating your formulas and type the following to get started:
principia formulate mongodb
The above commands will look in your cache for a package called mongodb and create a "skeleton" structure with the metadata.yaml, hooks and descriptions already done for you into a directory called ( you guessed it ), mongodb.
metadata.yaml
At this point in the development, the metadata.yaml file should look very similar to this:
ensemble: formula name: mongodb revision: 1 summary: An object/document-oriented database (metapackage) description: | MongoDB is a high-performance, open source, schema-free document- oriented data store that's easy to deploy, manage and use. It's network accessible, written in C++ and offers the following features : * Collection oriented storage - easy storage of object- style data * Full index support, including on inner objects * Query profiling * Replication and fail-over support * Efficient storage of binary data including large objects (e.g. videos) * Auto-sharding for cloud-level scalability (Q209) High performance, scalability, and reasonable depth of functionality are the goals for the project. This is a metapackage that depends on all the mongodb parts. provides: relation-name: interface: interface-name requires: relation-name: interface: interface-name peers: relation-name: interface: interface-name
For our purposes, let's change the emphasized lines to the following:
provides:
database:
interface: mongodb
peers:
replica-set:
interface: mongodb-replica-set
The peers section will be used when we start working with replica sets so, let's just ignore that one for now.
provides: is the way we "announce" what our particular formula ...well ... provides. In this case we provide a database interface by the name of mongodb.
Not much else to do with metadata.yaml file as principia formulate did the brunt of work here for us.
hooks/install
formulate also took care of providing us with a basic install script based on the mongodb package already available in Ubuntu. It should look very similar to this:
#!/bin/bash # Here do anything needed to install the service # i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
apt-get install -y mongodb
After some trial and error and some debugging, here is what I came up with:
#!/bin/bash # Here do anything needed to install the service # i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
set -ux
################################################################################# # Install some utility packages needed for installation ################################################################################# rm -f /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list echo deb http://ppa.launchpad.net/facter-plugins/ppa/ubuntu oneiric main >> /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list echo deb-src http://ppa.launchpad.net/facter-plugins/ppa/ubuntu oneiric main >> /etc/apt/sources.list.d/facter-plugins-ppa-oneiric.list apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B696B50DD8914A9290A4923D6383E098F7D4BE4B
################################################################################## # Set some variables that we'll need for later ##################################################################################
################################################################################## # Change the default mongodb configuration to bind to relfect that we are a master ##################################################################################
################################################################################## # Reconfigure the upstart script to include the replica-set option. # We'll need this so, when we add nodes, they can all talk to each other. # Replica sets can only talk to each other if they all belong to the same # set. In our case, we have defaulted to "myset". ################################################################################## sed -i -e "s/ -- / -- --replSet ${DEFAULT_REPLSET_NAME} /" /etc/init/mongodb.conf
################################################################################## # stop then start ( *** not restart **** ) mongodb so we can finish the configuration ################################################################################## service mongodb stop # There is a bug in the upstart script that leaves a lock file orphaned.... Let's wipe that file out rm -f /var/lib/mongodb/mongod.lock service mongodb start
################################################################################## # Register the port ################################################################################## [ -x /usr/bin/open-port ] && open-port 27017/TCP
I have tried to comment the install script so you have an idea of what's going on ...
hooks/start
This is the script that Ensemble will call to start mongodb. Here is what mine looks like:
#!/bin/bash # Here put anything that is needed to start the service. # Note that currently this is run directly after install # i.e. 'service apache2 start'
service mongodb status && service mongodb restart || service mongodb start
It's simple enough.
hooks/stop
#!/bin/bash # This will be run when the service is being torn down, allowing you to disable # it in various ways.. # For example, if your web app uses a text file to signal to the load balancer # that it is live... you could remove it and sleep for a bit to allow the load # balancer to stop sending traffic. # rm /srv/webroot/server-live.txt && sleep 30
service mongodb stop rm -f /var/lib/mongodb/mongod.lock
This is the script that Ensemble calls when it needs to stop a service.
These files are templates for the relationships ( provides, requires, peers, etc. ) declared in the metadata.yaml file. Here is a look at the ones that I have for mongodb:
Per the metadata.yaml, we need to define the following relationships:
database
replica-set
Based on that information, here are the files that I created for this formula:
database-relation-joined
#!/bin/bash
# This must be renamed to the name of the relation. The goal here is to
# affect any change needed by relationships being formed
The above commands satisfy one of our design goals, standalone deployment. Let's check out the replica sets. Type this:
ensemble add-unit mongodb
And that's all that is needed to add a new mongodb node that will automatically create a replica set with the existing node. You can continue to "add-unit" to add more nodes to the replica set. Notice that all of the configuration, is taken care of with the replica-set-relation-joined and replica-set-relation-changed hook scripts that we wrote above.
The beauty of this formula is that the user doesn't really have to know exactly what is needed to get a replica set cluster up and running. Ensemble formulas are self-contained and idempotent. This means portability.
// August 8th, 2011 // 7 Comments » // Uncategorized
A while back I started experimenting with Ensemble and was intrigued by the notion of services instead of machines.
A bit of background on Ensemble from their website:
Ensemble is a next generation service orchestration framework. It has been likened to APT for the cloud. With Ensemble, different authors are able to create service formulas independently, and make those services coordinate their communication through a simple protocol. Users can then take the product of different authors and very comfortably deploy those services in an environment. The result is multiple machines and components transparently collaborating towards providing the requested service.
I come from a DevOps background and know first hand the troubles and tribulations of deploying production services, webapps, etc. One that's particularly "thorny" is hadoop.
To deploy a hadoop cluster, we would need to download the dependencies ( java, etc. ), download hadoop, configure it and deploy it. This process is somewhat different depending on the type of node that you're deploying ( ie: namenode, job-tracker, etc. ). This is a multi-step process that requires too much human intervention. It is also a process that is difficult to automate and reproduce. Imagine 10, 20 or 50 node cluster using this method. It can get frustrating quickly and it is prone to mistake.
With this experience in mind ( and a lot of reading ), I set out to deploy a hadoop cluster using an Ensemble formula.
First things first, let's install Ensemble. Follow the Getting Started documentation on the Ensemble site here.
According to the Ensemble documenation, we just need to follow some file naming conventions for what they call "hooks" ( executable scripts in your language of choice that perform certain actions ). These "hooks" control the installation, relationships, start, stop, etc of your formula. We also need to summarize the description of the formula in a file called metadata.yaml. The metadata.yaml file describes the formula, it's interfaces, what it requires and provides among other things. More on this file later when I show you the one for hadoop-master and hadoop-slave.
Armed with a bit of knowledge and a desire for simplicity, I decided to split the hadoop cluster in two:
hadoop-master (namenode and jobtracker )
hadoop-slave ( datanode and tasktracker )
I know this is not an all-encompassing list but, this will take care of a good portion of deployments and, the ensemble formulas are easy enough to modify that you can work your changes into them.
One of my colleagues, Brian Thomason did a lot of packaging for these formulas so, my job is now easier. The configuration for the packages has been distilled down to three questions:
namenode ( leave blank if you are the namenode )
jobtracker ( leave blank if you are the jobtracker )
hdfs data directory ( leave blank to use the default: /var/lib/hadoop-0.20/dfs/data )
Due to the magic of Ubuntu packaging, we can even "preseed" the answers to those questions to avoid being asked about them ( and stopping the otherwise automatic process ). We'll use the utility debconf-set-selections for this. Here is a piece of the code that I use to preseed the values in my formula:
Thanks to Brian's work, I now just have to install the packages ( hadoop-0.20-namenode and hadoop-0.20-jobtracker). Let's put all of this together into an ensemble formula.
Create a directory for the hadoop-master formula ( mkdir hadoop-master )
Make a directory for the hooks of this formula ( mkdir hadoop-master/hooks )
Let's start with the always needed metadata.yaml file ( hadoop-master/metadata.yaml ):
ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
namenode, which manages the block locations of files on the
filesystem. The jobtracker is a central service which is responsible
for managing the tasktracker services running on all nodes in a
Hadoop Cluster. The jobtracker allocates work to the tasktracker
nearest to the data with an available work slot.
provides:
hadoop-master:
interface: hadoop-master
Every Ensemble formula has an install script ( in our case: hadoop-master/hooks/install ). This is an executable file in your language of choice that ensemble will run when it's time to install your formula. Anything and everything that needs to happen for your formula to install, needs to be inside of that file. Let's take a look at the install script of hadoop-master:
#!/bin/bash
# Here do anything needed to install the service
# i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
There a few other files that we need to create ( start and stop ) to get the hadoop-master formula installed. Let's see those files:
start
#!/bin/bash
# Here put anything that is needed to start the service.
# Note that currently this is run directly after install
# i.e. 'service apache2 start'
set -x
service hadoop-0.20-namenode status && service hadoop-0.20-namenode restart || service hadoop-0.20-namenode start
service hadoop-0.20-jobtracker status && service hadoop-0.20-jobtracker restart || service hadoop-0.20-jobtracker start
stop
#!/bin/bash
# This will be run when the service is being torn down, allowing you to disable
# it in various ways..
# For example, if your web app uses a text file to signal to the load balancer
# that it is live... you could remove it and sleep for a bit to allow the load
# balancer to stop sending traffic.
# rm /srv/webroot/server-live.txt && sleep 30
set -x
ensemble-log "stop script"
service hadoop-0.20-namenode stop
service hadoop-0.20-jobtracker stop
Let's go back to the metadata.yaml file and examin it in more detail:
ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
namenode, which manages the block locations of files on the
filesystem. The jobtracker is a central service which is responsible
for managing the tasktracker services running on all nodes in a
Hadoop Cluster. The jobtracker allocates work to the tasktracker
nearest to the data with an available work slot.
provides:
hadoop-master:
interface: hadoop-master
The emphasized section ( provides ) tells ensemble that this formula provides an interface named hadoop-master that can be used in relationships with other formulas ( in our case we'll be using it to connect the hadoop-master with the hadoop-slave formula that we'll be writing a bit later ). For this relationship to work, we need to let Ensemble know what to do ( More detailed information about relationships in formulas can be found here ).
Per the Ensemble documentation, we need to name our relationship hooks hadoop-master-relation-joined and it should also be an executable script in your language of choice. Let's see what that file looks like:
#!/bin/sh
# This must be renamed to the name of the relation. The goal here is to
# affect any change needed by relationships being formed
# This script should be idempotent.
set -x
ensemble-log "joined script started"
# Calculate our IP Address
IP_ADDRESS=`hostname -f`
# Preseed our Namenode, Jobtracker and HDFS Data directory
ensemble add-relation hadoop-slave hadoop-master # ( connects the hadoop-slave to the hadoop-master )
As you can see, once you have the formula written and tested, deploying the cluster is really a matter of a few commands. The above example gives you one hadoop-master ( namenode, jobtracker ) and one hadoop-slave ( datanode, tasktracker ).
To add another node to this existing hadoop cluster, we add:
ensemble add-unit hadoop-slave # ( this adds one more slave )
Run the above command multiple times to continue to add hadoop-slave nodes to your cluster.
Ensemble allows you to catalog the steps needed to get your service/application installed, configured and running properly. Once your knowledge has been captured in an ensemble formula, it can be re-used by you or others without much knowledge of what's needed to get the application/service running.
In the DevOps world, this code re-usability can save time, effort and money by providing self contained formulas that provide a service or application.
So you wanted to play with hadoop to crunch on some big-data problems, except that, well getting a hadoop cluster up and running in not exactly a one minute thing! Let me show you how to make it “a one minute thing” using Ensemble! Since Ensemble now has formulas for creating hadoop master and slave nodes, thanks to the great work of Juan Negron. Spinning up a hadoop cluster could not be easier! Check this video out
Yep that’s how simple it is! If you want to scale-out the cluster, you only need to ask Ensemble to do it for you:
$ ensemble add-unit hadoop-slave
If you’re interested to learn more about exactly how this “magic” works, check out this indepth guide dissecting how the hadoop Ensemble formulas exactly work by non-other than Juan Negron, the formula author.
So is this easier than configuring a hadoop cluster manually? Leave me a comment, let me know your thoughts! Also let me know what you’d like to see deployed next with Ensemble. Be sure to drop in to #ubuntu-ensemble on freenode irc and say hi