Installation

Procedural Overview 1, how to get my complete cluster up and running?

There are four relatively simple stages. To go from beginning to end takes around an hour if using the tools provided and resources from amazon. If not using these tools the configuration can become complicated, especially to begin with.

Allocate Centos6 resources (using aws/up_aws_cluster.py or vagrant up.sh, or any other means). There must be one admin node in the cluster with passwordless access to the rest (using sshkeys, automatic using aws/up_aws_cluster.py), and ssh access from the outside world. If you want to use automatic deployment wiht blueprints, the admin node root user must also have passwordless ssh access to itself (needed for some steps in ambari).
Install ambari on the admin node (automatically done by up_aws_cluster.py)
Decide the distribution of services across nodes (create blueprint and cluster template if you want)
Configure cluster services (deployed based on the blueprint using deploy_from_blueprint.py or manually through a combination of the web interface and command-line tools)

Example of creating a cluster on Amazon and deploying with a blueprint

# Performs steps 1 and 2 together, a security config file tells which subnet/security-group/keys to use
aws/up_aws_cluster.py management clusters/aws_management_config.json clusters/example_security_config.json
# Performs step 3 and 4: deploying a blueprint to this cluster needs two config files and takes ~2 minutes
deploy_from_blueprint.py blueprints/management.blueprint.json blueprints/management.cluster.json 54.154.25.186  clusters/example_security_config.json

Procedural Overview 2, how to test/develop my new service installation?

You will need to first have the knowledge on how you do a standard install of this component on a blank centos 6 machine. Once you know how that goes, you're ready to wrap in ambari, following the Architecture wiki.

You can test your wrapper very very easily, very similar to above, but here you can use the default blueprint and single-node cluster to begin with.

1 create new test amazon machine

aws/up_aws_cluster.py test-some-service clusters/aws_singlenode_config.json clusters/example_security_config.json
#deploying a blueprint to this cluster needs two config files and takes ~2 minutes

2 deploy default blueprint to fast-track the cluster creation

deploy_from_blueprint.py blueprints/default.blueprint.json blueprints/default.cluster.json 54.154.25.186  clusters/example_security_config.json

3 visit the machine and checkout your branch of Ambari

./aws/connect_to.py #choose machine
#git fetch ambari kave if you ahven't already got it (should already be there actually)
#if it's already put there for you by up_aws_cluster, probably you need to setup ssh keys, which is easy ...
# export GIT_SSH=$HOME/gitwrap.sh
cd AmbariKave
git pull #if this fails you probably need to export GIT_SSH=$HOME/gitwrap.sh, or add your key
git checkout MyBranch
bin/patch.sh
ambari-server restart

4 keep trying this until your service installation works
4.1 pull your latest changes

git pull

4.2 apply as patch

bin/patch.sh
ambari-server restart

4.3 start installation

bin/service.sh install MYNEWSERVICE -h ambari.localdomain

4.4 monitor through ambari web interface (you can also sometimes re-start installations from the web interface...)
4.5 edit/commit/push/pull as you think is necessary
4.6 repeat stage 4 until it's working

Once it works one time, you must be nearly there, but best to check by going to a fresh machine, repeating from step 1. Once this is working you should take a look at the blueprint of the cluster (default) you created and use that to make a full blueprint including your new service.

If you can try the deployment again from a complete blueprint, and it works, you must be convinced that this is fine.

AWS: Development env on amazon with default deployment

See AWS-CLI and deployment readme

We can create a set of machines with Ambari installed very simply on Amazon with a few clicks. Any node marked as "admin" will automatically have Ambari installed, but not configured.

You will only need to remember the IP address of the ambari node, and also copy it's private key for the other machines locally, to later upload through the web interface.

scp -i my-amazon-key root@ambariamazonnodeip:.ssh/id_rsa ambari_priv.pem

Vagrant: Development environment on VirtualBox

If you want to install on your laptop instead, we've based this part on thing on the vagrant installation manual this is really good and pretty easy. We however divert from this slightly after the machine is booted.

Vagrant: Ambari vagrant setup

We copied this from the base manual but will divert at a point

After you have installed VirtualBox and Vagrant on your computer, check out the “ambari-vagrant” repo on github:

git clone https://github.com/u39kun/ambari-vagrant.git

Edit your /etc/hosts on your computer so that you will be able to resolve hostnames for the VMs:

sudo cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts

Note: don't forget to uncomment the required /etc/hosts lines! Copy the private key to your home directory (or some place convenient for you) so that it’s easily accessible for uploading via Ambari Web:

vagrant

The above command shows the command usage and also creates a private key as ~/.vagrant.d/insecure_private_key. This key will be used in the following steps. Starting VMs

First, change directory to ambari-vagrant:

cd ambari-vagrant

You will see subdirectories for different OS’s. “cd” into the OS that you want to test. centos6.4 is recommended as this is quicker to launch than other OS's. Now you can start VMs with the following command:

cd centos6.4
cp ~/.vagrant.d/insecure_private_key .
./up.sh <# of VMs to launch>

For example, up.sh 3 starts 3 VMs. 3 seems to be a good number with 16GB of RAM without taxing the system too much. With the default Vagrantfile, you can specify up to 10 (if your computer can handle it; you can even add more). VMs will have the FQDN [01-10].ambari.apache.org, where is c59 (CentOS 5.9), c64 (CentOS 6.4), etc. E.g., c5901.ambari.apache.org, c6401.ambari.apache.org, etc. VMs will have the IP address 192.168..1[01-10], where is 59 for CentOS 5.9, 64 for CentOS 6.4, etc. E.g., 192.168.59.101, 192.168.64.101, etc. Note that up.sh 3 command is equivalent to doing something like: vagrant up /c6401-3/

Vagrant: Testing Vagrant

If it is your first time running a vagrant command, run:

vagrant init

Log into the VM:

vagrant ssh c6401

Installation: Turn off firewall

Turn off the firewall and edit the config such that the firewall won't boot in the future.

sudo service iptables stop
sudo chkconfig iptables off

Installation: Tests

The following commands should let you know if the cluster is ready:

 sudo su #you must be able to become the root user
 ssh root@localhost #once you are the root user, ssh as yourself should not prompt for password!, see here: https://www.cs.utah.edu/~bigler/code/sshkeys.html
 ssh <whatever-local-node> #as the root user should not prompt for a password!
 service iptables status #should tell you this service is not running!
 sestatus #should tell you selinux is in permissive mode!

Installation: AmbariKave

So we now have a blank centos machine. This is not really that impressive yet. We will now install Ambari and patch it with our custom code.

First, consider which user the checkout should be under, you can install and configure everything as a sudoer, but then for blueprints your sudoes will need passwordless ssh access to the root user, which might not be as secure as you would like. In that case, sudo su, first so that the whole installation is made as the root user. Then remember where in git you;re going to check out the project from, in this example, github is used.

    sudo yum install git
    #SETUP SSH KEYS!
    git clone [email protected]:KaveIO/AmbariKave.git
    cd ambarikave
    sudo bin/install.sh
    sudo bin/patch.sh
    sudo ambari-server start

Updating AmbariKave correctly

To update the version of our ambari installer correctly, do:

Connect to your ambari/admin node
Checkout the new code from git and cd into that directory
sudo ambari-server stop
sudo bin/patch.sh
sudo ambari-server start

Web: Configuring AmbariKave through the web interface, step 1: hadoop components

Either through the web interface or with blueprints using the command-line tools (below)

You can now install Hadoop components through the web interface.

Now you should start the installation procedure from your browser. This can be done by visiting:

http://[ip-or-vagrant-name]:8080/
e.g.:
http://c6401:8080/ (this corresponds with centOS 6.4 machine 1, look up the correct address for your machine in the /etc/hosts file)
or
http://[amazon-public-ip]:8080/

In the next steps don't forget to change c#### with whatever your real hostname is, like your amazon ip address?

The login is:

Username: admin
Password: admin

Name the cluster any name you prefer and install HDP 2.2.KAVE and then hit Next. In the following step you can provide all the ip-addresses of the machines in the cluster, e.g. c6402.ambari.apache.org, c6403.ambari.apache.org and so on. Also take the private key from the previous steps (the insecure key) and upload it in the web-console. This allows Ambari to login to the machine and do its magic. To begin with, you must only install one node, and this node has to run some key services. These services must be isntalled for Ambari to work properly at the moment, but you can turn them off later.

ONLY INSTALL ONE NODE.
This node must have Ganglia and Nagios. DO NOT SELECT OPENLDAP OR GITLABS
N.B.: You must use the same host name as the result of "hostname" on the target machine. Hostname, when there is no specified domain, looks like "amachine".localdomain. If you have aliased this machine in your /etc/hosts file, then make sure you have aliased it also including a domain, even if it is localdomain

Now lets get ready to install gitlabs.

Add a new host through the web interface and install only the client software on this host. (restart ganglia & nagios afterwards if you like.)

Command Line: Installing OPENLDAP

This should be as easy as:

bin/service.sh install OPENLDAP -h c6402.ambari.apache.org

This will prompt you for some parameters. This is NOT a friendly dialog. Typ0's will come back to haunt you! You should fill these in with something reasonable. An example would be:

Value for base?
dc=kave,dc=org

Value for root_user?
cn=root,dc=kave,dc=org

Value for root_password?
bla123

Value for bind_user?
cn=bind,dc=kave,dc=org

Value for bind_password?
bla123

Value for database_directory?
/var/db/openldap/kave

Value for phpldapadmin_user?
bind

Value for phpldapadmin_password?
bind

Did it work? I don't know, try going to: http://iporhostnamethingy/phpldapadmin

Command Line: Installing GITLAB

This should be as easy as:

bin/service.sh install GITLAB -h c6403.ambari.apache.org

This will prompt you for some parameters. This is NOT a friendly dialog. Typ0's will come back to haunt you! You should fill these in with something reasonable. An example would be:

Value for gitlab_port?
80

Value for ldap_enabled?
true

Value for ldap_host?
c6401.ambari.apache.org

Value for ldap_port?
389

Value for ldap_uid?
uid

Value for ldap_method?
plain

Value for ldap_bind_dn?
cn=bind,dc=kave,dc=org

Value for ldap_password?
bla123

Value for ldap_allow_username_or_email_login?
true

Value for ldap_base?
dc=kave,dc=org

Value for restrict_public_projects?
true

Value for gitlab_signin_enabled?
false

After the installation is completed you can visit the Gitlab and phpldapadmin by visiting:

http://c6402.ambari.apache.org/
http://c6403.ambari.apache.org/

And login with: username:password is root:5iveL!fe Note: If you get the default apache page then you might want to restart some of the services on the host, check your dashboard in order see which services require a restart.

Blueprints: Configuring Ambari through Blueprints

https://cwiki.apache.org/confluence/display/AMBARI/Blueprints

http://www.pythian.com/blog/ambari-blueprints-and-one-touch-hadoop-clusters/

https://cwiki.apache.org/confluence/display/AMBARI/Adding+a+New+Service+to+an+Existing+Cluster

A blueprint can store all the required configuration parameters and then deploy an entire cluster completely.

Blueprints: Details on using blueprints

A blueprint: Defines the services running on certain host_groups and the common configurations of all those services
A cluster configuration: allocates hosts to those host groups, and the components will then be installed on them.

Blueprints: test access to ambari configuration server

curl --user admin:admin http://{your.ambari.server}:8080/api/v1/clusters

Blueprints: Save a blueprint from an existing cluster

curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://ambarinodename:8080/api/v1/clusters/test?format=blueprint

e.g.:

curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://{your.ambari.server}:8080/api/v1/clusters/test?format=blueprint

Blueprints: Examining a cluster configuration

curl --user admin:admin http://{your.ambari.server}:8080/api/v1/clusters/[clustername]

Blueprints: Writing a custom blueprint

Best way to do this is to use an existing blueprint (https://github.com/DataAnalyticsOrganization/AmbariKave/tree/master/deployment/blueprints), or save a blueprint from a running cluster. For example you can make a blueprint from a small amazon cluster, and then copy out some of it's configuration. When the blueprint is being registered, it will tell you if there are any problems.

For the moment you must have the basic hadoop components in your blueprint somewhere, but the host group they are in does not need to be instantiated.

Certain passwords and email addresses must be kept in the blueprint to pass topology checks, however, make sure you overwrite them with your own custom versions in the cluster template!

A blueprint determines common configurations for groups of machines. The configurations can be specified at the top-most level, or overwritten in any host-group. This makes it very configurable. later they can also be overwritten by your cluster template.

Blueprints: Writing a custom cluster template

You should always manually re-write your cluster template to match your cluster and to change the default passwords. The cluster template should ideally only contain those specifications which differ from cluster-to-cluster, such as the host names, the passwords you want to use.

Cluster templates are much easier to write than blueprints, you can take a look at the example below:

{
    "blueprint" : "management",
    "default_password" : "admin",
    "configurations" : [
      { "nagios-env" : { "nagios_web_password" : "TEST", "nagios_contact" : "[email protected]"} }
    ],
    "host_groups" : [
	{
	    "name" : "admin",
	    "configurations" : [  { "openldap" : {"bind_password" : "TEST", "root_password" : "TEST"} } ],
	    "hosts" : [ { "fqdn" : "ambari.localdomain" } ]
	},
	{
	    "name" : "gitlabs",
	    "configurations" : [  { "gitlab" : {"ldap_host" : "ambari.localdomain", "ldap_password" : "bla123", "gitlab_admin_password" : "TEST"} } ],
	    "hosts" : [ { "fqdn" : "gitlabs-nl.localdomain" } ]
	}
    ]
}

Blueprints: Installing with a blueprint manually

Obtain /write blueprint
Obtain/write cluster template, make sure you overwrite any default passwords
Install ambari-agent on all nodes in cluster, registering the correct ambari node
Register blueprint through curl
Register template through curl, will automatically start the cluster configuration

If you really want to do all these steps manually, take a look at: http://www.pythian.com/blog/ambari-blueprints-and-one-touch-hadoop-clusters/

Otherwise, use the command-line tool.

Blueprints: Installing with a blueprint using the command-line-tool supplied

Obtain /write blueprint
Obtain/write cluster template, make sure you overwrite any default passwords
Ensure ambari is installed on one cluster node, and that this cluster node has access to all other nodes without password, using sshkeys
Ensure that whichever machine you are on has passwordless access as root on the destination ambari machine, or make sure you are already installing from the root user of that machine, 'sudo su'
You can then install the blueprints very simply, either if you're remote or on the ambari node itself.

deploy_from_blueprint.py blueprint.json cluster.json [hostname=localhost] [access_key_if_remote]

This takes only a few minutes.

How to reset a cluster completely?

If you can possibly avoid this, don't do it! Just start the whole cluster from an image again!

Option 1: use the script provided: bin/clean.sh

Option 2: Manually

Stop ambari-clients on all nodes : ambari-agent stop
Remove/uninstall ambari : yum -y erase ambari-agent ambari-server
Stop services which were managed by ambari and remove them if they were installed through ambari (e.g. yum erase openldap-servers)
Erase the postgre database: su - postgres ; psql -c "drop database ambari"; psql -c "drop database ambarirca"; ctrl^d
Re-install ambari from scratch

Kave on Azure

Kave on Azure Home

For contributors

Developer Home

For someone who modifies the AmbariKave code itself and contributes to this project. Persons working on top of existing KAVEs or developing solutions on top of KAVE don't need to read any of this second part.

Installation

Installation

Procedural Overview 1, how to get my complete cluster up and running?

Procedural Overview 2, how to test/develop my new service installation?

AWS: Development env on amazon with default deployment

Vagrant: Development environment on VirtualBox

Vagrant: Ambari vagrant setup

Vagrant: Testing Vagrant

Installation: Turn off firewall

Installation: Tests

Installation: AmbariKave

Updating AmbariKave correctly

Web: Configuring AmbariKave through the web interface, step 1: hadoop components

Command Line: Installing OPENLDAP

Command Line: Installing GITLAB

Blueprints: Configuring Ambari through Blueprints

Blueprints: Details on using blueprints

Blueprints: test access to ambari configuration server

Blueprints: Save a blueprint from an existing cluster

Blueprints: Examining a cluster configuration

Blueprints: Writing a custom blueprint

Blueprints: Writing a custom cluster template

Blueprints: Installing with a blueprint manually

Blueprints: Installing with a blueprint using the command-line-tool supplied

How to reset a cluster completely?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of Contents

Kave on Azure

For contributors

Clone this wiki locally