-
Notifications
You must be signed in to change notification settings - Fork 5
Installation
There are four relatively simple stages. To go from beginning to end takes around an hour if using the tools provided and resources from amazon. If not using these tools the configuration can become complicated, especially to begin with.
- Allocate Centos6 resources (using aws/up_aws_cluster.py or vagrant up.sh, or any other means). There must be one admin node in the cluster with passwordless access to the rest (using sshkeys, automatic using aws/up_aws_cluster.py), and ssh access from the outside world. If you want to use automatic deployment wiht blueprints, the admin node root user must also have passwordless ssh access to itself (needed for some steps in ambari).
- Install ambari on the admin node (automatically done by up_aws_cluster.py)
- Decide the distribution of services across nodes (create blueprint and cluster template if you want)
- Configure cluster services (deployed based on the blueprint using deploy_from_blueprint.py or manually through a combination of the web interface and command-line tools)
Example of creating a cluster on Amazon and deploying with a blueprint
# Performs steps 1 and 2 together, a security config file tells which subnet/security-group/keys to use
aws/up_aws_cluster.py management clusters/aws_management_config.json clusters/example_security_config.json
# Performs step 3 and 4: deploying a blueprint to this cluster needs two config files and takes ~2 minutes
deploy_from_blueprint.py blueprints/management.blueprint.json blueprints/management.cluster.json 54.154.25.186 clusters/example_security_config.json
You will need to first have the knowledge on how you do a standard install of this component on a blank centos 6 machine. Once you know how that goes, you're ready to wrap in ambari, following the Architecture wiki.
You can test your wrapper very very easily, very similar to above, but here you can use the default blueprint and single-node cluster to begin with.
- 1 create new test amazon machine
aws/up_aws_cluster.py test-some-service clusters/aws_singlenode_config.json clusters/example_security_config.json
#deploying a blueprint to this cluster needs two config files and takes ~2 minutes
- 2 deploy default blueprint to fast-track the cluster creation
deploy_from_blueprint.py blueprints/default.blueprint.json blueprints/default.cluster.json 54.154.25.186 clusters/example_security_config.json
- 3 visit the machine and checkout your branch of Ambari
./aws/connect_to.py #choose machine
#git fetch ambari kave if you ahven't already got it (should already be there actually)
#if it's already put there for you by up_aws_cluster, probably you need to setup ssh keys, which is easy ...
# export GIT_SSH=$HOME/gitwrap.sh
cd AmbariKave
git pull #if this fails you probably need to export GIT_SSH=$HOME/gitwrap.sh, or add your key
git checkout MyBranch
bin/patch.sh
ambari-server restart- 4 keep trying this until your service installation works
- 4.1 pull your latest changes
git pull
- 4.2 apply as patch
bin/patch.sh
ambari-server restart
- 4.3 start installation
bin/service.sh install MYNEWSERVICE -h ambari.localdomain
- 4.4 monitor through ambari web interface (you can also sometimes re-start installations from the web interface...)
- 4.5 edit/commit/push/pull as you think is necessary
- 4.6 repeat stage 4 until it's working
Once it works one time, you must be nearly there, but best to check by going to a fresh machine, repeating from step 1. Once this is working you should take a look at the blueprint of the cluster (default) you created and use that to make a full blueprint including your new service.
If you can try the deployment again from a complete blueprint, and it works, you must be convinced that this is fine.
See AWS-CLI and deployment readme
We can create a set of machines with Ambari installed very simply on Amazon with a few clicks. Any node marked as "admin" will automatically have Ambari installed, but not configured.
You will only need to remember the IP address of the ambari node, and also copy it's private key for the other machines locally, to later upload through the web interface.
- scp -i my-amazon-key root@ambariamazonnodeip:.ssh/id_rsa ambari_priv.pem
If you want to install on your laptop instead, we've based this part on thing on the vagrant installation manual this is really good and pretty easy. We however divert from this slightly after the machine is booted.
We copied this from the base manual but will divert at a point
After you have installed VirtualBox and Vagrant on your computer, check out the “ambari-vagrant” repo on github:
git clone https://github.com/u39kun/ambari-vagrant.git
Edit your /etc/hosts on your computer so that you will be able to resolve hostnames for the VMs:
sudo cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts
Note: don't forget to uncomment the required /etc/hosts lines! Copy the private key to your home directory (or some place convenient for you) so that it’s easily accessible for uploading via Ambari Web:
vagrant
The above command shows the command usage and also creates a private key as ~/.vagrant.d/insecure_private_key. This key will be used in the following steps. Starting VMs
First, change directory to ambari-vagrant:
cd ambari-vagrant
You will see subdirectories for different OS’s. “cd” into the OS that you want to test. centos6.4 is recommended as this is quicker to launch than other OS's. Now you can start VMs with the following command:
cd centos6.4
cp ~/.vagrant.d/insecure_private_key .
./up.sh <# of VMs to launch>
For example, up.sh 3 starts 3 VMs. 3 seems to be a good number with 16GB of RAM without taxing the system too much. With the default Vagrantfile, you can specify up to 10 (if your computer can handle it; you can even add more). VMs will have the FQDN [01-10].ambari.apache.org, where is c59 (CentOS 5.9), c64 (CentOS 6.4), etc. E.g., c5901.ambari.apache.org, c6401.ambari.apache.org, etc. VMs will have the IP address 192.168..1[01-10], where is 59 for CentOS 5.9, 64 for CentOS 6.4, etc. E.g., 192.168.59.101, 192.168.64.101, etc. Note that up.sh 3 command is equivalent to doing something like: vagrant up /c6401-3/
If it is your first time running a vagrant command, run:
vagrant init
Log into the VM:
vagrant ssh c6401
Turn off the firewall and edit the config such that the firewall won't boot in the future.
sudo service iptables stop
sudo chkconfig iptables off
The following commands should let you know if the cluster is ready:
sudo su #you must be able to become the root user
ssh root@localhost #once you are the root user, ssh as yourself should not prompt for password!, see here: https://www.cs.utah.edu/~bigler/code/sshkeys.html
ssh <whatever-local-node> #as the root user should not prompt for a password!
service iptables status #should tell you this service is not running!
sestatus #should tell you selinux is in permissive mode!
So we now have a blank centos machine. This is not really that impressive yet. We will now install Ambari and patch it with our custom code.
First, consider which user the checkout should be under, you can install and configure everything as a sudoer, but then for blueprints your sudoes will need passwordless ssh access to the root user, which might not be as secure as you would like. In that case, sudo su, first so that the whole installation is made as the root user. Then remember where in git you;re going to check out the project from, in this example, github is used.
sudo yum install git
#SETUP SSH KEYS!
git clone [email protected]:KaveIO/AmbariKave.git
cd ambarikave
sudo bin/install.sh
sudo bin/patch.sh
sudo ambari-server startTo update the version of our ambari installer correctly, do:
- Connect to your ambari/admin node
- Checkout the new code from git and cd into that directory
- sudo ambari-server stop
- sudo bin/patch.sh
- sudo ambari-server start
Either through the web interface or with blueprints using the command-line tools (below)
You can now install Hadoop components through the web interface.
Now you should start the installation procedure from your browser. This can be done by visiting:
http://[ip-or-vagrant-name]:8080/
e.g.:
http://c6401:8080/ (this corresponds with centOS 6.4 machine 1, look up the correct address for your machine in the /etc/hosts file)
or
http://[amazon-public-ip]:8080/
In the next steps don't forget to change c#### with whatever your real hostname is, like your amazon ip address?
The login is:
- Username: admin
- Password: admin
Name the cluster any name you prefer and install HDP 2.2.KAVE and then hit Next.
In the following step you can provide all the ip-addresses of the machines in the cluster,
e.g. c6402.ambari.apache.org, c6403.ambari.apache.org and so on. Also take the private key from the previous steps (the insecure key) and upload it in the web-console. This allows Ambari to login to the machine and do its magic. To begin with, you must only install one node, and this node has to run some key services. These services must be isntalled for Ambari to work properly at the moment, but you can turn them off later.
- ONLY INSTALL ONE NODE.
- This node must have Ganglia and Nagios. DO NOT SELECT OPENLDAP OR GITLABS
- N.B.: You must use the same host name as the result of "hostname" on the target machine. Hostname, when there is no specified domain, looks like "amachine".localdomain. If you have aliased this machine in your /etc/hosts file, then make sure you have aliased it also including a domain, even if it is localdomain
Now lets get ready to install gitlabs.
Add a new host through the web interface and install only the client software on this host. (restart ganglia & nagios afterwards if you like.)
This should be as easy as:
bin/service.sh install OPENLDAP -h c6402.ambari.apache.org
This will prompt you for some parameters. This is NOT a friendly dialog. Typ0's will come back to haunt you! You should fill these in with something reasonable. An example would be:
Value for base?
dc=kave,dc=org
Value for root_user?
cn=root,dc=kave,dc=org
Value for root_password?
bla123
Value for bind_user?
cn=bind,dc=kave,dc=org
Value for bind_password?
bla123
Value for database_directory?
/var/db/openldap/kave
Value for phpldapadmin_user?
bind
Value for phpldapadmin_password?
bind
Did it work? I don't know, try going to: http://iporhostnamethingy/phpldapadmin
This should be as easy as:
bin/service.sh install GITLAB -h c6403.ambari.apache.org
This will prompt you for some parameters. This is NOT a friendly dialog. Typ0's will come back to haunt you! You should fill these in with something reasonable. An example would be:
Value for gitlab_port?
80
Value for ldap_enabled?
true
Value for ldap_host?
c6401.ambari.apache.org
Value for ldap_port?
389
Value for ldap_uid?
uid
Value for ldap_method?
plain
Value for ldap_bind_dn?
cn=bind,dc=kave,dc=org
Value for ldap_password?
bla123
Value for ldap_allow_username_or_email_login?
true
Value for ldap_base?
dc=kave,dc=org
Value for restrict_public_projects?
true
Value for gitlab_signin_enabled?
false
After the installation is completed you can visit the Gitlab and phpldapadmin by visiting:
http://c6402.ambari.apache.org/
http://c6403.ambari.apache.org/
And login with: username:password is root:5iveL!fe
Note: If you get the default apache page then you might want to restart some of the services on the host, check your dashboard in order see which services require a restart.
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
http://www.pythian.com/blog/ambari-blueprints-and-one-touch-hadoop-clusters/
https://cwiki.apache.org/confluence/display/AMBARI/Adding+a+New+Service+to+an+Existing+Cluster
A blueprint can store all the required configuration parameters and then deploy an entire cluster completely.
- A blueprint: Defines the services running on certain host_groups and the common configurations of all those services
- A cluster configuration: allocates hosts to those host groups, and the components will then be installed on them.
curl --user admin:admin http://{your.ambari.server}:8080/api/v1/clusters
curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://ambarinodename:8080/api/v1/clusters/test?format=blueprint
e.g.:
curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://{your.ambari.server}:8080/api/v1/clusters/test?format=blueprint
curl --user admin:admin http://{your.ambari.server}:8080/api/v1/clusters/[clustername]
Best way to do this is to use an existing blueprint (https://github.com/DataAnalyticsOrganization/AmbariKave/tree/master/deployment/blueprints), or save a blueprint from a running cluster. For example you can make a blueprint from a small amazon cluster, and then copy out some of it's configuration. When the blueprint is being registered, it will tell you if there are any problems.
For the moment you must have the basic hadoop components in your blueprint somewhere, but the host group they are in does not need to be instantiated.
Certain passwords and email addresses must be kept in the blueprint to pass topology checks, however, make sure you overwrite them with your own custom versions in the cluster template!
A blueprint determines common configurations for groups of machines. The configurations can be specified at the top-most level, or overwritten in any host-group. This makes it very configurable. later they can also be overwritten by your cluster template.
You should always manually re-write your cluster template to match your cluster and to change the default passwords. The cluster template should ideally only contain those specifications which differ from cluster-to-cluster, such as the host names, the passwords you want to use.
Cluster templates are much easier to write than blueprints, you can take a look at the example below:
{
"blueprint" : "management",
"default_password" : "admin",
"configurations" : [
{ "nagios-env" : { "nagios_web_password" : "TEST", "nagios_contact" : "[email protected]"} }
],
"host_groups" : [
{
"name" : "admin",
"configurations" : [ { "openldap" : {"bind_password" : "TEST", "root_password" : "TEST"} } ],
"hosts" : [ { "fqdn" : "ambari.localdomain" } ]
},
{
"name" : "gitlabs",
"configurations" : [ { "gitlab" : {"ldap_host" : "ambari.localdomain", "ldap_password" : "bla123", "gitlab_admin_password" : "TEST"} } ],
"hosts" : [ { "fqdn" : "gitlabs-nl.localdomain" } ]
}
]
}
- Obtain /write blueprint
- Obtain/write cluster template, make sure you overwrite any default passwords
- Install ambari-agent on all nodes in cluster, registering the correct ambari node
- Register blueprint through curl
- Register template through curl, will automatically start the cluster configuration
If you really want to do all these steps manually, take a look at: http://www.pythian.com/blog/ambari-blueprints-and-one-touch-hadoop-clusters/
Otherwise, use the command-line tool.
- Obtain /write blueprint
- Obtain/write cluster template, make sure you overwrite any default passwords
- Ensure ambari is installed on one cluster node, and that this cluster node has access to all other nodes without password, using sshkeys
- Ensure that whichever machine you are on has passwordless access as root on the destination ambari machine, or make sure you are already installing from the root user of that machine, 'sudo su'
- You can then install the blueprints very simply, either if you're remote or on the ambari node itself.
deploy_from_blueprint.py blueprint.json cluster.json [hostname=localhost] [access_key_if_remote]
This takes only a few minutes.
If you can possibly avoid this, don't do it! Just start the whole cluster from an image again!
Option 1: use the script provided: bin/clean.sh
Option 2: Manually
- Stop ambari-clients on all nodes : ambari-agent stop
- Remove/uninstall ambari : yum -y erase ambari-agent ambari-server
- Stop services which were managed by ambari and remove them if they were installed through ambari (e.g. yum erase openldap-servers)
- Erase the postgre database: su - postgres ; psql -c "drop database ambari"; psql -c "drop database ambarirca"; ctrl^d
- Re-install ambari from scratch
For users, installers, and other persons interested in the KAVE, or developing solutions on top of a KAVE.
-
Developer Home
-
Onboarding
-
Architecture
-
Identities
-
Repository
-
FreeIPA
-
Installation
-
AWS Dev, getting started with aws within our dev team.
-
AWS CLI, more details and discussion over using aws for this development
-
DockerDev, Installation and usage of Docker
-
Onboarding
For someone who modifies the AmbariKave code itself and contributes to this project. Persons working on top of existing KAVEs or developing solutions on top of KAVE don't need to read any of this second part.