In this Tutorial I will show you a complete way how you can install your own small Hadoop Single Node Cluster with the Hortonworks Data Platform inside a Virtualbox. After the easy setup you can play around with the cluster and get some experience with it without the need to setup a new machine. It could also be a local development environment where you can debug your Map/Reduce jobs. The Hortonworks Data Platform is an 100% Open Source Apache Hadoop Distribution and comes with the following components:
- Hadoop Distributed File System (HDFS)
- MapReduce
- Apache Pig
- Apache Hive
- Apache HCatalog
- Templeton
- Apache HBase
- Apache ZooKeeper
- Apache Oozie
- Apache Sqoop
- Ganglia
- Nagios
This tutorial is based on this quick start guide. It’s recommended to have a fast internet connection during the HMC setup. Otherwise you maybe run into problems with Puppet timeouts. In this case you can try to pre-install some of the RPMs. Have a look in this thread in the Hortonworks forum.
Install Virtualbox
- The first step is the installation of the Virtualbox Software, which can be downloaded here . Please choose the installation binaries for your operating system.
- Install Virtualbox with default options.
- Download the ISO for CentOS 6.3 from your favourite mirror . (Maybe you take directly this one).
- Install the ISO-file in your Virtualbox. You will find detailed setup instructions here .
- Before you start the virtual machine make sure that you configure the following settings:
- Main memory: 4096 MB
- Disk space: 16 GB
- Enable the bridged network adapter
- Enable IOAPIC
- Start the Virtual machine
See also the screenshots below:
Install CentOS
- When everthing is working correctly then CentOS will start the installation process.
- Please chosse “Install or upgrade an existing system” from the list.
- For the hostname leave the default “localhost.localdomain”.
- Skip the media test.
- Choose the installation type “Minimal Desktop”.
- Create a user for the cluster (e.g. hadoop).
- After the successful setup reboot your virtual system and login as root.
Prepare the HMC Single Node Cluster Setup
- Change the keyboard layout to the correct language through “System->Administration->Keyboard”.
- Disable the firewall.
- Disable SELinux.
- Change SELINUX=enforcing to SELINUX=disabled.
- Configure ntpd to start at bootup.
- Edit the File “/etc/hosts” so that it looks like in the following screenshot. It is important that the first entry is “localhost.localdomain”, otherwise the HMC-Setup will not work, because you will get a problem with the hostname resolution.
- Type “hostname -f” in the terminal. It should be “localhost.localdomain”.
- Type “hostname -s” in the terminal. It should be “localhost”.
- Start the ssh-Service with
- Make sure that sshd ist started automatically on startup.
- Prepare password-less SSH Login for the root user to localhost.
- Check that password-less login works with
- Create a text file “hostdetail.txt” with the host names that will be part of your cluster. In our example with only one Node it should only contain this entry:
- When you want to use a GUI-Editor to edit the file then you will get this error. Just install your favourite editor, e.g. gedit. Just follow the instructions.
- After this preparation it’s recommended to make a snapshot of your actual system so that you can come back to this point when something goes wrong with the current installation.
chkconfig iptables off
chkconfig ip6tables off
vi /etc/selinux/config
chkconfig ntpd on
/sbin/service sshd start
chkconfig sshd on
ssh-keygen
ssh-copy-id localhost
chmod 700 .ssh
chmod 640 authorized_keys
ssh localhost
localhost.localdomain
Install Hortonworks Data Platform with HMC
- Download the RPM (Please verify if there is a newer version on this page )
- Install “Extra Packages for Enterprise Linux (EPEL)”.
- Install HMC.
- Check the installation status with
- Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries.
- Stop the firewall
- Proceed to the final installation step.
rpm -Uvh http://public-repo-1.hortonworks.com/HDP-1.1.1.16/repos/centos6/hdp-release-1.1.1.16-1.el6.noarch.rpm
yum install epel-release
yum install hmc
rpm -qa | grep hmc
service hmc start
/etc/init.d/iptables stop
Provisioning Your Cluster
- Go to the main page of the Hortonworks Management Center (HMC). Maybe you replace “localhost” with the IP from your Virtual machine host, when you access it from outside.
- Follow the wizard instructions
- When you are prompted to specify the Disk Mount Point then choose another as proposed in the wizard. For example “/data”.
- When the installation was successful you should see this screen 🙂
- When there is an error then the following logfiles are maybe helpful for troubleshooting:
- You can now go to the dashboard and check the status of your cluster:
- To safely shutdown your Cluster please stop all services in the HMC and then you can stop your Virtual machine.
- When you restart your system you can start HMC again by issuing the following commands:
- To run the HMC Service on startup follow the steps described here (optional).
http://localhost/hmc/html
/var/log/hmc/hmc.log
/var/log/puppet_apply.log
service hmc start
service hmc-agent start
You can now start playing around with your own Hadoop Cluster. When you have problems with the setup you can refer to the documentation or just leave a comment here. Merry X-Mas 🙂
More articles
fromDennis Schulte
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Dennis Schulte
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.