Thursday, 15 December 2016

How to use a custom python with spark on BigInsights 4.2 Enterprise clusters


Install Anaconda

This script installs anaconda python on a BigInsights on cloud 4.2 Enterprise cluster.
Note that these instructions do NOT work for Basic clusters because ssh is broken on basic clusters.

Ssh into the mastermanager node, then run (changing the values for your environment):
 
export BI_USER=snowch
export BI_PASS=changeme
export BI_HOST=bi-hadoop-prod-4118.bi.services.us-south.bluemix.net

Next run the following. The script attempts to be as idemopotent as possible so it shouldn't matter if you run it multiple times:
 
# abort if the script encounters an error or undeclared variables
set -euo

CLUSTER_NAME=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters | python -c 'import sys, json; print(json.load(sys.stdin)["items"][0]["Clusters"]["cluster_name"]);')
echo Cluster Name: $CLUSTER_NAME

CLUSTER_HOSTS=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters/${CLUSTER_NAME}/hosts | python -c 'import sys, json; items = json.load(sys.stdin)["items"]; hosts = [ item["Hosts"]["host_name"] for item in items ]; print(" ".join(hosts));')
echo Cluster Hosts: $CLUSTER_HOSTS

wget -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh

# Install anaconda if it isn't already installed
[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b

# Install anaconda on all of the cluster nodes
for CLUSTER_HOST in ${CLUSTER_HOSTS}; 
do 
   if [[ "$CLUSTER_HOST" != "$BI_HOST" ]];
   then
      echo "*** Processing $CLUSTER_HOST ***"
      ssh $BI_USER@$CLUSTER_HOST "wget -q -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh"
      ssh $BI_USER@$CLUSTER_HOST "[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b"

      # You can install your pip modules on each node using something like this:
      # ssh $BI_USER@$CLUSTER_HOST "${HOME}/anaconda2/bin/python -c 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary"
   fi
done

echo 'Finished installing'

Running a pyspark job

If you are using pyspark, you can use anaconda python, set the following variables before running the pyspark command:

export SPARK_HOME=/usr/iop/current/spark-client
export HADOOP_CONF_DIR=/usr/iop/current/hadoop-client/conf

# set these to the folders where you installed anaconda
export PYSPARK_PYTHON=/home/biadmin/anaconda2/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/home/biadmin/anaconda2/bin/python2.7
 

Zeppelin

You can also run Zeppelin with this custom python version - see here for more information:
 
https://github.com/IBM-Bluemix/BigInsights-on-Apache-Hadoop/blob/master/examples/Zeppelin/README.md 

Github

The scripts in this blog are kept in github, please navigate to github for the latest version:

https://github.com/IBM-Bluemix/BigInsights-on-Apache-Hadoop/blob/master/examples/Zeppelin/anaconda_setup.md

Saturday, 14 June 2014

Easy Windows (and Linux/OS X) HBase development environment setup

I was trying to set up a demo HBase development environments on my Windows laptop, but I ran into a number of issues trying to get HBase running as a server and also connecting to HBase using a java client.

Rather than mucking around with Windows, I decided to create a HBase environment inside a virtual machine by installing the following inside the VM:

  • Ubuntu 14.04
  • XFCE desktop (a light-weight unix desktop)
  • Eclipse java development environment.

I'm hoping this environment will also be useful even for users new to Linux, Eclipse or Java because all of the setup has been done on behalf of the user.

Rather than provide a set of manual instruction steps, I decided to use Vagrant to automate the set up steps on behalf of the user.  Vagrant is an awesome tool for setting up development environments, if you haven't tried it yet, I strongly recommend taking a look at it.

For more information, see the project page, here: https://github.com/snowch/vagrant-hbase

Friday, 25 April 2014

Apache Stratos PaaS simple setup

Introduction

This blog is for developers and administrators who are familiar with the upcoming Apache Stratos 4.0 (incubating) PaaS framework, who would like to try Stratos but don't have the time to go through the manual setup process.

This blog uses Vagrant to automate an environment for you to try out Stratos.  By following just a few steps, these are the things that are automatically done for you:

  • downloading a basic Ubuntu image for Virtualbox
  • start up the image 
  • set up up a Puppet Master
  • checkout, compile and install Stratos
  • set up OpenStack
  • create a Stratos Cartridge
  • set up eclipse with the Stratos source code

When following this blog, you will use shell scripts that I have created for Vagrant to set up the environment.  Later in the blog when using the scripts, feel free to have a look at the scripts to see what they are doing: hopefully, they should be understandable.  Also worth mentioning is that although Vagrant is used in this blog, you don't need to know anything about Vagrant - you will be shown all you need to know.

Note:
  • The machine running Virtualbox is the 'vagrant host', or just 'host'
  • The image running inside Virtualbox is the 'vagrant guest', or just 'guest'
  • OpenStack uses Qemu emulation to run instances which is very slow.  If you get a chance, head over to https://www.virtualbox.org/ticket/4032 and add a comment showing your support for Nested Virtualisation.  This will allow OpenStack to use KVM instead of Qemu which will be much quicker.

Prerequisites

You need at least 6Gb of free memory and around 30Gb free disk space on the 'host'. The setup should work on all of the platforms supported by Vagrant and Virtualbox.

This blog assumes you are connecting directly to the internet and not using a proxy server (unless that proxy server is transparent).  Proxy servers make it very difficult to setup an IaaS because each operating system instance will need to be configured to use the proxy server to have Internet access.

Install Vagrant and Virtualbox for your operating system.  Do NOT use the packages installed by your operating system (e.g. with apt-get), but instead install package from the following URLs:


Even if you already have Vagrant or Virtualbox installed, it may be worth installing the latest versions if yours are quite old. 

Note: if you hit any bugs with the environment described here, please raise an issue on the github project at: https://github.com/snowch/stratos-vagrant-box/issues

Setup steps

The vagrant project is stored in github.  If you have git installed, you can checkout using:
git clone https://github.com/snowch/stratos-vagrant-box.git
cd stratos-vagrant-box
git checkout v1.0
vagrant plugin install vagrant-cachier
Now all you need to do is run either:
new_dev_env_setup.bat (Windows users)
or
./new_dev_env_setup.sh (*nix users)
Then wait.  A lot of software is being downloaded and setup.  This can take quite a few hours to complete.  When the script has finished successfully, you should see:
...
Finished configuring the cartridge.
Note the cartridge id:  6bfe03f0-f725-456d-9ede-6af763f80528 
(your actual cartridge id will be different)

Warning: If you run the script again, it will delete your previous created guest, create a new guest and go through the whole installation process again.  If the script does not finish successfully the first time around, you can re-run the script to retry installing from scratch.

After successful completion of the 'stratos_developer_env_setup' script, you should be able to access the Stratos and OpenStack web consoles using:


You can get ssh access to the 'vagrant guest' by running the following command from the stratos-vagrant-box directory:
vagrant ssh
After ssh'ing into the 'guest', you will see the scripts used to setup your environment. stratos.sh and openstack-qemu.sh.  You can run these scripts with no arguments to see a list of options for each script.

For a graphical client, you can use 'rdesktop' if your host is a *nix variant, or 'Remote Desktop Connection' on Windows to connect to the 'guest'.  The connection details are:

  • Hostname: 192.168.56.5
  • Port: 3389 (not required if using Windows Remote Desktop Client)
  • Username: vagrant
  • Password: vagrant

After connecting with rdesktop or Remote Desktop Connection, scroll to the bottom left on the window and click on the Ubuntu desktop menu to open the Eclipse IDE.  When eclipse prompts you for the workspace location, accept the default.  Then click the icon to go to the Workbench.  You should now see the Stratos source code imported ready for you to hack. Note that hacking is optional - you can use the 'guest' just for  trying out the Stratos runtime.

If you are still inside the ssh session, type CTRL-D to exit, so you are back at the stratos-vagrant-box directory.  Run the command: 'vagrant halt'.  This will shutdown the virtualbox guest, ready for the next blog.

The next blog will describe how to use your vagrant guest, and will introduce commands to start, stop and remove your 'guest', and commands to start, stop and reconfigure Stratos and OpenStack inside the 'guest'.