Tuesday, May 13, 2014

How to start the beeline hive command line client in local mode

beeline -u jdbc:hive2://
See http://blog.cloudera.com/blog/2014/02/migrating-from-hive-cli-to-beeline-a-primer/ for an introduction to beeline.

Tuesday, April 8, 2014

Monday, December 16, 2013

How to increase Xmx for the hadoop client applications

Sometimes you need more memory for hadoop client application tools like hive, beeline or pig.
export HADOOP_CLIENT_OPTS=-Xmx2G

Thursday, December 12, 2013

Find out the total size of directories with the 'du' command

To get a sorted list of folder sizes run:
du -sm * | sort -n

How to access the web interface of a remote hadoop cluster over a SOCKS proxy

You want to use the web interface of a hadoop cluster but you only have ssh access to it? SOCKS is the solution:

Use ssh to open a SOCKS proxy:

ssh -f -N -D 7070 user@remote-host
Click here for an explanation of the command.

After that you can configure firefox to use this proxy:

  1. Go to the manual proxy settings and add localhost:7070 as SOCKS v5 host
  2. Go to about:config and set network.proxy.socks_remote_dns to true to use DNS resolution over the proxy (thanks to Aczire for this!).
Thats all!

Wednesday, December 11, 2013

How to rsync a folder over ssh

Another short shell snippet:

rsync -avze ssh source user@host:target-folder
For an explanation of the command click here.

Monday, October 28, 2013

How to write unit tests for your Hadoop MapRecude jobs

Simple answer: use MRUnit.

You will need a classifier to include it in your maven project:

  <dependency>
   <groupId>org.apache.mrunit</groupId>
   <artifactId>mrunit</artifactId>
   <version>1.0.0</version>
   <classifier>hadoop2</classifier>
   <scope>test</scope>
  </dependency>