Review Board 1.7.22

FLUME-1020: Support Kerberos security in HDFS Sink

Review Request #4360 - Created March 15, 2012 and submitted

Mike Percy
This is an initial pass at an implementation of HDFS security. I think it will probably work. Currently trying to get Kerberos to play nice with the cluster on my VM though, so I haven't successfully tested it yet. It still works when used on HDFS with security disabled. :)

The only thing I don't like is in configure() when authentication fails I throw a FlumeException. I'll trace up and see how bad that would be but it seems likely to break something. Just logging the error is kind of a bummer as well, though ... need to ensure process() doesn't fill up the disk while spewing copious error messages into the logs. Maybe this is a use case for some kind of FatalException type thing.

Review request changed
Updated (March 22, 2012, 8:06 a.m.)
Brock, thanks for all the feedback!

I am now looking for the bigtop JAVA_HOME detection script and calling it if it's there.

I've also incorporated more suggestions from Roman, including using slf4j 1.6.1 which Hadoop and Zookeeper are using. I'm also excluding slf4j from the hadoop classpath when it's injected into Flume's classpath to avoid warnings in the log when it's an older version of Hadoop.

Also incorporated the suggestions regarding not checking twice and incorporated some debug messages to indicate overall success or failure.

I tested this all on a Kerberos cluster and it seems to work well.
Ship it!
Posted (March 22, 2012, 8:25 a.m.)

Thanks for the patch Mike. Please attach it to the Jira. Also, it will be great if you can file a follow-up jira to remove the configuration constants from the system into their own separate class.
  1. Thanks for committing this Arvind! I've filed to track removal of the config constants.