FLUME-1020: Support Kerberos security in HDFS Sink
Review Request #4360 - Created March 15, 2012 and submitted
This is an initial pass at an implementation of HDFS security. I think it will probably work. Currently trying to get Kerberos to play nice with the cluster on my VM though, so I haven't successfully tested it yet. It still works when used on HDFS with security disabled. :) The only thing I don't like is in configure() when authentication fails I throw a FlumeException. I'll trace up and see how bad that would be but it seems likely to break something. Just logging the error is kind of a bummer as well, though ... need to ensure process() doesn't fill up the disk while spewing copious error messages into the logs. Maybe this is a use case for some kind of FatalException type thing.
Review request changed
Updated (March 17, 2012, 2:30 a.m.)
Got this working. Please take a look. For now, I am not throwing if authentication fails. Also, based on speaking with folks familiar with HDFS, it turns out that in order to communicate with a secure cluster one must have the hadoop config directory on the classpath. This is due to some static variables being used to keep track of settings & states related to the UserGroupInformation class. So, I am looking for Hadoop environment variables in bin/flume-ng. Tested this against a Kerberized Hadoop cluster running in a VM (Centos 6) using MIT Kerberos on my laptop.
Review request changed
Updated (March 20, 2012, 10:33 p.m.)
No java code has changed. Updated the build environment and the runtime environment as follows: 1. At runtime, if the hadoop binary can be found on the system, Flume interrogates it to get the CLASSPATH and JAVA_LIBRARY_PATH variables out of it using tricks kindly shared by Roman in Bigtop. This allows us to find the hadoop configuration files and the appropriate JARs for the system being accessed. At least, it's basically the state of the art for compatibility right now if you want to call it that. 2. To allow the tricks above to work at runtime, the hadoop artifacts have been marked as optional in the POM, which means that they will not be included in the binary distribution. That's fine, because they are only needed if the HDFS Sink is used, and we jump through hoops to find those artifacts if they're on the system. As a result, I am able to compile Flume against the default hadoop version (0.20.205.0) and run against versions of Hadoop that I didn't explicitly build against, like 0.23.x, without a problem. This is a huge improvement over when I was working on this last week, where I was adding/fixing profiles to get anything to work at all, since it's well known that different hadoop versions generally refuse to talk to each other. I also refactored the flume-ng script to duplicate less code and be a bit friendlier, since I was doing surgery in there anyway. This is ready for review now.
Posted (March 21, 2012, 6:43 a.m.)
Should we also try the bigtop autodetect script? http://svn.apache.org/repos/asf/incubator/bigtop/trunk/bigtop-packages/src/common/bigtop-utils/bigtop-detect-javahome This script: http://svn.apache.org/repos/asf/incubator/bigtop/trunk/bigtop-packages/src/common/hadoop/install_hadoop.sh calls the script as follows: # Autodetect JAVA_HOME if not defined if [ -e /usr/libexec/bigtop-detect-javahome ]; then . /usr/libexec/bigtop-detect-javahome elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then . /usr/lib/bigtop-utils/bigtop-detect-javahome fi
The only place this method is called we have previously checked to see if security is enabled. Why do this check in both places?
This is good debug info. But if it fails here, we have already logged in. Should we be returning false?
Review request changed
Updated (March 22, 2012, 8:06 a.m.)
Brock, thanks for all the feedback! I am now looking for the bigtop JAVA_HOME detection script and calling it if it's there. I've also incorporated more suggestions from Roman, including using slf4j 1.6.1 which Hadoop and Zookeeper are using. I'm also excluding slf4j from the hadoop classpath when it's injected into Flume's classpath to avoid warnings in the log when it's an older version of Hadoop. Also incorporated the suggestions regarding not checking twice and incorporated some debug messages to indicate overall success or failure. I tested this all on a Kerberos cluster and it seems to work well.
Posted (March 22, 2012, 8:25 a.m.)
+1 Thanks for the patch Mike. Please attach it to the Jira. Also, it will be great if you can file a follow-up jira to remove the configuration constants from the system into their own separate class.