Review Board 1.7.22


HBASE-5209: Add active and backup masters to ClusterStatus

Review Request #3892 - Created Feb. 14, 2012 and updated

David Wang
0.94.0, 0.92.1, 0.90.7
HBASE-5209
Reviewers
hbase
hbase-git
Problem:
There is no method in the HBase client-facing APIs to determine which of the masters is currently active.  This can be especially useful in setups with multiple backup masters.

Solution:
Augment ClusterStatus to return the currently active master and the list of backup masters.

Notes:
* I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master.  If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort.  The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
* I fixed some minor formatting issues while going through the code.  I can take these changes out if it is considered improper to commit such non-related changes with the main changes.
* Ran mvn -P localTests test multiple times - no new tests fail
* Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
* Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
* Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
  * Did the following before and after killing
    * hbase hbck -details - checked output to see that active and backup masters are reported properly
    * zk_dump - checked that active and backup masters are reported properly
* Started cluster with no backup masters to make sure change operates correctly that way
* Tested build with this diff vs. build without this diff, in all combinations of client and server
  * Verified that new client can run against old servers without incident and with the defaults applied.
  * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind.  This is not solvable, at least in the scope of this change.

12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
A record version mismatch occured. Expecting v2, found v3
        at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
        at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

* Ran dev-support/test-patch.sh - no new issues fail:

-1 overall.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -136 warning messages.  

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version ) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.
Review request changed
Updated (March 15, 2012, 1:16 p.m.)
Addressed Jon's and Stack's comments with addendum patch.  Passed unit tests and I threw it onto a setup to make sure everything looked OK.
Ship it!
Posted (March 16, 2012, 10:50 p.m.)
Dave,

Additions look good to me.  Since this is a little while since the others were committed please file another jira, and then I'll commit.

Thanks!