Review Board 1.7.22


HBASE-5209: Add active and backup masters to ClusterStatus

Review Request #3892 - Created Feb. 14, 2012 and updated

David Wang
0.94.0, 0.92.1, 0.90.7
HBASE-5209
Reviewers
hbase
hbase-git
Problem:
There is no method in the HBase client-facing APIs to determine which of the masters is currently active.  This can be especially useful in setups with multiple backup masters.

Solution:
Augment ClusterStatus to return the currently active master and the list of backup masters.

Notes:
* I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master.  If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort.  The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
* I fixed some minor formatting issues while going through the code.  I can take these changes out if it is considered improper to commit such non-related changes with the main changes.
* Ran mvn -P localTests test multiple times - no new tests fail
* Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
* Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
* Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
  * Did the following before and after killing
    * hbase hbck -details - checked output to see that active and backup masters are reported properly
    * zk_dump - checked that active and backup masters are reported properly
* Started cluster with no backup masters to make sure change operates correctly that way
* Tested build with this diff vs. build without this diff, in all combinations of client and server
  * Verified that new client can run against old servers without incident and with the defaults applied.
  * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind.  This is not solvable, at least in the scope of this change.

12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
A record version mismatch occured. Expecting v2, found v3
        at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
        at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

* Ran dev-support/test-patch.sh - no new issues fail:

-1 overall.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -136 warning messages.  

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version ) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.
src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Revision b849429 New Change
[20] 39 lines
[+20]
40
 * <p>
40
 * <p>
41
 * <tt>ClusterStatus</tt> provides clients with information such as:
41
 * <tt>ClusterStatus</tt> provides clients with information such as:
42
 * <ul>
42
 * <ul>
43
 * <li>The count and names of region servers in the cluster.</li>
43
 * <li>The count and names of region servers in the cluster.</li>
44
 * <li>The count and names of dead region servers in the cluster.</li>
44
 * <li>The count and names of dead region servers in the cluster.</li>

    
   
45
 * <li>The name of the active master for the cluster.</li>

    
   
46
 * <li>The name(s) of the backup master(s) for the cluster, if they exist.</li>
45
 * <li>The average cluster load.</li>
47
 * <li>The average cluster load.</li>
46
 * <li>The number of regions deployed on the cluster.</li>
48
 * <li>The number of regions deployed on the cluster.</li>
47
 * <li>The number of requests since last report.</li>
49
 * <li>The number of requests since last report.</li>
48
 * <li>Detailed region server loading and resource usage information,
50
 * <li>Detailed region server loading and resource usage information,
49
 *  per server and per region.</li>
51
 *  per server and per region.</li>
50
 *  <li>Regions in transition at master</li>
52
 * <li>Regions in transition at master</li>
51
 *  <li>The unique cluster ID</li>
53
 * <li>The unique cluster ID</li>
52
 * </ul>
54
 * </ul>
53
 */
55
 */
54
public class ClusterStatus extends VersionedWritable {
56
public class ClusterStatus extends VersionedWritable {
55
  /**
57
  /**
56
   * Version for object serialization.  Incremented for changes in serialized
58
   * Version for object serialization.  Incremented for changes in serialized
57
   * representation.
59
   * representation.
58
   * <dl>
60
   * <dl>
59
   *   <dt>0</dt> <dd>initial version</dd>
61
   *   <dt>0</dt> <dd>Initial version</dd>
60
   *   <dt>1</dt> <dd>added cluster ID</dd>
62
   *   <dt>1</dt> <dd>Added cluster ID</dd>
61
   *   <dt>2</dt> <dd>Added Map of ServerName to ServerLoad</dd>
63
   *   <dt>2</dt> <dd>Added Map of ServerName to ServerLoad</dd>
62
   * </dl>
64
   * </dl>
63
   */
65
   */
64
  private static final byte VERSION = 2;
66
  private static final byte VERSION = 2;
65

    
   
67

   
66
  private String hbaseVersion;
68
  private String hbaseVersion;
67
  private Map<ServerName, HServerLoad> liveServers;
69
  private Map<ServerName, HServerLoad> liveServers;
68
  private Collection<ServerName> deadServers;
70
  private Collection<ServerName> deadServers;

    
   
71
  private ServerName master;

    
   
72
  private Collection<ServerName> backupMasters;
69
  private Map<String, RegionState> intransition;
73
  private Map<String, RegionState> intransition;
70
  private String clusterId;
74
  private String clusterId;
71
  private String[] masterCoprocessors;
75
  private String[] masterCoprocessors;
72

    
   
76

   
73
  /**
77
  /**
74
   * Constructor, for Writable
78
   * Constructor, for Writable
75
   */
79
   */
76
  public ClusterStatus() {
80
  public ClusterStatus() {
77
    super();
81
    super();
78
  }
82
  }
79

    
   
83

   
80
  public ClusterStatus(final String hbaseVersion, final String clusterid,
84
  public ClusterStatus(final String hbaseVersion, final String clusterid,
81
      final Map<ServerName, HServerLoad> servers,
85
      final Map<ServerName, HServerLoad> servers,
82
      final Collection<ServerName> deadServers, final Map<String, RegionState> rit,
86
      final Collection<ServerName> deadServers,

    
   
87
      final ServerName master,

    
   
88
      final Collection<ServerName> backupMasters,

    
   
89
      final Map<String, RegionState> rit,
83
      final String[] masterCoprocessors) {
90
      final String[] masterCoprocessors) {
84
    this.hbaseVersion = hbaseVersion;
91
    this.hbaseVersion = hbaseVersion;
85
    this.liveServers = servers;
92
    this.liveServers = servers;
86
    this.deadServers = deadServers;
93
    this.deadServers = deadServers;

    
   
94
    this.master = master;

    
   
95
    this.backupMasters = backupMasters;
87
    this.intransition = rit;
96
    this.intransition = rit;
88
    this.clusterId = clusterid;
97
    this.clusterId = clusterid;
89
    this.masterCoprocessors = masterCoprocessors;
98
    this.masterCoprocessors = masterCoprocessors;
90
  }
99
  }
91

    
   
100

   
[+20] [20] 66 lines
[+20] [+] public boolean equals(Object o) {
158
      return false;
167
      return false;
159
    }
168
    }
160
    return (getVersion() == ((ClusterStatus)o).getVersion()) &&
169
    return (getVersion() == ((ClusterStatus)o).getVersion()) &&
161
      getHBaseVersion().equals(((ClusterStatus)o).getHBaseVersion()) &&
170
      getHBaseVersion().equals(((ClusterStatus)o).getHBaseVersion()) &&
162
      this.liveServers.equals(((ClusterStatus)o).liveServers) &&
171
      this.liveServers.equals(((ClusterStatus)o).liveServers) &&
163
      deadServers.equals(((ClusterStatus)o).deadServers) &&
172
      this.deadServers.equals(((ClusterStatus)o).deadServers) &&
164
      Arrays.equals(this.masterCoprocessors, ((ClusterStatus)o).masterCoprocessors);
173
      Arrays.equals(this.masterCoprocessors, ((ClusterStatus)o).masterCoprocessors) &&

    
   
174
      this.master.equals(((ClusterStatus)o).master) &&

    
   
175
      this.backupMasters.equals(((ClusterStatus)o).backupMasters);
165
  }
176
  }
166

    
   
177

   
167
  /**
178
  /**
168
   * @see java.lang.Object#hashCode()
179
   * @see java.lang.Object#hashCode()
169
   */
180
   */
170
  public int hashCode() {
181
  public int hashCode() {
171
    return VERSION + hbaseVersion.hashCode() + this.liveServers.hashCode() +
182
    return VERSION + hbaseVersion.hashCode() + this.liveServers.hashCode() +
172
      deadServers.hashCode();
183
      this.deadServers.hashCode() + this.master.hashCode() +

    
   
184
      this.backupMasters.hashCode();
173
  }
185
  }
174

    
   
186

   
175
  /** @return the object version number */
187
  /** @return the object version number */
176
  public byte getVersion() {
188
  public byte getVersion() {
177
    return VERSION;
189
    return VERSION;
[+20] [20] 16 lines
[+20] public byte getVersion() {
194
  public Collection<ServerName> getServers() {
206
  public Collection<ServerName> getServers() {
195
    return Collections.unmodifiableCollection(this.liveServers.keySet());
207
    return Collections.unmodifiableCollection(this.liveServers.keySet());
196
  }
208
  }
197

    
   
209

   
198
  /**
210
  /**

    
   
211
   * Returns detailed information about the current master {@link ServerName}.

    
   
212
   * @return current master information if it exists

    
   
213
   */

    
   
214
  public ServerName getMaster() {

    
   
215
    return this.master;

    
   
216
  }

    
   
217

   

    
   
218
  /**

    
   
219
   * @return the number of backup masters in the cluster

    
   
220
   */

    
   
221
  public int getBackupMastersSize() {

    
   
222
    return this.backupMasters.size();

    
   
223
  }

    
   
224

   

    
   
225
  /**

    
   
226
   * @return the names of backup masters

    
   
227
   */

    
   
228
  public Collection<ServerName> getBackupMasters() {

    
   
229
    return Collections.unmodifiableCollection(this.backupMasters);

    
   
230
  }

    
   
231

   

    
   
232
  /**
199
   * @param sn
233
   * @param sn
200
   * @return Server's load or null if not found.
234
   * @return Server's load or null if not found.
201
   */
235
   */
202
  public HServerLoad getLoad(final ServerName sn) {
236
  public HServerLoad getLoad(final ServerName sn) {
203
    return this.liveServers.get(sn);
237
    return this.liveServers.get(sn);
[+20] [20] 35 lines
[+20] [+] public void write(DataOutput out) throws IOException {
239
    out.writeUTF(clusterId);
273
    out.writeUTF(clusterId);
240
    out.writeInt(masterCoprocessors.length);
274
    out.writeInt(masterCoprocessors.length);
241
    for(String masterCoprocessor: masterCoprocessors) {
275
    for(String masterCoprocessor: masterCoprocessors) {
242
      out.writeUTF(masterCoprocessor);
276
      out.writeUTF(masterCoprocessor);
243
    }
277
    }

    
   
278
    Bytes.writeByteArray(out, this.master.getVersionedBytes());

    
   
279
    out.writeInt(this.backupMasters.size());

    
   
280
    for (ServerName backupMaster: this.backupMasters) {

    
   
281
      Bytes.writeByteArray(out, backupMaster.getVersionedBytes());

    
   
282
    }
244
  }
283
  }
245

    
   
284

   
246
  public void readFields(DataInput in) throws IOException {
285
  public void readFields(DataInput in) throws IOException {
247
    super.readFields(in);
286
    super.readFields(in);
248
    hbaseVersion = in.readUTF();
287
    hbaseVersion = in.readUTF();
[+20] [20] 22 lines
[+20] public void readFields(DataInput in) throws IOException {
271
    int masterCoprocessorsLength = in.readInt();
310
    int masterCoprocessorsLength = in.readInt();
272
    masterCoprocessors = new String[masterCoprocessorsLength];
311
    masterCoprocessors = new String[masterCoprocessorsLength];
273
    for(int i = 0; i < masterCoprocessorsLength; i++) {
312
    for(int i = 0; i < masterCoprocessorsLength; i++) {
274
      masterCoprocessors[i] = in.readUTF();
313
      masterCoprocessors[i] = in.readUTF();
275
    }
314
    }

    
   
315
    this.master = ServerName.parseVersionedServerName(Bytes.readByteArray(in));

    
   
316
    count = in.readInt();

    
   
317
    this.backupMasters = new ArrayList<ServerName>(count);

    
   
318
    for (int i = 0; i < count; i++) {

    
   
319
      this.backupMasters.add(ServerName.parseVersionedServerName(

    
   
320
                               Bytes.readByteArray(in)));

    
   
321
    }
276
  }
322
  }
277
}
323
}
src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
Revision 2f60b23 New Change
 
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Revision 9d21903 New Change
 
src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
Revision f6f3f71 New Change
 
src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
Revision 111f76e New Change
 
src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
Revision 3e3d131 New Change
 
src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
Revision 16e4744 New Change
 
src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
Revision bc98fb0 New Change
 
  1. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java: Loading...
  2. src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java: Loading...
  3. src/main/java/org/apache/hadoop/hbase/master/HMaster.java: Loading...
  4. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java: Loading...
  5. src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java: Loading...
  6. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java: Loading...
  7. src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java: Loading...
  8. src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java: Loading...