Review Board 1.7.22


HBASE-5209: Add active and backup masters to ClusterStatus

Review Request #3892 - Created Feb. 14, 2012 and updated

David Wang
0.94.0, 0.92.1, 0.90.7
HBASE-5209
Reviewers
hbase
hbase-git
Problem:
There is no method in the HBase client-facing APIs to determine which of the masters is currently active.  This can be especially useful in setups with multiple backup masters.

Solution:
Augment ClusterStatus to return the currently active master and the list of backup masters.

Notes:
* I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master.  If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort.  The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
* I fixed some minor formatting issues while going through the code.  I can take these changes out if it is considered improper to commit such non-related changes with the main changes.
* Ran mvn -P localTests test multiple times - no new tests fail
* Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
* Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
* Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
  * Did the following before and after killing
    * hbase hbck -details - checked output to see that active and backup masters are reported properly
    * zk_dump - checked that active and backup masters are reported properly
* Started cluster with no backup masters to make sure change operates correctly that way
* Tested build with this diff vs. build without this diff, in all combinations of client and server
  * Verified that new client can run against old servers without incident and with the defaults applied.
  * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind.  This is not solvable, at least in the scope of this change.

12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
A record version mismatch occured. Expecting v2, found v3
        at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
        at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

* Ran dev-support/test-patch.sh - no new issues fail:

-1 overall.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -136 warning messages.  

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version ) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.
src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Revision b849429 New Change
[20] 30 lines
[+20]
31
import java.util.Map;
31
import java.util.Map;
32
import java.util.TreeMap;
32
import java.util.TreeMap;
33

    
   
33

   
34
import org.apache.hadoop.hbase.master.AssignmentManager.RegionState;
34
import org.apache.hadoop.hbase.master.AssignmentManager.RegionState;
35
import org.apache.hadoop.hbase.util.Bytes;
35
import org.apache.hadoop.hbase.util.Bytes;

    
   
36
import org.apache.hadoop.io.VersionMismatchException;
36
import org.apache.hadoop.io.VersionedWritable;
37
import org.apache.hadoop.io.VersionedWritable;
37

    
   
38

   
38
/**
39
/**
39
 * Status information on the HBase cluster.
40
 * Status information on the HBase cluster.
40
 * <p>
41
 * <p>
41
 * <tt>ClusterStatus</tt> provides clients with information such as:
42
 * <tt>ClusterStatus</tt> provides clients with information such as:
42
 * <ul>
43
 * <ul>
43
 * <li>The count and names of region servers in the cluster.</li>
44
 * <li>The count and names of region servers in the cluster.</li>
44
 * <li>The count and names of dead region servers in the cluster.</li>
45
 * <li>The count and names of dead region servers in the cluster.</li>

    
   
46
 * <li>The name of the active master for the cluster.</li>

    
   
47
 * <li>The name(s) of the backup master(s) for the cluster, if they exist.</li>
45
 * <li>The average cluster load.</li>
48
 * <li>The average cluster load.</li>
46
 * <li>The number of regions deployed on the cluster.</li>
49
 * <li>The number of regions deployed on the cluster.</li>
47
 * <li>The number of requests since last report.</li>
50
 * <li>The number of requests since last report.</li>
48
 * <li>Detailed region server loading and resource usage information,
51
 * <li>Detailed region server loading and resource usage information,
49
 *  per server and per region.</li>
52
 *  per server and per region.</li>
50
 *  <li>Regions in transition at master</li>
53
 * <li>Regions in transition at master</li>
51
 *  <li>The unique cluster ID</li>
54
 * <li>The unique cluster ID</li>
52
 * </ul>
55
 * </ul>
53
 */
56
 */
54
public class ClusterStatus extends VersionedWritable {
57
public class ClusterStatus extends VersionedWritable {
55
  /**
58
  /**
56
   * Version for object serialization.  Incremented for changes in serialized
59
   * Version for object serialization.  Incremented for changes in serialized
57
   * representation.
60
   * representation.
58
   * <dl>
61
   * <dl>
59
   *   <dt>0</dt> <dd>initial version</dd>
62
   *   <dt>0</dt> <dd>Initial version</dd>
60
   *   <dt>1</dt> <dd>added cluster ID</dd>
63
   *   <dt>1</dt> <dd>Added cluster ID</dd>
61
   *   <dt>2</dt> <dd>Added Map of ServerName to ServerLoad</dd>
64
   *   <dt>2</dt> <dd>Added Map of ServerName to ServerLoad</dd>

    
   
65
   *   <dt>3</dt> <dd>Added master and backupMasters</dd>
62
   * </dl>
66
   * </dl>
63
   */
67
   */
64
  private static final byte VERSION = 2;
68
  private static final byte VERSION_MASTER_BACKUPMASTERS = 3;

    
   
69
  private static final byte VERSION = 3;

    
   
70
  private static final String UNKNOWN_SERVERNAME = "unknown";
65

    
   
71

   
66
  private String hbaseVersion;
72
  private String hbaseVersion;
67
  private Map<ServerName, HServerLoad> liveServers;
73
  private Map<ServerName, HServerLoad> liveServers;
68
  private Collection<ServerName> deadServers;
74
  private Collection<ServerName> deadServers;

    
   
75
  private ServerName master;

    
   
76
  private Collection<ServerName> backupMasters;
69
  private Map<String, RegionState> intransition;
77
  private Map<String, RegionState> intransition;
70
  private String clusterId;
78
  private String clusterId;
71
  private String[] masterCoprocessors;
79
  private String[] masterCoprocessors;
72

    
   
80

   
73
  /**
81
  /**
74
   * Constructor, for Writable
82
   * Constructor, for Writable
75
   */
83
   */
76
  public ClusterStatus() {
84
  public ClusterStatus() {
77
    super();
85
    super();
78
  }
86
  }
79

    
   
87

   
80
  public ClusterStatus(final String hbaseVersion, final String clusterid,
88
  public ClusterStatus(final String hbaseVersion, final String clusterid,
81
      final Map<ServerName, HServerLoad> servers,
89
      final Map<ServerName, HServerLoad> servers,
82
      final Collection<ServerName> deadServers, final Map<String, RegionState> rit,
90
      final Collection<ServerName> deadServers,

    
   
91
      final ServerName master,

    
   
92
      final Collection<ServerName> backupMasters,

    
   
93
      final Map<String, RegionState> rit,
83
      final String[] masterCoprocessors) {
94
      final String[] masterCoprocessors) {
84
    this.hbaseVersion = hbaseVersion;
95
    this.hbaseVersion = hbaseVersion;
85
    this.liveServers = servers;
96
    this.liveServers = servers;
86
    this.deadServers = deadServers;
97
    this.deadServers = deadServers;

    
   
98
    this.master = master;

    
   
99
    this.backupMasters = backupMasters;
87
    this.intransition = rit;
100
    this.intransition = rit;
88
    this.clusterId = clusterid;
101
    this.clusterId = clusterid;
89
    this.masterCoprocessors = masterCoprocessors;
102
    this.masterCoprocessors = masterCoprocessors;
90
  }
103
  }
91

    
   
104

   
[+20] [20] 66 lines
[+20] [+] public boolean equals(Object o) {
158
      return false;
171
      return false;
159
    }
172
    }
160
    return (getVersion() == ((ClusterStatus)o).getVersion()) &&
173
    return (getVersion() == ((ClusterStatus)o).getVersion()) &&
161
      getHBaseVersion().equals(((ClusterStatus)o).getHBaseVersion()) &&
174
      getHBaseVersion().equals(((ClusterStatus)o).getHBaseVersion()) &&
162
      this.liveServers.equals(((ClusterStatus)o).liveServers) &&
175
      this.liveServers.equals(((ClusterStatus)o).liveServers) &&
163
      deadServers.equals(((ClusterStatus)o).deadServers) &&
176
      this.deadServers.equals(((ClusterStatus)o).deadServers) &&
164
      Arrays.equals(this.masterCoprocessors, ((ClusterStatus)o).masterCoprocessors);
177
      Arrays.equals(this.masterCoprocessors,

    
   
178
                    ((ClusterStatus)o).masterCoprocessors) &&

    
   
179
      this.master.equals(((ClusterStatus)o).master) &&

    
   
180
      this.backupMasters.equals(((ClusterStatus)o).backupMasters);
165
  }
181
  }
166

    
   
182

   
167
  /**
183
  /**
168
   * @see java.lang.Object#hashCode()
184
   * @see java.lang.Object#hashCode()
169
   */
185
   */
170
  public int hashCode() {
186
  public int hashCode() {
171
    return VERSION + hbaseVersion.hashCode() + this.liveServers.hashCode() +
187
    return VERSION + hbaseVersion.hashCode() + this.liveServers.hashCode() +
172
      deadServers.hashCode();
188
      this.deadServers.hashCode() + this.master.hashCode() +

    
   
189
      this.backupMasters.hashCode();
173
  }
190
  }
174

    
   
191

   
175
  /** @return the object version number */
192
  /** @return the object version number */
176
  public byte getVersion() {
193
  public byte getVersion() {
177
    return VERSION;
194
    return VERSION;
[+20] [20] 16 lines
[+20] public byte getVersion() {
194
  public Collection<ServerName> getServers() {
211
  public Collection<ServerName> getServers() {
195
    return Collections.unmodifiableCollection(this.liveServers.keySet());
212
    return Collections.unmodifiableCollection(this.liveServers.keySet());
196
  }
213
  }
197

    
   
214

   
198
  /**
215
  /**

    
   
216
   * Returns detailed information about the current master {@link ServerName}.

    
   
217
   * @return current master information if it exists

    
   
218
   */

    
   
219
  public ServerName getMaster() {

    
   
220
    return this.master;

    
   
221
  }

    
   
222

   

    
   
223
  /**

    
   
224
   * @return the number of backup masters in the cluster

    
   
225
   */

    
   
226
  public int getBackupMastersSize() {

    
   
227
    return this.backupMasters.size();

    
   
228
  }

    
   
229

   

    
   
230
  /**

    
   
231
   * @return the names of backup masters

    
   
232
   */

    
   
233
  public Collection<ServerName> getBackupMasters() {

    
   
234
    return Collections.unmodifiableCollection(this.backupMasters);

    
   
235
  }

    
   
236

   

    
   
237
  /**
199
   * @param sn
238
   * @param sn
200
   * @return Server's load or null if not found.
239
   * @return Server's load or null if not found.
201
   */
240
   */
202
  public HServerLoad getLoad(final ServerName sn) {
241
  public HServerLoad getLoad(final ServerName sn) {
203
    return this.liveServers.get(sn);
242
    return this.liveServers.get(sn);
[+20] [20] 35 lines
[+20] [+] public void write(DataOutput out) throws IOException {
239
    out.writeUTF(clusterId);
278
    out.writeUTF(clusterId);
240
    out.writeInt(masterCoprocessors.length);
279
    out.writeInt(masterCoprocessors.length);
241
    for(String masterCoprocessor: masterCoprocessors) {
280
    for(String masterCoprocessor: masterCoprocessors) {
242
      out.writeUTF(masterCoprocessor);
281
      out.writeUTF(masterCoprocessor);
243
    }
282
    }

    
   
283
    Bytes.writeByteArray(out, this.master.getVersionedBytes());

    
   
284
    out.writeInt(this.backupMasters.size());

    
   
285
    for (ServerName backupMaster: this.backupMasters) {

    
   
286
      Bytes.writeByteArray(out, backupMaster.getVersionedBytes());

    
   
287
    }
244
  }
288
  }
245

    
   
289

   
246
  public void readFields(DataInput in) throws IOException {
290
  public void readFields(DataInput in) throws IOException {

    
   
291
    int version = getVersion();

    
   
292
    try {
247
    super.readFields(in);
293
      super.readFields(in);

    
   
294
    } catch (VersionMismatchException e) {

    
   
295
      /*

    
   
296
       * No API in VersionMismatchException to get the expected and found

    
   
297
       * versions.  We use the only tool available to us: toString(), whose

    
   
298
       * output has a dependency on hadoop-common.  Boo.

    
   
299
       */

    
   
300
      int startIndex = e.toString().lastIndexOf('v') + 1;

    
   
301
      version = Integer.parseInt(e.toString().substring(startIndex));

    
   
302
    }
248
    hbaseVersion = in.readUTF();
303
    hbaseVersion = in.readUTF();
249
    int count = in.readInt();
304
    int count = in.readInt();
250
    this.liveServers = new HashMap<ServerName, HServerLoad>(count);
305
    this.liveServers = new HashMap<ServerName, HServerLoad>(count);
251
    for (int i = 0; i < count; i++) {
306
    for (int i = 0; i < count; i++) {
252
      byte [] versionedBytes = Bytes.readByteArray(in);
307
      byte [] versionedBytes = Bytes.readByteArray(in);
[+20] [20] 18 lines
[+20] public void write(DataOutput out) throws IOException {
271
    int masterCoprocessorsLength = in.readInt();
326
    int masterCoprocessorsLength = in.readInt();
272
    masterCoprocessors = new String[masterCoprocessorsLength];
327
    masterCoprocessors = new String[masterCoprocessorsLength];
273
    for(int i = 0; i < masterCoprocessorsLength; i++) {
328
    for(int i = 0; i < masterCoprocessorsLength; i++) {
274
      masterCoprocessors[i] = in.readUTF();
329
      masterCoprocessors[i] = in.readUTF();
275
    }
330
    }

    
   
331
    // Only read extra fields for master and backup masters if

    
   
332
    // version indicates that we should do so, else use defaults

    
   
333
    if (version >= VERSION_MASTER_BACKUPMASTERS) {

    
   
334
      this.master = ServerName.parseVersionedServerName(

    
   
335
                      Bytes.readByteArray(in));

    
   
336
      count = in.readInt();

    
   
337
      this.backupMasters = new ArrayList<ServerName>(count);

    
   
338
      for (int i = 0; i < count; i++) {

    
   
339
        this.backupMasters.add(ServerName.parseVersionedServerName(

    
   
340
                                 Bytes.readByteArray(in)));

    
   
341
      }

    
   
342
    } else {

    
   
343
      this.master = new ServerName(UNKNOWN_SERVERNAME, -1,

    
   
344
                                   ServerName.NON_STARTCODE);

    
   
345
      this.backupMasters = new ArrayList<ServerName>(0);

    
   
346
    }
276
  }
347
  }
277
}
348
}
src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
Revision 2f60b23 New Change
 
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Revision 9d21903 New Change
 
src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
Revision f6f3f71 New Change
 
src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
Revision 111f76e New Change
 
src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
Revision 3e3d131 New Change
 
src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
Revision 16e4744 New Change
 
src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
Revision bc98fb0 New Change
 
  1. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java: Loading...
  2. src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java: Loading...
  3. src/main/java/org/apache/hadoop/hbase/master/HMaster.java: Loading...
  4. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java: Loading...
  5. src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java: Loading...
  6. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java: Loading...
  7. src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java: Loading...
  8. src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java: Loading...