Review Board 1.7.22


hbase-6060 Regions's in OPENING state from failed regionservers takes a long time to recover

Review Request #5796 - Created July 6, 2012 and updated

Michael Stack
hbase-6060
Reviewers
hbase
ram_krish
hbase-git
This patch works on the issue where SSH and the single region assign
can clash; in particular, the single region assign retries along w/
SSH running could result in double-assign.  The basic idea is a region
that is in the OFFLINE state is a region that can be retried by
single-assign and is not to be assigned by SSH as part of its bulk
assign set.  To that end, on open region, in RS, we set znode to
OPENING before returning to master.  On master-side, PENDING_OPEN now
is just that narrow window post return from open region while we are
waiting on the znode callback to set RegionState to OPENING.

We add synchronize all of RegionState and add at least one conditional
state setting: i.e. don't set a RegionState to PENDING_OPEN if state is
currently OPENING or OPENED.  Would like to add more so RegionState
is where we set what state can follow from current condition.

Also tried to remove and rename of methods on RegionState. Did some
cleanup of its use throughout AssignmentManager.

TODO: More tests and corraling of our setting regions to OFFLINE;
it happens too often but also from too many different angles... makes
it hard to follow whats going on.  I am also afraid that a region
could be OFFLINE and not be in a state of being assigned (I suppose
the timeout monitor will find it but I need to spend more time looking
at OFFLINE setters to see if any OFFLINEs being left aside). Also, need
to work on bulk assign.  It should do same setting of znode on open.
Need to study socket timeout more, especially around bulk assign where
its more likely.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Rename of getRegionState, a RegionState creating method to
  createRegionState.  This is probably the bulk of the change in
  this class.
  Move setting of PENDING_OPEN to after we call open on regionserver.
  (processServerShutdown): Clears out all state and returns a
  list of RegionStates that pertain to the dead server whether
  their being carried by the server at time of expiration or
  if they are regions in RIT that were assigned it and in process
  of being opened.
  Synchronized all setting in RegionState in prep for our checking
  state for legal transitions all in here rather than all over the place.
  Added RegionState#setPendingOpen which will set PendingOpen IFF it is
  in OFFLINE state.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
  Fixed up some formatting.
  Moved bulk of process method out of process into private methods that
  do one thing rather than have process do it all.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Before queuing a OpenRegionHandler in the executor, set its state up in
  znode as OPENING.  Will fail faster if can't set znode.
  (transitionZookeeperOfflineToOpening) Added from OpenRegionHandler.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  (transitionZookeeperOfflineToOpening) Moved out of here.

M hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
  Changed what we wait on now PENDING_OPEN does not mean same thing.

M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
  (TestSSHWhenSourceRSandDestRSInRegionPlanGoneDown) Add Rajesh's test.

 
Review request changed
Updated (July 7, 2012, 8:22 a.m.)
Manage regions in transition in the regionserver... if error remove the region from RIT on RS (Issue found by Ram).
Posted (July 7, 2012, 10:42 a.m.)

   

  
Nice.
This comment no longer matches code below.
  1. Your comments are on Rajesh or Ram code, code I did not write.  I'll address them but I was hoping to get more substance than this in a feedback that asks fundamental questions about our assign.
Insert spaces between if and (, ) and {
Remove white space.
Can you explain this ?
Posted (July 7, 2012, 1:32 p.m.)

   

  
This was from Rajesh's patch:
{code}
+    // adding region in pending open.
+    am.regionsInTransition.put(REGIONINFO.getEncodedName(), new RegionState(REGIONINFO,
+        State.PENDING_OPEN));
{code}
I don't see that in current patch.
  1. I did not copy from thin air.  Please cite which of the many patches listed on hbase-6060 you are referring too.  Thanks.  And again, can you provide feedback of more substance.  Do you think this a good way to go?  Thanks Ted.
  2. The cited code appeared in 6060-trunk_3.patch
    
    The title of HBASE-6060 was about regions that are stuck in OPENING state for a very long time. It would be nice to cover OPENING state in the unit test.
    
    Thanks
  3. Read the issue.  Read the TODO at the top of this posted review.
Posted (July 9, 2012, 11:05 a.m.)

   

  
Similar check is needed in RS_ZK_REGION_OPENING. As per the current patch we do transitioning from OFFLINE to OPENING and then call the openregionhandler.  So the master on callback will see regionstate in OFFLINE.  So it is needed.
  1. Agreed.
One clarification?
If regions are moving from A to B and B goes down.
This rit will have the regions that belonged to B and also those were in RIT to B? Correct Stack?
If this is correct pls see the below comment.
Stack, are we sure that this will have the regions that are part of the destination server? 
It will not be right? The metaHRIs will only have those regions that are already in dest server and not the ones in RIT.  That is where in our previous patches(Rajesh's patches) we collected the RITs and then assigned them outside this loop.
So regions that are in  OPENING state to the dead RS(that is the destination) will not be assigned.  Pls do correct me if am wrong.
  1. You are right.
HBASE-6070 needs this change Stack.  Correct me if you feel it is wrong.  
  1. I think so.  I was going to try and mock up some tests to prove it.
Stack this step still has the problem that i mentioned last time particularly in master restart.
Consider the OFFLINE node was created and we asked the RS to open the region.  Before the RS could transtition to OPENING if the master was killed and backup came up, he will see the node in OFFLINE state and trigger an assignment to the same RS.  But we may get REgionalreadyInTransition and will not retry.  At the same time RS will also fail and then remove from onlineRegions in RS side. So this may leave the region in OFFLINE state itself.
  1. So, the new master comes up BEFORE the RS has moved the region from OFFLINE to OPENING?  Thats seems unlikely, do you not agree?
    
    Lets allow it for the sake of argument.
    
    When you say, "At the same time RS will also fail and then remove from onlineRegions in RS side." you are saying that we fail to open the region for some reason in the regionserver?  If the regionserver crashes it will be picked up by SSH because it was OPENING?  If we can't open the region because of HDFS issues, we'll abort the RS.  Is there another case where we'd fail open a region and we'd not abort the regionserver?   If there is, we should address that?
    
    Worse case, the timeout monitor will eventually put this region back on line?
    
    What you reckon Ram?
    
    
    
    
Posted (July 9, 2012, 11:15 a.m.)

   

  
in mastar restart case for OFFLINE node we tend to make the RIT also in OFFLINE. But here the oldData will have master's address.  So if the state is OFFLINE we have to overwrite the servername with null (call rs.offline()).
  1. Ok.
Posted (July 9, 2012, 11:16 a.m.)
I have one way to solve this master restart and the node being in OFFLINE state on an alive RS.
{code}
if (t instanceof RegionAlreadyInTransitionException) {
            String errorMsg = "Failed assignment in: " + plan.getDestination()
                + " due to " + t.getMessage();
            LOG.error(errorMsg, t);
            return;
          }
{code}
Here we will check if master initialized is done or not.  If not done we can go for a retry here. What do you feel Stack?
  1. Sounds good.  We have to commit Maryann's patch first? (If its finished?)
    
    Thanks for review.
    
    Do you think this a good direction to go in Ram?  If so, I'll fix the above issues and start in on some unit tests to exercise such as you raise above.
Posted (July 9, 2012, 4:50 p.m.)
Iam ok with this Stack. It definitely makes heads spin (smile).
Thanks Stack.  Nice of you.