Review Board 1.7.22

hbase-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions

Review Request #2065 - Created Sept. 27, 2011 and updated

Michael Stack
Make the Meta* operations against meta retry.  We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop.  Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code).  Upped the retries on serverside too.

Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back.  This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables.  Opening new issue
to cutback CT use over the code base.

A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.

Here is some detail to help reviews.

M src/main/java/org/apache/hadoop/hbase/catalog/
  Clean up.  Shutdown access on some of these unused methods.  Don't
  let out HRegionInterface instances in particular since we are going
  away from raw HRI use to instead use a connection with retries:
  i.e. HTable.

  Comments on state of this class. Javadoc edits.
  getZooKeeperWatcher on HConnection is deprecated so don't use it
  in constructor.  Override MetaNodeTracker and on node delete
  reset meta location (We used to do this over in MetaNodeTracker
  but to do that we had to have a CatalogTracker over in zk package
  which is silly -- bad package encapsulation).

  (waitForRootServer) Renamed getRootServerConnection and change it
  from public to package private.
  (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
  (getMetaServerConnection) Change from public to package private.
  Use MetaReader to read the meta location in root rather than a
  raw HRegionInterface so we get retrying.
  (remaining, timedout) Added utility methods.
  (waitForMetaServer) Changed from public to private.
  (resetMetaLocation) Made it synchronized on metaAvailable.
  Not all accesses were synchronized.

M src/main/java/org/apache/hadoop/hbase/catalog/
  Refactor to use HTable instead of raw HRegionInterface so we get
  retrying.  For each operation we get an HTable, use it, then close it.
  (putToMetaTable, putsToMetaTable, etc) Utility methods.
  (updateRootWithMetaMigrationStatus, etc.) Moved out to own
  class since these classes are for a one-time migration only.
A src/main/java/org/apache/hadoop/hbase/catalog/
  New class that holds all Meta* methods updating meta table used
  doing the one-time migration done to meta on startup.  This class
  is marked deprecated because its going to be dropped in 0.94.

M src/main/java/org/apache/hadoop/hbase/catalog/
  Retrofit methods in here to use fullScan methods with Visitor.
  (getCatalogRegionInterface, getCatalogRegionNameForTable,
    getCatalogRegionNameForRegion) Removed.
  (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
  (fullScanOfResults) Renamed as fullScan override.
  (fullScanOfRoot) Added as deprecated. We should be doing
  this against zk.
  (metaRowToRegionPair, getServerNameFromResult) Moved to Result
  (CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/
  Handle few cases where methods throw InterruptedException
  (Don't let it out on the HBaseAdmin public API)

M src/main/java/org/apache/hadoop/hbase/client/
  Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
  on failure. Call ServerCallable connect AFTER beforeCall rather than
  ServerCallable.instantiateServer BEFORE beforeCall.

M src/main/java/org/apache/hadoop/hbase/client/
  Add to DEBUG message the connection name we were using.

M src/main/java/org/apache/hadoop/hbase/client/
  (getServerNameFromCatalogResult, parseCatalogResult,
    parseHRegionInfoFromCatalogResult) Added

M src/main/java/org/apache/hadoop/hbase/client/
  Added new ThrowableWithExtraContext that takes extra context info.

M src/main/java/org/apache/hadoop/hbase/client/
  instantiateServer renamed as connect

M src/main/java/org/apache/hadoop/hbase/client/
  Javadoc.  Renamed instantiateServer as connect.

M src/main/java/org/apache/hadoop/hbase/master/
  Javadoc. Use MetaReader method instead of handcoding.

M src/main/java/org/apache/hadoop/hbase/master/handler/
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/master/handler/
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/thrift/
  Allow hris can come back null when we ask for table regions.

M src/main/java/org/apache/hadoop/hbase/zookeeper/
  Remove import of CatalogTracker.

M src/test/java/org/apache/hadoop/hbase/
  Use utility in MetaReader instead of handcode it.

M src/test/java/org/apache/hadoop/hbase/catalog/
  Use new HConnectionTestingUtility mocking tests (need to use it
  because its a bit harder mocking tests now that we use HTable instead
  of the more direct HRegionInterface).
  Add some tests of broken out utility methods.

M src/test/java/org/apache/hadoop/hbase/catalog/
  Add tests

M src/test/java/org/apache/hadoop/hbase/catalog/
  Add test of 3669 retrying.

M src/test/java/org/apache/hadoop/hbase/client/
  New test utility that helps with mock of HConnection making it so can mock
  an HConnection and then have an HTable use the mocked connection.  Can do
  a mock or a spied on HConnection

M src/test/java/org/apache/hadoop/hbase/client/
  The migration code moved.  Reference new location.

M src/test/java/org/apache/hadoop/hbase/master/
M src/test/java/org/apache/hadoop/hbase/master/
M src/test/java/org/apache/hadoop/hbase/master/
  Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
All tests passed recently.  Rerunning again.
Review request changed
Updated (Oct. 2, 2011, 12:01 a.m.)
Patch addresses Ted and Jon reviews.

I also redid the catalog package changes -- MetaReader/Editor and CatalogTracker -- so change is minimized; methods are cleanly added and none are removed, they are just deprecated, not unless the were private and are no longer used OR method has been moved out to the new MetaMigration class.  Hopefully this makes this patch easier to digest.
Posted (Oct. 2, 2011, 2:28 a.m.)


This visitor can be merged with the visitor in updateMetaWithNewRegionInfo() after refactoring.
The only difference between them is the boolean parameter to updateHRI().
The checking here seems fragile.
I know it is in current code base. So maybe extract the String in another JIRA.
Posted (Oct. 3, 2011, 1:13 a.m.)


This is no longer true - see the fourth parameter below.
  1. I can fix on commit
Out of date javadoc.
  1. I can fix on commit.
    I ran all tests and TestAdmin and TestMergeTool failed but running them individually, they passed.