Review Board 1.7.22


Ensure that we replay logs correctly.

Review Request #2524 - Created Oct. 21, 2011 and updated

Amitanand Aiyer
0.89, 0.92
hbase-4645
Reviewers
amitanand, jgray, kannanm, karthik.ranga, lhofhansl, nspiegelberg, stack, tedyu
hbase-git
There is a data loss happening (for some of the column families) when we do the replay logs.

The bug seems to be from the fact that during replay-logs we only choose to replay
the logs from the maximumSequenceID across ALL the stores. This is wrong. If a
column family is ahead of others (because the crash happened before all the column
families were flushed), then we lose data for the column families that have not yet
caught up.

The correct logic for replay should begin the replay from the minimum across the
maximum in each store.
Initial patch. v1.

mvn test (running).

Diff revision 4

This is not the most recent revision of the diff. The latest diff is revision 9. See what's changed.

1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9

  1. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java: Loading...
  2. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java: Loading...
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Revision 8c32839 New Change
[20] 393 lines
[+20] [+] public long initialize(final CancelableProgressable reporter)
394

    
   
394

   
395
    // Remove temporary data left over from old regions
395
    // Remove temporary data left over from old regions
396
    status.setStatus("Cleaning up temporary data from old regions");
396
    status.setStatus("Cleaning up temporary data from old regions");
397
    cleanupTmpDir();
397
    cleanupTmpDir();
398

    
   
398

   
399
    // Load in all the HStores.  Get maximum seqid.
399
    // Load in all the HStores.

    
   
400
    // Get minimum of the maxSeqId across all the store.

    
   
401
    //

    
   
402
    // Context: During replay we want to ensure that we do not lose any data. So, we

    
   
403
    // have to be conservative in how we replay logs. For each store, we calculate

    
   
404
    // the maxSeqId up to which the store was flushed. But, since different stores

    
   
405
    // could have a different maxSeqId, we choose the

    
   
406
    // minimum across all the stores.

    
   
407
    // This could potentially result in duplication of data for stores that are ahead

    
   
408
    // of others. ColumnTrackers in the ScanQueryMatchers do the de-duplication, so we

    
   
409
    // do not have to worry.

    
   
410
    // TODO: If there is a store that was never flushed in a long time, we could replay

    
   
411
    // a lot of data. Currently, this is not a problem because we flush all the stores at

    
   
412
    // the same time. If we move to per-cf flushing, we might want to revisit this and send

    
   
413
    // in a vector of maxSeqIds instead of sending in a single number, which has to be the

    
   
414
    // min across all the max.

    
   
415
    long minSeqId = -1;
400
    long maxSeqId = -1;
416
    long maxSeqId = -1;
401
    for (HColumnDescriptor c : this.htableDescriptor.getFamilies()) {
417
    for (HColumnDescriptor c : this.htableDescriptor.getFamilies()) {
402
      status.setStatus("Instantiating store for column family " + c);
418
      status.setStatus("Instantiating store for column family " + c);
403
      Store store = instantiateHStore(this.tableDir, c);
419
      Store store = instantiateHStore(this.tableDir, c);
404
      this.stores.put(c.getName(), store);
420
      this.stores.put(c.getName(), store);
405
      long storeSeqId = store.getMaxSequenceId();
421
      long storeSeqId = store.getMaxSequenceId();
406
      if (storeSeqId > maxSeqId) {
422
      if (minSeqId == -1 || storeSeqId < minSeqId) {

    
   
423
        minSeqId = storeSeqId;

    
   
424
      }

    
   
425
      if (maxSeqId == -1 || storeSeqId > maxSeqId) {
407
        maxSeqId = storeSeqId;
426
        maxSeqId = storeSeqId;
408
      }
427
      }
409
    }
428
    }
410
    // Recover any edits if available.
429
    // Recover any edits if available.
411
    maxSeqId = replayRecoveredEditsIfAny(
430
    maxSeqId = Math.max(maxSeqId, replayRecoveredEditsIfAny(
412
        this.regiondir, maxSeqId, reporter, status);
431
        this.regiondir, minSeqId, reporter, status));
413

    
   
432

   
414
    status.setStatus("Cleaning up detritus from prior splits");
433
    status.setStatus("Cleaning up detritus from prior splits");
415
    // Get rid of any splits or merges that were lost in-progress.  Clean out
434
    // Get rid of any splits or merges that were lost in-progress.  Clean out
416
    // these directories here on open.  We may be opening a region that was
435
    // these directories here on open.  We may be opening a region that was
417
    // being split but we crashed in the middle of it all.
436
    // being split but we crashed in the middle of it all.
[+20] [20] 3690 lines
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
Revision 966262b New Change
 
  1. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java: Loading...
  2. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java: Loading...