There is a data loss happening (for some of the column families) when we do the replay logs.
The bug seems to be from the fact that during replay-logs we only choose to replay
the logs from the maximumSequenceID across ALL the stores. This is wrong. If a
column family is ahead of others (because the crash happened before all the column
families were flushed), then we lose data for the column families that have not yet
The correct logic for replay should begin the replay from the minimum across the
maximum in each store.
Amit-- thanks for getting a fix out soon. Comments inline. As you mentioned, I am assuming you are planning on adding a test as part of the commit itself.
Get minimum seqid across all store
Get minimum of the max seq id for each store.
trailing white space
yes, makes sense as a future optimization.
as Madhu pointed out in a later email, if now edits were were there for this region, then this function will simply return minSeqId back as its return value.
Like before, we should also track the maxSeqId for all stores in the above loop.
And, then change this line to something like:
maxSeqId = Math.max(maxSeqId, replayRecoveredEditsIfAny(...));
Amit-- code changes look solid.
It is unfortunate that there was a nice test TestWALReplay.java to test this exact case-- but was broken in catching this issue.
Furthermore, the test writes to multiple CFs in separate puts (transactions). It will be good to enhance it to do cross-CF puts (in a single txn).
looks like the original test is busted. It expects to flush only one region (in line 271); but here region.close() will cause it to flush all stores :)
Wow... Losing data?! And that went unnoticed for so long.
I guess I don't understand when the store's maxid go out of sync and why this does not happen all the time.
currently, we seem to flush all the column families together.
For this failure scenario to kick in, there has to be a failure after *some* stores have flushed. But, not
all of them.
Unclean shutdowns are rare. That too in the middle of the flushing wasn't too common.
But, important to fix, given that we now know about it.
Good comment. Makes sense.
Does that mean that up to now we could have failed to replay some of the logs and actually lose data?!
(I guess, yes, that's how you describe the problem)
Is the because before your change it would not need to replay any logs, but now it does?
yes. earlier we would not have replayed any logs. The test was ensuring that we don't.
Now that we do our decision based on the minimum across different Stores, we do end
up replaying some edits.