1073: Move all journal stream management code into one place
Review Request #1688 - Created Aug. 31, 2011 and updated
Posting patch on review-board for easy review
In doing this review, I remembered some reasoning about why I wanted to introduce the EditLogReference interface: 1) The lifecycle of the open edit streams is now unbalanced -- the streams are all opened in one place and then closed far away in loadEdits() later. If there's an error in one of the middle streams, it will leak the open files of the other streams that come after it. Introducing EditLogReference (a reference to a stream that's not opened yet) means that loadEdits(EditLogReference) has the whole lifecycle of an open stream. 2) The old implementation had true unit tests for the various gap scenarios, which I found easy to read. The new tests are much harder to read, with the "AbortSpec" stuff, plus it relies on actually creating real files on disk, rather than mock objects. Having the FileJournalManager only deal with references in the "planning" state, and moving the validation to the reference, means we can test all of this code with only mocks. Specific comments on the implementation below.
If you pass "target - 1" above, then why do we have to explicitly remove it again? It already should be excluded here.
shouldn't there be exactly 1 stream here? What happens if you don't find any? there should be a check for stream == null
add a comment that this JM is never used for input, hence it can never provide transactions?
adjust javadoc to explain what "best" means. In this case, it's the one who has the most transactions readable after that txid
should log a warning
this error path leaks streams
while we're at it, we can add a good check here that lastTxId == INVALID_TXID || op.txid == lastTxid + 1 - log an ERROR and break the loop if that's hit, I think?
I don't think this logic makes sense - in the case of a journal manager going away and coming back inside the BN or SBN, you may want to finalize old segments while writing to a new one. That is to say, this should be independent of currentInProgress.
warning message should include the storage dir
don't you also need to break, here? Otherwise you will improperly log all larger logs as gaps
what's the purpose of this helper function?
is this ever called?
Add javadoc that this mutates the structure - it's not clear from the name of the function.
please use American spelling to match the rest of Hadoop
What would happen if the txid file hadn't been updated? Can we add a test for the case where we have an out-of-date seen_txid file, but we do have new logs? i.e: fsimage_0 edits_1-10 edits_inprogress_11 (which 5 txns) seen_txid = 0 Do we properly read all 15 txns? Do we have enough safeguards that we won't overwrite edits or do anything overlapping? In the current implementation (ie pre-patch), it would see the newer logs and know that it should attempt to read them. In this implementation, I think it might overwrite one of the segments. Please add a test case for this.