FLUME-1762. File Channel should recover automatically if the checkpoint is incomplete or bad by deleting the contents of the checkpoint directory
Review Request #8396 - Created Dec. 7, 2012 and submitted
Added code to throw a BadCheckpointException, if we can recover from the situation by deleting all files in the checkpoint directory. In the log class, during startup if BadCheckpointException is caught, all files are deleted and replay is retried.
Added unit tests. Modified some existing unit tests to test for this change.
|Nit: Javadoc comment without any text||Brock Noland||Dec. 7, 2012, 2:59 p.m.||Open|
|When we log this exception we print a very similar message. Additionally, this constructor is used when this statement is ...||Brock Noland||Dec. 7, 2012, 2:59 p.m.||Open|
|How about logging what files we are deleting just in case a file goes missing at some point?||Brock Noland||Dec. 7, 2012, 2:59 p.m.||Open|
|I have recently found that parseDelimitedForm can return null. In this case we throw a NPE. Would it be possible ...||Brock Noland||Dec. 7, 2012, 5:11 p.m.||Open|
|We ignore the return value of deleteAllFiles which if it fails could cause confusing results in the subsequent operations.||Brock Noland||Dec. 10, 2012, 4:10 p.m.||Open|
|If one delete fails but others succeeded we log nothing. Perhaps we should log when returning false?||Brock Noland||Dec. 10, 2012, 4:10 p.m.||Open|
Posted (Dec. 7, 2012, 2:59 p.m.)
Hari, I think this is a great patch!! Nice work! Patch looks great, I have a few typical small comments below. Additionally, it looks like we are missing a couple test cases where we'd do full replay. It's certainly possible I am just missing the test, so excuse me if I am: Bad Version of Checkpoint itself Bad Version of Checkpoint meta Differing write order ids between meta and checkpoint Checkpoint marker incomplete
Nit: Javadoc comment without any text
When we log this exception we print a very similar message. Additionally, this constructor is used when this statement is probably not true (ie the version is bad). What do you think about removing this constructor and then adding an appropriate error message each time?
I think we can delete the v1 stuff but we should probably do that in a different JIRA.
How about logging what files we are deleting just in case a file goes missing at some point?
Posted (Dec. 7, 2012, 5:11 p.m.)
Posted (Dec. 10, 2012, 4:10 p.m.)
Hari! Thank you for addressing my feedback! I have two items below and then I think we can commit this awesome patch! The tests pass fine on my machine.
Review request changed
Updated (Dec. 10, 2012, 9:10 p.m.)
- changed from pending to submitted