Overall I see this patch as trading off service resiliency in favor of performance.
With the current ordering of operations (WAL append and sync prior to memstore insert), we ensure that an error during sync is seen by the client and memstore consistency is maintained. Importantly (at least for my goals), this also allows us to do some reasoning about when it's necessary to abort the region server or when we can take additional actions to try to ride over a transient error. As long as there were no deferred flush edits, we could reason that any error on sync was propagated back to the client as a failure and we did not need to abort yet. This is the direction I've been trying to move with HBASE-4222/4282 and a partial form of it was already in place prior to that.
I understand why we want to reorder these operations and move the sync outside of the acquired row locks. From this standpoint, since an error on sync leaves the memstore polluted, aborting immediately is the right thing to do. But I don't think it's a desirable behavior. I think it will lead to more complaints from users about observed instability of the system.
The use-case that motivated HBASE-4222 was performing a rolling restart of all DataNodes in a cluster, with a running, but completely quiescent HBase cluster. In this case, with no data durability at stake, we really should be able to recover. But instead what will happen is a catastrophic failure of RegionServers as each server tries to roll its HLog. The patch in it's current state would regress to this behavior, triggering RS aborts even more quickly than prior to HBASE-4222 (no HLog close would be attempted).
I would really like to find a way to keep the performance optimization of moving the HLog sync outside of the row locks, while still being able to guarantee memstore consistency in the case of failure, so that we can still reason about whether or not a RS abort is really necessary.
Speaking naively, is it at all feasible that the RWCC.WriteEntry could track the KeyValues instances it's used to apply to the memstore? And these references could then be used to attempt a memstore rollback on failure? Any other ways that we can maintain memstore consistency here without giving up and aborting?