Review Board 1.7.22


ZOOKEEPER-786 Exception in ZooKeeper.toString

Review Request #1714 - Created Sept. 5, 2011 and submitted

Thomas Koch
ZOOKEEPER-786
Reviewers
zookeeper
zookeeper-git
.

 
Review request changed
Updated (Sept. 14, 2011, 10:18 p.m.)
  • changed from Exception in ZooKeeper.toString to ZOOKEEPER-786 Exception in ZooKeeper.toString
Posted (Sept. 27, 2011, 1:59 p.m.)

   

  
Why was this removed? It is not germane to the patch at hand.
  1. You're right, the removal had nothing to do with the issue, but it was the right thing to do, once I've been there. It may happen in rare cases, that connect returns true already there, but primeConnection() is called anyways later from doTransport(). Having to possible call origins only adds to uncertainty.
    Did you observe any problems with this?
  2. Are you sure that the lines in doTransport() will be called and executed if the socket connects immediately? I don't personally know whether that is the case or not. You didn't write any tests to show that this would happen, and you shouldn't have made the change for purposes of removing uncertainty without explicitly calling out what you were doing. If you are quite sure that even in the case of an immediate connection, OP_CONNECT will be set and we will hit that in doTransport, I'm fine with this change. But I don't like having changes like this snuck in under the radar, because if this is not the case and it causes a bug, it will be a massive pain to find.
  3. It seems to me, by the by, that if a socket is immediately connected it will NOT set the selection key for OP_CONNECT, because state will be set to ST_CONNECTED (see the source for SocketChannelImpl, specifically connect:
    
    http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/SocketChannelImpl.java#SocketChannelImpl.connect%28java.net.SocketAddress%29
    
    and translateReadyOps:
    
    http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/SocketChannelImpl.java#SocketChannelImpl.translateReadyOps%28int%2Cint%2Csun.nio.ch.SelectionKeyImpl%29
    
    So, I'm not sure that this is a safe change to make, and you should always, always, always, be careful when changing anything to do with our NIO code unless you are really sure of the consequences.
  4. Camille, you're right, I missed this change. Thanks.
    
    There is definitely a reason for this! The connect call can return immediately with a connected indication, in which case we need to run the primeConnection code, if you don't OP_CONNECT will never be indicated (because the connection already took place) and primeConnection will therefore never be called in doTransport.
    
    I believe we saw this with Solaris (there was a bug on this but I can't find it either on apache nor sourceforge).
    
    
  5. Ted deserves the credit, he pointed it out as part of 1174. If you don't mind, I will pull this bit out of the change in trunk when I check in 1174 there.
  6. It would be better to keep the changes distinct. Would you mind committing this snippet first, as a second patch for 786, then apply 1174? Or I/you can entirely roll back this change from trunk, you can commit 1174, and then Thomas can update this patch and reapply. Which would you prefer?
  7. I'll roll back this change as a second patch for 786, then apply 1174. This just went into trunk right?
  8. In trunk I see this:
    
            sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
            sock.connect(addr);
    
    however in branch-3.4 I see this:
    
                boolean immediateConnect = sock.connect(addr);
                sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
                if (immediateConnect) {
                    sendThread.primeConnection();
                }
    
    so yes, this patch was applied only to trunk, however, isn't this a bug? sock.register is being called after sock.connect.
    
    Also, why was this patch applied to branch 3.4, but it's not available on trunk?
    
  9. I see. 1174 was committed to branch-3.4 but this change caused a conflict for trunk.
    
    Isn't this a timing issue then? registering after connecting?