Review Board 1.7.22


Improve concurrency of putMsg / putMsgList

Review Request #4852 - Created April 24, 2012 and updated

Bo Wang
GIRAPH-185
Reviewers
giraph
giraph
Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

 
Posted (April 24, 2012, 10:28 a.m.)
Looks good to me, wound't hurt to see some stress test to check performance, although I wouldn't expect this to be slower than the synchronized block. Also, I'd agree that moving the messages directly from the inMessages to the Vertex could be a memory win.
Posted (April 24, 2012, 8:53 p.m.)

   

  
Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList.  I believe that linked list's will use more memory than the array list due to the double links (forward and backward).  Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList?  I haven't seen much on that.

The concurrenthashmap changes look good.
  1. Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation. 
    
    java.util.ArrayList: 149 bytes
    java.util.LinkedList: 101 bytes
    java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
    
    The tool I was using is a program from the link below.
    http://www.javapractices.com/topic/TopicAction.do?Id=83
    
    In terms of performance, here is a benchmark.
    http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
    
    In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses. 
    
    
  2. Thanks for the response Bo.
    
    Those numbers are for the empty data structures I'm assuming.  I was referring to the incremental cost of adding elements (messages) to the data structures.  The performance isn't a a concern to me (unless we call size() somewhere).
  3. By the incremental cost, I mean the memory cost, sorry.