Went about half way through the patch which looks good.
nit: insert space between if and (
Add javadoc for this interface and its methods
This interface is only used by void sendParam().
Please merge it with Call interface below.
CallRequest is extracted from Call and provides a meaningful grammatical restriction to identify the fact that the method Connection.sendParam(CallRequest request) is only used to send data without receiving the result. In general the same interface (and instance) is not required to send and receive data, and it might be completely separated with encapsulating details of data to send/receive after thorough refactoring, but at present I stopped at the halfway point that is enough for my purpose, making HBaseClient use SharedMap.
About javadoc, I can't do nothing but duplicate the names of the methods.
nit: insert space between catch and (
'is pooled' -> 'was pooled'
Would you tell me why the past tense is appropriate?
This method is expected to be synchronized and the condition is kept just until the method returns, and I think the present tense is more close to what I want to say.
Add @param for key
Do the same for other methods
I think this interface is not yet well stable to fill javadoc tags, which will be written by dupcliating the contents of the body of the javadoc.
Remove one 'the' from 'the the '
'The rest' -> 'If'
There are two kind states of the registered objects, idle objects that are held in the pool, and busy objects that are borrowed and in use. This method clear() invalidates all of the registered objects, but returns only idle objects, and you can only tear down the idle objects just after calling this method. So I feel I have to write some description about the busy objects.
But some comment changed.
'but still you can tear down them later' -> 'which you can tear down later'
'process' -> 'processing'
Would invalidateAll() be a better name for this method ?
This method is intended to call from outside register/borrow-return/invalidate cycles, and returnObject() or invalidateObject() is required to call for borrowed objects even after this method is called. The naming invalidateAll() might cause confusion as if you could call this method in place of invalidateObject().
Would invalidateAllForKey() be a better name for this method ?
I overlooked the possibility to throw an IOException while releasing tables, which causes halfway stopping releasing tables which are already removed from the pool. Also closing a single table should succeed regardless of whether releasing the table throws an exception or not, otherwise there is no way to remove the such broken table from the pool.
I think I'll add methods to release a table/tables with logging exceptions instead of throwing them, and make methods of closing a table/tables call the new methods.
Fixed along the reviews.
Added comment to the javadoc of the class SharedMap about its expected usage.
Removed getStartTime() from CallRequest to Call, because I noticed it is not necessary.
Added a convenient method done() to the private inner class AbstractCallDecorator to reduce duplication of code.
releaseHTableInterface() throws IOException
Please narrow the exception caught
If catching only IOException, closing tables will be stopped halfway if a runtime exception is thrown.
As HTableFactory implementing HTableInterfaceFactory actually throws RuntimeException in the method createHTableInterface() against the interface's contract, I think it is better to prepare for runtime exceptions.
HTableFactory is marked Stable. Looks like we cannot change method signature.
Thanks for the reminder.
'failing to return them' usually is bad
If user tears down rest of the objects, that should be acceptable.
Please rewrite this last sentence
Sorry I couldn't understand what was suggested after thinking repeatedly, but I rewrote the comment to represent what I expects in more detail.
Explain what the type parameters represent.
'borrows a pooled object' -> 'lends a pooled object'
Since map contains multiple pools, this name gives me the impression of count of the number of pools.
Would maxObjectCountPerKey be a better name ?
Should idleQueue be checked to see that value isn't in idleQueue ?
Thanks to your suggestion, I noticed that I should change the specification of the method SharedMap.registerObject() in order to keep its consistency. It complicates implementations and is not necessary to treat unregistered objects and registered objects in the same manner.
I make the method registerObject() reject re-registrations for already registered objects and return false, though in practice the method will be used for a newly created object and the returned boolean value of registerObject() will be ignored.
'>=' -> '>'
No. The idleQueue contains maxPoolCount objects at most.
'borrows' -> 'lends'
'give an pooled instance regardlessly' -> 'gives a pooled instance unconditionally'
'may borrow' -> 'may lend'
Do you want to enrich this comment ?
'pooling objects' -> 'pooled objects'
'to borrow' -> 'to lend'
This line and line 50 are too long
Please limit to 100 characters
'registering objects are exceeded'
Did you mean number of registered objects exceed this limit ?
Please rename maxPoolCount, considering my suggestion above.
> Did you mean number of registered objects exceed this limit ?
Yes, in order to simplify the interface, registerObject() always succeeds regardless of the limit.
The last part of the sentence should read 'even if number of registering objects exceeds the max count'
Also I changed the constructor; instead of throwing IllegalArgumentException I made RoundRobinSharedMap use 1 for the max count, otherwise checking the value would be required between calling this constructor and getting the value from the configuration of HBase.
Looking at comment on line 87, should we assign 1 to maxObjectCountPerKey ?
Comment on line 87 says, the condition maxObjectCounterPerKey > 0 ensures that busyQueue is not empty at that point and we have at least one object to borrow.
I rewrote the comment.
busyCount should be used in place of busyCountWrapper to avoid unboxing.
Add a message to the exception.
I think it is verbose, because we don't have additional effective information to append to the exception. The constructor has only one argument, and we can trace and find the cause with the exception's stack trace and the contract described in the javadoc.
Generally, looking really good. A couple of nits and a couple questions (the latter more because I'm a little late to the party and also haven't been in this code for a bit).
Also, do you want to add @Deprecated to PoolMap?
PoolMap.PoolType is still used in HTablePool, which is marked as stable.
nit: this is a weird construct. why not go with the positive case instead?
this seems like fundamentally different behavior than previously. Now we are just invalidating idle instances, rather than all instances of HTable. Am I missing something?
Yes, the previous code may close tables which are still in use by other threads, but the method HTable.close() is not thread-safe and we cannot tell what happens. In the previous code it is required for users to return all the tables before calling HTablePool.close().
why the switch here to creating a new daemon thread? and the thread -> runnable change above? Sorry, catching up...
With chainging the code to use SharedMap instead of PoolMap, because they have different functions, I changed how to interrupt the threads in instances of Connection.
(I think the interruption is not appropriate, but I just left the old logic for the time being.)
I decided to use ThreadGroup.interrupt() for the purpose. But the thread group must be specified as a parameter of a constructor of Thread, and there is no constructor to fit nicely, so I felt this was the time to break strong dependency between Connection and Thread.
I think we can completely separate threads from instances of Connection, and we can reuse the threads by using ExecutorService etc. to avoid overhead of re-creating threads, setting aside how effective to do so. But I stopped this refactoring at the point I can use SharedMap.
doesn't this change the behavior a little to always use a roundrobbing, rather than a configurable type of pool?
Yes. In the previous code there was the other type, thread-local, which is wrong and doesn't work well, so I purged. I also improved the round-robin logic to prefer to lend idle objects, and there is no (or little) reason to use the old thread-local logic.
no (or little) reason -> no reason; I think it is difficult to pick up advantage of the thread-local logic any more.
these two methods are really similar - maybe switch the interface to just be a varargs?
I worry about overhead of reusing the methods for an array/collection of requests, which seem to be not called from anywhere in the java code. If the methods are going to be removed, the changing code to reuse the methods is wasteful.
And the two methods don't seem so similar to extract common logic.
Does this comment really apply? The same amount of blocking happens in either case since the connection.setupIOstreams is synchronized. You are saving an extra synchronized block (yay!), but the thread ordering isn't affected, right?
I almost copied and pasted the comment, and didn't elaborate. It also becomes difficult to understand since the intended synchronization blocks are far away.
By the way, these two synchronizations have different targets to block, and in any way the separation is required.
I'll rewrite the comment.
can you be more specific here with equality and sameness (== or .equals?)
It would be redundant to add concrete descriptions for the top interface SharedMap and all of its implementations, and I'll add the description for the top interface and leave the implementation classes to be mentioned briefly.
nit: can you be a little more explicit as to what it is actually doing (i.e. map between key and the pool of objects)?
Its type Map<K, Pool<V>> well described "map between key and the pool of objects". I agree that the name "map" is a common noun as it is, so I'll rename it to "poolMap" or something.
maybe worth commenting that subclasses don't need to be thread-safe as this is handled here?
Ideally, without explicit description we should create classes without paying cost or risk of synchronization.
how do I get something that is borrowed and not pooled?
Well, the word "pooled" is vague, so I'll remove or replace.
this only works correctly because all the access is synchronized above, right?
this only works correctly because all the access is synchronized above, right?
Yes, you can normally implement the interface Pool.
maybe describe the round-robin logic a little more here? How do we limit the number of elements in the pool before we cycle through them?
Do you mean we should write here the content which is written in the javadoc of the constructor?
I rewrote the javadoc a little.
nit: these javadocs aren't formatted correctly - the methods need to have their parameters as well. Also the methods sho uld have correct javadocs (@return, @param tags, etc)
You should not fill tags without any additional indispensable information, at least until the code is well stable. Otherwise not only you pay cost to duplicate comment but also increase maintenance cost and inconsistency risk in future.
this seems overly specific for the class-level javadoc. Great explanation, but maybe just leave it to the method itself?
This explains a scenario, how to use the interface over across the methods, that is the meaning of the existence of SharedMap and more important than the javadoc of each method. There is no appropriate place except the javadoc of the interface.
this line in particular would be really good to have on returnObject().
Same as above.
Do you mean I need to create a new object and register it?
No, but it is just almost meaningless if you don't.
can you be a bit more explicit here with what it means to invalidate an object (i.e. that object won't be lent in the future to other requests, but doesn't do anything to the resource itself and maybe still be in use after invalidation (and hence needs to be torn down independently).
This method is only meaningful in combination with other methods, which is referred in the javadoc of the class. Also I don't want to write here other methods' specifications redundantly.
nit: giving -> that gives
This is a part of test classes which are not for users and not supposed to be included in the document.
The naming of classes, methods, fields etc. of test classes must be self-evident without comments, and the implementation logic of each test method must be apparent and represent a part of the specification of the targeted class.
A couple more comments - some on correctness, some nits, some thoughts on alternatives. Getting really close thought - good stuff!
nit: odd method name, maybe something like newTableMap? And why the static - its only called in the constructor?
I'll rename the instance variable "tables" -> "tableMap" and this method "newTables" -> "newTableMap".
This method is static and independent of an instance, and is required to construct the final instance variable "tables" ("tableMap").
what do you think about having the PoolType just have its own method (e.g. PoolType.getTableMap()) for the table here? Would save on switching, be a bit more OO and drop the need for this method. It would be similar to the current PoolType::createPool method.
Or maybe breaking the dependency entirely and moving to a different enum? Or deprecating the constructor and creating a new one that takes in the new enum type + a helper method (also deprecated) to convert from the PoolType to the new object?
HTablePool is annotated as stable, and we should keep the constructors which use PoolType in the end.
Ideally, this is not a case to use enum. enum classes are not extendable without rewriting themselves, in exchange for ability to enumerate all of their members. That is an opposite vector against OO, and for many cases it is overpayment. We should have simply created a factory method for each case.
If there is contention here, couldn't this method create two tables for the same tablename, only one of which is actually registered to the pool? We could consider this a 'leak' since the pool then doesn't actually know about all the tables it created, right? Its handled correctly in releaseTable, but couldn't it lead to more tables/threads than expected?
So maybe we should: (1) add a comment to this method (or createTable) that its not thread safe, or (2) synchronize this method to avoid double creating the object, or (3) at least check the result of .registerObject to ensure that we don't have a double lend of the same key (but different table)?
No, it is not required to do anything, unless you have to exactly block threads and wait until some table is returned rather than create an excess instance.
The method registerObject() always succeed unless the instance to be pooled is already registered.
Creating excess tables causes some tables will be overflowed and rejected to return into the pool, and they will be torn down?right there.
This does have implications for the number of the threads being used on a system (which is important for matching threads to cores and avoiding excessive context switching). There can be the case where we do get more threads created than expected, even though we are using the pool. Therefore, we need to at least check the return of registerObject() to ensure we don't lend excessive threads (if we don't synchronize on access/creation) and only pay a temporary thread creation hit until we realize that the object has already been registered, at which point we can just recurse.
So at the very least, this need a comment that it is possible (if not very likely) that you can have two instances from the pool that are completely separate, only one of which is registered; to me this is incorrect behavior, especially given the javadocs on the method and really would prefer changing as described above
First of all, from the beginning HTablePool is designed to hold several instances for each table, and the class has been created as a convenient class to help users to create client applications.
For the current version of HTablePool, the pool size given from constructors of HTablePool is used as the max count of idle instances (which are not borrowed but held in HTablePool) for each table. HTablePool doesn't prevent you from getting instances to progress your operations.
Well, think about how to prevent users from getting an instance.
HTablePool doesn't make HTable to be thread safe, so (I)you must block the thread at the HTablePool.getTable() method until some other thread returns a borrowed table via HTableInterface.close() method, or (II)immediately notify the failure by returning null or throwing an exception or etc.
(Note that adding "synchronized" or something to HTablePool.getTable() is not sufficient. Even if a thread creates and registers a new instance of HTable in getTable(), it is not allowed for another thread to use until the instance returns to the pool. In the current implementation, the "excessive" creation may be happened with contention between getting and returning an instance, and if the returned instance loses it is pooled to prepare for the next get request.)
For the former case (I), the block crosses over the methods, and that easily causes dead lock. For example, some thread get a table "A" and get a table "B" while other thread get table "B" and "A", under the condition that the max count of instances allowed to create for each table is 1.
For the latter case (II), the situation is not so better. To solve the starvation problem, if a thread is notified of the failure, it must roll back its half-finished operations, return its all borrowed tables, and start over. (In this case the operations must be able to roll back.)
In either case you burdens users to create client applications, and it generally doesn't pay. In many cases users can explicitly control how many user threads execute operations in their application and how many tables are required for each operation, and the users can estimate and control the maximum count of instances of HTable. That is much easier than dealing with the above cases.
By the way, the SharedMap.registerObject() always succeeds unless the instance to be pooled is already registered,
and a newly created instance is always can be registered (unless equals/hashCode are maliciously overridden).
Why the interface here? I don't see any other implementing types....
Well, in future, we should distinguish between request and response of Call that are not necessarily identical, and we should extract responsibility to encode/decode the contents to send/receive via the byte streams from Connection and should embed the responsibility into the request/response of Call.
Actually, the method Connection.sendParam() only sends encoded data and only requires "id"(int) and "param"(RpcRequestBody). That is the reason we don't decorate an instance of Call to make traps to know completion of the responses when we call Connection.sendParam().
But I'll change this to avoid confusion. As the name of the method sendParam(), I'll give "id" and "param" to the method instead of the whole of the object of Call or CallRequest, and remove the interface CallRequest.
> That is the reason we don't decorate an instance of Call to make traps to know completion of the responses when we call Connection.sendParam().
Sorry, I actually decorated it.
This call is only used in the one call() method, which in turn is only used by the one deprecated call method, which isn't used by anything that I could find. Further, that method has been deprecated since 0.94 (at least). What do you think about dropping it here or in a follow-on jira?
I also suspect that this method is not used, but some scripts might call this method. I'm not sure that.
Also I think this parallel call implementation has better performance than repeatedly invoking the single call, and I feel it’s too good to be thrown away.
nit: can you add javadocs here to describe what its should be doing, the side effects, and how its different than the other sendParams method?
This is the section that I was talking about before that could be pulled into its own method. In my head, the top would be the setup part (where you setup the call for the connection), and then you can pull out the connection.setupIOstreams and looping into new method (e.g. makeCalls(Connection, Call...calls)) that can be called from above.
Generally, when there is a large comment section, its non-obvious code that I find is better isolated, to avoid messing things up in the copy (i.e. here and in the above sendParams method).
Its a minor thing, and if you think it just adds complication, then I'll drop it.
Sorry I didn't understand all you said, but I agree with messy duplication. So I'll change the order of the logic and restructure some methods.
Disagree with you comment's on the last review about not needing javadocs for test classes. Maybe no necessarily on the class level, but on the method level, what the goals of each test are, etc. are still good things to have.
Even if comments are required for the test methods, there is no advantage to use javadoc for the comment. Javadoc burdens us with following the special format combining HTML and tags, even if that is not converted to documents nor referred by developers via pop-up window in IDE. Even the test runner shows only the names of the methods which are failed to execute. Comments of tests are only for the developers who just see the test codes to investigate something.
For the developers who just see the test codes, what comments are required if the names of the methods and their implementation codes are apparent? Redundant comments are not only noisy but also misleading.
Of course, comment is useful to complement something which the code which fails to represent.
Also javadoc might be worth for helper classes or helper methods via pop-up window in IDE.
- Renamed an instance variable "tables" in HTablePool to "tableMap", and a method "newTables" to "newTableMap".
- Similarly, renamed an instance variable "connections" in HBaseClient to "connectionMap", and a method "newConnections" to "newConnectionMap".
- Removed the interface CallRequest.
- Removed a property "id" from Call because I found it is not needed, and fixed methods about it.
- Simplified and enabled to interrupt while calling SingleCall.getValue() and ParallelResults.getValues(); I overlooked that the public methods call() throws InterruptedException.
- Fixed code of handling InterruptedException thrown by Thread.sleep() in calling UserGroupInformation.doAs() method, in order to avoid leak. (But I'm not sure how to deal with RuntimeException.)
- Changed order of creating a instance of Connection, establishing its connection and starting an internal thread, preparing to handle a response, and sending a parameter to remote server, in order to reduce duplication and complication. HBaseClient.getConnection() now returns a instance of Connection which already has established its connection and has started its internal thread. The newly extracted method Connection.invokeCall() executes both of preparing to handle a response and sending a parameter.
- Renamed a method "sendCall" to "invokeCall" because the method not only sends a parameter to a remote server but also adds the instance of Call into an internal pool to handle its response.
- Added interruption precondition check for all public methods Connection.call() in order to stop earlier when interrupted.
- Added @deprecated tag to a protected static method HBaseClient.getPoolType(Configuration) which is already unused.
I rebased my local branch on the current trunk because I had failed to apply my patch on the trunk, which might cause this patch different from the previous patch in many hunks.
Also I added @deprecated tag to the constructors of PoolMap, whose instance is not thread safe against its intention. I didn't add the tag to the class PoolMap itself because the internal class PoolMap.PoolType is used in public constructors in HTablePool, which is annotated as stable.
Fixed the test class TestFromClientSide. The previous version of TestFromClientSide.testClientPoolThreadLocal() was wrong from the beginning because HTable is not thread safe, and also logic of calling Object.notifyAll() and Object.wait() is wrong and it might do nothing (though fortunately it seems to avoid to be blocked forever). The intention of the previous version is vague and I already purged thread-local logic, so I created another similar test.