Review Board 1.7.22


HIVE-4629. HS2 should support an API to retrieve query logs

Review Request #14326 - Created Sept. 25, 2013 and updated

Shreepadma Venugopalan
trunk
HIVE-4629
Reviewers
hive
brocknoland
hive-git
Adds a new API to HS2, String getLog(OperationHandle opHandle) that returns the query log for a given operation handle. The log is maintained in memory as a circular buffer. The default size is 128 KB, but can be configured by the user. Logging is initialized if hive.server2.in.mem.logging is set to true.

Log object is created in executeStatement,getColumns,getTables,getSchemas,getCatalogs,getTypeInfo,getFunctions and destroyed in closeOperation, cancelOperation.
Add a new unit test to test log retrieval.
Total:
40
Open:
40
Resolved:
0
Dropped:
0
Status:
From:
Description From Last Updated Status
Let's add a test where we call getLog without any query. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Why even catch this if we rethrow immediately. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
I think we should append these two error messages with log. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Let's trim this trailing ws. (understand it's not yours.) Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Should this be called in a finally block? Same question for the items below. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Is this useful at the INFO level? It's completely your call but I was just curious. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Looks like we only require the List interface on the LHS? Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Internally we storing this as an int. Should the return type be a int or the internal type be a ... Brock Noland Oct. 4, 2013, 6:47 p.m. Open
When we drop an event, that means it's not stored in memory. Is it still logged to the HS2 log ... Brock Noland Oct. 4, 2013, 6:47 p.m. Open
The order of members for this class should be: static final final non-final Brock Noland Oct. 4, 2013, 6:47 p.m. Open
These two should start with a lower case cahr Brock Noland Oct. 4, 2013, 6:47 p.m. Open
LOG should be final Brock Noland Oct. 4, 2013, 6:47 p.m. Open
The variable should start with a lowercase char. Same same below and above. Brock Noland Oct. 4, 2013, 6:47 p.m. Open
spelling Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Slightly confusing. I would expect: if(operationLog == nul) { if(createIfAbsent) { doCreate(); } else { throw } } Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Operation is spelled wrong Brock Noland Oct. 4, 2013, 6:47 p.m. Open
trailing ws Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Similar to above, should this be part of the finally? Brock Noland Oct. 4, 2013, 6:47 p.m. Open
I know we are printing stack traces to syserr elsewhere in this file but lets' use the logger? Brock Noland Oct. 4, 2013, 6:47 p.m. Open
What encoding should this be, UTF-8? Brock Noland Oct. 4, 2013, 6:47 p.m. Open
Is this in memory logging for the server process, sessions, operations, or ...? Please make the property name more specific, ... Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Please use kilobytes instead of bytes. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
It's an operation log, not a query log. Also, what are the units of this value? Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
This feature needs to be tested at the Thrift level. Please include a test where getOperationLog is called more than ... Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
This looks like a substring from the previous check. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Please change the name to GetOperationLog() Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Does this return one line from the log per call, or do I get the whole 128kb buffer? What happens ... Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Formatting Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Why isn't OperationLog a member of the o.a.h.service.cli.operation package? Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
The name of this class should describe the functionality as opposed to the implementation. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
StringBuilder provides a delete(start, end) method which can be used to easily implement a ring buffer. I think that would ... Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
I think most of this code belongs in OperationManager as opposed to its own class. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Thread names are mutable and don't have to be unique. I think you want to use threadId instead. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
spelling Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Please move this to o.a.h.service.cli.operation.* Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Does this comment imply that when operation logging is enabled log messages generated by operations won't appear in the system ... Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Seems like most of the operation logging code should go in this class. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
I don't understand why the OperationLogger hangs off the SessionManager instead of the OperationManager. Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
If the established protocol for registering a thread is to always unregister it first, why not have registerCurrentThread call unregisterCurrentThread? Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Does operation logging work for executeStatementAsync()? Carl Steinbach Oct. 18, 2013, 11:50 p.m. Open
Posted (Oct. 4, 2013, 6:47 p.m.)
Shreepadma,

This looks like a very useful feature for clients of HS2! Indeed I wish all databases had this feature. The patch looks very good. I have noted a few items below, nothing major, basically a bunch of nits.

Brock
Let's add a test where we call getLog without any query.
Why even catch this if we rethrow immediately.
I think we should append these two error messages with log.
service/if/TCLIService.thrift (Diff revision 1)
 
 
Let's trim this trailing ws. (understand it's not yours.)
Should this be called in a finally block? Same question for the items below.
Is this useful at the INFO level? It's completely your call but I was just curious.
Looks like we only require the List interface on the LHS?
Internally we storing this as an int. Should the return type be a int or the internal type be a long?
When we drop an event, that means it's not stored in memory. Is it still logged to the HS2 log file?

Since we have a limit on the amount of data we store per log capture, I guess the question is, is all this data also sent to the HS2 log file?
The order of members for this class should be:

static final
final
non-final
These two should start with a lower case cahr
LOG should be final
The variable should start with a lowercase char. Same same below and above.
spelling
Slightly confusing. I would expect:

if(operationLog == nul) {
  if(createIfAbsent) {
    doCreate();
  } else {
    throw
  }
}
Operation is spelled wrong
trailing ws
Similar to above, should this be part of the finally?
I know we are printing stack traces to syserr elsewhere in this file but lets' use the logger?
What encoding should this be, UTF-8?
Posted (Oct. 18, 2013, 11:50 p.m.)

   

  
Is this in memory logging for the server process, sessions, operations, or ...? Please make the property name more specific, and please change the ordering of the subnames so that they run from general to specific, e.g. hive.server2.logging.operation.*
Please use kilobytes instead of bytes.
conf/hive-default.xml.template (Diff revision 1)
 
 
It's an operation log, not a query log. Also, what are the units of this value?
This feature needs to be tested at the Thrift level. Please include a test where getOperationLog is called more than once for the same operation, and a test where getOperationLog is demonstrated with asynchronous queries.
This looks like a substring from the previous check.
service/if/TCLIService.thrift (Diff revision 1)
 
 
Please change the name to GetOperationLog()
service/if/TCLIService.thrift (Diff revision 1)
 
 
Does this return one line from the log per call, or do I get the whole 128kb buffer? What happens if I call this RPC multiple times on the same operation handle? What happens if the circular buffer rolls over between two calls?
service/if/TCLIService.thrift (Diff revision 1)
 
 
Formatting
Why isn't OperationLog a member of the o.a.h.service.cli.operation package?
The name of this class should describe the functionality as opposed to the implementation.
StringBuilder provides a delete(start, end) method which can be used to easily implement a ring buffer. I think that would be a better option here since it would eliminate the need to iterate over every log entry in read().
I think most of this code belongs in OperationManager as opposed to its own class.
Thread names are mutable and don't have to be unique. I think you want to use threadId instead.
spelling
Please move this to o.a.h.service.cli.operation.*
Does this comment imply that when operation logging is enabled log messages generated by  operations won't appear in the system logs?

Also, can you please include a paragraph somewhere that provides a high-level overview of the operation logging mechanism? What expectations does it make about the relationship between Sessions/Operations and threads, and does it support asynchronous execution?
Seems like most of the operation logging code should go in this class.
I don't understand why the OperationLogger hangs off the SessionManager instead of the OperationManager.
If the established protocol for registering a thread is to always unregister it first, why not have registerCurrentThread call unregisterCurrentThread?
Does operation logging work for executeStatementAsync()?