HIVE-1694: Accelerate GROUP BY execution using indexes
Review Request #392 - Created Feb. 3, 2011 and updated
I suggest hive.optimize.index.groupby for this property.
I don't understand why these state variables are maintained as conf variables rather than just data members of a class. Could you explain why that is necessary?
Why is the exception being ignored here?
Need Apache headers on all new files.
Use "expr" instead of engfd
What about nested function invocations?
Eliminate commented-out code
Need to abstract out this dependency on _c convention.
For HIVE-1644, the HMC team is creating a new package org.apache.hadoop.hive.ql.optimizer.index. I think any optimizer code related to indexing should go in there.
This code is currently tied to the compact index representation. We mentioned earlier that we'll need a new index representation (summary) instead in order to implement the counts correctly (we should leave the compact representation as is). So: * until the summary representation is added, we can't enable this * in general, it would be good to find a way to make this pluggable; for example, the bitmap index representation can also be utilized by counting the bits, but the rewrite expression would be slightly different
Might want to use a finer level than LOG.info
Just a sanity check to avoid huge payloads coming back from thrift.
Hmm, no, I think we should fail hard here. If the underlying problem is fatal (e.g. the metastore went down), we should not try to hide it.