Review Board 1.7.22


extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

Review Request #785 - Created May 26, 2011 and updated

Tomasz Nykiel
HIVE-2185
Reviewers
hive
hive
Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. 
Other statistics (e.g., total table/partition size) are derived from the file system.

We introduce a new feature for collecting information about the sizes of uncompressed data, to be able to determine the efficiency of compression.
On top of adding the new statistic collected, this patch extends the stats collection mechanism, so any new statistics could be added easily.

1. serializer/deserializer classes are amended to accommodate collecting sizes of uncompressed data, when serializing/deserializing objects.
We support:

Columnar SerDe
LazySimpleSerDe
LazyBinarySerDe

For other SerDe classes the uncompressed siez will be 0.

2. StatsPublisher / StatsAggregator interfaces are extended to support multi-stats collection for both JDBC and HBase.

3. For both INSERT OVERWRITE and ANALYZE statements, FileSinkOperator and TableScanOperator respectively are extended to support multi-stats collection.

(2) and (3) enable easy extension for other types of statistics.

4. Collecting uncompressed size can be disabled by setting:

hive.stats.collect.uncompressedsize = false
- additional JUnit test for Serializer/Deserializer amended classes
- additional queries for TestCliDriver over multi-partition tables
- all other JUnit tests
- standalone setup 

Diff revision 1

This is not the most recent revision of the diff. The latest diff is revision 3. See what's changed.

1 2 3
1 2 3

  1. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: Loading...
  2. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java: Loading...
  3. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java: Loading...
  4. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java: Loading...
  5. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java: Loading...
  6. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java: Loading...
  7. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java: Loading...
  8. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java: Loading...
  9. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java: Loading...
  10. trunk/hbase-handler/src/test/queries/hbase_stats.q: Loading...
  11. trunk/hbase-handler/src/test/results/hbase_stats.q.out: Loading...
  12. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: Loading...
  13. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java: Loading...
  14. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java: Loading...
  15. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java: Loading...
  16. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java: Loading...
  17. trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: Loading...
  18. trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java: Loading...
  19. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: Loading...
  20. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java: Loading...
This diff has been split across 3 pages: 1 2 3 >
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Revision 1127756 New Change
[20] 392 lines
[+20] [+] public class HiveConf extends Configuration {
393
        false), // whether to update metastore stats only if all stats are available
393
        false), // whether to update metastore stats only if all stats are available
394
    HIVE_STATS_RETRIES_MAX("hive.stats.retries.max",
394
    HIVE_STATS_RETRIES_MAX("hive.stats.retries.max",
395
        0),     // maximum # of retries to insert/select/delete the stats DB
395
        0),     // maximum # of retries to insert/select/delete the stats DB
396
    HIVE_STATS_RETRIES_WAIT("hive.stats.retries.wait",
396
    HIVE_STATS_RETRIES_WAIT("hive.stats.retries.wait",
397
        3000),  // # milliseconds to wait before the next retry
397
        3000),  // # milliseconds to wait before the next retry

    
   
398
    HIVE_STATS_COLLECT_UNCOMPRESSEDSIZE("hive.stats.collect.uncompressedsize", true),

    
   
399
    // should the uncompressed size be collected when analayzing tables
398

    
   
400

   
399

    
   
401

   
400
    // Concurrency
402
    // Concurrency
401
    HIVE_SUPPORT_CONCURRENCY("hive.support.concurrency", false),
403
    HIVE_SUPPORT_CONCURRENCY("hive.support.concurrency", false),
402
    HIVE_LOCK_MANAGER("hive.lock.manager", "org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager"),
404
    HIVE_LOCK_MANAGER("hive.lock.manager", "org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager"),
[+20] [20] 364 lines
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java
Revision 1127756 New Change
 
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java
Revision 1127756 New Change
 
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java
Revision 1127756 New Change
 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
Revision 1127756 New Change
 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
Revision 1127756 New Change
 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
Revision 1127756 New Change
 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java
Revision 1127756 New Change
 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java
New File
 
trunk/hbase-handler/src/test/queries/hbase_stats.q
Revision 1127756 New Change
 
trunk/hbase-handler/src/test/results/hbase_stats.q.out
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Revision 1127756 New Change
 
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java
Revision 1127756 New Change
 
  1. trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: Loading...
  2. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java: Loading...
  3. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java: Loading...
  4. trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java: Loading...
  5. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java: Loading...
  6. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java: Loading...
  7. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java: Loading...
  8. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java: Loading...
  9. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java: Loading...
  10. trunk/hbase-handler/src/test/queries/hbase_stats.q: Loading...
  11. trunk/hbase-handler/src/test/results/hbase_stats.q.out: Loading...
  12. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: Loading...
  13. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java: Loading...
  14. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java: Loading...
  15. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java: Loading...
  16. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java: Loading...
  17. trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: Loading...
  18. trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java: Loading...
  19. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: Loading...
  20. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java: Loading...
This diff has been split across 3 pages: 1 2 3 >