Review Board 1.7.22


Add timestamp column with index to the partition stats table.

Review Request #2079 - Created Sept. 27, 2011 and updated

Kevin Wilfong
HIVE-2471
Reviewers
hive
heyongqiang, nzhang
hive
I added a timestamp column ts to the partition statistics table which defaults to the current_timestamp.  I also added code to create an index on that column, and verify that index exists when we check if the table exists.

I also took the opportunity to fix another problem.  Every time we change the schema of the partition statistics table we give it a slightly different name, like PARTITION_STATS, PARITION_STATISTICS, PARTITION_STAT_TBL, etc.  Instead, I want to put a number at the end of the table name, here I have PARTITION_STATS_V2, instead of trying to come up on a new variation of name, we can just increment the final number, this will also make it easy to identify old tables which can be dropped.

Checking whether the index exists may not be worth the time it takes.  We have to check this every time we init JDBCStatsPublisher, unless the table doesn't exist, and if it doesn't exist, it's not the end of the world, it just means any scripts which try to use the index will be slower, and the index can always be added later.  Also, the chance the program creates the table, but is interrupted before it can create the index is low.  I added the check because I thought the chance of having to try and find the reason why Hive slowed down, and having to find that a clean up script is running slow, and hence holding the locks for a long time, sounded painful, and hence the check would be worth it, but I am open to debate.
I ran TestStatsPublisherEnhanced using both derby and MySQL, and verified all the tests succeeded.

I also ran a few queries and verified that the table and index were created and that the rows, including timestamp, appeared in the table.