Review Board 1.7.22

Changes to HCatInputFormat to make it use SerDes instead of StorageDrivers

Review Request #3784 - Created Feb. 8, 2012 and updated

Alan Gates
francisliu, sushanth
See HCATALOG-237 for details and design notes.

Posted (Feb. 9, 2012, 1:13 a.m.)


not related do this patch. Didn't know the outputSchema is not in inputJobInfo. We should file a jira to put it in for consistency.
since jobConf is a deep copy of jobContext.getConfiguration() you should be consistent here and only use jobConf. In case one of these methods actually modifies the configuration then you know all of it is in one place. And update the jobContext with all the changes before this method ends.
We should sync up on how we do this. Take a look at my patch and HCatUtil.getStorageHandler().
Available in my patch use HCatUtil.getStorageHandler() instead
Will jobProperties really have nothing in it? maybe you should append to the the contents instead?
IOException would be better. Please add a message as well.
should be moved into default/foster storageHandler logic.
what happens if storageHandler == null? You should encapsulate if,of,serde into a "FosterStorageHandler" so you only have to take care of one code path.
you don't need an explicit cast.
For now use HCatStorageHandler since it contains some methods that are missing and will be added to HiveStorageHandler
similar code as the previous...same comments. In HCatOutputFormat, I moved code such as this into HCatUtil.configureOutputJobProperties() maybe you should do something similar?
I would just do:
 JobConf jobConf = new JobConf(jobContext.getConfiguration)

One less variable to worry about.
Since this is file based code. this logic should go into the Foster/Default StoragaHandler.configureInputJobProperties()
you don't need the 2nd condition
you can probably remove this already?