Review Board 1.7.22


PIG-3059 Global configurable minimum 'bad record' thresholds

Review Request #8765 - Created Dec. 26, 2012 and updated

Cheolsoo Park
PIG-3059
Reviewers
pig
jadler, jcoveney, sms
pig-git
This patch implements configurable bad records thresholds based on work done by Jonathan in PIG-2614.

The changes include:
- Adds new Pig properties - pig.load.bad.record.threshold and pig.load.bad.record.min.
- Removes 'ignore_bad_files' option from AvroStorage since it's no longer needed.
- Incorporates InputErrorTracker class written by Jonathan in PIG-2614.
- Adds a try-catch block to nextKeyValue() method in PigRecordReader.
- Adds new test cases to TestAvroStorage for these new properties.
ant clean commit-test
ant clean compile-test jar-withouthadoop
cd contrib/piggybank/java
ant clean test -Dtestcase=TestAvroStorage

Diff revision 2 (Latest)

1 2
1 2

  1. conf/pig.properties: Loading...
  2. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java: Loading...
  3. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java: Loading...
  4. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java: Loading...
  5. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java: Loading...
  6. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro: Loading...
  7. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro: Loading...
  8. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile4.avro: Loading...
  9. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/bad.avro: Loading...
  10. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/good.avro: Loading...
  11. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputErrorTracker.java: Loading...
  12. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java: Loading...
  13. src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java: Loading...
  14. src/org/apache/pig/tools/pigstats/JobStats.java: Loading...
  15. src/org/apache/pig/tools/pigstats/PigStats.java: Loading...
  16. src/org/apache/pig/tools/pigstats/PigStatsUtil.java: Loading...
  17. src/org/apache/pig/tools/pigstats/SimplePigStats.java: Loading...
conf/pig.properties
Revision 001a75e New Change
[20] 155 lines
[+20]
156
# first one whose supports() method returns true will be used.
156
# first one whose supports() method returns true will be used.
157
#
157
#
158
#####################################################################
158
#####################################################################
159

    
   
159

   
160
#pig.load.default.statements=
160
#pig.load.default.statements=

    
   
161

   

    
   
162
#pig.load.bad.split.threshold = <somevalue>: The threshold of tolerance for

    
   
163
#bad input splits. A value of 1 skips all the bad splits, whereas a value of 0

    
   
164
#allows no bad splits.

    
   
165
#pig.load.bad.split.min = <somevalue>: The minimum number of errors that will

    
   
166
#be tolerated regardless of threshold.
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
Revision 771c313 New Change
 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
Revision 0a84915 New Change
 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
Revision 9c37fec New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
Revision 28a448f New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro
Revision e69de29 New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro
Revision e69de29 New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile4.avro
Revision e69de29 New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/bad.avro
Revision e69de29 New Change
 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/good.avro
Revision e69de29 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputErrorTracker.java
Revision e69de29 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java
Revision 6c77bad New Change
 
src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java
Revision 45135b6 New Change
 
src/org/apache/pig/tools/pigstats/JobStats.java
Revision bdc08a5 New Change
 
src/org/apache/pig/tools/pigstats/PigStats.java
Revision 0228997 New Change
 
src/org/apache/pig/tools/pigstats/PigStatsUtil.java
Revision 521a482 New Change
 
src/org/apache/pig/tools/pigstats/SimplePigStats.java
Revision e4cd1c0 New Change
 
  1. conf/pig.properties: Loading...
  2. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java: Loading...
  3. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java: Loading...
  4. contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java: Loading...
  5. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java: Loading...
  6. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro: Loading...
  7. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro: Loading...
  8. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile4.avro: Loading...
  9. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/bad.avro: Loading...
  10. contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/good.avro: Loading...
  11. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputErrorTracker.java: Loading...
  12. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java: Loading...
  13. src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java: Loading...
  14. src/org/apache/pig/tools/pigstats/JobStats.java: Loading...
  15. src/org/apache/pig/tools/pigstats/PigStats.java: Loading...
  16. src/org/apache/pig/tools/pigstats/PigStatsUtil.java: Loading...
  17. src/org/apache/pig/tools/pigstats/SimplePigStats.java: Loading...