Review Board 1.7.22


float and double calculation is inaccurate in Hive

Review Request #8653 - Created Dec. 18, 2012 and updated

Johnny Zhang
trunk
HIVE-3715
Reviewers
hive
hive
I found this during debug the e2e test failures. I found Hive miss calculate the float and double value. Take float calculation as an example:
hive> select f from all100k limit 1;
48308.98
hive> select f/10 from all100k limit 1;
4830.898046875 <--added 04875 in the end
hive> select f*1.01 from all100k limit 1;
48792.0702734375 <--should be 48792.0698
It might be essentially the same problem as http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm But since e2e test compare the results with mysql and seems mysql does it right, so it is worthy fixing it in Hive.
I did test to compare the result with mysql default float precision setting, the result is identical.

query:          select f, f*1.01, f/10 from all100k limit 1;
mysql result:   48309       48792.0702734375    4830.898046875
hive result:    48308.98    48792.0702734375	4830.898046875


I apply this patch and run the hive e2e test, and the tests all pass (without this patch, 5 related failures)
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
Revision 1423224 New Change
[20] 19 lines
[+20]
20

    
   
20

   
21
import org.apache.hadoop.hive.ql.exec.Description;
21
import org.apache.hadoop.hive.ql.exec.Description;
22
import org.apache.hadoop.hive.ql.exec.UDF;
22
import org.apache.hadoop.hive.ql.exec.UDF;
23
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
23
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
24

    
   
24

   

    
   
25
import java.math.BigDecimal;

    
   
26
import java.math.RoundingMode;
25
/**
27
/**
26
 * UDFOPDivide.
28
 * UDFOPDivide.
27
 *
29
 *
28
 */
30
 */
29
@Description(name = "/", value = "a _FUNC_ b - Divide a by b", extended = "Example:\n"
31
@Description(name = "/", value = "a _FUNC_ b - Divide a by b", extended = "Example:\n"
[+20] [20] 11 lines
[+20] [+] public DoubleWritable evaluate(DoubleWritable a, DoubleWritable b) {
41
    // + b);
43
    // + b);
42
    if ((a == null) || (b == null)) {
44
    if ((a == null) || (b == null)) {
43
      return null;
45
      return null;
44
    }
46
    }
45

    
   
47

   
46
    doubleWritable.set(a.get() / b.get());
48
    BigDecimal aBig = new BigDecimal(String.valueOf(a.get()));

    
   
49
    BigDecimal bBig = new BigDecimal(String.valueOf(b.get()));

    
   
50
    aBig = aBig.divide(bBig, 10, RoundingMode.HALF_DOWN);

    
   
51

   

    
   
52
    doubleWritable.set(aBig.doubleValue());
47
    return doubleWritable;
53
    return doubleWritable;
48
  }
54
  }
49
}
55
}
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
Revision 1423224 New Change
 
  1. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java: Loading...
  2. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java: Loading...