Posts Tagged ‘hadoop’

select a/sum(a) from foo; -- WRONG!

will not work because sum(a) would work on each row and will return the number present in that row resulting in all 1. since sum(n) = n where n is a single number.

here is where we can make good use of  hive windowing and analytics functions.

the solution is to use the following:

select a/sum(a) over () from foo; -- RIGHT !!!

here we are instructing that the sum be performed over the entire column ‘a’.

you could of course make 2 queries and calculate the sum of a first and then hardcode the sum in another query but that means you are performing 2 passes over the table for sure. now i can’t be sure how many passes over the table are being made in the right query above. if you know the answer please post in the comments below.

Read Full Post »

If one launches Hive using the “-S” or “–silent” option then Hive does not print progress information. However if you are already inside the Hive command line shell then you can control this behaviour by setting the value of silent.

set silent=on; -- make hive silent

set silent=false; -- make hive print progress information


Read Full Post »