select a/sum(a) from foo; -- WRONG!
will not work because sum(a) would work on each row and will return the number present in that row resulting in all 1. since sum(n) = n where n is a single number.
here is where we can make good use of hive windowing and analytics functions.
the solution is to use the following:
select a/sum(a) over () from foo; -- RIGHT !!!
here we are instructing that the sum be performed over the entire column ‘a’.
you could of course make 2 queries and calculate the sum of a first and then hardcode the sum in another query but that means you are performing 2 passes over the table for sure. now i can’t be sure how many passes over the table are being made in the right query above. if you know the answer please post in the comments below.