+1 vote
in DevOps by
If you run select * query in Hive, why it’s not run Mpareduce?

1 Answer

0 votes
by

It’s an optimization technique. hive.fetch.task.conversion property can (FETCH task) minimize latency of mapreduce overhead. When queried SELECT, FILTER, LIMIT queries, this property skip mapreduce and using FETCH task. As a result Hive can execute query without run mapreduce task.

By default it’s value “minimal”. Which optimize: SELECT STAR, FILTER on partition columns, LIMIT queries only, where as another value is “more” which optimize : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns).

...