Alexey Zinoviev - Hadoop Jungle: The world of wild algorithms and poisonous (Ru)

If you think that all deals with Hadoop cluster is only MapReduce job and Hive queries writing, you are mistaken. Incomplete utilization of JVM resources leads to the development degradation. As a result you will fire into the air a wad of money each time when your read or shuffle your data. Let’s put on a pith helmet, getting vaccinated against arrogance and go chop vines of nonoptimal solutions. We’ll go to the jungle and tamed the wild elephant Hadoop. Afterwards we will give an overview of different JOIN implementations. Also we’ll obtain power over the redirection of data to different partitions. This topic is the best choice for anyone who already know all the hassle with BigData and has some experience with the Hadoop, but wants to know more about its features.

26 views

2906

999