How can you debug your Hadoop code?

Question

How can you debug your Hadoop code?

1 Answer

john ganales · Answer 1 · 2022-12-20T15:44:26+0000

Hadoop has a web interface to debug the code or users can make use of counters to debug Hadoop codes. There are some other simple ways to debug Hadoop code -

The simplest way is to use System.out.println () or System.err.println () commands, available in Java. To view all the stdout logs, easiest way is go to job tracker page, click on the completed jobs, then on map/reduce task, the task no. comes handy now, click on task id and then task logs, finally stdout logs.

But if your code produces huge number of logs, in that case there are various other methods to debug the code-

i) To check the details of the failed tasks, one can simply add the variable keep.failed.task.files in config. Once you do that, you can go to the failed tasks directory and run that particular task in isolation, which will run on single jvm.

ii)Other option is to run the same job on small cluster with same input. This will keep all the logs in one place, but we need to make sure that logging level is set to INFO.

Implement a Hadoop Job for Real-Time Querying

How can you debug your Hadoop code?

Please log in or register to answer this question.

1 Answer