1 Answer

0 votes
by
MapReduce Char Count Example

In MapReduce char count example, we find out the frequency of each character. Here, the role of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-value pair.

Pre-requisite

Java Installation - Check whether the Java is installed or not using the following command.

java -version

Hadoop Installation - Check whether the Hadoop is installed or not using the following command.

hadoop version

If any of them is not installed in your system, follow the below link to install it.

www.javatpoint.com/hadoop-installation

Steps to execute MapReduce char count example

Create a text file in your local machine and write some text into it.

$ nano info.txt

MapReduce Char Count Example

Check the text written in the info.txt file.

$ cat info.txt

MapReduce Char Count Example

In this example, we find out the frequency of each char value exists in this text file.

Create a directory in HDFS, where to kept text file.

$ hdfs dfs -mkdir /count

Upload the info.txt file on HDFS in the specific directory.

$ hdfs dfs -put /home/codegyani/info.txt /count

MapReduce Char Count Example

Write the MapReduce program using eclipse.

File: WC_Mapper.java

package com.javatpoint;  

  

import java.io.IOException;      

import org.apache.hadoop.io.IntWritable;    

import org.apache.hadoop.io.LongWritable;    

import org.apache.hadoop.io.Text;    

import org.apache.hadoop.mapred.MapReduceBase;    

import org.apache.hadoop.mapred.Mapper;    

import org.apache.hadoop.mapred.OutputCollector;    

import org.apache.hadoop.mapred.Reporter;    

public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{    

    public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,     

           Reporter reporter) throws IOException{    

        String line = value.toString();    

        String  tokenizer[] = line.split("");    

        for(String SingleChar : tokenizer)  

        {  

            Text charKey = new Text(SingleChar);  

            IntWritable One = new IntWritable(1);  

            output.collect(charKey, One);                 

        }  

    }    

    

}   

File: WC_Reducer.java

package com.javatpoint;  

    import java.io.IOException;    

    import java.util.Iterator;    

    import org.apache.hadoop.io.IntWritable;    

    import org.apache.hadoop.io.Text;    

    import org.apache.hadoop.mapred.MapReduceBase;    

    import org.apache.hadoop.mapred.OutputCollector;    

    import org.apache.hadoop.mapred.Reducer;    

    import org.apache.hadoop.mapred.Reporter;    

        

    public class WC_Reducer  extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {    

    public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,    

     Reporter reporter) throws IOException {    

    int sum=0;    

    while (values.hasNext()) {    

    sum+=values.next().get();    

    }    

    output.collect(key,new IntWritable(sum));    

    }    

    }  

File: WC_Runner.java

package com.javatpoint;  

  

    import java.io.IOException;    

    import org.apache.hadoop.fs.Path;    

    import org.apache.hadoop.io.IntWritable;    

    import org.apache.hadoop.io.Text;    

    import org.apache.hadoop.mapred.FileInputFormat;    

    import org.apache.hadoop.mapred.FileOutputFormat;    

    import org.apache.hadoop.mapred.JobClient;    

    import org.apache.hadoop.mapred.JobConf;    

    import org.apache.hadoop.mapred.TextInputFormat;    

    import org.apache.hadoop.mapred.TextOutputFormat;    

    public class WC_Runner {    

        public static void main(String[] args) throws IOException{    

            JobConf conf = new JobConf(WC_Runner.class);    

            conf.setJobName("CharCount");    

            conf.setOutputKeyClass(Text.class);    

            conf.setOutputValueClass(IntWritable.class);            

            conf.setMapperClass(WC_Mapper.class);    

            conf.setCombinerClass(WC_Reducer.class);    

            conf.setReducerClass(WC_Reducer.class);         

            conf.setInputFormat(TextInputFormat.class);    

            conf.setOutputFormat(TextOutputFormat.class);           

            FileInputFormat.setInputPaths(conf,new Path(args[0]));    

            FileOutputFormat.setOutputPath(conf,new Path(args[1]));     

            JobClient.runJob(conf);    

        }    

    }    

Download the source code.

Create the jar file of this program and name it charcountdemo.jar.

Run the jar file

hadoop jar /home/codegyani/charcountdemo.jar com.javatpoint.WC_Runner /count/info.txt /char_output

The output is stored in /char_output/part-00000

MapReduce Char Count Example

Now execute the command to see the output.

hdfs dfs -cat /r_output/part-00000

MapReduce Char Count Example
...