javapandit.net
  • Home
  • Quick Java
    • Exception Handling
    • Collections Framework
  • Java Best Practices
  • Web Services
    • Web Service Basics
    • Ten Basic webservice concepts
    • XML
    • Apache Axis
    • Restful Web Services
  • JMS Concepts
    • JMS- MySQL
  • Hadoop
    • NoSQL DATABASEs
    • Apache Sqoop
    • Hadoop Interview Questions
  • Java 5
  • Java 8
    • Java 8 : Lambda Expressions
  • JDBC
  • Java Architect
    • Enterprise application re-platforming strategies
    • Java Memory Management
  • Java Programs
  • Technical Tips
    • How to set JAVA_HOME environment variable
    • How to create an auto increment field in Oracle?
    • Linux Commands
  • Best Java Interview Questions
    • Java Interview Questions- YouTube
  • Interview Questions
    • Java Tech interview Questions
    • Core Java Interview Questions >
      • core tech questions1
      • Java Collection interview questions
      • Java Concurrency
    • Servlets Interview Questions
    • JSP Interview Questions
    • Java Web Services Interview Questions
    • EJB Interview Questions
    • XML Interview Questions
    • JMS Interview Questions
  • Struts Interview Questions
    • Struts 2 Interview Questions
  • Java EE Architects Interview Questions
    • Java Architect Interview Questions
    • Top 10 reasons for Java Enterprise Application Performance Problems
    • Web Application Scalability Questions for IT Architect
  • JavaPandit's Blog
  • Web Services Interview Questions
  • Servlets And JSP
  • Oracle SOA Interview Questions
    • Open ESB /JBI
    • BPEL Language
  • Log4J
  • Ant
  • Maven
  • JMeter
  • JUnit
  • Apache POI Framework
  • ORCALE SERVICE BUS (OSB) Interview Questions
  • J2EE Patterns
    • Model-View-Controller (MVC)
    • Front Controller
    • DAO
    • Business Delegate
    • Session Facade
    • Service Locator
    • Transfer Object
    • Design Patterns >
      • Creational Patterns >
        • Singleton
      • Behavioural Patterns
      • Structural Patterns
    • Intercepting Filter
  • SQL Interview Questions/Lab
  • Best Wall Papers
    • Devotional Songs
  • Java Community
  • HIBERNATE
  • ORACLE CC&B
    • Oracle CC&B Interview Questions
  • Docker
  • Little Princess
    • Sai Tanvi Naming Ceremony Celebrations
    • Rice Feeding Ceremony
    • Sai Tanvi Gallery
  • APPSC Career Guidance
    • AP History
    • Indian Polity
    • Indian Economy
    • Science & Technology
    • Mental Ability and Reasoning
    • Disaster Management
    • Current Affairs and Events
    • General Sciences >
      • Biology
      • Physics
      • Chemistry
    • Previous Question Papers
  • About Us
  • Contact US

Spring with Hadoop

11/3/2013

0 Comments

 
Let me  explain how to integrate Spring with Hadoop.

This POC contains how to Word Count from one of the Flat file in Hadoop file system using Spring for Apache Hadoop (    
Spring for Apache Hadoop provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.)

Spring for Hadoop provides integration with the Spring Framework to create and run Hadoop, MapReduce, Hive, and Pig jobs as well as work with HDFS and HBase.

1.    Software Requirements

JDK level 6.0
Spring Framework 3.0 and above
Apache Hadoop 1.2.1. and above
Cloudera CDH3 (cdh3u5), CDH4 (cdh4.1.3 MRv1) distributions
Hortonworks Data Platform 1.3
Greenplum HD (1.2)
Any distro compatible with Apache Hadoop 1.x should be supported.

Note:
Hadoop YARN support is only available in Spring for Apache Hadoop version 2.0 and later.

2.    Spring Hadoop 

This will explain how to run a hadoop mapreduce job using spring IOC with CDH3 using eclipse from windows.

Spring hadoop name space is
http://www.springframework.org/schema/hadoop

& schema for it is
http://www.springframework.org/schema/hadoop/spring-hadoop.xsd

Steps:
1.    Download spring-hadoop jar.
2.    Create a java project in eclipse.
3.    Cteate lib folder and place the following jars inside lib
4.    Add all these jars to BuilPath.
5.    Create hadoop.properties file in src folder

hd.fs=value of fs.default.name
hd.jt=value of mapred.job.tracker
wordcount.input.path=inputpath
wordcount.output.path=outputpath

for example:
hd.fs=hdfs://javapandit1:9030
hd.jt=javapandit1:9010
wordcount.input.path=/user/hadoop/test/input/
wordcount.output.path=/user/hadoop/test/output

6.    Place the applicationContext.xml inside src package
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xmlns:hdp="http://www.springframework.org/schema/hadoop"
         xmlns:context="http://www.springframework.org/schema/context"
         xsi:schemaLocation="http://www.springframework.org/schema/beans                                                            
         http://www.springframework.org/schema/beans/spring-beans-3.0.xsd                                                    
         http://www.springframework.org/schema/context                                             
         http://www.springframework.org/schema/context/spring-context-3.0.xsd                                                        
         http://www.springframework.org/schema/hadoop
         http://www.springframework.org/schema/hadoop/spring-hadoop.xsd"> 
        <context:property-placeholder location="hadoop.properties"/>

    <hdp:configuration>
           fs.default.name=${hd.fs}
           mapred.job.tracker=${hd.jt}
           </hdp:configuration>
          <hdp:job id="wordcountJob"
              input-path="${wordcount.input.path}"
              output-path="${wordcount.output.path}"

    libs="file:/home/hadoop/hadoop-0.20.2-cdh3u0/hadoop-examples-*.jar"
    jar-by-"org.apache.hadoop.examples.WordCount"
    mapper="org.apache.hadoop.examples.WordCount.TokenizerMapper"
    reducer="org.apache.hadoop.examples.WordCount.IntSumReducer"/>
     
    <hdp:job-runner id="runner" run-at-startup="true"   
  job-ref="wordcountJob" />       
      </beans>

  Note:

1.  Make sure that there is no space at end of line of fs.default.name and mapred.job.tracker values otherwise you will get URI exception.

Caused by: java.net.URISyntaxException: Illegal character in authority at index 7: hdfs://javapandit1:9030

2.    For running eclipse from windows system we need to set the setJarByClass.

In applicationContext.xml we need to set the jar-by-class attribute value.
Otherwise we will get the following error.

WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.WordCount$TokenizerMapper

3.    Careful while setting the value of jar-by-class, it expects class with package name otherwise will get the following error.

Example:
jar-by-"WordCount"

Caused by: org.springframework.beans.TypeMismatchException: Failed to convert property value of type 'java.lang.String' to required type 'java.lang.Class' for property 'jarByClass'; nested exception is java.lang.IllegalArgumentException: Cannot find class [WordCount]

Caused by: java.lang.ClassNotFoundException: WordCount

7.    Create a class to run the wordcount job

package com.sample;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class WordCount {

      private static final Log log=LogFactory.getLog(WordCount.class);

      public static void main(String[] args) throws Exception {

    AbstractApplicationContext context = new ClassPathXmlApplicationContext(
                        "applicationContext.xml",WordCount.class);

            log.info("Wordcount with HDFS copy Application Running");

            context.registerShutdownHook();
   }
}


8.    Run the WordCount class as run as java application.

9.    If it runs successfully we will get the output file in specified output path.

Project structure is as follows:
Picture
0 Comments



Leave a Reply.

    Author
    Ramu Gayapaka is the Founder of Javapandit  Technologies online community.

     

    Archives

    November 2013
    February 2013
    September 2012

    Categories

    All
    Architect's Page
    Management

    RSS Feed

Powered by Create your own unique website with customizable templates.