NetApp-Hadoop-NFS-Connector

This site is obsolete. Please use NetApp FAS NFSConnector product from now.netapp.com

Overview

The Hadoop NFS Connector allows Apache Hadoop (2.2+) and Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint. The NFS Connector supports two modes: (1) secondary filesystem - where Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a second storage endpoint, and (2) primary filesystem - where Hadoop/Spark runs entirely on a NFSv3 storage server.

The code is written in a way such that existing applications do not have to change. All one has to do is to copy the connector jar into the lib/ directory of Hadoop/Spark. Then, modify core-site.xml to provide the necessary details.

NOTE: The code is in beta. We would love for you to try it out and give us feedback.

This is the first release and it does the following:

Connects to a NFSv3 storage server supporting AUTH_NONE or AUTH_SYS authentication method.
Works with Apache Hadoop (vanilla) 2.2 or newer, Hortonworks HDP 2.2 or newer
Supports all operations defined by the Hadoop FileSystem interface.
Pipelines the READ/WRITE requests to utilize the underlying network (works fine with 1GbE and 10GbE networks)

We are planning to add these in the near future:

Ability to connect to multiple NFS endpoints (multiple IP addresses). This allows for even more bandwidth.
Integrate with Hadoop user authentication

How to use

Once the NFS connector is configured, you can easily invoke it from the command-line using the Hadoop shell.

  console> bin/hadoop fs -ls nfs://<nfs-server-hostname>:2049/ (if using as secondary filesystem)
  console> bin/hadoop fs -ls / (if using as default/primary filesystem)

When new jobs are submitted, you can simply provide it as an input or output path or both:

  (assuming NFS is used as a secondary filesystem)
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in /tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort /tera/in nfs://<nfs-server-hostname>:2049/tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out

Configuration

Compile the project ``` console> mvn clean package ```
Copy the jar file to the shared common library directory based on your Hadoop installation. For example, for hadoop-2.4.1: ``` console> cp target/hadoop-connector-nfsv3-1.0.jar $HADOOP_HOME/share/hadoop/common/lib/ ```
Add parameters of NFSv3 connector into core-site.xml located in Hadoop configuration directory (e.g., for hadoop-2.4.1: $HADOOP_HOME/conf) ``` fs.defaultFS nfs://:2049 fs.nfs.configuration /nfs-mapping.json fs.nfs.impl org.apache.hadoop.fs.nfs.NFSv3FileSystem fs.AbstractFileSystem.nfs.impl org.apache.hadoop.fs.nfs.NFSv3AbstractFilesystem ```
Start Hadoop. NFS can now be used inside Hadoop.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src		src
CONFIG.md		CONFIG.md
DEPENDS		DEPENDS
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NetApp-Hadoop-NFS-Connector

Overview

How to use

Configuration

About

Releases 2

Packages

Contributors 5

Languages

NetApp/NetApp-Hadoop-NFS-Connector

Folders and files

Latest commit

History

Repository files navigation

NetApp-Hadoop-NFS-Connector

Overview

How to use

Configuration

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages