What is the foundational Java API used to write to HDFS?

HDFS design PDF at Apache.org

HDFS Java API Reference

Search for: HDFS Java API Reference

FileSystem Java API

Hadoop docs at Apache.org

HDFS can be seen as any other file system where files can be read or written like byte streams.

However there are a number file storage types supported by HDFS to facilitate Mapreduce where these files can be split and worked in parallel. Some examples are map files, sequence files, avro files etc.

How does client tools like fs, sqoop, hive import uses the underlying Java API to write files divided correctly into blocks that can be used for parallel processing.

How do you write sequence files to HDFS using Java API?

Search for: How do you write sequence files to HDFS using Java API?

Here is an article on how to write sequence files to HDFS using Java API


//Using java classes
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.Text;

//Using mapreduce
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;