What is the foundational Java API used to write to HDFS?

satya - 11/24/2018, 5:21:18 PM

HDFS design PDF at Apache.org

HDFS design PDF at Apache.org

satya - 11/24/2018, 5:21:51 PM

HDFS Java API Reference

HDFS Java API Reference

Search for: HDFS Java API Reference

satya - 11/24/2018, 5:23:45 PM

FileSystem Java API

FileSystem Java API

satya - 11/24/2018, 5:27:45 PM

Hadoop docs at Apache.org

Hadoop docs at Apache.org

satya - 11/24/2018, 5:32:14 PM

Why am I asking this question?

HDFS can be seen as any other file system where files can be read or written like byte streams.

However there are a number file storage types supported by HDFS to facilitate Mapreduce where these files can be split and worked in parallel. Some examples are map files, sequence files, avro files etc.

How does client tools like fs, sqoop, hive import uses the underlying Java API to write files divided correctly into blocks that can be used for parallel processing.

satya - 11/24/2018, 5:32:38 PM

How do you write sequence files to HDFS using Java API?

How do you write sequence files to HDFS using Java API?

Search for: How do you write sequence files to HDFS using Java API?

satya - 11/24/2018, 5:43:34 PM

Here is an article on how to write sequence files to HDFS using Java API

Here is an article on how to write sequence files to HDFS using Java API

satya - 11/24/2018, 5:44:48 PM

Summary of the article


//Using java classes
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.Text;

//Using mapreduce
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;