What is the foundational Java API used to write to HDFS?
satya - 11/24/2018, 5:21:51 PM
HDFS Java API Reference
HDFS Java API Reference
satya - 11/24/2018, 5:32:14 PM
Why am I asking this question?
HDFS can be seen as any other file system where files can be read or written like byte streams.
However there are a number file storage types supported by HDFS to facilitate Mapreduce where these files can be split and worked in parallel. Some examples are map files, sequence files, avro files etc.
How does client tools like fs, sqoop, hive import uses the underlying Java API to write files divided correctly into blocks that can be used for parallel processing.
satya - 11/24/2018, 5:32:38 PM
How do you write sequence files to HDFS using Java API?
How do you write sequence files to HDFS using Java API?
Search for: How do you write sequence files to HDFS using Java API?
satya - 11/24/2018, 5:43:34 PM
Here is an article on how to write sequence files to HDFS using Java API
Here is an article on how to write sequence files to HDFS using Java API
satya - 11/24/2018, 5:44:48 PM
Summary of the article
//Using java classes
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.Text;
//Using mapreduce
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;