Bigdata

0) Apache Flume

14-Jul-19

Apache Flume

1) Apache Pig, Grunt

15-Jul-19

Apache Pig, Grunt

2) AVRO - Schemas

16-Aug-17

AVRO - Schemas

3) Cassandra

25-Jul-17

Cassandra

4) Cassandra

7-Jan-19

Cassandra

5) Cloud vendor space

26-Oct-19

Cloud vendor space

Cloud, Salesforce, Amazon

7) Couchdb

6-Jan-19

Couchdb

8) Firebase

6-Jan-19

Firebase

First mapreduce program

10) Google Colab

27-Dec-21

Google Colab

12) Hive

15-Jul-19

Hive

How do you store a big CSV file in HDFS?

How is HDFS and HBase different?

How is HDFS fault tolerant?

How to upgrade an azure VM? I have a VM in azure. As I was looking at the bill I realized it is not an RI - Reserved instance. With a reserved instance prices can be half to a third. So I wanted to convert this VM to a 3 year reserved instance. By their cost calculator that will come to about 300 dollars for 3 years.

However when I tried to upgrade the VM (They call it resize), I have realized this is an older VM and I am not able to reserve it for 3 years. So this article goes into how to find a suitable VM that can be reserved for 3 years and how do I migrate my OS disk and the data to this new VM

In addition the VHD (hard disk) I had seemed to be a nonmanaged disk. I needed to convert that to a managed disk as well. this article will go into that as well with some useful reads from microsoft azure site.

17) Key Bigdata references

23-Nov-18

Key Bigdata references

18) Mapreduce

21-Mar-19

Mapreduce

19) Nature of ETL and Spark

25-Oct-23

Nature of ETL and Spark

20) Parquet format

6-May-19

Parquet format

21) putty

25-Mar-19

putty

22) Sqoop

14-Jul-19

Sqoop

What is a Hadoop Cluster?

What is a hadoop edge server or node?

What is clustered resource management and YARN?

What is S3? How is it different from HDFS?

What is the foundational Java API used to write to HDFS?