Bigdata

A Shortcut to Big Data - Lesson Plan 1, Azure and Spark: Smart Programmer Series

A Shortcut to Big Data - Lesson Plan 2, Installing Spark stand alone on windows 10: Smart Programmer Series

2) Apache Flume

14-Jul-19

Apache Flume

3) Apache Pig, Grunt

15-Jul-19

Apache Pig, Grunt

Arima Models and Python

5) AVRO - Schemas

16-Aug-17

AVRO - Schemas

6) AWS EC2

25-Mar-19

AWS EC2

7) AWS General

6-May-19

AWS General

8) Azure

19-Feb-19

Azure

9) Azure basics

6-Aug-19

Azure basics

10) Azure functions

13-Nov-19

Azure functions

11) Azure storage

25-Oct-19

Azure storage

12) Cassandra

25-Jul-17

Cassandra

13) Cassandra

7-Jan-19

Cassandra

14) Cloud vendor space

26-Oct-19

Cloud vendor space

Cloud, Salesforce, Amazon

16) Couchdb

6-Jan-19

Couchdb

17) Creating an EMR on aws

29-Apr-19

Creating an EMR on aws

18) Data factory basics

7-Nov-19

Data factory basics

Data factory outstanding questions

20) Essential Python

18-Oct-19

Essential Python

21) Firebase

6-Jan-19

Firebase

22) First mapreduce program

21-Mar-19

First mapreduce program

23) Generators in Python

1-Oct-19

Generators in Python

25) Hive

15-Jul-19

Hive

How do you store a big CSV file in HDFS?

How is HDFS and HBase different?

How is HDFS fault tolerant?

How to upgrade an azure VM? I have a VM in azure. As I was looking at the bill I realized it is not an RI - Reserved instance. With a reserved instance prices can be half to a third. So I wanted to convert this VM to a 3 year reserved instance. By their cost calculator that will come to about 300 dollars for 3 years.

However when I tried to upgrade the VM (They call it resize), I have realized this is an older VM and I am not able to reserve it for 3 years. So this article goes into how to find a suitable VM that can be reserved for 3 years and how do I migrate my OS disk and the data to this new VM

In addition the VHD (hard disk) I had seemed to be a nonmanaged disk. I needed to convert that to a managed disk as well. this article will go into that as well with some useful reads from microsoft azure site.

30) Key Bigdata references

23-Nov-18

Key Bigdata references

31) Manage azure budgets

7-Nov-19

Manage azure budgets

32) Mapreduce

21-Mar-19

Mapreduce

On Azure subscriptions

34) Parquet format

6-May-19

Parquet format

35) putty

25-Mar-19

putty

36) PySpark APIs

11-Sep-19

PySpark APIs

PySpark: PySpark on Windows 10: Installation Journal

Python Pending Research/Questions

Quick introduction to Spark

Resolving Imports and modules in Python

41) Spark on Azure

17-Aug-19

Spark on Azure

42) Sqoop

14-Jul-19

Sqoop

43) VSCode and python

16-Sep-19

VSCode and python

What is a Hadoop Cluster?

What is a hadoop edge server or node?

46) What is AWS Athena?

13-Jul-19

What is AWS Athena?

47) what is AWS CLI?

29-Apr-19

what is AWS CLI?

What is clustered resource management and YARN?

What is equivalent to S3 in Azure?

What is HDInsight architecture in Azure?

What is S3? How is it different from HDFS?

What is the foundational Java API used to write to HDFS?

Where is the schema catalogue in AWS?

Working with Azure Cloud Shell

working with typestubs in python