Tuesday, July 7, 2020

Hadoop and Cloud Analytics

Cloud Analytics (Hadoop + Cloud) Course 5th Session

1.Cloud Analytics -future of Hadoop and How cloud is affecting the hadoop
space

2. Hadoop 1 Architecture

3. Hadoop components

Components  -MasterNodes  -WorkerNode
HDFS        -NameNode,    SNN-DN
Processing   Job Tracker  Task Tracker

4.Hadoop 1 Limitation
   1.SPOF of Name Node
   2.High Intensive Job Tracker

Hadoop 2
   SPOF overcome by High Availability/HDFS Federation
   High Intensive Job Tracker overcome by YARN-Yet Naother Resource Negotiator


Job Tracker-->Resource Manager--->Application Manager
                                  Schedular

Task Tracker---> Node Manager---->Container
                                  Application Master

Hadoop1 suppose if you are processing -then it is called as Mapreduce1
Hadoop2(YARN) if you are processing-then it is called as Mapreduce2
Hadoop2-HA+Yarn
Hadoop1-SPOF Name Node
Hadoop2-HA+Checkpoint server(SNN)+Journal Node Control by Zookeeper

Hadoop1- You can run at a time only job
Hadoop2-You can run at a time multiple jobs.
----------------------------------------------------------------------
The FIFO Scheduler -First in First Out-->Default in Hadoop1-Job1--->Job2
CapacityScheduler, -Configures capacity in XML file-Job1-1 hr Job2-30 minutes-75%-job1 25% job2, both are running at the same time,
and FairScheduler-
-------------------------------------
OpenSource vs Distribution
Hadoop opensource-Enterprise  they are not using as it is.
Hadoop Distribution-Cloudera+Hortonworks ,MapR

local filesystem Commands:
ls
ls -lrt
mkdir test
ls -ltr test
cd test
cat>>a.txt
ss
s
s
s
pwd
cat a.txt
cp
mv
rm

HDFS command:
hadoop fs -put inputfile.txt test.txt
hadoop fs -ls /user/cloudera
hadoop fs -cat /user/cloudera/test.txt
hadoop fs -mkdir /user/cloudera/todayclass
hadoop fs -ls /user/cloudera
hadoop fs -copyFromLocal sample.txt  /user/cloudera/todayclass/simple.txt
hadoop fs -get /user/cloudera/todayclass/simple.txt bada.txt
hadoop fs -Ddfs.replication=4 -cp /user/cloudera/todayclass/simple.txt  /user/cloudera/sample.txt

sudo -u hdfs hdfs fsck /user/cloudera/todayclass/ -files -blocks -locations



Data sec ops
Data Engineer
Job market
singleskil -DBA-Oracle
Broader skill
oracke DBA-No sql, c
cassandra,hbase,mongo
db,AWS RDS
----------------------
supply and Demand
supply less-Demand
supply more-
----------------------------
Hadoop 1
Hadoop components
Hadoopm1 limitations
Hadoop 2 HA
Hadoop 2 high JT-YARN
Hadoop comments

Fault tolerance

Map reduce:
----------------------------
processing logic in hadoop

each block has key-value

Map reduce:
Data set->splitting as Data block

Each block is 64 MB

6MB-block more-meta data -network banwidth-
600MB-
900 MB data set-600 MB
600MB-1 block
300 MB-1 block

900 MB-Default-64Mb-15 block
       600   -2 block
       6     -150 block


Mapper(key id value-number) -User Define

hfs-sitexml
dfs.block.size
13421418

Mapper(key-id value-number)- User Define
Sort and Shuffling(grouping)
Reduce and compute

















































































No comments:

Post a Comment

Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...