Sunday, March 22, 2020

Hadoop Arch

Hadoop 1.0
1.First job submits to Job Tracker
2.Job Tracker contact Name node for Data
3.Then it give back the information to client
4.Meanwhile Job Tracker send information to Task Tracker about the job which
gonna come
5.client reaches TaskTracker and give job details ---(Jar path,Data path)
6.Task Tracker starts the job (Map Reduce)


Hadoop 2.0
1.Clients submits the job to Resource Manager
2.RM contacts Name Node
3.Name node give info to RM(Input split transformation)
4.RM give back information to client
5.Meanwhile RM send information to respective node manager to create container
where the data is present also it send an Info to one manager to create application master
6.The data reaches container and Application master submits the job inside container and monitors the job inside container
7.NM monitors the life of container
8.Application master gets the track of Map Reduce

Monday, March 16, 2020

Find the excess amount of the blood

select tab.bg as Bloog_group,tab.supp_quan-tab.don_quan as Excess_blood
from
(select donotor.bg as bg,sum(donotor.quan) as don_quan,sum(supplier.quan) as supp.quan from
donotor join supplier
on donotor.bg=supplier.bg
group by donotor.bg) tab
where tab.don_quan

Thursday, March 5, 2020

Hadoop History and Sources

2003----Google---GFS---Google File System---DFS---But very powerful
2004----Google---MapReduce----Those guys---NDFS---Lucene Search Engine--Took MR Paper
2005----NUTCH--They release a process and storage with NDFS and Mapreduce
2006----Doug cutting---Hadoop---HDFS/MapReduce
2008---Facebook---Hive---SQL---HQL
2009-2010----Cloudera--
2011-2012---Horton works ---(spark) 2015

Sources:
---------
Business process-1X
OLTP -online transaction processing

Human-10x
Email
Documents
Social


Machine-100x
CTV
Mike
Sensors
Satellite(Maps,DTH)

Cluster setup:
--------------
Edge node should not part of storage node/master/slave node
that is not recommendable
Hadoop NameNode
Hadoop DataNode
Hadoop Edge Node

Interface medium:
Putty to connect Edge Node

In the putty IP Address[HostName] for connecting Edge Node
Username
password


Namenode:
cluster administration
 heartbeat
 Replication
 Balancing
Meta management

Over replication
Under replication


Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...