Number of executer is equal to no of partition
Number of executer is equal to no of tasks
Stages means that series of transformation
New stage gets created on shuffling
Nodemanager allocates the resources at the corresponding node
Driver and executer are logical
Block and partition are physical
Resource manager submitted the job and allocating resources to the overall cluster.
RDD,DATAFRAME,DATASET are storage object, that is distributed or split object, Although
SCALA objects are not distributed itself.
Spark Streaming:
Dstream=>RDD==>BATCH INTERVAL
==>Partition=>Block Interval-200ms
No of transformation=>No of RDD
Dstream is internally convert into RDD API
RDD API
DF API
DS API
DStream API ===>Transformation==> it has 2 types
(1)Stateful transformation
(2)Stateless transformation
Narrow Transformation:It is not involved shuffling
Wide Transformation>;It involves shuffling
Is it possible SQL operation for DStream?
Dataframe and Dataset are supporting sql operation
RDD-non sql operation
Horizontal merging->Join
Vertical merging=>Union
Number of cores =No of receiver+1
Recent Batch creation stateless processing stateful
7.00-7.01 B1(R1)
7.01-7.02 B2(R2) B1(R1) B1
7.02-7.03 B3(R3) B2(R2) B2,B1
7.03-7.04 B4(R4) B3(R3) B3,R2
7.04-7.05 B5(R5) B4(R4) B4,B3
Stateful:It is called as Windowing
A transformation involves multiple batch for processing
Stateless:
As transformation involves one batch for processing at a time
W.S=2, window size is equalvalent to the number of batches
Slide Interval:
SPARK STAGS and TASK:
master slave masterrpcurl
YARN RM NM 8082
Standalone Sparkmaster Sparkworker 7077
Mesos Mesos master mesosworker 5050
Spark app-->;2 programs
1)Driver->Master
2)Executor-->Slave
task-->partition under task
partition->data
excueter->logical entity, space and cores
Number of executer is equal to no of tasks
Stages means that series of transformation
New stage gets created on shuffling
Nodemanager allocates the resources at the corresponding node
Driver and executer are logical
Block and partition are physical
Resource manager submitted the job and allocating resources to the overall cluster.
RDD,DATAFRAME,DATASET are storage object, that is distributed or split object, Although
SCALA objects are not distributed itself.
Spark Streaming:
Dstream=>RDD==>BATCH INTERVAL
==>Partition=>Block Interval-200ms
No of transformation=>No of RDD
Dstream is internally convert into RDD API
RDD API
DF API
DS API
DStream API ===>Transformation==> it has 2 types
(1)Stateful transformation
(2)Stateless transformation
Narrow Transformation:It is not involved shuffling
Wide Transformation>;It involves shuffling
Is it possible SQL operation for DStream?
Dataframe and Dataset are supporting sql operation
RDD-non sql operation
Horizontal merging->Join
Vertical merging=>Union
Number of cores =No of receiver+1
Recent Batch creation stateless processing stateful
7.00-7.01 B1(R1)
7.01-7.02 B2(R2) B1(R1) B1
7.02-7.03 B3(R3) B2(R2) B2,B1
7.03-7.04 B4(R4) B3(R3) B3,R2
7.04-7.05 B5(R5) B4(R4) B4,B3
Stateful:It is called as Windowing
A transformation involves multiple batch for processing
Stateless:
As transformation involves one batch for processing at a time
W.S=2, window size is equalvalent to the number of batches
Slide Interval:
SPARK STAGS and TASK:
master slave masterrpcurl
YARN RM NM 8082
Standalone Sparkmaster Sparkworker 7077
Mesos Mesos master mesosworker 5050
Spark app-->;2 programs
1)Driver->Master
2)Executor-->Slave
task-->partition under task
partition->data
excueter->logical entity, space and cores
No comments:
Post a Comment