Bigdata_Spark_Consultant
Saturday, September 12, 2020
Python Challenges Program
Challenges program:
program 1:
#Input :ABAABBCA
#Output: A4B3C1
str1="ABAABBCA"
str2=""
d={}
for x in str1:
d[x]=d.get(x,0)+1
print(d)
for k,v in d.items():
str2+=k+str(v)
print(str2)
Output:
{'A': 4, 'B': 3, 'C': 1}
A4B3C1
program 2:
#Input :a4b3c2
#Output: aaaabbbcc
str1="a4b3c2"
str2=""
for ch in str1:
if ch.isalpha():
x=ch
else:
d=int(ch)
str2=str2+x*d
print(str2)
output:
aaaabbbcc
program 3:
#Reverse the middle words
#Input :Hi how are you python
#Output: Hi woh era uoy python
str1="Hi how are you python"
list1=[x for x in str1.split(" ")]
print(list1[0],end=" ")
for i in range(1,len(list1)-1):
print(list1[i][::-1],end=" ")
print(list1[len(list1)-1])
Output: Hi woh era uoy python
program 4:
#Input :B4A1D3
#Output: ABD134
str1="B4A1D3"
alphabet=[]
digit=[]
for x in str1:
if x.isalpha():
alphabet.append(x)
else:
digit.append(x)
result=(sorted(alphabet)+sorted(digit))
print("".join(result))
Output: ABD134
program 5:
#Input :aaaabbbccz
#Output: 4a3b2c1z
str1="aaaabbbccz"
str2=""
n=len(str1)
previous=str1[0]
count=1
i=1
while i
Monday, September 7, 2020
Machine Learning Basics
Machine learning:
Structured Learning:
1)Classification
2)Regression
Unstruture Learning:
1)Clustering
2)Association
Active Learning
Passive Learning
Simple Linear Regression:
Multiple Linear Regression:
Ploynomial Regression:
Ridge Regression/Tihknov regularisation
Gradient Descent
Batch Gradient descent
Stochastic descent
Logicl regression
Binary classification
multi class classification
multi label classificaton
Decision trees
(non linear classification)
Random Forest(Eager learners)
clustering with K-means
K-nearest neighbours(Lazy learners)
Feature selection/Extraction
One hot encoding,bag of words
stemming and lemmatization(from nltk)
Picture
OCR-Optical Character Regognition
POS=Point of Interest
SIFT-Scale Invariant Feature Transform
SURF-Speeded up Robust Feature
Dimensationality Reduction with PCA
(Principal Component Analysis)
SVM-Support Vector Machine
Artifical neaural network
CNN/Convnet -convoluntial neural network
Statistical Learning
Probability distribution
Predictor
PAC method
overfitting
RFE(Recursive feature elimination)
outlier removal
Explanatory analysis
Gradient descent
inductive bias
Finite hypothesis class
K-means
elbow method
Distortion
support vector meachine
Friday, August 7, 2020
PARQUET-TOOLS command
1. To read metadata - I'll use my home directory(/home/akshay/) to place the parquet files in, which we want to read.
$ cd ~
get the parquet file in local using the filesystem get command:
$ hdfs dfs -get /user/akshay/ad/000001_0
Navigate to the directory where your parquet-tools utility script is installed(I have it at /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4/bin)
$ cd /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4/bin
$ ./parquet-tools meta /home/akshay/000001_0
2. To read data - Since we are already in the directory of parquet-tools utility, use the traditional cat parameter to read the data of parquet file.
$ ./parquet-tools cat /home/akshay/000001_0
3. To read schema -
$ ./parquet-tools schema /home/akshay/000001_0
4. To read top n rows(eg 10 rows)-
$ ./parquet-tools head -n 10 /home/akshay/000001_0
Saturday, July 11, 2020
Subscribe to:
Posts (Atom)
Python Challenges Program
Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...
-
Conditional Functions Return Type Name(Signature) Description T if(boolean testCondition, T valueTrue, T valueFalseOrN...
-
PYSPARK Regular Expression Operations read data from hdfs data is unstructured text data we have to clean the data(regular expressio...
-
>>> from pyspark.sql.types import Row >>> from date...