Bigdata_Spark_Consultant

Saturday, September 12, 2020

Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d.get(x,0)+1 print(d) for k,v in d.items(): str2+=k+str(v) print(str2) Output: {'A': 4, 'B': 3, 'C': 1} A4B3C1 program 2: #Input :a4b3c2 #Output: aaaabbbcc str1="a4b3c2" str2="" for ch in str1: if ch.isalpha(): x=ch else: d=int(ch) str2=str2+x*d print(str2) output: aaaabbbcc program 3: #Reverse the middle words #Input :Hi how are you python #Output: Hi woh era uoy python str1="Hi how are you python" list1=[x for x in str1.split(" ")] print(list1[0],end=" ") for i in range(1,len(list1)-1): print(list1[i][::-1],end=" ") print(list1[len(list1)-1]) Output: Hi woh era uoy python program 4: #Input :B4A1D3 #Output: ABD134 str1="B4A1D3" alphabet=[] digit=[] for x in str1: if x.isalpha(): alphabet.append(x) else: digit.append(x) result=(sorted(alphabet)+sorted(digit)) print("".join(result)) Output: ABD134 program 5: #Input :aaaabbbccz #Output: 4a3b2c1z str1="aaaabbbccz" str2="" n=len(str1) previous=str1[0] count=1 i=1 while i

Monday, September 7, 2020

Machine Learning Basics

Machine learning: Structured Learning: 1)Classification 2)Regression Unstruture Learning: 1)Clustering 2)Association Active Learning Passive Learning Simple Linear Regression: Multiple Linear Regression: Ploynomial Regression: Ridge Regression/Tihknov regularisation Gradient Descent Batch Gradient descent Stochastic descent Logicl regression Binary classification multi class classification multi label classificaton Decision trees (non linear classification) Random Forest(Eager learners) clustering with K-means K-nearest neighbours(Lazy learners) Feature selection/Extraction One hot encoding,bag of words stemming and lemmatization(from nltk) Picture OCR-Optical Character Regognition POS=Point of Interest SIFT-Scale Invariant Feature Transform SURF-Speeded up Robust Feature Dimensationality Reduction with PCA (Principal Component Analysis) SVM-Support Vector Machine Artifical neaural network CNN/Convnet -convoluntial neural network Statistical Learning Probability distribution Predictor PAC method overfitting RFE(Recursive feature elimination) outlier removal Explanatory analysis Gradient descent inductive bias Finite hypothesis class K-means elbow method Distortion support vector meachine

Friday, August 7, 2020

PARQUET-TOOLS command

1. To read metadata - I'll use my home directory(/home/akshay/) to place the parquet files in, which we want to read.

$ cd ~
get the parquet file in local using the filesystem get command:
$ hdfs dfs -get /user/akshay/ad/000001_0

Navigate to the directory where your parquet-tools utility script is installed(I have it at /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4/bin)

$ cd /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4/bin
$ ./parquet-tools meta /home/akshay/000001_0

2. To read data - Since we are already in the directory of parquet-tools utility, use the traditional cat parameter to read the data of parquet file.
$ ./parquet-tools cat /home/akshay/000001_0

3. To read schema -
$ ./parquet-tools schema /home/akshay/000001_0

4. To read top n rows(eg 10 rows)-
$ ./parquet-tools head -n 10 /home/akshay/000001_0

Bigdata_Spark_Consultant

Saturday, September 12, 2020

Python Challenges Program

Monday, September 7, 2020

Machine Learning Basics

Friday, August 7, 2020

PARQUET-TOOLS command

Saturday, July 11, 2020

SQL Advanced

Python Challenges Program

Blog Archive

Labels