Saturday, March 30, 2019

Program for Reading JSON file and save to CSV file in python using Pandas/Spark

Program for Reading JSON file and save to CSV file in python using Pandas

In [1]: import pandas as pd                                                    
In [2]: import json                                                            
In [3]: import csv                                                             
In [4]: with open('/home/hadoop/git.json', 'r') as f:
   ...:     dicts = json.load(f)
   ...:                                                      
In [5]: from pandas.io.json import json_normalize            
In [6]: df = pd.DataFrame(dicts)                             
In [7]: df.dtypes                                            
Out[7]:
assignee              object
assignees             object


                                                                                                                               
Out[9]:
                                                title                                             labels            created_at            updated_at closed_at
0   #607   [{'id': 167756, 'node_id': 'MDU6TGFiZWwxNjc3NT]  2019-03-29T02:20:18Z  2019-03-29T02:20:40      None
1                            Share [{'id': 57548, 'node_id': 'MDU6TGFiZWw1NzU0OA'] 2019-03-29T02:17:52  2019-03-29T02:18:17      None
2        Setup  [{'id': 57548, 'node_id': 'MDU6TGFiZWw1NzU0OA']  2019-03-29T02:16:59Z  2019-03-29T02:17:36Z      None

n [89]: df1=df['title','labels','created_at','updated_at','closed_at']]  
                                                                                                                             
In [90]: df1.to_csv("/home/hadoop/file.csv", encoding="utf-8")        

Program for Reading JSON file and save to CSV file in SPARK

scala> val df = spark.read.option("multiline","true").option("inferschema","true").format("json").load("/home/hadoop/Desktop/vow/ramesh.json")
scala> df.select($"title" as "title",$"labels.name" as "name",$"created_at" as "created_at", $"updated_at" as "updated_at", $"closed_at" as "closed_at").show



scala> df.select($"title" as "title",$"labels.name" as "name",$"created_at" as "created_at", $"updated_at" as "updated_at", $"closed_at" as "closed_at").show
+--------------------+--------------------+--------------------+--------------------+---------+
|               title|                name|          created_at|          updated_at|closed_at|
+--------------------+--------------------+--------------------+--------------------+---------+

No comments:

Post a Comment

Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...