In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
As a lightweight data exchange format, JSON has gradually become one of the mainstream data types because it is easy to read, write and interact. Most common programming languages provide interfaces for reading and parsing JSON, but how to filter, calculate and expand multi-tier JSON data into two-dimensional data requires developers to have a headache. This article will share with you how to use the aggregator SPL (structured processing language) to easily solve the problem of parsing and storing JSON data.
Import and Analysis of JSON data File
Depending on the complexity of the JSON data file and the different requirements, we will discuss it in three situations:
1. Single layer JSON data file
Let's start with a simple example to see how to read the JSON file of a normal key-value mapping. The following is the JSON data file for the order information of a product:
SPL imports JSON data files with a simple script:
= json (file ("product.json") .read ()
Without writing a loop function or parsing a JSON object, you can see that the JSON data file has been converted to a two-dimensional data sequence table:
two。 Multi-tier JSON data file with the same structure of detail data
Next, let's take a look at how to deal with multi-tier JSON files. Here is the JSON data file orders.json that we will use:
As you can see, JSON data is divided into two layers, the first layer is "shipper's country" and "shipper's area", and the second layer is detailed data. Now we want to import 2013 orders from North and South China, let's see if it is implemented in SPL.
This time let's first define the parameters: Country, Area and Year, which correspond to the country of the shipper to be imported, the region of the shipper, and the year of the order date, respectively. By defining parameters, when you import different countries, regions, and years later, you no longer need to modify the SPL, just provide the corresponding parameter values. It should be noted here that the value of Area is a sequence, and the default value is [North China, South China], so that data from multiple regions can be read at the same time. As shown below:
Let's take a look at the SPL script:
A
B
one
= json (file ("orders.json") .read ()
= A1.select (shipper's country = = Country & & Area.contain (shipper's area))
two
= B1.news (regional order; B1. Shipper's country: shipper's country, B1. Shipper's area: shipper's area, # 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13)
= A2.select (year (date of ordering) = = Year)
Let's explain it in detail:
The first step: A1 = json (file ("orders.json"). Read ()), import the JSON file to generate the sequence table. After execution, you can see that the JSON data is displayed hierarchically (in the aggregator designer, we can view the next layer of detail data by double-clicking the "area order" value):
Step 2: you can see from the figure that the fields of "shipper's country" and "shipper's area" are on the first level, so you can directly call A1.select (shipper's country = = Country & & Area.contain (shipper's area)) in B1 to filter out the data of North and South China.
Step 3: "Regional order" is the detailed data we want, but it does not include the fields "shipper's country" and "shipper's region", so we need to combine these two fields with the detail fields of regional orders. Such complex requirements can be solved in one step through the news function. From the expression of A2 lattice, we can see that the parameters are not complex. Shipper's country, B1. The shipper's area and all the fields of the "regional order" can be put together. Take a look at the execution result:
Step 4: in B2, filter out the data of 2013 for the year of "order date" by = A2.select (year) = = Year).
Finally, let's execute it, and we can see that the resulting two-dimensional table fully meets the requirements:
3. Multi-tier JSON data files with different structures of detail data
Because of the complexity of data sources, the details of JSON data files may have different structures. Let's take a look at how to deal with this kind of JSON files. Here is the JSON file we are going to use, sales.json:
The first layer takes the year and month as the dimension, the second layer takes the country as the dimension, and the third layer is the detailed data. However, in the detailed data, due to different sales channels, the structure is not completely consistent, such as "PRODUCTLINE", "ADDRESSLINE1" and "ADDRESSLINE2" are not necessary in the detailed data. Now, we are going to read from the data the sales data of the United States and Canada, two big countries in North America in 2017 and 2018.
For ease of use, let's define two parameters: Year and Country:
Let's take a look at SPL:
A
B
one
= json (file ("sales.json") .read ()
= A1.select (Year.contain (YEAR))
two
= B1.news (MONTHLY_SALES;B1.YEAR:YEAR,B1.MONTH:MONTH,#1,#2)
= A2.select (Country.contain (COUNTRY))
three
= B2.news (NATIONAL_MONTHLY_SALES;B2.YEAR:YEAR,B2.COUNTRY:COUNTRY,ORDERNUMBER,QUANTITYORDERED,PRICEEACH,ORDERLINENUMBER,SALES,ORDERDATE,STATUS,QTR_ID,PRODUCTLINE,MSRP,PRODUCTCODE,CUSTOMERNAME,PHONE,ADDRESSLINE1,ADDRESSLINE2,CITY,STATE,POSTALCODE,TERRITORY,CONTACTLASTNAME,CONTACTFIRSTNAME,DEALSIZE)
Let's explain it in detail.
Grid A1 still imports the JSON file as a multi-level table.
Since the year field is on the first level, calling A1.select (Year.contain (YEAR)) directly in the B1 cell can filter out the data for 2017 and 2018 years from A1:
Next, we use the news function again in the A2 box, which is used to combine the year and month field with the monthly sales details of the next layer:
In the B2 grid, we filter out the data of the United States and Canada through A2.select (Country.contain (COUNTRY)).
Then in the A3 grid, we use the news function again, this time we need to put the YEAR,MONTH,COUNTRY and the country monthly sales details at the next level into a sequence table. Since the structure of the detail data in this file may be different, we use the full number of field names as parameters to create the order table. The value of the field is set according to the name, and the value without this field is null by default (for example, the "ADDRESSLINE1" and "ADDRESSLINE2" fields in the following figure):
After execution, you can see the final result:
At this point, a multi-tier JSON file with inconsistent detail data structure is successfully expanded into a two-dimensional table.
Order table into the database
The previous introduction of the common JSON data file import and parsing, followed by the problem of data storage. The reason why it is said to be stored at the end is not because of complexity, on the contrary, because the last generated in the previous example is the order table, so it is very simple and convenient to update the database. Take the order table in example 2 of the previous import of JSON as an example:
A
one
= file ("orders.json"). Read (). Import@j (). Select (shipper country = = Country & & shipper area = = Area). Regional order .select (year (order date) = = Year) .derive (Country: shipper country, Area: shipper region)
two
= connect ("demo") .update (A1, order; order ID)
As you can see, it only takes a script to update the database!
Here is more common through the primary key update, with the order table A1 through the primary key order ID to update the order table in the database. The update function in SPL has many options to meet more updated data requirements, so I won't repeat them here.
Therefore, as long as you use the right tools, import and parse from JSON files to data storage, you can easily deal with tedious tasks. In addition to JSON data files, the aggregator SPL also supports a variety of data sources, which will be shared with you through more articles in the future.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.