Implementation of external data parallel computing by aggregator 07/06 Update SLTechnology News&Howtos

Implementation of external data parallel computing by aggregator

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Text parallelism

SPL can roughly divide the text file into N segments by volume and read only one of them. For example, cardInfo.txt stores 10 million pieces of population information, divides it into ten parts, takes the second, and the code can be written:

AB1=file ("d:\\ temp\\ cardInfo. Txt")

2=A1.import@t (; 2:10) / read directly into memory 3=A1.cursor@t (; 2:10). Fetch@x () / cursor read

Roughly segmented by volume, rather than accurately segmented by the number of rows, in order to improve segmenting performance. For example, if you look at the first few fields of A2 or A3 in IDE, you can see that the number of rows is not exactly 1 million (depending on the specific data):

IndexcardNonamegenderprovincemobile1308200310180525Alison ClintonfemaleIdaho10246274902709198311300191Abby WoodfemaleKansas1966846631005199807060610George BushmaleCalifornia1019879226... 1000005405199907050256Mark RowswellmaleIdaho1168620176

Segmented reading can be applied to multithreaded computing to improve read performance. For example, if you use two threads to read cardInfo.txt, each thread calculates the number of lines in this segment, and finally merges them into the total number of lines, you can use the following code:

5fork to (2) = A1.cursor@t (; A5count 2) .total (count (1)) / 2 thread segmented 6=A5.sum ()

/ merge result

The statement fork statement is suitable for the situation where the algorithm is more complex. When the algorithm is relatively simple, it can be read by segments directly with cursor@m. For example, the previous code can be rewritten as follows:

7=A1.cursor@tm (; 2) .total (count (1)) / 2 thread segmentation

The above code specifies the number of threads, and if the number of threads is omitted, use "parallet limit" in the configuration file as the default number of threads. Assuming parallet limit=2, the above code can be rewritten as follows:

8=A1.cursor@tm () .total (count (1)) / default thread segmentation

In order to verify the performance difference before and after segmented reading, an algorithm is designed to calculate the total number of rows of cardInfo.txt with single thread and 2 threads respectively. You can see a significant improvement in performance:

11=now ()

12=A1.cursor@t () .total (count (1))

13=interval@ms (A11 now ()) / unsegmented, 20882ms14

15=now ()

16=A1.cursor@tm (; 2) .total (count (1))

17=interval@ms (A15 now ()) / 2 Thread Segmentation, 12217msJDBC parallel

When fetching data through JDBC, it is sometimes encountered that although the database load is not heavy, the performance of fetching is still poor. In this case, parallel fetching can be used to improve performance.

For example, the Oracle database has a call record table callrecord, the number of records is 1 million, the index field is callTime, and the data is basically distributed evenly according to this field. When using non-parallel fetching, you can find that the performance is not satisfactory. The code is as follows:

AB1=now () / record time for testing performance 2=connect ("orcl")

3=A2.query@x ("select * from callrecord")

4=interval@ms (A1 now ()) / non-parallel fetch, 17654ms

After changing to 2-thread parallel fetch, you can see that the performance has improved significantly. The code is as follows:

6=now ()

7=connect ("orcl"). Query@x ("select min (callTime), max (callTime) from callrecordA") 8: 2. (range (A7.room1 callTime) elapseurs (A7.room2), ~: 2) / time interval parameter list 9fork A8=connect ("orcl") 10

= B9.query@x ("select * from callrecordA where callTime > =? and callTime

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.