Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to transpose RDD or MLLib matrix

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to RDD or MLLib matrix transpose operation, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

How to transpose the matrix of Spark Mllib or a RDD. Spark Mllib matrix has a variety of forms, distributed and non-distributed, non-distributed here let alone the wave tip, very simple, because it is based on arrays. While distributed storage is based on RDD, the problem becomes how to transpose a RDD.

First, let's introduce what a transpose operation is:

According to the definition in the encyclopedia, the matrix obtained by exchanging the rows and rows of a matrix is the transpose of the matrix.

To swap the rows and rows of a RDD, the main ideas are as follows:

1. Convert RDD first, and give each line a unique line number (row, rowIndex).

2, for each line of RDD, convert it to (value, colIndex), and organize it to (colIndex.toLong, (rowIndex, value))

3, carry out flatmap

4, after the completion of step 3, we only need to group by 3key and sort by its key to get the order of the transformed columns.

5. After completing step 4, we can build new rows according to the (rowIndex, value) of each row, using the subscript and its value to ensure the converted order of each row.

At this point the conversion is complete.

The specific steps are as follows:

Def transposeRowMatrix (m: RowMatrix): RowMatrix = {

Val transposedRowsRDD = m.rows.zipWithIndex.map {case (row, rowIndex) = > rowToTransposedTriplet (row, rowIndex)}

.flatMap (x = > x) / / (newRowIndex, (newColIndex, value))

.groupByKey

SortByKey () .sortByKey (_. _ 2) / / A pair of row is sorted to remove the index

.map (buildRow) / / use indexes and values to rebuild each row and remove the index

New RowMatrix (transposedRowsRDD)

}

/ / convert each line

Def rowToTransposedTriplet (row: Vector, rowIndex: Long): Array [(Long, (Long, Double))] = {

Val indexedRow = row.toArray.zipWithIndex

IndexedRow.map {case (value, colIndex) = > (colIndex.toLong, (rowIndex, value))}

}

/ / build a new line

Def buildRow (rowWithIndexes: Iterable [(Long, Double)]): Vector = {

Val resArr = new Array [Double] (rowWithIndexes.size)

RowWithIndexes.foreach {case (index, value) = >

ResArr (index.toInt) = value

}

Vectors.dense (resArr)

}

test

Prepare data

Val observations = sc.parallelize (

Seq (

Vectors.dense (1.0,10.0,100.0)

Vectors.dense (2.0,20.0,200.0pm 2.0)

Vectors.dense (3.0,30.0,300.0)

)

)

Generating matrix

Val mat: RowMatrix = new RowMatrix (observations)

You will find that the ranks have been changed.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report