Spark Python operation command three 10/25 Update SLTechnology News&Howtos

Spark Python operation command three

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

12 data format

The original data split or intercepted by [[upright 3percent, upright 5'], [upright 4percent, upright 6'], [upright 4percent, upright 5'], [upright 4percent, upright 2']] can be used to obtain the corresponding column data through x [0], x [1] in map.

Can be converted to key-value data format through map for example: df3 = df2.map (lambda x: (x [0], x [1]))

Key-value data format

Each () represents a set of data, the first represents key and the second represents value.

3) PipelinedRDD type represents key-value form data

13 RDD type conversion

UserRdd = sc.textFile ("D:\ data\ people.json")

UserRdd = userRdd.map (lambda x: x.split (""))

UserRows = userRdd.map (lambda p: Row (userName = p [0], userAge = int (p [1]), userAdd = p [2]) UserSalary = int (p [3])) print (userRows.take (4))

Results: [Row (userAdd='shanghai', userAge=20, userName='zhangsan', userSalary=13), Row (userAdd='beijin', userAge=30, userName='lisi', userSalary=15)]

2) create a DataFrame

UserDF = sqlContext.createDataFrame (userRows)

Query fields through sql statement

From pyspark.conf import SparkConf

From pyspark.sql.session import SparkSession

From pyspark.sql.types import Row

If name = 'main':

Spark = SparkSession.builder.config (conf = SparkConf (). GetOrCreate ()

Sc = spark.sparkContextrd = sc.textFile ("D:\ data\ people.txt") rd2 = rd.map (lambda x:x.split (",") people = rd2.map (lambda p: Row (name=p [0], age=int (p [1])) peopleDF = spark.createDataFrame (people) peopleDF.createOrReplaceTempView ("people") teenagers = spark.sql ("SELECT name" Age FROM people where name='Andy' ") teenagers.show (5) print (teenagers.rdd.collect ()) teenNames = teenagers.rdd.map (lambda p: 100 + p.age). Collect () for name in teenNames: print (name)

15 detailed examples of dateFrame,sql,json usage

Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License at

Http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "ASIS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.

A simple example demonstrating basic Spark SQL features.

Run with:

. / bin/spark-submit examples/src/main/python/sql/basic.py

From future import print_function

$example on:init_session$

From pyspark.sql import SparkSession

$example off:init_session$$example on:schema_inferring$

From pyspark.sql import Row

$example off:schema_inferring$$example on:programmatic_schema$Import data types

From pyspark.sql.types import *

$example off:programmatic_schema$

Def basic_df_example (spark):

$example on:create_df$# spark is an existing SparkSessiondf = spark.read.json ("/ data/people.json") # Displays the content of the DataFrame to stdoutdf.show () # +-- +-+ # | age | name | # +-+-+ # | null | Michael | # | 30 | Andy | # | 19 | Justin | # +-+-+ # $example off:create_df$# $example on:untyped_ops$# spark Df are from the previous example# Print the schema in a tree formatdf.printSchema () # root# |-- age: long (nullable = true) # |-- name: string (nullable = true) # Select only the "name" columndf.select ("name"). Show () # +-+ # | name | # +-# | Michael | # | Andy | # | Justin | # +-+ # Select everybody, but increment the age by 1df.select (df ['name'] Df ['age'] + 1). Show () # +-+ # | name | (age + 1) | # +-+ # | Michael | null | # | Andy | 31 | # | Justin | 20 | # +-+ # Select people older than 21df.filter (df [' age'] > 21) ). Show () # +-- +-+ # | age | name | # +-- +-+ # | 30 | Andy | # +-- +-+ # Count people by agedf.groupBy ("age"). Count () .show () # +-- +-- + # | age | count | # +-+ # | 19 | 1 | # | null | 1 | # | 1 | # +-- +-- # $example off:untyped_ops$# $example on:run_sql$# Register the DataFrame as a SQL temporary viewdf.createOrReplaceTempView ("people") sqlDF = spark.sql ("SELECT * FROM people") sqlDF.show () # +-- +-- + # | age | name | # +-- +-+ # | null | Michael | # | 30 | Andy | # | 19 | Justin | # +-+-+ # $example off:run_sql$# $example on:global _ temp_view$# Register the DataFrame as a global temporary viewdf.createGlobalTempView ("people") # Global temporary view is tied to a system preserved database `global_ temp`spark.sql ("SELECT * FROM global_temp.people"). Show () # +-+-+ # | age | name | # +-+ # | null | Michael | # | 30 | Andy | # | 19 | Justin | # +-+-+ # Global temporary view is cross- Sessionspark.newSession () .sql ("SELECT * FROM global_temp.people") .show () # +-+-+ # | age | name | # +-+-+ # | null | Michael | # | 30 | Andy | # | 19 | Justin | # +-+-+ # $example off:global_temp_view$

Def schema_inference_example (spark):

$example on:schema_inferring$sc = spark.sparkContext# Load a text file and convert each line to a Row.lines = sc.textFile ("examples/src/main/resources/people.txt") parts = lines.map (lambda l: l.split (",") people = parts.map (lambda p: Row (name=p [0], age=int (p [1])) # Infer the schema And register the DataFrame as a table.schemaPeople = spark.createDataFrame (people) schemaPeople.createOrReplaceTempView ("people") # SQL can be run over DataFrames that have been registered as a table.teenagers = spark.sql ("SELECT name FROM people WHERE age > = 13 AND age")

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.