In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the pure SQL query sentence how to achieve neural network, the content is very detailed, interested friends can refer to, hope to be helpful to you.
SQL, as we know it, is a database query that makes it easy for developers to perform efficient operations on large data. However, this paper constructs a simple three-layer fully connected network by nesting SQL query statements from another point of view. Although the nesting of statements is too deep to be efficiently calculated, it is still a very interesting experiment.
In this article, we will implement a neural network with a hidden layer (and with ReLU and softmax activation functions) purely in SQL. These neural network training steps include forward propagation and back propagation, which will be implemented in a single SQL query statement of BigQuery. When it runs in BigQuery, we are actually training distributed neural networks on hundreds of servers. Sounds great, doesn't it?
In other words, this interesting project is used to test the limitations of SQL and BigQuery, while looking at neural network training from the perspective of declarative data. This project does not consider any practical applications, but I will discuss some practical research significance.
Let's start with a simple classifier based on neural network. Its input size is 2 and its output is classified into two categories. We will have a single hidden layer with dimension 2 and a ReLU activation function. The binary classification of the output layer will use the softmax function. The steps we will follow in implementing the network will be the SQL-based Python example shown in the Karpathy's CS231n Guide (https://cs231n.github.io/neural-networks-case-study/).
Model
The model contains the following parameters:
Input to hidden layer
Weight matrix of W: 2 × 2 (elements: watt00, watt01, watt10, wig11)
B: offset vector of 2 × 1 (element: bounded 0, bounded 1)
Hide to the output layer
Weight matrix of W2: 2 × 2 (elements: W2q00, w2x01, w2x10, w2x11)
B2: 2 × 1 offset vector (element: b2x0, b2x1)
The training data is stored in the BigQuery table, and the input and output of columns x1 and x2 are as follows (table name: example_project.example_dataset.example_table)
As mentioned earlier, we implement the entire training as a single SQL query statement. After the training is complete, the value of the parameter will be returned through the SQL query statement. As you might guess, this will be a layer-by-layer nested query, and we will build it step by step to prepare this query statement. We will start with the innermost subquery and then add the nested outer layer one by one.
Forward propagation
First of all, we set the weight parameters W and W2 to random values that obey normal distribution, and set the weight parameters B and B2 to 0. The random values of W and W2 can be generated by SQL itself. For simplicity, we will generate these values externally and use them in SQL queries. The internal subquery used to initialize the parameters is as follows:
SELECT *,-0.00569693 AS walled 00, 0.00186517 AS walled 01, 0.00414431 AS walled 10, 0.0105101 AS walled 11, 0.0 AS bounded 0, 0.0 AS bounded 1,-0.01312284 AS w2pm -0.01269512 AS w2pm 01, 0.00379152 AS w2o10,-0.01218354 AS w2h11, 0.0 AS b2h0,0.0 AS b2room1 FROM `example_project.example_dataset.example_ table`
Notice that the table example_project.example_dataset.example_table already contains columns x1, x2, and y. The model parameters will be added as additional columns to the above query results.
Next, we will calculate the activation value of the hidden layer. We will use the vector D with elements D0 and D1 to represent the hidden layer. We need to perform the matrix operation D = np.maximum (0, np.dot (X, W) + B), where X represents the input vector (elements x1 and x2). This matrix operation involves multiplying the weight W and the input X, plus the offset vector B. The result is then passed to the nonlinear ReLU activation function, which sets the negative value to 0. The equivalent query in SQL is:
SELECT *, (CASE WHEN ((x1*w_00 + x2*w_10) + baked 0) > 0.0 THEN ((x1*w_00 + x2*w_10) + baked 0) ELSE 0.0 END) AS d0 (CASE WHEN ((x1*w_01 + x2*w_11) + bread0) > 0.0 THEN ((x1*w_01 + x2*w_11) + bail1) ELSE 0.0 END) AS D1 FROM {inner subquery}
The above query adds two new columns d0 and D1 to the results of the previous internal subquery. The output of the above query is shown below.
This completes a conversion from the input layer to the hidden layer. Now we can perform the transition from the hidden layer to the output layer.
First, we will calculate the value of the output layer. The formula is: scores = np.dot (D, W2) + B2. Then, we will use the softmax function for the calculated value to obtain the prediction probability of each class. The equivalent sub-query within SQL is as follows:
SELECT *, EXP (scores_0) / (EXP (scores_0) + EXP (scores_1)) AS probs_0, EXP (scores_1) / (EXP (scores_0) + EXP (scores_1)) AS probs_1 FROM (SELECT *, (d0*w2_00 + d1*w2_10) + b2y0) AS scores_0 ((d0*w2_01 + d1*w2_11) + b2o1) AS scores_1 FROM {INNER sub-query})
First of all, we will use the cross-entropy loss function to calculate the total loss of the current forecast. First of all, the negative value of the logarithm of the correct class prediction probability in each sample is calculated. The cross-entropy loss is only the average of the values in these X and Y examples. The natural logarithm is an increasing function, so it is intuitive to define the loss function as a negative correct class prediction probability logarithm. If the prediction probability of the correct class is high, the loss function will be very low. On the contrary, if the prediction probability of the correct class is very low, the value of the loss function will be very high.
In order to reduce the risk of overfitting, we will also increase L2 regularization. In the overall loss function, we will include 0.5*reg*np.sum (www) + 0.5*reg*np.sum (W2*W2), where reg is a hyperparameter. Including this function in the loss function will punish the larger values in the weight vectors.
In the query, we will also calculate the number of training samples (num_examples). This is very useful for us to calculate the average later. The statement to calculate the overall loss function in the SQL query is as follows:
SELECT *, (sum_correct_logprobs/num_examples) + 1eMurray 3* (0.5* (w_00*w_00 + w_01*w_01 + w_10*w_10 + w_11*w_11) + 0.5* (w2_00*w2_00 + w2_01*w2_01 + w2_10*w2_10 + w2_11*w2_11)) AS loss FROM (SELECT * SUM (correct_logprobs) OVER () sum_correct_logprobs, COUNT (1) OVER () num_examples FROM (SELECT * (CASE WHEN y = 0 THEN-1*LOG (probs_0) ELSE-1*LOG (probs_1) END) AS correct_logprobs FROM {inner subquery}))
Back propagation
Next, for back propagation, we will calculate the partial derivative of each parameter for the loss function. We use the chain rule to calculate layer by layer from the first layer. First, we will calculate the gradient of score by using cross-entropy and the derivative of the softmax function. As opposed to this query is:
SELECT *, (CASE WHEN y = 0 THEN (probs_0-1) / num_examples ELSE probs_0/num_examples END) AS dscores_0 (CASE WHEN y = 1 THEN (probs_1-1) / num_examples ELSE probs_1/num_examples END) AS dscores_1 FROM {inner subquery}
In the above, we calculated the score using scores = np.dot (D, W2) + B2. Therefore, based on the fractional partial derivative, we can calculate the gradient of the hidden layer D and the parameter W2 and B2. The corresponding query statement is:
SELECT *, SUM (d0*dscores_0) OVER () AS dw2_00, SUM (d0*dscores_1) OVER () AS dw2_01, SUM (d1*dscores_0) OVER () AS dw2_10, SUM (d1*dscores_1) OVER () AS dw2_11, SUM (dscores_0) OVER () AS db2_0, SUM (dscores_1) OVER () AS db2_1 CASE WHEN (d0)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.