In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what is Python machine learning". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
What is machine learning?
Arthur Arthur Samuel coined the term "machine learning" in 1959. He was a pioneer in artificial intelligence and computer games and defined machine learning as "an area of research that enables computers to learn without explicit programming".
In short, machine learning is an application of artificial intelligence (AI) that enables programs (software) to learn from experience and improve themselves in accomplishing a task without explicit programming. For example, how would you write a program to identify a fruit based on its various attributes, such as color, shape, size, or any other attribute?
One way is to hard-code everything, make some rules, and use them to identify the results. This seems to be the only viable approach, but you can never make perfect rules that apply to all situations. Machine learning can easily solve this problem without any rules, which makes it more robust and practical. You will see how we will use machine learning to accomplish this task in the next section.
Therefore, we can say that machine learning makes machine behavior and decision-making more humanized by making machines have the ability to learn with minimal human intervention (that is, without explicit programming). Now the question arises, how can the program gain any experience and learn from it? The answer is data. Data is also known as the driving force of machine learning, we can say for sure that there is no machine learning without data.
You may wonder why machine learning was not mentioned until recent years when the term machine learning was introduced in 1959 and can be traced back to a long time ago. You may need to note that machine learning requires a lot of computing power, a lot of data, and devices that can store such a large amount of data. We didn't meet all these requirements until recently and we can practice machine learning.
How is it different from traditional programming?
Do you want to know the difference between machine learning and traditional programming? Well, in traditional programming, we feed input data and well-written and tested programs into the machine to generate output. When it comes to machine learning, during the learning phase, the input data and the output associated with the data will be fed to the machine and programmed for itself.
If you don't fully understand this, don't worry, you'll get a better understanding in the next section.
Why do we need machine learning?
Today's machine learning has all the attention it needs. Machine learning can automate many tasks, especially those that only human beings can use their inherent intelligence to perform. Only with the help of machine learning can this intelligence be copied to the machine.
With the help of machine learning, enterprises can automate daily tasks. It also helps to automate and quickly create data analysis models. Industries rely on large amounts of data to optimize their operations and make informed decisions. Machine learning helps to create models that can process and analyze large amounts of complex data to provide accurate results. These models are accurate, scalable, and have less turnaround time. By building this accurate machine learning model, enterprises can take advantage of profitable opportunities and avoid unknown risks.
Image recognition, text generation and many other use cases are finding applications in the real world. This broadens the horizons of machine learning experts to become sought-after professionals.
Current machine learning
In 2012 Alex Krizhevsky,Geoffrey Hinton and Ilya Sutskever published influential research papers describing a model that can significantly reduce the error rate of image recognition systems. Meanwhile, Google's X Lab has developed a machine learning algorithm that can browse YouTube videos on its own to identify videos containing cats. In 2016, AlphaGo (created by Google DeepMind researchers to play ancient Chinese go) won four of its five games against Lee Sedol, the world's top go player for more than a decade.
Now, in 2020, OpenAI has released GPT-3, the most powerful language model ever. It can write creative novels, generate functional code, write thoughtful business memos, and so on. Its possible use cases are limited only by our imagination.
The characteristics of machine learning
Automation: today, you have a spam folder in your Gmail account that contains all spam. You might wonder how Gmail knows that all these emails are spam? This is the job of machine learning.
It can recognize spam, so it is easy to automate this process. The ability to perform repetitive tasks automatically is one of the most important features of machine learning. A large number of organizations are already using machine learning-based paperwork and email automation.
For example, in the financial sector, a large number of repetitive, data-intensive and predictable tasks need to be performed. As a result, the industry uses different types of machine learning solutions to a large extent.
Improve customer experience: for any enterprise, providing customized experience and better service is one of the key ways to increase participation, enhance brand loyalty and build long-term customer relationships.
Machine learning can help us achieve both. Have you ever noticed that whenever you open any shopping site or see any ads on the Internet, most of them are related to the content you have searched recently? This is because machine learning enables us to make accurate and amazing recommendation systems. They help us customize the user experience. Now
At the beginning of using the service, most companies now have a chat robot that can be used around the clock (24 × 7). For example, the Eva of AirAsia. These robots provide intelligent answers, and sometimes you may not even notice that you are talking to the robot.
Automated data visualization: in the past, we have seen companies and individuals generating large amounts of data. Take companies like Google,Twitter,Facebook as an example. How much data do they produce every day? We can use this data and visualize significant relationships so that companies can make better decisions and benefit both companies and customers.
With user-friendly automated data visualization platforms such as AutoViz, enterprises can gain a large number of new insights, thereby increasing the productivity of the process.
Business intelligence: when machine learning features are used in conjunction with big data analysis, it can help companies find solutions to problems that can help them grow and generate more profits.
From retail to financial services to health care, machine learning has become one of the most effective technologies to promote business operations.
What is the best language for machine learning?
Although there are many languages available for machine learning, according to me, Python is the best programming language for machine learning applications. This is due to the benefits mentioned in the following sections. Other programming languages that can be used in machine learning applications are RMagee C + +, JavaScript,Java,C#,Julia,Shell,TypeScript, and Scala. R is also a very good introductory language for machine learning.
Compared with other programming languages, Python is known for its readability and relatively low complexity. Machine learning applications involve complex concepts such as calculus and linear algebra, which require a lot of effort and time to implement. Python lightens the burden by helping machine learning engineers validate ideas through rapid implementation. You can check out the Python tutorial to get a basic understanding of the language. Another benefit of using Python in machine learning is the pre-built library. As described below, there are different software packages for different types of applications:
Use Numpy,OpenCV and Scikit when working with images
When processing text, NLTK works with Numpy and Scikit
Librosa for audio applications
Matplotlib,Seaborn and Scikit are used for data representation
TensorFlow and Pytorch for deep learning applications
Scientific computing science
Django for integrating Web applications
Panda is used for advanced data structure and analysis
Python provides the flexibility to choose between object-oriented programming or scripting. There is no need to recompile the code. Developers can make any changes and view the results immediately. You can use Python with other languages to achieve the desired functionality and results.
Python is a general programming language that can be run on any platform, including Windows,MacOS,Linux,Unix. When migrating from one platform to another, the code needs to make some small changes and changes, and can be used on the new platform.
The following is a summary of the benefits of using Python to solve machine learning problems:
Types of machine learning
Machine learning is roughly divided into three categories.
Supervised learning
Unsupervised learning
Reinforcement learning
What is supervised learning?
Let's start with a simple example that you are teaching a child the difference between a dog and a cat. What would you do?
You can show him / her the dog and say, "this is a dog." when you meet a cat, you will point out that it is a cat. When you show your child enough cats and dogs, he may learn to distinguish them. If he is well trained, he may be able to recognize different breeds of dogs he has never seen before.
Similarly, in supervised learning, we have two sets of variables. One is called the target variable, or the label (the variable we want to predict) and the feature (the variable that helps us predict the target variable).
We show the program (model) functions and the tags associated with those functions, and then the program can find potential patterns in the data. Take the example of this dataset, where we predict the price of a house based on the size of the house. The price as a target variable depends on the size as a feature.
Number of roomsPrice1 $1003 $3005 $500
In the real dataset, we will have more rows and more than one function, such as size, location, number of floors, and so on.
Therefore, it can be said that the supervised learning model has a set of input variables (x) and an output variable (y). An algorithm identifies the mapping function between input and output variables. The relation is y = f (x).
Monitor or supervise learning in the sense that we already know that the output and algorithms are corrected each time to optimize their results. The dataset is algorithmically trained and modified until it reaches an acceptable level of performance.
We can classify supervised learning problems as:
Regression problems-used to predict future values and use historical data to train the model. For example, predict the future price of a house.
Classification problem-various label training algorithms to identify items in a particular category. For example, dogs or cats (as described in the above example), apples or oranges, beer or wine or water.
What is unsupervised learning?
This method is a method that has no target variables, only input variables (characteristics). The algorithm can learn by itself and find impressive structures in the data.
The aim is to decipher the basic distribution of the data in order to gain more knowledge about the data.
We can group unsupervised learning problems into:
Clustering: this means bundling input variables with the same characteristics. For example, group users according to search records
Association: here, we find rules that control meaningful associations between datasets. For example, people who watch "X" will also watch "Y".
What is reinforcement learning?
In this method, machine learning models are trained to make a series of decisions based on the rewards and feedback they get for their actions. Machine learning how to achieve goals in complex and uncertain situations, and will be rewarded every time they are achieved during the learning period.
Reinforcement learning and supervised learning are different in the sense that there are no answers available, so strengthen the steps that agents decide to perform tasks. When there is no training data set, the machine will learn from its own experience.
Machine learning algorithm
This may be the most time-consuming and difficult process in your machine learning process. There are many algorithms in machine learning, and you don't need to fully understand them to get started. But I suggest that once you start practicing machine learning, you should start learning the most popular algorithms, such as:
Linear regression.
Logical regression
Decision tree
Support vector machine
Naive Bayes
K nearest neighbor
K mean
Random forest
Gradient lifting algorithm
GBM
XGBoost
LightGBM
Cat booster
Here, I will briefly outline one of the simplest algorithms in machine learning, the K-nearest neighbor algorithm (which is a supervised learning algorithm), and show how to use it for regression and classification. I strongly recommend checking linear regression and logical regression because we are going to implement them and compare the results with the KNN (K nearest neighbor) algorithm in the implementation section.
You may need to note that there are usually separate algorithms for regression and classification problems. But by modifying the algorithm, we can use it for classification and regression, as shown below
K nearest neighbor algorithm
KNN belongs to a group of lazy learners. In contrast to eager learners (such as logical regression, SVM, neural networks), lazy learners simply store training data in memory. In the training phase, KNN collates the data (the process of indexing) in order to effectively find the nearest neighbor in the reasoning phase. Otherwise, it will have to compare each new case during the reasoning period with the entire data set, making it inefficient.
So, if you want to know what the training phase is, eager learners and lazy learners, now remember that the training phase is the time the algorithm learns from the data provided to it. For example, if you go through the linear regression algorithm linked above, during the training phase, the algorithm will try to find the best fit line, which involves a lot of computation, so it takes a lot of time, and this type of algorithm is called eager learners. On the other hand, lazy learners, like KNN, do not involve a lot of computation, so they train faster.
K-NN of classification problem
Now let's look at how to classify using K-NN. Here is a hypothetical data set that attempts to predict whether a person is male or female (label) based on height and weight (characteristics).
Height (cm)-characteristic weight (kg)-characteristics. Gender (label) 18780 male 16550 female 19999 male 14570 female 18087 male 17865 female 18760 male
Now let's draw these points:
Now, we are going to classify a new point because it is 190 cm in height and 100 Kg in weight. This is how K-NN classifies this:
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
Select the value of K, and the user selects what he thinks is the best value of K after analyzing the data.
Measure the distance between the new point and its nearest K points. There are several ways to calculate this distance, the most commonly used of which are-Euclidian,Manhattan (for continuous data points, that is, regression problems) and Hamming distances (for classification, that is, classification problems).
Determine the category of points that are closer to the new point, and mark the new point accordingly. Therefore, if most of the points closer to our new point belong to some "a" class, then our new point is expected to come from the "a" class.
Now let's apply this algorithm to our own dataset. Let's first draw a new data point.
Now let's take k = 3, that is, we will see the three points closest to the new point:
Therefore, it is classified as male:
Now let's take the value k = 5 and see what happens:
As we can see, the four points closest to the new data point are male, and only one point is female, so we classify it as "male" again based on the majority. When classifying, you must always select the value of K as an odd number.
K-NN of regression problem
We have seen how to classify using K-NN. Now, let's take a look at what changes have been made to use it for regression. The algorithm is almost the same, except for one difference. In the classification, we examined most of all the nearest points. Here, we will take the average of all the closest points and use it as the predicted value. Let's take the same example again, but here we have to predict a person's weight (label) based on his height (characteristics).
Height (cm)-characteristic weight (kg)-label 18780165501999914570180871786518760
Now that we have a new data point with a height of 160cm, we set the K values to 1, 2 and 4, respectively, to predict its weight.
When K = 1: the closest point to 160cm in our data is 165cm, whose weight is 50, so we conclude that the weight of the prediction itself is 50.
When K = 2: the two closest points are 165 and 145, respectively, and the weights are 50 and 70, respectively. Taking the average, we say the predicted weight is (50 + 70) / 2 = 60.
When K = 4: repeat the same process, now we take the four closest points, so we get 70.6 as the weight of the prediction.
You might think it's really simple, and there's nothing special about machine learning, it's just basic math. But keep in mind that this is the simplest algorithm, and once you move forward, you'll see a more complex algorithm.
Machine learning steps
I want machine learning to just apply algorithms to data and get predictions, but it's not that simple. There are several steps in machine learning that are necessary for each project.
1. Collecting data: this is probably the most important and time-consuming process. In this step, we need to collect data that can help us solve the problem. For example, if you want to predict the price of a house, we need an appropriate dataset that contains all the information about past home sales, and then forms a tabular structure. We will solve similar problems in the implementation section.
two。 Prepare the data: once we have the data, we need to process it in the correct format. Preprocessing involves various steps, such as data cleanup, for example, if your dataset contains some null or outlier (for example, a string instead of a number), what do you do with it? There are many ways to do this, but a simple one is to delete only rows with null values.
Similarly, sometimes in the dataset, we may have columns that have no effect on the results, such as id, and we have deleted these columns as well. We usually use data visualization to visualize the data through graphics and charts, and then after analyzing the graph, we determine that the characteristic of important.Data preprocessing is a huge topic.
3. Select the model: now our data is ready to be input into the machine learning algorithm. If you want to know what is a model? Usually, "machine learning algorithm" and "machine learning model" can be used interchangeably. The model is the output of machine learning algorithms that run on the data.
To put it simply, when we implement the algorithm on all the data, the output we get contains all the rules, numbers, and any other algorithm-specific data structures needed to make predictions. For example, after performing a linear regression on the data, we get the equation of the best fitting line, which is called the model. The next step is usually to train the model, just in case we don't want to adjust the hyperparameters and choose the default parameters.
4. Hyperparameter tuning: hyperparameters are critical because they control the overall behavior of machine learning models. The ultimate goal is to find the best combination of hyperparameters that can bring us the best results. But what are these hyperparameters? Remember the variable K in our K-NN algorithm.
When we set different K values, we get different results. the optimal value of K is not predefined and is different for different data sets. There is no way to know the best value for K, but you can try different values and check which values get the best results. K here is a hyperparameter, each algorithm has its own hyperparameter, we need to adjust their values to get the best results.
5. Evaluation: you may wonder how to know the performance of the model, and what better way to test the model on some data? This data is called test data and cannot be a subset of the data (training data) on which we train the algorithm.
The purpose of the training model is not to learn all the values in the training data set, but to identify the basic patterns in the data and predict data based on this pattern that has never been seen before. There are many evaluation methods, such as K-fold cross-validation and so on. We will discuss this step in more detail in the next section.
6. Prediction: now that our model also performs well in the test set, we can use it in the real world and hope it will perform well in the real world data.
The advantages of machine learning
1. Easily identify trends and patterns
Machine learning can look at large amounts of data and discover specific trends and patterns that humans cannot see. For example, for e-commerce sites such as Amazon and Flipkart, it can learn about their users' browsing behavior and purchase history to help them choose the right products, transactions and reminders. It uses the results to show them the relevant ads.
two。 Continuous improvement
We will continue to generate new data and help it upgrade over time and improve its performance and accuracy as it is provided to the machine learning model. We can say that this is like gaining experience because they continue to improve accuracy and efficiency. This enables them to make better decisions.
3. Dealing with multi-dimensional and multivariate data
Machine learning algorithms are good at dealing with multi-dimensional and multi-type data, and they can do this in dynamic or uncertain environments.
4. Extensive application
You can be an e-retailer or a health care provider, and you can use machine learning. Where applicable, it has the ability to help provide customers with more personal experiences, while also targeting the right customers.
This is the end of "what is Python Machine Learning"? thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.