What are the common interview questions in machine learning? 04/19 Update SLTechnology News&Howtos

What are the common interview questions in machine learning?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the common test questions in machine learning". In daily operation, I believe many people have doubts about what are the common test questions in machine learning. The editor has consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts about "what are the common test questions in machine learning"! Next, please follow the small series to learn together!

1. What are deviations and variances?

Generalization error can be decomposed into the square of the bias plus variance plus noise. Deviation measures the deviation between the expected prediction and the true result of the learning algorithm, characterizes the fitting ability of the learning algorithm itself, variance measures the change of learning performance caused by the change of the training set of the same size, characterizes the influence caused by data disturbance, noise expresses the lower bound of the expected generalization error that any learning algorithm can achieve on the current task, and characterizes the difficulty of the problem itself. Bias and variance are generally called bias and variance. Generally, the stronger the training degree, the smaller the bias, the larger the variance. The generalization error generally has a minimum value in the middle. If the bias is large and the variance is small, it is generally called underfitting, while the bias is small and the variance is large.

2. What models are solved by EM algorithm, why not Newton method or gradient descent method?

The models solved by EM algorithm generally have GMM or collaborative filtering, and k-means actually belongs to EM. EM algorithm will certainly converge, but may converge to local optima. Because the number of terms in the sum will increase exponentially with the number of hidden variables, it will cause trouble for gradient calculation.

3. Comparison of SVM, LR and Decision Tree?

Model complexity: SVM support kernel function, can deal with linear nonlinear problems;LR model is simple, fast training speed, suitable for dealing with linear problems; decision tree is easy to overfit, need to prune loss function: SVM hinge loss; LR L2 regularization; adaboost index loss data sensitivity: SVM add tolerance is insensitive to outlier, only care about support vector, and need to do normalization first; LR is sensitive to distant points Data volume: LR is used for large data volume, and SVM nonlinear kernel is used for small data volume and few features.

4. Difference between GBDT and Random Forest

Random forest adopts the idea of bagging, which is also called bootstrap aggregation. It obtains multiple sample sets by sampling with replacement in the training sample set, trains a base learner based on each sample set, and then combines the base learners. Random forest introduces random attribute selection into the training process of decision tree based on bagging. In traditional decision tree, the best attribute is selected from the attribute set of the current node, while in random forest, the subset containing k attributes is randomly selected, and then the most attribute is selected. k is used as a parameter to control the degree of randomness. In addition, GBDT training is based on the idea of Boosting, updating sample weights according to errors in each iteration, so it is a serial-generated serialization method, while random forest is a bagging idea, so it is a parallelization method.

5. How does xgboost score features?

In the training process, the features of the separation points are selected by Gini index. The more times a feature is selected, the higher the score of the feature.

1. # feature importance

2. print(model.feature_importances_)

3. # plot

4. pyplot.bar(range(len(model.feature_importances_)), model.feature_importances_)

5. pyplot.show()

6. ==========

7. # plot feature importance

8. plot_importance(model)

9. pyplot.show()

# feature importance

print(model.feature_importances_)

# plot

pyplot.bar(range(len(model.feature_importances_)), model.feature_importances_)

pyplot.show()

==========

# plot feature importance

plot_importance(model)

pyplot.show()

6. What is OOB? How is OOB calculated in Random Forest and what are its advantages and disadvantages?

About 1/3 of the samples collected by Bootstrap in bagging method will not appear in the sample set collected by Bootstrap, of course, they will not participate in the establishment of decision tree. This 1/3 data is called oob (out of bag), which can be used to replace the error estimation method of test set. Out-of-pocket data (oob) error is calculated as follows: For the random forest that has been generated, use out-of-bag data to test its performance. Assuming that the total number of out-of-bag data is O, use the O out-of-bag data as input and bring it into the random forest classifier that has been generated before. The classifier will give O corresponding classifications of the data. Because the types of the O data are known, use the correct classification to compare with the results of the random forest classifier, and count the number of classification errors of the random forest classifier. If X is set, the error size of the out-of-bag data =X/O; This has been shown to be unbiased, so there is no need for cross-validation or separate test sets in the random forest algorithm to obtain unbiased estimates of test set errors.

7. What is machine learning?

Machine learning is designed to deal with system programming, belongs to the computer science discipline, it can automatically learn and improve according to experience. For example, a robot controlled by a program can perform a series of tasks and tasks based on data collected from sensors. It can automatically learn applications based on data.

8. The difference between machine learning and data mining

Machine language refers to giving a computer the ability to learn without explicit program instructions, enabling it to learn, design, and expand relevant algorithms autonomously. Data mining is a way to extract knowledge or unknown, interesting images from unstructured data. Machine learning algorithms are applied in this process.

9. What is overfitting in machine learning?

In machine learning, overfitting occurs when a statistical model first describes random errors or noise rather than its underlying relationships. When a model is too complex, overfitting is usually easy to spot because the number of parameters is too varied relative to the number of training data types. Then the model works poorly due to overfitting.

10. Causes of overfitting

Since the criteria used to train the model are not identical to the criteria used to judge model efficiency, this leads to the possibility of overfitting.

11. How to avoid overfitting

When you do machine learning with smaller data sets, it is easy to overfit, so using larger data sets can avoid overfitting. But when you have to model with small data sets, you can use a technique called cross-validation. In this approach, the dataset is divided into two sections, the test dataset only tests the model, and the training dataset, where data points are used to model.

In this technique, a model is usually trained given a dataset with prior knowledge (training dataset) and tested with a dataset without prior knowledge. The idea behind cross-validation is that during the training phase, a dataset is defined to test the model.

12. What is inductive machine learning?

Inductive machine learning involves the process of learning by practice, deriving universal rules from attempts at a set of observable examples.

13. What are the five popular algorithms for machine learning?

a. decision tree

b. Neural network (backpropagation)

c. probabilistic network

d. Nearest Neighbor Method

e. support vector machine

14. What are the different algorithmic techniques for machine learning?

The different types of algorithmic techniques in machine learning are:

supervised learning

unsupervised learning

semi-supervised learning

Transduction reasoning

Learning to reason (Learning to Learn).

15. In machine learning, what are the three stages of building hypotheses or models?

a. Modelling

b. Model testing

c. Model application.

16. What are training and testing data sets?

In various fields of information science similar to machine learning, a set of data is used to discover potential predictive relationships, called a "training dataset." The training dataset is the examples presented to the learner, while the trial dataset is used to test the accuracy of the hypothetical relationships proposed by the learner.

17. Please list the various methods of machine learning.

The various methods of machine learning are as follows:

Concept vs classification Learning.

Symbolic Vs Statistical Learning.

inductive vs analytical Learning.

18. What is the function of unsupervised learning?

Find clusters of data

Find a low-dimensional representation of the data

Find interesting directions for data

Interesting coordinates and correlations

Finding significant observations and data set cleanup

19. What is the function of supervised learning?

classification

speech recognition

regression

time series prediction

comment string

20. What is algorithmic independent machine learning?

Machine learning is independent of any particular classifier or learning algorithm in basic mathematics and is called algorithm-independent machine learning.

More machine learning tutorials will continue to be updated! Students who have relevant learning needs can continue to pay attention, I hope these summaries are helpful to everyone! Friends with different opinions can leave a message!

At this point, the study of "what are the common test questions in machine learning" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.