Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What problems should be paid attention to when using Logistic regression analysis

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What are the problems that should be paid attention to when using Logistic regression analysis? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problems. Through this article, I hope you can solve this problem.

Logistic regression is often used to analyze the relationship between binary dependent variables (such as survival and death, disease and non-disease, etc.) and multiple independent variables. A more common situation is to analyze the association between risk factors and the occurrence of a disease. For example, if we explore the risk factors of gastric cancer, we can choose two groups of people, one is gastric cancer group, the other is non-gastric cancer group, the two groups have different clinical manifestations and lifestyles, etc., the dependent variable is with or without gastric cancer, that is, "yes" or "no", it is a binary variable, and the independent variables include age, sex, eating habits, Helicobacter pylori infection and so on. Independent variables can be either continuous variables or classified variables. Through Logistic regression analysis, we can roughly understand the risk factors of gastric cancer.

Logistic regression and multiple linear regression have a lot in common, but the biggest difference lies in their dependent variables. The dependent variable of multiple linear regression is continuous variable; the dependent variable of Logistic regression is binary variable or multi-classified variable, but binary variable is more commonly used and easier to explain [1]. Although Logistic regression is widely used in the field of medical research, there are many problems in its application. Combined with the author's own experience, this paper will discuss the common problems of using Logistic regression.

The usage of Logistic regression

Generally speaking, Logistic regression has two major uses, the first is to find risk factors, such as the above example, to find out the risk factors related to gastric cancer, and the second is to predict, according to the established Logistic regression model, we can predict the probability of a disease or condition (including the establishment of risk score) under different independent variables.

Estimation of risk degree by Logistic regression

The so-called relative risk (risk ratio,RR) is the ratio used to describe the risk of disease (or other outcomes) in different states of a factor. The value of OR (odds ratio) given by Logistic regression is similar to the relative risk, which is often used to indicate the extent to which the risk of terminal events in another population exceeds or decreases relative to one population. If the risk of gastric cancer varies from sex to sex, the specific value of risk can be calculated by Logistic regression, for example, 1.7, which means that the risk of gastric cancer in men is 1.7 times higher than that in women.

Here, we should pay attention to the direction of the estimation. Taking women as a reference, the OR for men with gastric cancer is 1.7. Apart from the control group, the relative risk is meaningless.

One of the reasons why Logistic regression is widely used in medical research is that the model directly gives the OR value of clinical significance, which facilitates the interpretation and promotion of the results to a great extent.

Sample size problem

Usually, regression models need to be based on large samples. Before carrying out Logistic regression, should consider whether the current sample size is sufficient? For example, if you look at risk factors for gastric cancer, such as sex, age and eating habits, you need at least 90 cases of gastric cancer.

It is suggested that before Logistic regression, combined with the above two principles, the sample size of the model should be considered from the point of view of total samples and the number of events.

The form of independent variable in Logistic regression

The independent variable of Logistic regression can be either continuous variable or classified variable. The general principle is to try to consider which form is better from a practical or professional point of view. For example, age can be taken as a continuous variable, or 5 or 10 years old as a group, or even divided into two groups: the elderly and the young.

Different division methods determine the differences in the interpretation of the results. for example, in making the relationship between gastric cancer and age, if age is taken as a continuous variable, the risk is 1.008, which is explained as each one year of age. the risk of gastric cancer will be 0.008 times higher, this data will not seem to be of much clinical significance. However, if you take the 10-year-old group, the possible risk is 1.6, that is, for every 10-year-old increase in age, the risk of gastric cancer increases by 60%. The relative risk of this range is of more clinical significance. There is no fixed standard on how to divide the continuous variables. It is a common method to divide the continuous variables according to the quantile of statistics or the boundary value of clinical significance. It is suggested that in the analysis, we should first describe the trend, observe the relationship between specific independent variables and dependent variables, and then consider the clinical professional point of view and statistics, in order to obtain the most reasonable division.

Univariate analysis of Logistic regression

Theoretically, if the sample is large enough and there is no correlation between all the factors, it is best to put all the factors into the equation and analyze all possible confounding factors at the same time through the full model method. on this basis, significant variables can be screened by stepwise regression, in which case single factor analysis can not be done.

If the number of samples is limited, for example, there are only 80 patients, but there are 20 factors, in this case, it is best to use single factor analysis to eliminate variables that are neither statistically significant nor clinically significant, and only analyze meaningful variables. In univariate analysis, it is best to relax the P value, such as 0.1 or 0.15, to avoid missing some important factors (the interaction between variables may cause the results of multi-factors to be different from those of univariate analysis). Of course, we should also pay attention to carefully examine the degree of correlation between various factors, for highly relevant independent variables are generally not brought into the model at the same time, such as systolic blood pressure and diastolic blood pressure. Once it is found that there is a strong correlation between factors, it is suggested that we should first screen and select the most representative variables into the model.

After reading the above, have you mastered the methods of paying attention to the problems that should be paid attention to when using Logistic regression analysis? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report