In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to modify variable names in Python". In daily operation, I believe many people have doubts about how to modify variable names in Python. The editor consulted all kinds of data and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "how to modify variable names in Python". Next, please follow the editor to study!
Please quickly state the functions of the following code:
For i in range (n): for j in range (m): for k in range (l): temp_value = X [I] [j] [k] * 12.5 new_ array [I] [j] [k] = temp_value+ 15
It's hard, right? It will be difficult to modify or debug this code unless you know what the author is thinking. Even the author himself will forget its purpose after writing this code for a few days, because variable names and "magic numbers" do not help remember the function of the code.
Examples like the above (or worse) are common when using data science code: the code contains variable names such as x, y, xs, x1, x2, tp, tn, clf, reg, xi, yi, ii, and many unnamed constant values. Frankly, data scientists (including myself) are not good at naming variables.
Many people have experienced the process from writing research-oriented data science code for one-time analysis to writing production-level code, so they have to abandon the practices gained from data science books, courses, and laboratories to improve their programming methods. There are many differences between practical machine learning code and the programming methods of data scientists, but this article will start with two more influential common questions:
Useless / confused / ambiguous variable names
Unnamed "magic" constant
Both of these problems lead to a disconnect between data science research (or Kaggle projects) and production machine learning systems. Yes, you can get away with these problems in Jupyter Notebook that runs code only once, but when the critical machine learning pipeline in a task needs to run accurately hundreds of times a day, it is necessary to write readable and understandable code. Fortunately, data scientists can adopt excellent practices in software engineering, which are also introduced in this article.
Note: this article focuses on Python because it is by far the most widely used language in industrial data science. In Python:
Variable name / function name is lowercase and separated by an underscore
The names of naming constants are all capitalized
The name of the class adopts the hump case naming rule.
Named variable
There are three basic principles to keep in mind when naming variables:
The variable name must describe the information represented by the variable. The variable name should be clearly defined to reflect what the variable represents.
The code will be read more times than written. So priority is given to the readability of the code over the speed of writing.
Only by using standard naming conventions can we make a global decision instead of making multiple local decisions.
What about in practice? Here are some improvements to the variable name:
X and y. If you read it many times, you will know that they are features and goals, but for other developers who read the code, this may not be clear. Instead, use names that describe the meaning of these variables, such as house_features and house_prices.
Value . What does value stand for? It can be velocity_mph, customers_served, efficiency, revenue_total. Names like value do not reflect the purpose of variables and are easily confused.
Temp . Even if you only use the variable as a temporary value store, give it a meaningful name. This may be the value used to convert units, so in this case, please specify:
# Don't do this temp = get_house_price_in_usd (house_sqft, house_room_count) final_value = temp * usd_to_aud_conversion_rate # Do this instead house_price_in_usd = get_house_price_in_usd (house_sqft, house_room_count) house_price_in_aud = house_price_in_usd * usd_to_aud_conversion_rate
If you use abbreviations such as usd, aud, mph, kwh, sqft, be sure to agree on commonly used abbreviations with the rest of the team in advance and keep a written record. Then, in the code review, ensure that these written standards are implemented.
Tp,tn,fp,fn: avoid specific machine learning abbreviations. These values represent true_positives, true_negative, false_positives, and false_negative, so their meaning is clearly stated. In addition to being difficult to understand, shorter variable names can also cause typing errors. When you want to enter tn, it is easier to write as tp, so please describe it in its entirety.
The above example shows that the readability of the code should be given priority over the speed of writing the code. It takes longer to read, understand, test, modify, and debug low-quality code than good code. In general, writing code faster by using shorter variable names actually increases the development time of the program! If you don't believe it, please take out the code you wrote 6 months ago and try to modify it. If you find it difficult to understand your code, it means you should follow better naming rules.
Xs and ys. These values are typically used for drawing, in which case the values represent x_cordinates and y_cordinates. However, these names can also be used for many other tasks, so confusion can be avoided by using specific names that describe the purpose of the variable, such as times and distances or temperatures and energy_in_kwh.
The cause of the bad variable name
Most of the problems with naming variables come from:
Attempt to shorten variable name
Directly convert the formula into code
On the first point, while languages like Fortran do limit the length of variable names (less than 6 characters), modern programming languages have no restrictions, so don't force yourself to use abbreviations. Don't use overly long variable names either, but if you have to make a choice, try to be readable.
On the second point, when writing equations or using models-- a point that schools forget to emphasize-- memorize letters or enter to represent actual values!
The following is an example of two mistakes made at the same time and how to correct them. Suppose there is a polynomial from the model that can calculate the price of a house. Developers may want to write mathematical formulas directly in code:
Temp = M1 * x1 + m2 * (x2 * 2) final = temp + b
This code looks like it was written by the machine for the machine. Although computers will eventually run this code, humans read it more times, so write code that can be understood by humans!
To do this, you don't need to think about the formula itself-- how to do it-- but what the real object of modeling is. Here is the complete equation (which is a good test of whether the reader understands the model):
House_price = price_per_room * rooms +\ price_per_floor_squared * (floors * * 2) house_pricehouse_price = house_price + expected_mean_house_price
If you have difficulty naming variables, it means that you don't know the model or code. Code is written to solve practical problems, so you need to understand the goals of model collection. Descriptive variable names help work at a higher level of abstraction than formulas and help developers focus on the problem itself.
Other considerations
The important thing when naming variables is the consistency count. Using consistent variable names can reduce naming time and increase problem solving time, especially when adding compound variable names.
1. Aggregation in variable name
Readers have learned the basic idea of using descriptive names, changing xs to distances, e to efficiency, and v to velocity. So what kind of variable name should be used to calculate the average speed? Is it average_velocity, velocity_mean or velocity_average? The following steps can solve this problem:
First, determine the common abbreviations: avg for average, max for maximum, std for standard deviation, and so on. Make sure that all members of the team agree and write this down.
Put the abbreviation at the end of the variable name. Put the most relevant information, that is, the entity described by the variable, at the beginning.
According to these rules, aggregate variables may be named velocity_avg, distance_avg, velocity_min, and distance_max. Article 2 may be chosen according to individual circumstances.
When a variable represents the number of projects, a thorny problem arises. If you want to use building_num, does it refer to the total number of buildings or an index value for a particular building? To avoid ambiguity, use building_count to represent the total number of buildings and building_index to represent specific buildings. This is also suitable for other problems, such as item_count and item_index. Item_count can also be replaced by item_total. This approach resolves ambiguity and maintains the consistency of adding the compound name to the end of the name.
two。 Circular index
Unfortunately, the typical loop variables have become I, j, and k. This may be the cause of the most errors and troubles in data science. Combining non-declarative variable names with nested loops (I've seen nested loops using ii, jj, or even iii) results in unreadable, error-prone code. This may be controversial, but the author never uses I or any other single letter as a loop variable, but chooses to describe the iteration, such as
For building_index in range (building_count):....
Or
For row_index inrange (row_count): for column_index inrange (column_count):....
This is particularly true for nested loops, where there is no need to remember whether I represents a row or column, or to be confused with j and k. More brainpower should be spent thinking about how to create the best model, rather than the specific order of array indexes.
(in Python, if you do not use loop variables, you should use the underscore "_" as the placeholder. In this way, you won't be confused about whether an index is used.)
3. Other naming methods that need to be avoided
Avoid using numbers in variable names
Avoid misspelled words
Avoid using ambiguous characters
Avoid using variable names with similar meanings
Avoid using abbreviations
Avoid variable names with similar pronunciation
The principle of giving priority to readability over convenience should be adhered to. The main purpose of programming is to communicate with other programmers, so give due consideration to team members.
Do not use magic numbers
Magic numbers are unnamed constants. It is often used for unit conversion, changing the time interval or increasing the subscript:
Final_value = unconverted_value * 1.61 final_quantity = quantity / 60 valuevalue_with_offset = value + 150 (these variable names are terrible! )
Magic numbers can lead to a lot of errors and confusion because:
Only the author knows the meaning of magic number.
To change the value of the magic number, you need to find all the locations where it appears, and then enter the new value manually.
You can define a function for conversion to replace the magic number. This function takes an unconverted value and a conversion rate as arguments.
Defconvert_usd_to_aud (price_in_usd, aud_to_usd_conversion_rate): price_in_aus = price_in_usd * usd_to_aud_conversion_rate
If you want to use the same conversion rate in many functions in a project, you can define a named constant somewhere.
USD_TO_AUD_CONVERSION_RATE = 1.61price_in_aud = price_in_usd * USD_TO_AUD_CONVERSION_RATE
Before you start writing this project, you need to agree with other team members that usd stands for US dollars and aud for Australian dollars. Remember the standard!
Here is another example:
# Conversion function approach def get_revolution_count (minutes_elapsed, revolutions_per_minute): revolution_count = minutes_elapsed * revolutions_per_minute # Named constant approach REVOLUTIONS_PER_MINUTE = 60 revolution_count = minutes_elapsed * REVOLUTIONS_PER_MINUT
Using naming constants defined somewhere makes it easier and more consistent to overwrite values. If the conversion rate changes, there is no need to search the entire code base to change its value each time it appears, because it is defined in only one place. This also tells the reader of the code the meaning of the constant. If the parameter name can reflect the content of the parameter, the function parameter is also a feasible solution.
An example of a magic number defect comes from a research project that the author engaged in when he was in college. This project needs to get energy data updated every 15 minutes. No one thought this number might change, so the team wrote a lot of functions that used the magic number 15 (or 96, the number of observations per day). These functions work well until you start getting data at intervals of 5 minutes and 1 minute. It took the entire team several weeks to modify the functions so that they could accept a time interval as a parameter. Even so, I encountered a lot of errors caused by the use of magic numbers for several months.
The real data often change, such as the exchange rate changing every minute. Forcing programming with specific values means that you may have to spend a lot of time rewriting code and fixing errors. There is no place for "magic" in programming, even in data science.
The importance of standards and agreements
The advantage of using standards is that they help developers simply make global decisions rather than many local decisions. Instead of choosing the location of the declaration each time you name the variables, make a decision at the beginning of the project, and then use the variables consistently throughout the project. The goal is to spend less time on non-core issues of data science such as naming, format, and style, and more time to solve important problems (such as using machine learning to study environmental changes).
Developers who are used to working alone may find it difficult to realize the advantages of adopting standards. However, even if you work alone, you can practice defining your own rules and use them consistently. Developers will be able to make fewer trivial decisions, and this will prepare for team development work in the future. Standards are necessary in any project that requires more than one person.
Readers may question some of the naming choices in this article, which is irrelevant. It is more important to adopt a consistent set of standards, rather than the maximum length of the space or variable name used in naming. The key is not to spend a lot of time on accidental problems, but to focus on solving inevitable problems.
At this point, the study on "how to change the variable name in Python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.