In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
How to use Python to build MySQL data processing system, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
I am a natural language processing / machine learning scientist, and I don't think I can successfully work in this field with a good computer science background. The outstanding researchers I know may not have the skilled skills to be part of the real-time development and repository team for some reason. So, I still want to share them, and maybe someone will find this information very useful.
I hope people will like the story I told.
(1) handling exceptions is very important.
When I first tried to connect to this particular MySQL database stored in Google's cloud, I encountered a lot of different errors. When setting up proxies, many agents were experienced. The problem is that in the first phase of code development, it is best to handle all errors, especially those related to connections, and raise them if necessary, otherwise the Exchange statement is called.
This sounds simple, but in my case, there may be environment variables, including UNIX socket names and node environment names, their values may be incorrect, database credentials may be incorrect, and I can have everything about it. I spent a few hours working on these examples, but saved a lot of time, and I'm happy to spend that time on this stage of project development.
(2) appropriate abstract classes are priceless.
The most important thing to keep in mind when dealing with an abstract class is that it may take a lot of time and attention to define it, and you do need it. The structure of my repository is based on the fact that I have to create many .csv files with very similar patterns (unique keys). In fact, I have a lot of similar extractors, algorithms, data post-processors, and so on, all of which are reduced to basic abstract classes, which makes it easier to create the next module.
When you write the nth module, you can realize that your class has been completed, and understand that the constructors and some methods that are not defined in the compilation process have been implemented, so you don't need to worry about them.
(3) flexible repository structure is always the best.
Sometimes it may look a little ugly (for example, there is 1 file in a folder), but it would be nice to see that some key modules (such as text preprocessors) need to be changed and that only 1-2 files need to be changed.
I'm not a software architect, so it's hard to tell the pros and cons of this area, but I think it's always good for components to be highly fragmented and independent. The repoI I developed myself has a large number of small folders, and directing to them is easier (perhaps more beautiful) than trying to make the entire architecture.
(4) it is worthwhile to test the data scientific model.
I didn't have enough time to complete the perfect test covering all the cases. The reason I still mention this is that if you don't have that obvious ML/NLP model behavior, it's best to test it at least for yourself.
I don't have many NLP/ML algorithms (most of them are simple), but without implementing even the simplest tests, the rest of them won't be supported. In addition, testing is often useful for better model understanding, because through assertion statements, some algorithm concepts may become clearer when you want to refresh the algorithm in your mind.
(5) make the database conform to the third normal form.
Sometimes this may be part of the discussion, but it is impossible to write an effective data processing system without making all three statements fully applicable to the database. Without them, some non-obvious query problems often occur, and it is even impossible to find the problem.
Here is a short and simple guide to SQL NF, which I think you'd better read a few times. (https://www.geeksforgeeks.org/database-normalization-normal-forms/)
(6) recording errors
When implementing logging, you usually don't look at all warnings and errors received for three years, but some errors may be unrepeatable, and logging helps you understand what's going on. I implemented it on my local machine, and when something on the server wasn't working, I could save a few hours by looking at similar cases.
(7) object-relational mapping (ORM) is not required unless the database is very simple.
Working on this project for a long time, I was really worried about the need to rewrite everything with object-relational mapping (ORM). But I was wrong.
In fact, things like SQLAlchemy and Peewee are suitable for small, simple databases, but they are not suitable for databases like complex ones (sometimes it requires four groups and five connections to write a query). They are elegant, sometimes very simple and beautiful, but in any case, if you only use the connector API, you can't have as much control as possible. I decided to use MySQL Connector because writing everything in object-relational mapping (ORM) can make tricky things more complicated.
Conclusion
This comment has nothing to do with the interpretation of the ML/ NLP algorithm and its performance discussion, but I still think it is useful. I want to know all the statements described above before I start working on this project, but I'm sure that some of them will become clear and understandable only after taking some time to fix bug and look for actual problems.
This is the answer to the question about how to build a MySQL data processing system with Python. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 232
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.