In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "what knowledge needs to be mastered to do data analysis". In daily operation, I believe many people have doubts about what knowledge they need to master when doing data analysis. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what knowledge you need to master to do data analysis". Next, please follow the editor to study!
1) knowledge of statistics.
This is the deficiency of a large number of big data analysts. Of course, we are not talking about some simple statistics here. But including mean, median, standard deviation, variance, probability, hypothesis testing, etc., with time, space, data itself. It should almost be the knowledge of advanced mathematics in science and engineering, or even a little higher. To be able to model, otherwise your analysis results are 108000 miles away from the actual results, it is estimated that in a few days, you will be packed up and left. Of course, being an ordinary big data analyst will not involve very deep knowledge of higher mathematics, but if you want to be a good big data analyst, you still have to study and study again.
2) be familiar with EXCEL.
Of course, do not need to master Gao Daquan, but also to master the commonly used functions, such as key points including but not limited to sum,count,sumif,countif,find,if,left/right, time conversion, a variety of chart practices and so on. If the amount of data is not very large, Excel can solve many problems. For example, filter some stolen data, sort, select the data that meet the criteria, and so on.
3) the practice of analytical thinking.
Such as structured thinking, mind mapping, or Baidu brain map, McKinsey-style analysis, understand some smart, 5W2H, SWOT and so on. You don't have to know much, but you have to know something.
4) knowledge of database.
Big data big data, that is, a large amount of data, Excel can not solve such a large amount of data, we have to use the database. If it is a relational database, such as Oracle, mysql, sqlserver, etc., you also have to learn to use SQL statements, (m.cnitedu.cn) filtering sorting, summarization, and so on. You also have to learn about non-relational databases, such as Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j and HBase, etc., or at least one or two commonly used ones, such as Hbase,Mongodb,redis.
5) Business learning.
In fact, for big data analysts, understanding the business is more important than understanding the data. For the industry business is how to go for the data analysis has a very important role, do not understand the business, maybe the results of your analysis is not what others want.
6) Development tools and environment.
For example: Linux OS, Hadoop (storing HDFS, computing Yarn), Spark, or other middleware. At present, there are many development tools, such as Java, python and so on.
What kind of data analysis software are there?
1. Excel
Excel is an important part of Microsoft office software, which can process all kinds of data, statistical analysis and auxiliary decision-making operations. it is widely used in management, statistical finance, finance and many other fields.
1. Data function
2. Statistical analysis
3. Chart function
4. Advanced screening
5. Automatic summarization function
6. Advanced mathematical calculation
II. SAS software
SAS, one of the largest software companies in the world, is a statistical analysis software developed by NORTH CAROLINA State University in the United States in 1966. SAS integrates data access, management, analysis and presentation organically. The main advantages are as follows: powerful, complete and new statistical methods, easy to use, flexible operation, and provide online help function.
III. R software
R is a complete software system for data processing, calculation and mapping.
The main advantages are as follows: data storage and processing system, array operation tools (especially powerful in vector and matrix operations), complete and coherent statistical analysis tools, excellent statistical mapping functions.
Simple and powerful programming language: can manipulate the input and output of data, can achieve branches, loops, user-defined functions
R is not so much a statistical software as a mathematical computing environment, because R does not only provide a number of statistical programs, users only need to specify a database and a number of parameters to carry out a statistical analysis.
R is a free software available in UNIX, LINUX, MacOS and WINDOWS versions, all of which can be downloaded and used for free. The R installer, various plug-ins and documentation can be downloaded from the R home page. Only 8 basic modules are included in R's installer, and other external modules are available through CRAN.
IV. SPSS
SPSS is the earliest statistical analysis software in the world.
The main advantages are as follows: simple operation, convenient programming, powerful function, data interface, module combination, strong pertinence:
5. Python
Python is an object-oriented, interpretive computer programming language. Python syntax is concise and clear, with rich and powerful class libraries. It is often nicknamed glue language and can easily connect various modules made in other languages (especially CpicurCraft +).
A common application scenario is to use Python to quickly generate a prototype of the program (sometimes even the final interface of the program), and then rewrite the parts with special requirements in a more appropriate language, such as the graphics rendering module in 3D games. If the performance requirements are particularly high, it can be rewritten with C _ Python +, and then encapsulated into an extension library that can be called by Python. It is important to note that you may need to consider platform issues when using extended class libraries, and some may not provide cross-platform implementations.
The main advantages are as follows: simple, easy to learn, fast, high-level language, portability, interpretation
At this point, the study of "what knowledge needs to be mastered in data analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.