The unknown threat big data security architecture that I understand 04/21 Update SLTechnology News&Howtos

The unknown threat big data security architecture that I understand

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

With regard to big data's security architecture, we need to answer the following questions:

1. What big data tools or big data biosphere are used to support the whole structure?

2. What kind of data do we need to collect to analyze related security issues

3. What kind of AI algorithm should be used to support the analysis of uncharacteristic threats?

4. What kind of classification and form should be adopted to show the related security issues?

5. How to respond to security problems to form relevant closed loops, so as to improve the ability to solve security problems

This is my understanding of the above questions (from a personal point of view only):

1. It is recommended to use the big data biosphere based on Spark (ELK structure is fine, of course, but ElasticSearch itself does not have the capability similar to MapReduce; the stand-alone performance of Logstash is really worrying) Even if Flume or self-developed LogParser is used as the log standardization tool to decompose the log metadata, Kafka as the log delivery tool, Spark RDD for correlation analysis, SparkR or MLlib as the machine learning tool, Mesos as the resource scheduler, Zookeeper as the HA tool (if necessary), and HDFS as the final processing and storage tool

2. Need to collect PCAP and analyzed network metadata (mainly network session information, including application protocol information, such as DNS, HTTP/HTTPS, SMTP/IMAP/POP3, SSH, SSL/TLS, Telnet, etc.) Especially for situations where there may be bounce connections, IDS/IPS, firewalls (actually not necessary), operating system itself (including port, process information), server database / NoSQL, middleware, antivirus gateway / software (including sandbox execution results), other applications (mainly systems used by users or self-developed), and so on. It is necessary to establish the corresponding relationship between the user and the terminal; as to whether the loopholes of all kinds of systems need to be collected, this may not be necessary, because now most of them use 0day for *, even if the loopholes are swept out, it may not be effective.

3. In Spark, AI algorithm mainly includes four categories: regression, classification, clustering (unsupervised) and collaborative filtering. I think the security issues may mainly focus on classification and clustering, especially for clustering (users can not be expected to know too much, so unsupervised is best). We only need to know the good and the bad. For example, you can portray the behavior of a terminal, that is, its network behavior (in / out), system behavior (port, service information), application behavior (logged in application system and its actions, this requires the collection of relevant application system logs, email sending and receiving, etc.)

4. As for what form to use to show the security problem, this is too much. For more information, please refer to the relevant examples of Splunk, but you should be careful not to let users do too much, but for each problem, you can still trace and export its original content to facilitate Forensics.

5. Of course, general security issues can be notified, transferred and handled in the form of ticket / SMS / email, but in case of emergency, they can be blocked directly through the application gateway (confirm that it is indeed a 100% security problem. Why not use a firewall to block it, readers can think about it).

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.