What is the realization of deep-seated requirements in the application process of SARIF? 07/13 Update SLTechnology News&Howtos

What is the realization of deep-seated requirements in the application process of SARIF?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to realize the deep-seated requirements of SARIF in the process of application. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Abstract: in order to reduce the cost and complexity of summarizing the results of various analysis tools into the general workflow, the industry began to use the static analysis results exchange format (Static Analysis Results Interchange Format (SARIF)) to solve these problems.

1. Introduction

At present, DevSecOps has become an important model for building enterprise-level R & D security. Static scanning tools are integrated into the development process of DevSecOps and play an important role in improving the overall safety level of the product. In order to maximize the coverage of security checking capabilities, development teams usually introduce multiple security scanning tools. But it also brings more problems to developers and platforms. In order to reduce the cost and complexity of summarizing the results of various analysis tools into the common workflow, the industry began to use static analysis results exchange format (Static Analysis Results Interchange Format (SARIF)) to solve these problems. The editor will introduce the realization of deep-seated requirements in the application of SARIF.

2. SARIF advance

Last time we talked about some basic applications of SARIF, let's talk about some applications of SARIF in more complex scenarios, so as to provide a complete reporting solution for static scanning tools.

In the latest 2021.03 version of the industry-famous static analysis tool Coverity, new features include: support for displaying Coverity scan results in SARIF format in the GitHub code warehouse. It can be seen that Covreity has also completed the adaptation of the SARIF format.

2.1. Use of metadata (metadata)

In order to prevent the scan report from being too large, some reused information needs to be extracted as metadata. For example: rules, messages of rules, contents of scanning, etc.

In the following example, the rule and rule information are defined in tool.driver.rules, and the rule number ruleId is directly used in the scan result (results) to get the rule information. At the same time, the message also uses message.id to get the alarm information. In this way, the rules can avoid a large amount of repetitive information with the same alarm, and effectively reduce the size of the report.

The vscode displays as follows:

{"version": "2.1.0", "runs": [{"tool": {"driver": {"name": "CodeScanner", "rules": [{"id": "CS0001" "messageStrings": {"default": {"text": "This is the message text. It might be very long. "}]}," results ": [{" ruleId ":" CS0001 "," ruleIndex ": 0 "message": {"id": "default"}}]} 2.2. Use of message parameters

The alarm of the scan result often needs to give the relevant information of the specific variable or function in the prompt message according to the specific code problem, which is convenient for the user to understand the problem. At this time, variable defect messages can be provided in the form of message parameters.

In the following example, the information template is provided as a placeholder ("{0}") in the message of the rule, and in the scan result (results), the corresponding parameters are provided through the arguments array. It appears in vscode as follows:

{"version": "2.1.0", "runs": [{"tool": {"driver": {"name": "CodeScanner", "rules": [{"id": "CS0001" "messageStrings": {"default": {"text": "Variable'{0} 'was used without being initialized."]}}, "results": [{"ruleId": "CS0001" "ruleIndex": 0, "message": {"id": "default", "arguments": ["x"]}}]} 2.3. Use of associated information in messages

In some cases, in order to better explain the cause of this alarm, users need to provide more reference information to help them understand the problem. For example, give the definition location of this variable, the introduction point of the pollution source, or other auxiliary information.

In the following example, the introduction location of the pollution source is given by defining the associated location (relatedLocations) of the location where the problem occurred (locations). This is shown in vscode, but when the user clicks "here", the tool can jump to the location introduced by the variable expr.

{"ruleId": "PY2335", "message": {"text": "Use of tainted variable 'expr' (which entered the system [here] (1)) in the insecure function' eval'."}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "3-Beyond-basics/bad-eval.py"} "region": {"startLine": 4}}], "relatedLocations": [{"id": 1, "message": {"text": "The tainted data entered the system here."} "physicalLocation": {"artifactLocation": {"uri": "3-Beyond-basics/bad-eval.py"}, "region": {"startLine": 3}}]} 2.4. Use of defect classification information

The classification of defects is very important for the analysis of tools and scan results. The tool can rely on the rule management of defect classification, which is convenient for users to select the rules they need; on the other hand, when users view the analysis report, they can also filter the analysis results quickly through the classification of defects. Tools can refer to industry standards, such as our commonly used Common Weakness Enumeration (CWE), or customize their own categories, which are supported by SARIF.

Examples of defect classification

{"version": "2.1.0", "runs": [{"taxonomies": [{"name": "CWE", "version": "3.2", "releaseDateUtc": "2019-01-03", "guid": "A9282C88-F1FE-4A01-8137-E8D2A037AB82" "informationUri": "https://cwe.mitre.org/data/published/cwe_v3.2.pdf/"," downloadUri ":" https://cwe.mitre.org/data/xml/cwec_v3.2.xml.zip", "organization": "MITRE", "shortDescription": {"text": "The MITRE Common Weakness Enumeration"} "taxa": [{"id": "401", "guid": "10F28368-3A92-4396-A318-75B9743282F6", "name": "Memory Leak", "shortDescription": {"text": "Missing Release of Memory After Effective Lifetime"} "defaultConfiguration": {"level": "warning"}], "isComprehensive": false}], "tool": {"driver": {"name": "CodeScanner" "supportedTaxonomies": [{"name": "CWE", "guid": "A9282C88-F1FE-4A01-8137-E8D2A037AB82"}], "rules": [{"id": "CA2101" "shortDescription": {"text": "Failed to release dynamic memory."}, "relationships": [{"target": {"id": "401", "guid": "A9282C88-F1FE-4A01-8137-E8D2A037AB82" "toolComponent": {"name": "CWE", "guid": "10F28368-3A92-4396-A318-75B9743282F6"}} "kinds": ["superset"]}}, "results": [{"ruleId": "CA2101" "message": {"text": "Memory allocated in variable 'p' was not released."}, "taxa": [{"id": "401"," guid ":" A9282C88-F1FE-4A01-8137-E8D2A037AB82 "," toolComponent ": {" name ":" CWE " "guid": "10F28368-3A92-4396-A318-75B9743282F6"}]} 2.4.1. Introduction of industry classification standards (runs.taxonomies)

Definition of taxonomies

"taxonomies": {"description": "An array of toolComponent objects relevant to a taxonomy in which results are categorized.", "type": "array", "minItems": 0, "uniqueItems": true, "default": [], "items": {"$ref": "# / definitions/toolComponent"}}

The taxonomies node is an array node that can define multiple classification criteria. At the same time, the definition of taxonomies refers to the definition of toolComponent under the definition group node definitions. This is consistent with our previous tool scanning engine (tool.driver) and tool extension (tool.extensions). The reason for this design is the strong correlation between the engine and the results, which can be made consistent in attributes in this way.

Definition of Industry Standard Classification (standard taxonomy)

In the example, the industry classification standard CWE is declared through the runs.taxonomies node. The description of the specification is given through the attribute node in the node taxonomies. The following is only an example. Refer to the specification description of SARIF:

Name: canonical name

Version: version

ReleaseDateUtc: date of release

Guid: unique identification to facilitate referencing of this specification elsewhere

InformationUri: document information for rules

DownloadUri: download address

Organization: publishing organization

ShortDescription: a short description of the specification.

2.4.2. Introduction of custom classification (runs.taxonomies.taxa)

Taxa is an array node, in order to reduce the size of the report, it is not necessary to put all the custom classification information under the taxa node, just list the classification information related to this scan. This is why the default value for identifying isComprehensive nodes later is false.

In the example, a classification required by the tool: CWE-401 memory leak is introduced through the taxa node, and a unique identification of this classification is made using guid and id, so that later tools can refer to this identity in rules or defects.

2.4.3. Tools are associated with industry classification standards (tool.driver.supportedTaxonomies)

Tool objects are associated with defined industry classification standards through tool.driver.supportedTaxonomies nodes. The array element of supportedTaxonomies is a toolComponentReference object because the taxonomy taxonomies itself is a toolComponent object. The toolComponentReference.guid attribute matches the guid attribute of the taxonomy object defined in run.taxonomies [].

In the example, supportedTaxonomies.name:CWE, it indicates that the tool supports the CWE taxonomy and refers to guid:A9282C88-F1FE-4A01-8137-E8D2A037AB82 in taxonomies [0] to associate it with the industry classification standard CWE.

2.5. Rule and defect classification association (rule.relationships)

The rule is defined under the tool.driver.rules node. Rules is an array node, and the rule is defined by the reportingDescriptor object in the array element.

The relationships in each ReportingDescriptor is an array element, and each element is a reportingDescriptorRelationship object that establishes a relationship from that rule to another reportingDescriptor object. The goal of a relationship can be a taxon in the taxonomy (as shown in this example) or another rule in another tool component

The target attribute in the ReportingDescriptorRelationship identifies the target of the relationship, whose value is a reportingDescriptorReference object, which refers to the reportingDescriptor in the object toolComponent.

The toolComponent in the reportingDescriptorReference object is a toolComponentReference object that points to the classification defined in the tool supportedTaxonomies.

The following figure shows the relationship between the rules in the example and defect classification:

2.5.1. Classification in scan results (result.taxa)

In the scan results (run.results), under each result (result), there is an attribute classification (taxa), taxa is an array element, and each element in the array points to the reportingDescriptorReference object to specify the classification of the defect. This is classified in the same way as the rules. From this point, we can also see that we can omit the taxa under result and instead classify defects through rules.

2.6. Code stream (Code Flow)

Some tools detect problems by simulating the execution of programs, sometimes across multiple threads of execution. SARIF simulates execution through a set of location information, like a code stream (Code Flow). The SARIF code flow contains one or more thread streams, each of which describes the chronological location of the code on a single thread of execution.

2.6.1. Defect Code flow Group (result.codeFlows)

Because there may be more than one code flow in the defect, the optional result.codeFlows property is an array codeFlow object.

Result: {"description": "A result produced by an analysis tool.", "additionalProperties": false, "type": "object", "properties": {. "codeFlows": {"description": "An array of 'codeFlow' objects relevant to the result.", "type": "array", "minItems": 0, "uniqueItems": false, "default": [], "items": {"$ref": "# / definitions/codeFlow"}} }} 2.6.2. Thread flow group (codeFlow.threadFlows) for code flow

As you can see from the definition of codeFlow, each code flow is made up of a threadFlows, and a threadFlows is required.

"codeFlow": {"description": "A set of threadFlows which together describe a pattern of code execution relevant to detecting a result.", "additionalProperties": false, "type": "object", "properties": {"message": {"description": "A message relevant to the code flow.", "$ref": "# / definitions/message"} "threadFlows": {"description": "An array of one or more unique threadFlow objects, each of which describes the progress of a program through a thread of execution.", "type": "array", "minItems": 1, "uniqueItems": false, "items": {"$ref": "# / definitions/threadFlow"},} "required": ["threadFlows"]}, 2.6.3. Thread flow (threadFlow) and thread flow location (threadFlowLocation)

In each thread flow (threadFlow), an array of locations (locations) describes the tool's analysis of the code.

Thread flow (threadFlow) definition:

ThreadFlow: {"description": "Describes a sequence of code locations that specify a path through a single thread of execution such as an operating system or fiber.", "type": "object", "additionalProperties": false, "properties": {"id": {. Message: {. InitialState: {. ImmutableState: {. "locations": {"description": "A temporally ordered array of 'threadFlowLocation' objects, each of which describes a location visited by the tool while producing the result.", "type": "array", "minItems": 1, "uniqueItems": false, "items": {"$ref": "# / definitions/threadFlowLocation"}} "properties": {...}, "required": ["locations"]}

Thread flow location (threadFlowLocation) definition:

Each element in the location group (locations) represents the tool's access to the code location through the threadFlowLocation. Finally, the location information of the analysis is given through the location attribute of the location type. Location can contain physical and logical location information, so codeFlow can also be used for the representation of binary analysis streams.

There is also a node of the state attribute in threadFlowLocation that we can use to store variables, the values of expressions, or symbol table information, or for state machine representations.

"threadFlowLocation": {"description": "A location visited by an analysis tool while simulating or monitoring the execution of a program.", "additionalProperties": false, "type": "object", "properties": {"index": {"description": "The index within the run threadFlowLocations array.", Location: {"description": "The code location.", "$ref": "# / definitions/location"}, "state": {"description": "A dictionary, each of whose keys specifies a variable or expression, the associated value of which represents the variable or expression value. For an annotation of kind 'continuation', for example, this dictionary might hold the current assumed values of a set of global variables. "," type ":" object "," additionalProperties ": {" $ref ":" # / definitions/multiformatMessageString "}},...}, 2.6.4. Code flow sample

Reference code

1. # 3-Beyond-basics/bad-eval-with-code-flow.py2.3. Print ("Hello, world!") 4. Expr = input ("Expression >") 5.use_input (expr) 6. 7. Def use_input (raw_input): 8. Print (eval (raw_input))

The above is an example of code injection of python code.

On the fourth line, the input information is assigned to the variable expr

On the fifth line, the variable expr enters the function use_input through the first argument of the function use_input

In the eighth line, the input result is printed through the function print, but here the function eval () is used to deal with the input parameters. Since the parameters are directly used in the processing of the function eval without being tested after input, the security issue of code injection may be introduced here.

This analysis process can be shown by the following scan results, making it easy for users to understand the process of the problem.

Scan result

{"version": "2.1.0", "runs": [{"tool": {"driver": {"name": "PythonScanner"}}, "results": [{"ruleId": "PY2335" "message": {"text": "Use of tainted variable 'raw_input' in the insecure function' eval'."} "locations": [{"physicalLocation": {"artifactLocation": {"uri": "3-Beyond-basics/bad-eval-with-code-flow.py"} "region": {"startLine": 8}], "codeFlows": [{"message": {"text": "Tracing the path from user input to insecure usage."} "threadFlows": [{"locations": [{"message": {"text": "The tainted data enters the system here."} "location": {"physicalLocation": {"artifactLocation": {"uri": "3-Beyond-basics/bad-eval-with-code-flow.py"} "region": {"startLine": 4} "state": {"expr": {"text": "42"}}, "nestingLevel": 0} {"message": {"text": "The tainted data is used insecurely here."} "location": {"physicalLocation": {"artifactLocation": {"uri": "3-Beyond-basics/bad-eval-with-code-flow.py"} "region": {"startLine": 8} "state": {"raw_input": {"text": "42"}} "nestingLevel": 1}]}]}

Here is just a simple example, through SARIF's codeFLow, we can adapt to a more complex analysis process, so that users can better understand the problem, and then quickly make judgments and modifications.

2.7. Defect fingerprint (fingerprint)

In large software projects, analysis tools can produce thousands of results at a time. In order to deal with so many results, in defect management, we need to record existing defects, establish a scanning baseline, and then deal with existing problems. At the same time, in the later scan, the new scan results need to be compared with the baseline to distinguish whether there are new problems. In order to determine whether the result of the subsequent operation is logically the same as that of the baseline, we must use an algorithm to construct a stable identity using the unique information contained in the defect result, which we call fingerprint. Use this fingerprint to identify the characteristics of this defect to distinguish it from other defects. We also call this fingerprint the defect fingerprint of this defect.

Defect fingerprints should contain relatively stable defect information:

The name of the tool that produces the result

Rule number

Analyze the file system path of the target; this path should be the relative path that the project itself has. The location information of the project in front of the path should not be included, because each machine may store the project in a different location.

Defect eigenvalue (partialFingerprints).

Each scan result (result) of SARIF provides a set of such attribute nodes for the storage of defect fingerprints, so that the defect management system can identify the uniqueness of defects through these marks.

Result: {"description": "A result produced by an analysis tool.", "additionalProperties": false, "type": "object", "properties": {. "guid": {"description": "A stable, unique identifier for the result in the form of a GUID.", "type": "string" "pattern": "^ [0-9a-fA-F] {8}-[0-9a-fA-F] {4}-[1-5] [0-9a-fA-F] {3}-[89abAB] [0-9a-fA-F] {3}-[0-9a-fA-F] {12} $"}, "description": "A stable, unique identifier for the equivalence class of logically identical results to which this result belongs" In the form of a GUID. "," type ":" string "," pattern ":" ^ [0-9a-fA-F] {8}-[0-9a-fA-F] {4}-[1-5] [0-9a-fA-F] {3}-[89abAB] [0-9a-fA-F] {3}-[0-9a-fA-F] {12} $"} "occurrenceCount": {"description": "A positive integer specifying the number of times this logically unique result was observed in this run.", "type": "integer", "minimum": 1}, "partialFingerprints": {"description": "A set of strings that contribute to the stable, unique identity of the result.", "type": "object" "additionalProperties": {"type": "string"}, "fingerprints": {"description": "A set of strings each of which individually defines a stable, unique identity for the result.", "type": "object", "additionalProperties": {"type": "string"}} ......}}

Only through the inherent information characteristics of the defect, in some cases, it is not easy to get the information of the unique identification result. At this time, we need to add some attribute values which are strongly related to the defect as additional information to add to the calculation of the defect fingerprint, so that the final calculated fingerprint is unique. This is a bit like the salt value when we do the encryption algorithm, except that this salt value needs to ensure that the unique value generated is repeatable to ensure that the next scan can get the same input value for the same defect and get the same fingerprint as last time. For example, when the tool checks for sensitive words in a document, the alarm message is: "xxx should not be used in the document." At this point, you can use this word as an eigenvalue of this defect.

The SARIF format provides such a partialFingerprints attribute to hold this eigenvalue to allow analysis tools and other components in the SARIF ecosystem to use this information. The defect management system can attach it to the fingerprint constructed for each result. In the previous example, the tool can set the value of the property in the partialFingerprints object to: forbidden word. The defect management system should include information in the partialFingerprints in its fingerprint calculation.

For partialFingerprints, only attributes that are strongly related to the defect feature should be added, and the value of the attribute should be relatively stable. For example, the code line number of the defect is not suitable to be added to the logical operation of the fingerprint, because the code line is a value that often changes, and in the next scan, it is likely that because the developer added or deleted some lines of code before the problem line, the same problem will get different lines of code in the new scan report, thus affecting the calculated value of the defect fingerprint, resulting in differences in comparison.

Although we try to find a unique identification feature for each defect and add some variable feature attributes, it is still difficult to design an algorithm to construct a truly stable fingerprint result. For example, if there are several same sensitive words in the same file, we will not be able to give a unique identification for each alarm defect in the future. Of course, at this time, we can also add the function name as a fingerprint calculation factor, because the function name is relatively stable in a program, and the addition of the function name helps to distinguish the scope of the same problem in the same file. however, there will still be multiple identical defects of the same problem in the same function. So although we try to distinguish each alarm, the scene with the same defect fingerprint will still exist in the actual scanning.

Fortunately, fingerprints do not have to be absolutely stable for practical purposes. It only needs to be stable enough to reduce the number of results reported as "new" to a low enough level so that the development team can manage the results of error reports without much effort.

SARIF gives a general format of the standard output of the static scanning tool, which can meet various requirements of the report output of the static scanning tool.

For the integration of various static scanning tools into the DevSecOps platform, SARIF reduces the cost and complexity of summarizing scan results into a common workflow

SARIF will also make it possible for IDE to integrate various scan results and provide a unified defect handling module; the defect display and repair of scan results in IDE will allow tool developers to focus on finding problems and reduce the workload of adapting to various IDE

SARIF has become one of the standards of OASIS and is supported by Microsoft, GrammaTech and other important static scanning tool manufacturers; at the same time, SARIF is required to provide scan reports in the evaluation and competition of some static inspection tools. DHS, NIST

At present, SARIF is mainly designed for the results of static scanning tools, but because of the versatility of its design, some dynamic analysis tool manufacturers also give the successful application of SARIF.

On the application of SARIF in the process of deep-seated requirements of how to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.