Points-to Analysis Across Unreachable Functions from Control Flow Graph #19406

jackfromeast · 2025-04-29T03:08:08Z

jackfromeast
Apr 29, 2025

Given the following simple Python code:

class Apple:
  def __init__(self):
    self._B = Banana()

  def eat(self, msg):
    self._B.eat(msg)

class Banana:
  def __init__(self):
    pass

  def eat(self, msg):
    sink(msg)

I defined the following CodeQL queries:

Taint Flow analysis:

module DebuggingTaintTracking implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    exists (Function func | 
      func.getAnArg() = source.asExpr() and
      func.getName() = "eat" and
      func.getScope().getName() = "Apple"
    )
  }

  predicate isSink(DataFlow::Node sink) {
    exists (Call call, Name name | 
      call.getAnArg() = sink.asExpr() and
      call.getFunc() = name and
      name.getId() = "sink"
    )
  }
}

module Flow = TaintTracking::Global<DebuggingTaintTracking>;

predicate flowFromSourceToSink(DataFlow::Node fromNode, DataFlow::Node toNode) {
  Flow::flow(fromNode, toNode)
}

Call Edge Analysis:

predicate resolveCallToFunctionDef(string name, CallCfgNode callNode, Object func) {
  name = "eat" and
  (
	  (
	    callNode.(MethodCallNode).getMethodName() = name and
	    func.(FunctionObject).getAMethodCall() = callNode.asCfgNode()
	  ) or 
	  (
	    func.(FunctionObject).getAFunctionCall() = callNode.asCfgNode() and
	    exists (Call call, Name funcName |
	      call.getFunc() = funcName and
	      funcName.getId() = name and
	      callNode.asExpr() = call
	    ) 
	  ) or
	  (
	    func.(ClassObject).getACall() = callNode.asCfgNode() and
	    func.(ClassObject).getName() = name
	  )
	 )
}

PointsTo Analysis:

predicate pointsToTest(Attribute attr, Value value) {
  attr.pointsTo(value) and
  attr.getName() = "_B"
}

If I analyze this code alone, all the analyses (taint flow, call edges, points-to) return empty.

However, if I add a simple instantiation and invocation:

if __name__ == "__main__":
  a = Apple()
  a.eat("Hello, World!")

Then all the analyses work correctly: the taint flow is detected, call edges are resolved, and points-to results appear.

Questions

Is it because functions not reachable from the control flow graph are simply skipped from points-to/call/taint analysis (the latter two might all depend on the points-to analysis)?

That is, CodeQL cannot analyze them during the points-to analysis if they are "dead code" (unreachable)?

If so, is there a way to fix or improve this?

Many real-world library codes (e.g., APIs, frameworks) don't call their entry points internally. The invocation happens only from external application code. Would this design prevent CodeQL from correctly analyzing libraries?

Could we force CodeQL to conservatively analyze all function bodies even if they are not called?

hvitved · 2025-04-29T07:39:24Z

hvitved
Apr 29, 2025
Maintainer

Hi

This is a known limitation for, I think, all languages, and it is something that is already on our radar. You are right that this happens because data flow basically considers the Apple.eat method dead, as there are no calls to it, but adding it like in your example, will make it live.

I'll add this example to our internal issue tracking this.

Thanks

2 replies

jackfromeast Apr 30, 2025
Author

Hi @hvitved, thank you very much for your comments.

I took another closer look and worked on mitigating the issue. I believe I’ve now fixed the points-to and call edge analyses in the examples. However, I’m still encountering issues with the data flow analysis.

For the points-to analysis, I commented out the following lines in the InstanceObject.selfAttribute predicate:

codeql/python/ql/lib/semmle/python/objects/Instances.qll

Lines 41 to 43 in 359aa02

    
           this.initializer(init, callee) and 
        
           self_variable_reaching_init_exit(self) and 
        
           self.getScope() = init.getScope() and

This predicate selfAttribute under the InstanceObject abstract class is responsible for resolving attribute accesses on runtime object values. In my example, the self in self._B corresponds to a value of type SelfInstanceInternal (i.e., self instance of Apple). I modified it to consider all possible self assignments, rather than limiting it to just those in the __init__ method.

It seems that the real blocker was the following line inside this.initializer(init, callee), which causes the attribute resolution to fail previously, but I’m not sure why:

codeql/python/ql/lib/semmle/python/objects/Instances.qll

Line 261 in 359aa02

this.getClass().attribute("__init__", init, _)

Now, both points-to and call edge analyses work correctly, even without explicit call sites.

However, I’m still unsure where to patch the taint flow analysis. From what I can tell, the taint tracking doesn't seem to rely on points-to analysis in this case—but I’d appreciate any clarification on that. If you have any suggestions on where I should look or how to proceed with enabling taint propagation in this scenario, they would be greatly appreciated.

hvitved May 1, 2025
Maintainer

I'm not familiar with the internals of the Python analysis, but for dataflow/taint tracking, I believe the approach that we want to take is to simulate that calls like

if __name__ == "__main__":
  a = Apple()
  a.eat("Hello, World!")

exist in the code base. We do in fact have a prototype of this for JavaScript, but that PR is currently blocked because of performance impact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Points-to Analysis Across Unreachable Functions from Control Flow Graph #19406

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TMZ Celebrity News – Breaking Stories, Videos & Gossip

🎥 Watch TMZ Live

Points-to Analysis Across Unreachable Functions from Control Flow Graph #19406

Uh oh!

jackfromeast Apr 29, 2025

Replies: 1 comment · 2 replies

Uh oh!

hvitved Apr 29, 2025 Maintainer

Uh oh!

jackfromeast Apr 30, 2025 Author

Uh oh!

hvitved May 1, 2025 Maintainer

TMZ Celebrity News – Breaking Stories, Videos & Gossip

🎥 Watch TMZ Live

jackfromeast
Apr 29, 2025

Replies: 1 comment 2 replies

hvitved
Apr 29, 2025
Maintainer

jackfromeast Apr 30, 2025
Author

hvitved May 1, 2025
Maintainer