You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then all the analyses work correctly: the taint flow is detected, call edges are resolved, and points-to results appear.
Questions
Is it because functions not reachable from the control flow graph are simply skipped from points-to/call/taint analysis (the latter two might all depend on the points-to analysis)?
That is, CodeQL cannot analyze them during the points-to analysis if they are "dead code" (unreachable)?
If so, is there a way to fix or improve this?
Many real-world library codes (e.g., APIs, frameworks) don't call their entry points internally. The invocation happens only from external application code. Would this design prevent CodeQL from correctly analyzing libraries?
Could we force CodeQL to conservatively analyze all function bodies even if they are not called?
This is a known limitation for, I think, all languages, and it is something that is already on our radar. You are right that this happens because data flow basically considers the Apple.eat method dead, as there are no calls to it, but adding it like in your example, will make it live.
I'll add this example to our internal issue tracking this.
Hi @hvitved, thank you very much for your comments.
I took another closer look and worked on mitigating the issue. I believe I’ve now fixed the points-to and call edge analyses in the examples. However, I’m still encountering issues with the data flow analysis.
For the points-to analysis, I commented out the following lines in the InstanceObject.selfAttribute predicate:
This predicate selfAttribute under the InstanceObject abstract class is responsible for resolving attribute accesses on runtime object values. In my example, the self in self._B corresponds to a value of type SelfInstanceInternal (i.e., self instance of Apple). I modified it to consider all possible self assignments, rather than limiting it to just those in the __init__ method.
It seems that the real blocker was the following line inside this.initializer(init, callee), which causes the attribute resolution to fail previously, but I’m not sure why:
Now, both points-to and call edge analyses work correctly, even without explicit call sites.
However, I’m still unsure where to patch the taint flow analysis. From what I can tell, the taint tracking doesn't seem to rely on points-to analysis in this case—but I’d appreciate any clarification on that. If you have any suggestions on where I should look or how to proceed with enabling taint propagation in this scenario, they would be greatly appreciated.
I'm not familiar with the internals of the Python analysis, but for dataflow/taint tracking, I believe the approach that we want to take is to simulate that calls like
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.
Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.
🎥 Watch TMZ Live
TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.
Uh oh!
There was an error while loading. Please reload this page.
-
Given the following simple Python code:
I defined the following CodeQL queries:
Taint Flow analysis:
Call Edge Analysis:
PointsTo Analysis:
If I analyze this code alone, all the analyses (taint flow, call edges, points-to) return empty.
However, if I add a simple instantiation and invocation:
Then all the analyses work correctly: the taint flow is detected, call edges are resolved, and points-to results appear.
Questions
Is it because functions not reachable from the control flow graph are simply skipped from points-to/call/taint analysis (the latter two might all depend on the points-to analysis)?
That is, CodeQL cannot analyze them during the points-to analysis if they are "dead code" (unreachable)?
If so, is there a way to fix or improve this?
Many real-world library codes (e.g., APIs, frameworks) don't call their entry points internally. The invocation happens only from external application code. Would this design prevent CodeQL from correctly analyzing libraries?
Could we force CodeQL to conservatively analyze all function bodies even if they are not called?
Beta Was this translation helpful? Give feedback.
All reactions