Dependencies between functions

First, we need to understand how to resolve dependencies between functions. Hopefully, this is pretty straightforward: if a function g is called inside a function f, then f depends on g.

Unfortunately, this naive approach may lead to inconsistencies in practice.

A quick Python example

Let's consider the following Python program:

import numpy as np
from matplotlib.pyplot import plot, show

def f(x):
    return x + 42

def plot_graph(x_min, x_max, N = 1000):
    # linspace samples N points evenly-spaced between xmin and xmax
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    # create a plot representing f on the segment [xmin, xmax]
    plot(abs, ord)
    # display that plot
    show()

Intuitively, we'd like to generate the following dependency graph:

flowchart TD
    A["plot_graph"]:::resolved --> B["numpy.linspace"]:::unresolved
    A --> C["f"]:::resolved
    A --> D["matplotlib.pyplot.plot"]:::unresolved
    A --> E["matplotlib.pyplot.show"]:::unresolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

The functions in green are resolved, that is, we computed their dependencies. On the other hand, the functions in orange have not been resolved yet: we know they are used, but we didn't access their source code to compute their own dependencies. They are unresolved.

Worse, let's generalize a bit, and consider a case where the function f is only known at runtime:

import numpy as np
from matplotlib.pyplot import plot, show

def plot_graph(f, x_min, x_max, N = 1000):
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    plot(abs, ord)
    show()

We will still generate the same graph, except that the function f will now be unresolved. Which means, we will be searching for it in other files - we will see why in the next page.

Namespaces and classes

Note that it is important to keep the full namespace in memory (i.e. numpy.linalg instead of linalg), because two different functions may have the same name but live in different namespaces. It is also important to be aware of the aliases to resolve correctly the functions in their respective source files.

Let's give another Python example which presents the difficulty of resolving methods:

class A:
    def f(self): ...

class B(A):
    def f(self): ...

def g(x):
    x.f()

So, does g depend on A.f or B.f? Well, if you can answer this question, you should definitely let me know. For some languages, it might be possible to try to have some heuristics thanks to static typing, but it is still an unsolvable problem. For example, in Java, we can't resolve the methods at compilation time, we need to do some dynamic dispatching using virtual tables.

When encountering the kind of situation described above, unidep will pick one the candidate functions (here A.f and B.f) according to some heuristics.

So, are we doomed?

Kinda.

But let us not give in to despair. We won't be able to generate statically a perfect call graph of functions, sure. But we are allowed to hope that, in practice, it will give significantly good enough results to be used. Especially because unidep was initially built to visualize dependencies between files (in one huge project which doesn't use that kind of trick ~~wdym unidep is an over engineered hobby project?~~).