Dependencies between files

Now that we can build dependencies between functions, we will use that and import/include/... directives to build dependencies between files.

Back to the Python example

Let's reuse the previous Python example, and externalizing the compute of f in another file:

# compute.py
def f(x):
    return x + 42

# main.py
import numpy as np
from matplotlib.pyplot import plot, show

from compute import f

def plot_graph(x_min, x_max, N = 1000):
    # linspace samples N points evenly-spaced between xmin and xmax
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    # create a plot representing f on the segment [xmin, xmax]
    plot(abs, ord)
    # display that plot
    show()

We'd like to generate a dependency graph like this:

flowchart TD
    A["main.py"]:::resolved --> B["numpy"]:::unresolved
    A --> C["matplotlib.pyplot"]:::unresolved
    A --> D["compute.py"]:::resolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Here, all dependencies between files are explicit - that is, there are some import statements that we can leverage to determine the dependencies. But it's not completely trivial: from import compute, we need to locate a file on disk. This Python file is probably named compute.py.

In Rust for example, use utils::{f, g}; means we should try to locate a file named utils.rs or utils/mod.rs. This process of locating source code files from module names is language-specific. Hence the need for language-specific explorers, which are responsible for solving this problem.

Another problem, easier but still language-specific is to identify these relevant use/import/... statements which express explicit dependencies. This is solved by language-specific resolvers, which allows us to know which specific instructions express this idea of explicit dependency.

Dependencies are not only explicit

Let's now take a look at a C program composed of 4 files:

my-codebase
├── a.c
├── a.h
├── b.c
└── main.c

// a.h
int f();
int g();

// a.c
#include "a.h"
int f() { ... }

// b.c
#include "a.h"
int g() { ... }

// main.c
#include "a.h"

int main() { ... }

We'd like to obtain something like this:

flowchart TD
    A["main.c"]:::resolved --> B["a.h"]:::resolved
    B --> C["a.c"]:::resolved
    C --> B
    B --> D["b.c"]:::resolved
    D --> B

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Three arrows are explicitly generated by the #include "a.h" directives. The other dependencies, going from a.h to a.c/b.c are implicit. Functions f and g are declared in a.h, but they are defined (implemented) in a.c and b.c, so we definitely want to say that a.h depends on both of these files.

If it doesn't seem necessary to you, let's consider the following main.c (this is almost real code from real projects):

int main()
    extern int f();
    extern int g();

    ...

In that case, we don't have a header, but we still want to generate something like this:

flowchart TD
    A["main.c"]:::resolved --> B["a.c"]:::resolved
    A --> C["b.c"]:::resolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Basically, unidep is playing the role of the linker for the languages that have a distinction between declaration and definition. We need to track the functions that we are still trying to resolve (above f and g), and add a file dependency if we find the definition of one of these functions.