Introduction

Getting started

Installation

Unidep might (hopefully) support a large number of programming languages. The default installation installs everything, but it is also possible to specify the languages you are interested in to keep dependencies manageable.

Default installation

The default installation will install all currently supported languages. You can use unidep:

  • as a CLI binary:
    cargo install unidep
    
  • as a Rust library, in your Cargo.toml:
    [dependencies]
    unidep = "0.1.0"
    

Custom installation

To install only a selected subset of languages (see the list of currently supported languages):

cargo install unidep --no-default-features --features "python,rust"

Or, if you plan to use unidep as a library:

[dependencies]
unidep = { version = "0.1.0", default-features = false, features = ["python", "rust"]}

Currently supported languages

Warning

For now, there are obviously no language supported at all.

  • python
  • rust

If your favorite language is missing, feel free to contribute.

Quick start

Info

The quick start provides information for the CLI app only. Library users need to refer to the library chapter.

To get a png/svg graph of the dependencies within your project:

unidep src # will be saved in a default png file
unidep src -o deps.png # you can specify the output file with -o
unidep src -o deps.svg # unidep will infer the image format

If you prefer having a DOT file, or a JSON file:

unidep src --output-format dot # will be printed to stdout
unidep src --output-format json -o deps.json

You can build the dependency graph starting from one or more specific files:

unidep src/main.c src/another.c --include "src"
# Build dependencies of main.c and another.c and visit relevant files within the src directory

You can also look at dependencies at a function level:

unidep src/main.c --function --include "src"

To review all the possibilites of unidep as a binary program, see the using unidep as a CLI app section.

Using unidep as a CLI app

This section provides explanations on how to use the unidep binary.

Warning

Make sure you have installed the features (i.e. languages) you want to use. For installation, see getting started.

unidep allows the user to refine the dependencies he wants to compute, and also how it should be rendered.

Configuring input

Specifying a directory or file(s)

By default, unidep will browse the directory and build the dependency graph from files stored there. For example, to build the dependency graph between files in your src directory (and subdirectories):

unidep src

If you want to know the dependencies of only certain specific files, you can target them:

# Warning, this is probably not what you want
unidep src/main.c src/utils.c

However, unidep will only browse these two files. So a lot of dependencies might stay unresolved after browsing these two files. What you probably want to have are the dependencies of main.c and utils.c across your whole project. To do that, you can --include the list of paths that unidep will be allowed to browse.

unidep src/main.c src/utils.c --include "src"

If you use an external library, you can include its path to allow unidep building a more complete dependency graph:

unidep src/main.c src/utils.c --include "src,/path/to/external-lib"

You can also --exclude files within an included directory. --include and --exclude flags support glob patterns. For example, if you want to feature the headers of the external library, you can do:

unidep src src/main.c --include "src,/path/to/external-lib" --exclude "/path/to/external-lib/**/*.c"

Note that in the specific case above, including just the headers would be better:

unidep src/main.c --include "src,/path/to/external-lib/**/*.h"

Targeting specific languages

In case your project is using multiple languages, you can target specific languages using the language flag. The following command creates two graphs (one for Rust files, one for Python files), and ignores all source files of other languages:

unidep src --language "rust,python"

Tracing function calls

By default, unidep identifies dependencies between files in your project. You can also output the dependencies expressed at a function level. To do that, use the --function flag.

unidep src --function

You can also specify one or multiple functions of which to draw the dependency graph. Note that it can be complicated to resolve namespaces, so you may provide the relevant file(s) if another function with the same name is considered instead of the one you want. Use the --target-function flag to provide the name of the function(s):

unidep src --target-function "do_fancy_stuff,do_more_fancy_stuff"

Configuring output

Choosing an output file

You can set an output file using the -o flag. If this argument is not provided, the results will be either printed on the standard output if possible, or saved in a default file (for images).

Example:

unidep src -o my-output-file

Output formats

Unidep supports three types of output format: images, DOT files or JSON files.

You can select one of this format using the --output-format flag:

  • image (default)
  • dot (Graphviz DOT format)
  • json

Users who want a ready-to-use results will prefer image (the default format), the ones who want to render the graph in a custom way might prefer dot and users who want to reuse the output in another external program may prefer json.

Images and DOT files

For image, the format of the image is inferred from the output file extension. If no output file is provided, a default png file will be created, but it is recommended to provide an output file. Supported extensions:

  • png
  • svg.

For dot however, if no output file is provided, the output will be printed to standard output.

Examples:

unidep src # will save output to unidep-output.png
unidep src -o output.svg
unidep src --output-format dot # will print the DOT file to stdout

Custom display configuration

If you are aiming at produce nice graphs, you can specify a configuration file describing which display styles should be used to render different nodes. TODO support --display-config with a path to a config file

JSON

JSON output is not targeted at display, but rather for use by an external program. Thus, it doesn't support custom display configuration.

If no output file is provided, the resulting JSON file will be printed to stdout.

Example:

unidep src --output-format json # will print the JSON file to stdout

Example output:

TODO

How does unidep work?

Our notion of dependency

The notion of dependency might seem trivial, but it truly is not that obvious. Dependencies between functions and between files are discussed in this chapter.

unidep overview

Here is a diagram of the internal flow between components of unidep used to extract dependencies:

flowchart TB

    subgraph Loader
        in_loader(["Unresolved file paths"]) --> ts_loader["Tree sitter"]
        ts_loader --> ts_ast(["Tree sitter AST"])
    end

    subgraph Resolvers
        direction LR
        resolver_ast(["Tree sitter AST"]) --> resolver["Language-specific resolver"]
        resolver_unmod_in(["Unresolved functions"]) --> resolver
        resolver --> resolver_deps(["Resolved dependencies"])
        resolver --> resolver_undeps(["Unresolved functions"])
        resolver --> resolver_unmod(["Unresolved module names"])
    end

    subgraph Explorers
        direction LR
        in_explorer(["Unresolved module names"]) --> explorer["Language-specific explorer"]
        explorer --> out_explorer(["Unresolved file paths"])
    end

    start(["Path of codebase"]) --> Loader
    Loader --> Resolvers
    Resolvers --> Explorers
    Explorers --> Loader

    Resolvers --> deps(["Dependencies"])

Two components only are language-specific: the resolvers and the explorers. This modularity facilitates support of new languages by unidep.

Dependencies between functions

First, we need to understand how to resolve dependencies between functions. Hopefully, this is pretty straightforward: if a function g is called inside a function f, then f depends on g.

Unfortunately, this naive approach may lead to inconsistencies in practice.

A quick Python example

Let's consider the following Python program:

import numpy as np
from matplotlib.pyplot import plot, show

def f(x):
    return x + 42

def plot_graph(x_min, x_max, N = 1000):
    # linspace samples N points evenly-spaced between xmin and xmax
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    # create a plot representing f on the segment [xmin, xmax]
    plot(abs, ord)
    # display that plot
    show()

Intuitively, we'd like to generate the following dependency graph:

flowchart TD
    A["plot_graph"]:::resolved --> B["numpy.linspace"]:::unresolved
    A --> C["f"]:::resolved
    A --> D["matplotlib.pyplot.plot"]:::unresolved
    A --> E["matplotlib.pyplot.show"]:::unresolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

The functions in green are resolved, that is, we computed their dependencies. On the other hand, the functions in orange have not been resolved yet: we know they are used, but we didn't access their source code to compute their own dependencies. They are unresolved.

Worse, let's generalize a bit, and consider a case where the function f is only known at runtime:

import numpy as np
from matplotlib.pyplot import plot, show

def plot_graph(f, x_min, x_max, N = 1000):
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    plot(abs, ord)
    show()

We will still generate the same graph, except that the function f will now be unresolved. Which means, we will be searching for it in other files - we will see why in the next page.

Namespaces and classes

Note that it is important to keep the full namespace in memory (i.e. numpy.linalg instead of linalg), because two different functions may have the same name but live in different namespaces. It is also important to be aware of the aliases to resolve correctly the functions in their respective source files.

Let's give another Python example which presents the difficulty of resolving methods:

class A:
    def f(self): ...

class B(A):
    def f(self): ...

def g(x):
    x.f()

So, does g depend on A.f or B.f? Well, if you can answer this question, you should definitely let me know. For some languages, it might be possible to try to have some heuristics thanks to static typing, but it is still an unsolvable problem. For example, in Java, we can't resolve the methods at compilation time, we need to do some dynamic dispatching using virtual tables.

When encountering the kind of situation described above, unidep will pick one the candidate functions (here A.f and B.f) according to some heuristics.

So, are we doomed?

Kinda.

But let us not give in to despair. We won't be able to generate statically a perfect call graph of functions, sure. But we are allowed to hope that, in practice, it will give significantly good enough results to be used. Especially because unidep was initially built to visualize dependencies between files (in one huge project which doesn't use that kind of trick wdym unidep is an over engineered hobby project?).

Dependencies between files

Now that we can build dependencies between functions, we will use that and import/include/... directives to build dependencies between files.

Back to the Python example

Let's reuse the previous Python example, and externalizing the compute of f in another file:

# compute.py
def f(x):
    return x + 42

# main.py
import numpy as np
from matplotlib.pyplot import plot, show

from compute import f

def plot_graph(x_min, x_max, N = 1000):
    # linspace samples N points evenly-spaced between xmin and xmax
    abs = np.linspace(x_min, x_max, N)
    ord = [f(x) for x in abs]
    # create a plot representing f on the segment [xmin, xmax]
    plot(abs, ord)
    # display that plot
    show()

We'd like to generate a dependency graph like this:

flowchart TD
    A["main.py"]:::resolved --> B["numpy"]:::unresolved
    A --> C["matplotlib.pyplot"]:::unresolved
    A --> D["compute.py"]:::resolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Here, all dependencies between files are explicit - that is, there are some import statements that we can leverage to determine the dependencies. But it's not completely trivial: from import compute, we need to locate a file on disk. This Python file is probably named compute.py.

In Rust for example, use utils::{f, g}; means we should try to locate a file named utils.rs or utils/mod.rs. This process of locating source code files from module names is language-specific. Hence the need for language-specific explorers, which are responsible for solving this problem.

Another problem, easier but still language-specific is to identify these relevant use/import/... statements which express explicit dependencies. This is solved by language-specific resolvers, which allows us to know which specific instructions express this idea of explicit dependency.

Dependencies are not only explicit

Let's now take a look at a C program composed of 4 files:

my-codebase
├── a.c
├── a.h
├── b.c
└── main.c
// a.h
int f();
int g();

// a.c
#include "a.h"
int f() { ... }

// b.c
#include "a.h"
int g() { ... }

// main.c
#include "a.h"

int main() { ... }

We'd like to obtain something like this:

flowchart TD
    A["main.c"]:::resolved --> B["a.h"]:::resolved
    B --> C["a.c"]:::resolved
    C --> B
    B --> D["b.c"]:::resolved
    D --> B

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Three arrows are explicitly generated by the #include "a.h" directives. The other dependencies, going from a.h to a.c/b.c are implicit. Functions f and g are declared in a.h, but they are defined (implemented) in a.c and b.c, so we definitely want to say that a.h depends on both of these files.

If it doesn't seem necessary to you, let's consider the following main.c (this is almost real code from real projects):

int main()
    extern int f();
    extern int g();

    ...

In that case, we don't have a header, but we still want to generate something like this:

flowchart TD
    A["main.c"]:::resolved --> B["a.c"]:::resolved
    A --> C["b.c"]:::resolved

    classDef resolved fill:#d6e5bd, stroke:none;
    classDef unresolved fill:#f9e1a8, stroke:none;

Basically, unidep is playing the role of the linker for the languages that have a distinction between declaration and definition. We need to track the functions that we are still trying to resolve (above f and g), and add a file dependency if we find the definition of one of these functions.