License : Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0)
Copyright : Hervé Frezza-Buet, CentraleSupelec
Last modified : March 29, 2024 01:04
Link to the source : index.md

Table of contents

Introduction and objectives

This lab work addresses the issue of conducting an experiment and reporting it. The point is to use smart tools, so that the report, the figures, the experiment outputs, the parameters in the experiment and the parameters printed in the report are always kept coherent with a minimal effort from you.

Here, let us consider a fake project. You may see, afterwards, that it is indeed quite general and what you will have learned with this fake project can easily be reused in your personal real-life experiments.

After the lab work, you will see how command line tools, LaTeX and make can be involved for a safe, easy, and professional reporting of your experiments (i.e. scientific articles, industrial technical reports, etc.).

Below is an overview of the different processes involved in this lab work. You can have a look at it to get a global picture but feel free to come back during the lab on this picture. It clarifies how the pieces fit together.

Overview of the processes involved in this lab work
Overview of the processes involved in this lab work

The fake experiment setup

Usual experiments involve some physical or computational process outputting some kind of raw data. Then, analyzing the data is performed by the experimenter in order to present the experiment, to highlight what is valuable in the data. Last, the experimenter writes a report including the results of the analysis phase and a conclusion. Our fake experiment will have the same rationale.

First create a directory for the project, and “go” into it from a command line (with cd).

Now, put in this directory the experiment itself. In this fake project, this consists of adding the python script simulation.py

Read this file, and run it (quit the less command by hitting the key q):

mylogin@mymachine:~$ python3 simulation.py
mylogin@mymachine:~$ less points.data

Now we want to analyse these points. In our fake experiment, it consists in finding k representatives for them, thanks to an online k-mean algorithm. Download this kmeans.py.

Read this file, in order to understand what is computed and which results are produced. You can run it as well and see the generated plot:

mylogin@mymachine:~$ python3 kmeans.py points.data
mylogin@mymachine:~$ evince kmeans.pdf &

Now, let us consider a report in LaTeX, telling about our experiment. Download report.tex as well as biblio.bib, the former being your report, the latter containing the description of the bibliographic references used in your report. You will also need to download hand made vectorial drawings, done with inkscape: fig-learn.svg, fig-simu.svg

Read the biblio.bib file to have an idea of the way bibliography entries are listed in this file.

A BibTex file can be filled in manually but this is quite cumbersome. Several journals directly provide you with the BibTex entry of a reference they host. There also exists several tools to handle a bibliographic database such as Zotero, JabRef, KBibTeX.

Read the report.tex file to have an overview of what LaTeX files look like. There are comments to help you. The table contains silly values for now.

Use inkscape to open both .svg drawing, in order to see what they look like.

We are ready to compile our report, but as you may have noticed, the report.tex file refers to pdf figures. We have generated kmeans.pdf, but the two pdf versions of our svg files are missing. This will raise an error. Let us try it. Type ‘x’ when LaTeX complains about an error and prompts you.

mylogin@mymachine:~$ latexmk -pdf report.tex

We can fix the error by opening the svg files with inkscape, and select menus that enable to export the drawings to pdf… but this is not the most convenient way. Indeed, as usual, command lines are better. Try

mylogin@mymachine:~$ inkscape fig-learn.svg --export-area-drawing --export-pdf=fig-learn.pdf
mylogin@mymachine:~$ inkscape fig-simu.svg --export-area-drawing --export-pdf=fig-simu.pdf

Now, your LaTeX compilation should work.

mylogin@mymachine:~$ latexmk -pdf report.tex
mylogin@mymachine:~$ evince report.pdf &

Cleaning up the file content

The report.tex LaTeX source code needs improvements. Indeed, LaTeX allows to define macros. Here, in the document, we have numbers, which are indeed parameters. This would be nice to define the parameter values at first, and then use the macro in the document so that in case of a value change, you only have to change the macro definition at the beginning and not all the occurrences in the document.

In the report.tex, before \begin{document} (i.e. in the so-called “preamble” section of the LaTeX code), write the following macro definitions:

% Parameters for simulation
\def\SimulationBigRadius{0.99}
\def\SimulationSmallRadius{0.75}
\def\SimulationNbSamples{10000}

% Parameters for k-means
\def\KMeansK{5}
\def\KMeansNbIterations{20000}
\def\KMeansAlpha{0.01}

So now, in the report.tex as well, replace :

Of course, the best way is to define such macro at first, and then use them rather than magic numbers in your document.

Check that everything compiles well.

Let us make it even more modular. The macro definitions can be written in some external files. Edit a file named simulation-params.tex and write in it the parameter definitions related to our simulation, i.e. the following LaTeX code:

% Parameters for simulation
\def\SimulationBigRadius{0.99}
\def\SimulationSmallRadius{0.75}
\def\SimulationNbSamples{10000}

Proceed the same way with another kmeans-params.tex file that contains

% Parameters for k-means
\def\KMeansK{5}
\def\KMeansNbIterations{20000}
\def\KMeansAlpha{0.01}

Now, in report.tex, remove the macro definition and use the \input LaTeX command to include those two file as if they were copy-pasted.

% Parameter definitions
\input{kmeans-params.tex}
\input{simulation-params.tex}

Recompile your LaTeX document and check that everything is performing well.

As you may guess, the trick is that such *-params.tex files can be generated by running your simulation, your analysing process… This is what we will do later on.

Last, as we did for parameter definitions, we can define externally the content of the result table. Create a file kmeans-table.tex containing :

% data line #1, it is fake.
(x,y) & 3.1415 & 18 \\
\hline
      
% data line #2, it is fake.
(a,b) & \sqrt 2 & 20 \\
\hline

and modify report.tex by replacing the previous data lines with

\input{kmeans-table.tex}

Generation of tex files

Our document, i.e. report.tex, is now customizable by changing the content of the related tex files :

The idea now is to generate those files while the experiment and the analysis is running.

At the end of the script simulation.py, add the following line in order to dump the simulation parameters to the simulation-params.tex file.

1
2
3
4
5
6
# Saving parameters into a LaTeX file
params_file = open("simulation-params.tex", "w")
params_file.write('\def\SimulationBigRadius{{{}}}\n'.format(R))
params_file.write('\def\SimulationSmallRadius{{{}}}\n'.format(r))
params_file.write('\def\SimulationNbSamples{{{}}}\n'.format(N))
params_file.close()

In the same way, at the end of kmeans.py, add the following lines, in order to generate kmeans-params.tex.

1
2
3
4
5
6
# Saving parameters into a LaTeX file
params_file = open("kmeans-params.tex", "w")
params_file.write('\def\KMeansK{{{}}}\n'.format(K))
params_file.write('\def\KMeansNbIterations{{{}}}\n'.format(N))
params_file.write('\def\KMeansAlpha{{{}}}\n'.format(alpha))
params_file.close()

The last file that we have to generate is the kmeans-table.tex. In the kmeans.py script, replace the printing of the results by the following code.

1
2
3
4
5
# Let us print the table lines corresponding to the result.
table_file = open("kmeans-table.tex", "w")
for i, (w, m, n) in enumerate(zip(W, averages, nbs)) :
    table_file.write('({:.3f}, {:.3f}) & {:.3f} & {} \\\\\n\hline\n'.format(w[0], w[1], m, n))
table_file.close()

In order to restart from scratch and regenerate all the files, let us clean all the files which can be rebuilt.

mylogin@mymachine:~$ rm -f *.aux *.log *.toc *.bbl *.blg *.out *.fls *.fdb_latexmk
mylogin@mymachine:~$ rm -f *-params.tex
mylogin@mymachine:~$ rm -f *-table.tex
mylogin@mymachine:~$ rm -f *.pdf *.data *~

Now the building recipe of your experiment report consists of the following stages:

mylogin@mymachine:~$ inkscape fig-simu.svg --export-area-drawing --export-pdf=fig-simu.pdf
mylogin@mymachine:~$ inkscape fig-learn.svg --export-area-drawing --export-pdf=fig-learn.pdf
mylogin@mymachine:~$ python3 simulation.py
mylogin@mymachine:~$ python3 kmeans.py points.data
mylogin@mymachine:~$ latexmk -pdf report.tex > /dev/null 2>&1
mylogin@mymachine:~$ evince report.pdf &

All LaTeX notifications, warning and errors have been redirected to /dev/null so nothing will be displayed in case of trouble with LaTeX. Hiding LaTeX verbosity will be illustrative in what follows, but if your execution freezes, think that this may be due to LaTeX, which waits for you to hit the ‘x’ key since an error has occurred.

If I change the value of K in the kmeans.py script, do I need to rebuild all ? Take the time to think about which are the files that are no more consistent.

If I modify the position of one of the blue points in fig-simu.svg, using inkscape, I only need to rebuild the pdf file from it and then the report which includes that pdf. But having to remember what is dependent on what and having to track what has changed is cumbersome. Fortunately, there is a command line tool called make which can orchestrate exactly just what needs to be rebuilt given a file describing the dependencies and the building recipes. Let us see this in the next section.

Automatic (re)building with the command make

The makefile and the target dependencies

The command make executes a series of recipes that are described in a file named makefile, which stands in the directory where the command is launched. This file is a text file, describing what needs to be done.

Let us start with a simple makefile. With a text editor, open a file named makefile in the project directory, and write what follows in it. The indentation is super important and has to be a single tab.

a :
    echo "doing A"
    echo "Cool !"     
b :
    echo "doing B"

This file says that we have two targets, a and b. Making target a consists in running echo "doing A" and then echo "Cool !" (echo is the bash command that prints a message). Making b consists in running echo "doing B".

Try the following in your terminal:

mylogin@mymachine:~$ make a
mylogin@mymachine:~$ make b
mylogin@mymachine:~$ make

As you can see, if you do not specify which target has to be made, the first one in the makefile is invoked. You may also have noticed that the command line is printed. For example, when you type

mylogin@mymachine:~$ make b

you get two lines, one that is the command echo "doing B" and one that is the result of echo "doing B", i.e. doing B. In order to prevent make from printing the command that it executes, you can prefix the commands by @. Modify your makefile like this

a :
    @echo "doing A"
    @echo "Cool !"

b :
    @echo "doing B"

and test the two commands again:

mylogin@mymachine:~$ make a
mylogin@mymachine:~$ make b

Makefile lines can also contain variable definitions. Modify your makefile like this and test command a.

COOL_MESSAGE = "Cool !"

a :
    @echo "doing A"
    @echo $(COOL_MESSAGE)

b :
    @echo "doing B"

The target names can be existing files. In this case, the recipe associated to the target is executed only if the file needs to be updated. What does it mean ? How does make know about the need for updating a file ? This is due to target dependencies that we discuss hereafter.

After the symbol :, you can add target dependencies. The dependencies can be file names, but also other targets. Modify target b as

b : a
    @echo "doing B"

And type

mylogin@mymachine:~$ make b

As now you have declared that b requires that target a is completed, the commands associated to a are executed before the ones associated to b.

Clear your makefile and write the following commands in it:

b : a
    @echo "making b from a"
    @touch b  # We set b to the current date, as if it had just been updated from a.

c : b
    @echo "making c from b"
    @touch c  # We set c to the current date, as if it had just been updated from b.

And try

mylogin@mymachine:~$ make b

It fails, since a is not a defined target. Let us create empty files a, b and c.

mylogin@mymachine:~$ touch a
mylogin@mymachine:~$ touch b
mylogin@mymachine:~$ touch c

Try (note that a is not a target in the makefile, but it is an existing file)

mylogin@mymachine:~$ make a
mylogin@mymachine:~$ make z
mylogin@mymachine:~$ make b
mylogin@mymachine:~$ make c

Let us set the date of file b as the current date.

mylogin@mymachine:~$ touch b
mylogin@mymachine:~$ make a
mylogin@mymachine:~$ make b
mylogin@mymachine:~$ make c

only c is updated, since its dependency b had a date posterior to the target file. Try now once again

mylogin@mymachine:~$ make c

nothing is rebuilt. The best thing with make is that it checks the dates of the targets and their dependencies recursively. Try

mylogin@mymachine:~$ touch a
mylogin@mymachine:~$ make c
mylogin@mymachine:~$ make a
mylogin@mymachine:~$ make b
mylogin@mymachine:~$ make c

Let us clean all

mylogin@mymachine:~$ rm a b c makefile

Automation of our project

Download this makefile. Read it carefully, and ask for anything you do not understand, since supplementary makefile tricks have been added.

type

mylogin@mymachine:~$ make
mylogin@mymachine:~$ make clean-all
mylogin@mymachine:~$ make view

Then, keep the pdf viewer opened, and try to change some params in the python files, some elements in the svg files thanks to inkscape, some text in the report.tex and biblio.bib file. After each single change, type

mylogin@mymachine:~$ make document

and observe what is actually generated. Observe that the report.pdf file is kept coherent with your changes.

Conclusion

Do not hesitate to investigate further makefiles. Search the web and ask to those who know (mates, teachers, etc.).

When you are setting up you own experiment, think at the very beginning about organizing your files, the way some results can be \input-ed in your LaTeX documents.

Unix philosophy is that all tools are opened, and each tool does a very specific job but does it well. Makefiles are a very powerful way to invoke a collection of such elementary tools in a coherent, customized, and maintainable way.

Hervé Frezza-Buet,