LFortran Minimum Viable Product (MVP)

After a little over 2 years from our initial announcement, we are releasing LFortran MVP today.

We have set our goal to release MVP in September 2021. We were initially hoping to compile at least some production codes for MVP. While we did not quite get there yet, after wrapping up and documenting where we are and writing a demo Fortran project that works with the MVP, it turns out LFortran is already quite usable and it can compile quite a lot of computational Fortran code if you are willing to work around some of the current limitations. In this blog post we will describe in detail the current status of LFortran.

We are looking for new contributors to help us get over the finish line sooner. If you are interested, please get in touch. We will be happy to teach you anything you need to know. It is not difficult to contribute.

Current Status

We have created a status page to document the exact status of each Fortran feature:

This page is automatically generated by running test programs. Feel free to submit more test cases to us to make the Fortran test suite more complete.

For each feature we list if it can be parsed to AST, semantics (AST to ASR), generate code (LLVM) and run the generated binary.

To give a quick overview:

  • LFortran has a complete Fortran 2018 parser that can parse any production code that we tried to AST. If you discover something that cannot be parsed, please report a bug and we will fix it.

  • It can format AST back as Fortran code (lfortran fmt). Including comments, although there are still a few issues to polish in the source formatter.

  • A large subset of Fortran can be transformed from AST to ASR, almost every feature has at least a prototype working (including features like classes, do concurrent, etc.). But to make the feature actually useful, one must implement enough use cases, and we have opted to focus first on a subset of Fortran features roughly corresponding to Fortran 95.

  • An even smaller subset can be compiled from ASR to LLVM.

  • Interfacing with C using iso_c_binding works well.

  • LFortran has a runtime library written in Fortran itself, it has three sections:

    • builtin: functions that must be implemented by the LLVM backend such as size, len, lbound etc.
    • pure: functions that are implemented in pure Fortran and only depend on pure or builtin.
    • impure: functions still implemented in Fortran, but that use iso_c_binding to call implementations in C. The LFortran runtime library is a standalone Fortran project that you can compile with any Fortran (and C) compiler. We would love if you contributed to it. All you need to know is Fortran (and optionally C).
  • So far we have mainly concentrated on compiling valid Fortran code. If you try invalid code, errors are not always reported yet and the error messages are not always the best (basic error messages such as a missing variable declaration work). Once we can compile production codes, we will work on error messages. Our goal is Rust style error messages, where it is a compiler bug if the error message is not clear.

MVP Demo Fortran Project

To show what LFortran can already compile, we created this demo:

You can clone the repository and compile it using fpm (Fortran Package Manager):

git clone https://gitlab.com/lfortran/examples/mvp_demo.git
cd mvp_demo
conda create -n mvp_demo -c conda-forge fpm=0.4.0 lfortran=0.14.0
conda activate mvp_demo
fpm run --all --compiler=lfortran
fpm test --compiler=lfortran

This compiles and runs all example programs and tests. The demo uses the following Fortran features:

  • Program, modules, subroutines, functions
  • Arrays, passing to functions as arguments, allocatable arrays
  • Interfacing C using iso_c_binding, file IO implemented in C and wrapped
  • Saving a mesh and values into a file
  • Types: integer, real (double precision)
  • If, select case, do loop
  • Intrinsic functions: sin, abs, kind, trim, size

LFortran supports a lot more, see the previous section.

Notebooks

LFortran is a compiler, but from the start it has been designed to also work interactively, on a command line or a Jupyter notebook, by extending the parser (AST) and semantic part (ASR) to be able to represent a small extension of Fortran to also allow declarations, statements and expressions at the global scope (besides the standard Fortran programs, modules, functions and subroutines).

We took the tutorial from https://fortran-lang.org/learn/quickstart and ported it to Jupyter notebooks powered by LFortran:

We noticed that the tutorial was already often written in an “executable” form, just without means to actually execute it.

The features available in interactive form are sometimes a little more limited than what is possible to compile into a binary.

Speed of compilation

LFortran is very fast. We have used our experience with optimizing SymEngine and ensured that internal data structures in LFortran are performing. To judge the speed of compilation, one can use the x86 backend. This backend is only in a prototype stage, but it shows what is possible. We can take the following benchmark bench3.f90:

$ time gfortran bench3.f90
0.74s user 0.06s system 94% cpu 0.842 total
$ time lfortran bench3.f90
0.72s user 0.03s system 97% cpu 0.776 total
$ time lfortran --backend=x86 bench3.f90
0.04s user 0.01s system 87% cpu 0.058 total
$ time lfortran --backend=cpp bench3.f90
0.80s user 0.08s system 93% cpu 0.948 total

In all cases the binary produces:

$ ./a.out
10045

As you can see on this particular file, we are about 14x faster than GFortran with the x86 backend, and comparable with the default LLVM backend. Even the C++ translation backend, which first produces C++ code and then calls clang++ is roughly comparable to GFortran.

We expect the speedup to be lower for more realistic files and projects, but this is promising. Our goal is very fast compilation in Debug mode. For that will need to finish the x86 backend and also add an ARM backend. We will work on this after we are done with our LLVM backend, which is currently our main focus.

On a more realistic code sin_benchmark.f90 (see the next section for details):

$ time gfortran sin_benchmark.f90
0.08s user 0.04s system 74% cpu 0.153 total
$ time lfortran sin_benchmark.f90
0.06s user 0.03s system 82% cpu 0.110 total

Only the LLVM backend can currently compile this code. And the overall speed is currently slightly faster than GFortran.

The LLVM backend is unfortunately quite slow. But the great advantage of LLVM is that it can optimize code very well, as you can see in the next section. For this reason the LLVM backend will be a great backend for Release mode as well as for interactive use and for supporting many platforms.

Performance of the generated code

People often ask us about the performance of the generated code. We can now answer that. On the 2019 MacBook Pro (Intel based):

$ lfortran --version
LFortran version: 0.14.0
Platform: macOS
Default target: x86_64-apple-darwin20.3.0
$ lfortran --fast sin_benchmark.f90
$ time ./a.out
./a.out  1.02s user 0.00s system 99% cpu 1.032 total

$ gfortran --version
GNU Fortran (GCC) 9.3.0
$ gfortran -Ofast sin_benchmark.f90
$ time ./a.out
./a.out  1.03s user 0.00s system 99% cpu 1.035 total

You can try it yourself and see on your computer.

This is just a very simple code that implements a sin(x) function and then benchmarks it. We have not implemented any optimizations in LFortran itself yet, all the optimizations come from LLVM. We recommend not to read into these numbers too much until we put serious effort in LFortran’s optimizations, but the fact that we are already competitive with GFortran is a sign that we are on the right track.

Note: for some reason GFortran does not use the fused multiply-add (fma) instructions. One can force it as follows (-march=native does not work, but -march=skylake does):

$ gfortran -O3 -march=skylake -ffast-math -funroll-loops sin_benchmark.f90
$ time ./a.out
./a.out  0.73s user 0.00s system 99% cpu 0.739 total

Neither does LFortran yet, if somebody knows how to make LLVM combine add/mul into fma using the C++ api, please let us know. One way to force LLVM to create fma is by optimizing the LLVM code that LFortran emits using Clang as follows:

$ lfortran --show-llvm sin_benchmark.f90 > sin_benchmark.ll
$ clang -O3 -march=native -ffast-math -funroll-loops -c sin_benchmark.ll -o sin_benchmark.o
$ lfortran -o a.out sin_benchmark.o
$ time ./a.out
./a.out  0.47s user 0.00s system 99% cpu 0.472 total

Again, we urge not to draw too many conclusions from these numbers. But they are encouraging.

Our plan is to implement optimizations at the ASR level as ASR->ASR transformations. That will allow users to see the optimized version of their codes as Fortran code (by transforming ASR->AST and AST->Fortran source) and will be able to verify that the compiler has optimized everything that can be done at the Fortran level. And if not, they can report a bug, or rework their code, or help the compiler with some command line options or possibly pragmas. Only then the code should be lowered to LLVM and LLVM should apply its optimizations.

Performance of intrinsic functions like sin(x)

Our goal for intrinsic functions (such as sin(x)) is to ship with (at least) two versions:

  • Accuracy first, performance second
  • Performance first, accuracy second

The accuracy first versions are currently called from libm, those versions are typically very accurate (usually either correctly rounded to all digits, or one floating point number next to the correctly rounded one). Those will be used by default in the Debug mode.

In Release mode, on the other hand, you will be able to switch to a very fast implementation of these functions, implemented in the pure section of the runtime library. Those functions are still accurate (to about 1e-15 relative accuracy), but not necessarily to the last bit. Also their argument range is sometimes reduced, for example the sin(x) function might only work for |x| < 1e10 or so. We are still working out the exact details, but our preliminary work shows that it is possible to implement these in Fortran and get close to optimal (vectorized) performance.

The basic idea is that in Debug mode LFortran could (in the future) check the argument to sin(x) and if it is out of the range of the performance first version, it would emit a warning. You then compile and run your production code in Debug mode and run it on production data. If you don’t see any warnings or errors, you know that if you switch to the performance first versions of intrinsics, they will work. The accuracy itself should be sufficient in most cases, but it will be slightly less than the accuracy first versions. We would like to figure out a way to also have the compiler assist in checking the accuracy in Debug mode. If you have ideas how to do that, please let us know.

Google Summer of Code (GSoC)

We had three GSoC students this year, here are their updates and final reports:

They have all done excellent progress and finished their projects. Their contributions were very beneficial to LFortran.

Conclusions

Please test the MVP and let us know what you think. If you are interested in contributing, we will get you up to speed.

Thank you everybody who has supported us over the years and helped us get to this point.