We are thrilled to announce that LFortran can now successfully compile and run predsci/POT3D, marking a significant milestone in our journey to beta. POT3D is the ninth production-grade, third-party code that LFortran can compile, bringing us closer to our goal of compiling 10 such codes—a critical step toward a beta-quality compiler.

About POT3D

POT3D: High Performance Potential Field Solver is a package developed by Predictive Science Inc., written in Fortran that computes potential field solutions to approximate the solar coronal magnetic field using observed photospheric magnetic fields as a boundary condition. It can be used to generate potential field source surface (PFSS), potential field current sheet (PFCS), and open field (OF) models. It has been (and continues to be) used for numerous studies of coronal structure and dynamics. The code is highly parallelized using MPI and is GPU-accelerated using Fortran standard parallelism (do concurrent) and OpenMP Target for data movement and device selection, along with an option to use the NVIDIA cuSparse library. The HDF5 file format is used for input/output.

How to compile POT3D with LFortran

Follow the steps shown below to build, compile and run POT3D with a validation example.

conda create -n lf-mpi lfortran=0.49.0 make cmake openmpi=5.0.6
conda activate lf-mpi
git clone https://github.com/gxyd/pot3d.git
cd pot3d
git checkout -t origin/lf_hdf5_mpi_namelist_global_workarounds
git checkout eb05c69d30b7c88454d97d66ab6be46db10e7552

To build and run pot3d, please execute the following script, which compiles the binaries and runs the program for ranks 1 and 2:

FC=lfortran ./build_and_run.sh

It will take about 1.5 minutes to run on Apple M4.

To build and run with optimisations:

FC="lfortran --fast --skip-pass=dead_code_removal" ./build_and_run.sh

This script will initiate the process of building the binaries and subsequently use mpiexec to test the application, first running it sequentially and then running it with two parallel processes.

Compilation benchmarks

We have benchmarked speed of compilation on MacBook Air M2 8GB RAM, with LFortran version 0.49.0 and GFortran version 13.2.0:

Files	LFortran (s)	GFortran (s)	LFortran / GFortran
`mpi_c_bindings.f90`	0.016	0.033	0.484
`mpi.f90`	0.015	0.058	0.258
`psi_io.f90`	0.018	0.100	0.18
`pot3d.F90`	0.862	0.697	1.23
`mpi_wrapper.o mpi_c_bindings.o mpi.o psi_io.o pot3d.o -o pot3d -L$CONDA_PREFIX/lib -lmpi -Wl,-rpath,$CONDA_PREFIX/lib`	0.052	0.025	2.08
Total	0.963	0.913	1.054

The compilation to object code via LLVM happens in the step pot3d.F90, and the majority of time is spent in LLVM compiling the IR to a binary. After we reach beta we plan to get our direct ASR->WASM backend working well, for very fast Debug compilation, it seems an order of magnitude speedup is possible from our preliminary benchmarks.

Evaluating the performance of compiled binaries with various optimization flags. The following compiler options are utilized for benchmarking purposes:

LFortran: --fast --skip-pass=dead_code_removal
GFortran: -O3 -march=native

MPI rank	LFortran (–fast) (s)	GFortran (-O3 -march=native)	LFortran (–fast) / GFortran (-O3 -march=native)
1	19.565044	12.087742	1.61
2	11.213420	8.373604	1.34

We are within a factor of 2x from GFortran, which is good enough for now. After we reach beta, we will focus more on performance: our goal is to be at least as fast as other compilers. Credit here goes to LLVM, which is very slow to compile in Debug mode, but is able to optimize code really well in Release mode, even the LLVM IR that we currently generate. We do not generate any obviously slow code, but in order to match the performance of production compilers, after we reach beta we will take individual benchmarks and ensure our generated LLVM IR matches, e.g., what Clang would generate for equivalent code.

Development Overview

LFortran does not yet work with HDF5 and OpenMPI, so we have created our own Fortran wrappers for MPI and we removed HDF5 and instead load the data using a custom binary format, so that we can compile POT3D with LFortran today. We will tackle direct compiling of HDF5 and OpenMPI later.

The follow three workarounds are currently needed in order to compile POT3D:

HDF5 read support replaced with binary file read: issue#6561
Namelist reading isn’t supported yet with LFortran: issue#1999
Wrap a global subroutine into a module: issue#4175

We will implement namelist’s and the global subroutine issue later. The workaround however could be argued is an improvement to the original code, since a compiler cannot in general check argument types when calling global subroutines, while it checks everything if they are inside a module.

In addition, one has to use our own mpi.f90 implementation (described below in more details), but no changes are needed to the POT3D Fortran code itself.

MPI wrappers

1. C Wrapper Implementation

The C wrapper (mpi_wrapper.c) serves as the primary interface to the native MPI implementation. Its key responsibilities include:

Handle Management: Converting between Fortran and C representations of MPI communicators, data types, and operation handles
Buffer Management: Managing the differences in memory layout and alignment requirements
Error Code Translation: Mapping C-style MPI error codes to their Fortran equivalents

The wrapper is compiled separately using the platform-specific C compiler:

${CC} -I$CONDA_PREFIX/include -c mpi_wrapper.c

We will try to later move all this implementation into pure Fortran, so that no C wrappers are needed.

2. Fortran-C Interoperability Layer

The binding layer (mpi_c_bindings.f90) utilizes Fortran 2003’s ISO_C_BINDING module to establish rigorous type correspondence between languages. This includes:

Procedure Interface Declarations: Defining explicit interfaces with the BIND(C) attribute
Memory Management Interoperability: Handling differences in array descriptors and pointer semantics

This component is compiled with:

${FC} -c mpi_c_bindings.f90

3. Fortran API Layer

The top-level Fortran MPI module (mpi.f90) presents a standard Fortran MPI interface to the user code, including:

MPI Constants: Defined using PARAMETER attributes
Procedure Interfaces: Following standard MPI procedure signatures
Optional Argument Handling: Managing Fortran’s optional argument semantics

The compilation command is:

${FC} -c mpi.f90

MPI Subroutines Implemented in LFortran

Currently, we only have those procedures in Fortran API layer which are actually used in POT3D repository, that includes:

Category	Subroutines
Environment Management	`MPI_Init`, `MPI_Init_thread`, `MPI_Finalize`, `MPI_Wtime`, `MPI_Barrier`
Communicator Management	`MPI_Comm_size`, `MPI_Comm_rank`, `MPI_Comm_split_type`
Point-to-Point Communication	`MPI_Isend`, `MPI_IRecv`, `MPI_Recv`, `MPI_Ssend`, `MPI_Waitall`
Collective Communication	`MPI_Bcast`, `MPI_Allgather`, `MPI_Allreduce`
Cartesian Topology	`MPI_Cart_create`, `MPI_Cart_sub`, `MPI_Cart_shift`, `MPI_Dims_create`, `MPI_Cart_coords`

Bug in `cshift` intrinsic

While running POT3D, we encountered a sporadic segmentation fault—occurring roughly once in 20 runs, accompanied by an obscure error message. The issue was reproducible only on macOS and appeared to be linked to our MPI wrappers, making debugging particularly challenging.

We eventually managed to construct a minimal reproducible example (MRE), though it still relied on our MPI wrappers. To further diagnose the issue, we analyzed the LFortran’s ASR dump from our Fortran backend and compiled them using GFortran. Running gfortran -fcheck=all -g -c pot3d.f90 helped pinpoint the root cause: a bug in the cshift intrinsic. Ultimately, the fix required just a one-line correction in our cshift implementation. More details on bug-fix can be found at PR#6372.

ArrayBounds tampered with `--fast`

After successfully compiling and running POT3D without optimizations, we decided to evaluate its performance using LFortran’s optimization flag --fast. However, upon testing, the resulting binary encountered a segmentation fault during execution. Identifying the root cause required considerable engineering effort to develop an MRE. Eventually, we managed to create a concise example, which was documented in issue#6611. This issue was subsequently resolved through PR#6618.

What’s Next?

As of this writing, LFortran compiles nine third-party codes:

Legacy Minpack (part of SciPy) and several more SciPy packages
Modern Minpack
fastGPT
dftatom
SciPy (60%)
stdlib (85%)
SNAP
PRIMA
POT3D

Here is our issue to track priorities to reach beta quality.

The primary goal is to advance LFortran from alpha to beta by successfully compiling ten third-party codes, such as POT3D, Fortran Package Manager (fpm), LAPACK, and components of Fortran stdlib and SciPy, with progress measured by the ability to compile and run these codes without modifications. Feature development is being prioritized based on the language constructs needed for these codes, and milestones will be announced as full compatibility is achieved. Once this objective is met, community input will help determine the next steps toward beta status, defined as a compiler that is expected to work on user code (but it will still have minor bugs).

Join Us

We welcome new contributors to join our journey. If you’re interested, please reach out on Zulip. Working on LFortran is both challenging and rewarding, offering ample opportunities for learning and growth.

Acknowledgements

We want to thank:

Sovereign Tech Fund (STF)
NumFOCUS
QuantStack
Google Summer of Code
John D. Cook
LANL
GSI Technology
Our GitHub, OpenCollective and NumFOCUS sponsors
All our contributors (109 so far!)

Discussions

Fortran Discourse: https://fortran-lang.discourse.group/t/lfortran-compiles-pot3d/9391
Twitter/X: https://x.com/lfortranorg/status/1902010519625638118