We are thrilled to announce that LFortran supports all intrinsic functions listed in F2018 Interpretation Document (Table 16.1 Pg: 332) (also shown by this nice list online at GFortran’s documentation). This comprehensive list of almost 200 intrinsics encompasses a broad array of functions and subroutines essential to Fortran’s functionality, including elemental, inquiry, transformational, array operations, and more. For each function we support both compile-time and runtime evaluation.
The functions are implemented in an ASR (Abstract Semantic Representation) pass, which means they get “instantiated” and specialized for the argument types and then the compiler can optimize it, so using any intrinsic function is at least as fast as implementing the same operation yourself, there is no runtime overhead.
In this blog post we show the details how things are implemented under the hood, and if you find it interesting, go ahead and try it yourself in LFortran and let us know any feedback.
LFortran is still alpha software, meaning that users must continue expecting that LFortran will fail compiling or running their codes. Please report all bugs that you find.
LFortran’s implementation of all intrinsic functions has been instrumental in reaching its current milestone, where it can successfully compile 7 out of 10 essential third-party projects required for its beta phase. These projects, including dftatom, SNAP, SciPy, stdlib, and fastGPT, heavily rely on Fortran’s built-in operations to perform a range of computational tasks, from mathematical computations to array handling and logical operations.
Design of intrinsic functions
In earlier versions, LFortran implemented intrinsic functions via intrinsic modules written in Fortran in the runtime library. The implementation was thus available to the compiler at the ASR level, guaranteeing no runtime overhead, and the implementation was in Fortran, thus any Fortran programmer could participate on the implementation. However it had three major issues:
- We had to implement a specific version of each function for all argument
types and ranks, so for example the function
sum
must work forinteger
,real
,complex
types each with at least two kinds (at least 6 total), and ranks 1-15, so around 100 functions total, with special handling ofdim
for each. - Impossible to specialize the implementation based on the arguments (e.g.,
sum
with thedim
argument requires a special loop structure for eachdim
which should be specialized for highest performance). - The compiler just saw an ASR function implementation (such as
sin
), and we marked the module as “intrinsic”, but it was hard to create ASR passes to add special optimizations for particular intrinsics, since we didn’t have dedicated nodes for them.
To fix these problems, we now represent intrinsics directly as nodes in ASR. This allows all of the compiler to have knowledge about them and can optimize them, create better error messages, backends can have special handling for them, etc. Then we have an ASR pass that instantiates these nodes for the particular arguments in ASR itself, providing the highest performance. This design allows us to later for example not instantiate some intrinsics, but let the backend generate a special hardware instruction (say for inverse square root).
For instance, consider the max function and the following program.
program main
integer :: a, b
read *, a, b
print *, max(a, b)
end program
The AST (Abstract Syntax Tree) just represents max
as a FuncCallOrArray
AST
node (it does not even know if it is an array or a function call due to
Fortran’s syntax ambiguity):
(TranslationUnit
[(Program
main
(TriviaNode
[]
[(EndOfLine)
(EndOfLine)]
)
[]
[]
[(Declaration
(AttrType
TypeInteger
[]
()
()
None
)
[]
[(a
[]
[]
()
()
None
())
(b
[]
[]
()
()
None
())]
()
)]
[(Read
0
()
[]
[]
[a
b]
()
)
(Print
0
()
[(FuncCallOrArray
max
[]
[(()
a
()
0)
(()
b
()
0)]
[]
[]
[]
)]
()
)]
[]
)]
)
In ASR (Abstract Semantic Representation) on the other hand max
is
represented as IntrinsicElementalFunction
and the compiler knows everything
about it:
(TranslationUnit
(SymbolTable
1
{
main:
(Program
(SymbolTable
2
{
a:
(Variable
2
a
[]
Local
()
()
Default
(Integer 4)
()
Source
Public
Required
.false.
),
b:
(Variable
2
b
[]
Local
()
()
Default
(Integer 4)
()
Source
Public
Required
.false.
)
})
main
[]
[(FileRead
0
()
()
()
()
()
()
[(Var 2 a)
(Var 2 b)]
()
)
(Print
(StringFormat
()
[(IntrinsicElementalFunction
Max
[(Var 2 a)
(Var 2 b)]
0
(Integer 4)
()
)]
FormatFortran
(Character -1 0 () PointerString)
()
)
)]
)
})
[]
)
Initially represented as a FuncCallOrArray
node in the frontend, it is then identified as an elemental intrinsic function and converted to a dedicated IntrinsicElementalFunction
node. As the code progresses through the intrinsic functions ASR pass (implemented in intrinsic_functions.cpp
), this node is replaced by an ASR function (_lcompilers_max0_i32
) that is constructed and cached on the fly, based on the function’s arguments as shown in ASR below:
(Function
(SymbolTable
3
{
_lcompilers_max0_i32:
(Variable
3
_lcompilers_max0_i32
[]
ReturnVar
()
()
Default
(Integer 4)
()
Source
Public
Required
.false.
),
x0:
(Variable
3
x0
[]
In
()
()
Default
(Integer 4)
()
Source
Public
Required
.false.
),
x1:
(Variable
3
x1
[]
In
()
()
Default
(Integer 4)
()
Source
Public
Required
.false.
)
})
_lcompilers_max0_i32
(FunctionType
[(Integer 4)
(Integer 4)]
(Integer 4)
Source
Implementation
()
.false.
.false.
.false.
.false.
.false.
[]
.false.
)
[]
[(Var 3 x0)
(Var 3 x1)]
[(Assignment
(Var 3 _lcompilers_max0_i32)
(Var 3 x0)
()
)
(If
(IntegerCompare
(Var 3 x1)
Gt
(Var 3 _lcompilers_max0_i32)
(Logical 4)
()
)
[(Assignment
(Var 3 _lcompilers_max0_i32)
(Var 3 x1)
()
)]
[]
)]
(Var 3 _lcompilers_max0_i32)
Public
.false.
.false.
()
),
And the IntrinsicElementalFunction
node is replaced by corresponding function call to the newly created ASR function:
(Print
(StringFormat
()
[(FunctionCall
1 _lcompilers_max0_i32
1 _lcompilers_max0_i32
[((Var 2 a))
((Var 2 b))]
(Integer 4)
()
()
)]
FormatFortran
(Character -1 0 () PointerString)
()
)
)
To understand it better, we can print the ASR as Fortran code using lfortran a.f90 --dump-all-passes-fortran
and we get:
! Fortran code after applying the pass: intrinsic_function
program main
implicit none
integer(4) :: a
integer(4) :: b
read(*, *) a, b
print *, _lcompilers_max0_i32(a, b)
contains
integer(4) function _lcompilers_max0_i32(x0, x1)
integer(4), intent(in) :: x0
integer(4), intent(in) :: x1
_lcompilers_max0_i32 = x0
if (x1 > _lcompilers_max0_i32) then
_lcompilers_max0_i32 = x1
end if
end function _lcompilers_max0_i32
end program main
The ASR->LLVM backend then takes this code and generates LLVM. By default (Debug mode) it is not optimized:
; ModuleID = 'LFortran'
source_filename = "LFortran"
@0 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
@1 = private unnamed_addr constant [5 x i8] c"%s%s\00", align 1
define i32 @_lcompilers_max0_i32(i32* %x0, i32* %x1) {
.entry:
%_lcompilers_max0_i32 = alloca i32, align 4
%0 = load i32, i32* %x0, align 4
store i32 %0, i32* %_lcompilers_max0_i32, align 4
%1 = load i32, i32* %x1, align 4
%2 = load i32, i32* %_lcompilers_max0_i32, align 4
%3 = icmp sgt i32 %1, %2
br i1 %3, label %then, label %else
then: ; preds = %.entry
%4 = load i32, i32* %x1, align 4
store i32 %4, i32* %_lcompilers_max0_i32, align 4
br label %ifcont
else: ; preds = %.entry
br label %ifcont
ifcont: ; preds = %else, %then
br label %return
return: ; preds = %ifcont
%5 = load i32, i32* %_lcompilers_max0_i32, align 4
ret i32 %5
}
define i32 @main(i32 %0, i8** %1) {
.entry:
%a = alloca i32, align 4
%b = alloca i32, align 4
call void @_lpython_call_initial_functions(i32 %0, i8** %1)
%a1 = alloca i32, align 4
%b2 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
call void @_lfortran_read_int32(i32* %a1, i32 -1)
call void @_lfortran_read_int32(i32* %b2, i32 -1)
call void @_lfortran_empty_read(i32 -1, i32* %2)
%4 = call i32 @_lcompilers_max0_i32(i32* %a1, i32* %b2)
%5 = sext i32 %4 to i64
%6 = call i8* (i32, i8*, ...) @_lcompilers_string_format_fortran(i32 2, i8* null, i32 2, i64 %5)
call void (i8*, ...) @_lfortran_printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @1, i32 0, i32 0), i8* %6, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @0, i32 0, i32 0))
br label %return
return: ; preds = %.entry
ret i32 0
}
declare void @_lpython_call_initial_functions(i32, i8**)
declare void @_lfortran_read_int32(i32*, i32)
declare void @_lfortran_empty_read(i32, i32*)
declare i8* @_lcompilers_string_format_fortran(i32, i8*, ...)
It will call the “instantiated” _lcompilers_max0_i32
function. It’s not
slower than calling a pre-compiled max
function from a runtime library (as is
often done in other compilers), but it is not the fastest possible
implementation. To obtain maximum
performance, we have to use Release mode by using the --fast
flag in
LFortran. Currently we do not do many ASR->ASR optimizations (we will add many
such optimizations later after reaching beta quality), but it enables LLVM
optimizations and LLVM can optimize the above code to just:
; ModuleID = 'LFortran'
source_filename = "LFortran"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-darwin23.3.0"
@0 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
@1 = private unnamed_addr constant [5 x i8] c"%s%s\00", align 1
; Function Attrs: norecurse nounwind readonly
define i32 @_lcompilers_max0_i32(i32* nocapture readonly %x0, i32* nocapture readonly %x1) local_unnamed_addr #0 {
.entry:
%0 = load i32, i32* %x0, align 4
%1 = load i32, i32* %x1, align 4
%2 = icmp sgt i32 %1, %0
%spec.select = select i1 %2, i32 %1, i32 %0
ret i32 %spec.select
}
define i32 @main(i32 %0, i8** %1) local_unnamed_addr {
.entry:
tail call void @_lpython_call_initial_functions(i32 %0, i8** %1)
%a1 = alloca i32, align 4
%b2 = alloca i32, align 4
%2 = alloca i32, align 4
call void @_lfortran_read_int32(i32* nonnull %a1, i32 -1)
call void @_lfortran_read_int32(i32* nonnull %b2, i32 -1)
call void @_lfortran_empty_read(i32 -1, i32* nonnull %2)
%3 = load i32, i32* %a1, align 4
%4 = load i32, i32* %b2, align 4
%5 = icmp sgt i32 %4, %3
%spec.select.i = select i1 %5, i32 %4, i32 %3
%6 = sext i32 %spec.select.i to i64
%7 = call i8* (i32, i8*, ...) @_lcompilers_string_format_fortran(i32 2, i8* null, i32 2, i64 %6)
call void (i8*, ...) @_lfortran_printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @1, i64 0, i64 0), i8* %7, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @0, i64 0, i64 0))
ret i32 0
}
declare void @_lpython_call_initial_functions(i32, i8**) local_unnamed_addr
declare void @_lfortran_read_int32(i32*, i32) local_unnamed_addr
declare void @_lfortran_empty_read(i32, i32*) local_unnamed_addr
declare i8* @_lcompilers_string_format_fortran(i32, i8*, ...) local_unnamed_addr
declare void @_lfortran_printf(i8*, ...) local_unnamed_addr
As you can see, the function got inlined and simplified. The resulting code is as fast as you would get by manually implementing the logic. Conclusion: there is no runtime overhead by using intrinsic functions.
Let’s take the example of another function, sum
. Sum is an intrinsic array function with more number of arguments, out of which some are optional. Most of the intrinsic array functions and subroutines include optional arguments. They are handled with the help of overload_id
in ASR. Let’s look at the dim
parameter in the sum
intrinsic. In the sum
function, the FuncCallOrArray
node represented in the frontend is then recognised as an array intrinsic and is replaced with the IntrinsicArrayFunction
node.
program test_sum_dim
integer :: x(3, 3) = reshape([1, 2, 3, 4, 5, 6, 7, 8, 9], [3, 3])
integer :: dim_ = 1
print *, sum(x, dim=dim_)
end program
AST:
(TranslationUnit
[(Program
test_sum_dim
()
[]
[]
[(Declaration
(AttrType
TypeInteger
[]
()
()
None
)
[]
[(x
[(1
3
DimensionExpr)
(1
3
DimensionExpr)]
[]
()
(FuncCallOrArray
RESHAPE
[]
[(()
(ArrayInitializer
()
()
[1
2
3
4
5
6
7
8
9]
)
()
0)
(()
(ArrayInitializer
()
()
[3
3]
)
()
0)]
[]
[]
[]
)
Equal
())]
()
)
(Declaration
(AttrType
TypeInteger
[]
()
()
None
)
[]
[(dim_
[]
[]
()
1
Equal
())]
()
)]
[(Print
0
()
[(FuncCallOrArray
SUM
[]
[(()
x
()
0)]
[(DIM
dim_)]
[]
[]
)]
()
)]
[]
)]
)
In ASR, the sum
function is represented as an IntrinsicArrayFunction
node.
This node captures the properties of a sum. The input array x
. The dimension
(dim=1
) along which the summation should be performed. The ASR node includes
necessary information, like array dimensions and type, and the data is prepared
to ensure that the results fit within an output array of rank n-1
(where n
is
the rank of x).
(TranslationUnit
(SymbolTable
1
{
test_sum_dim:
(Program
(SymbolTable
2
{
dim_:
(Variable
2
dim_
[]
Local
(IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 1 (Integer 4) Decimal)
Save
(Integer 4)
()
Source
Public
Required
.false.
),
x:
(Variable
2
x
[]
Local
(ArrayReshape
(ArrayConstant
36
[1, 2, 3, ...., 7, 8, 9]
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 9 (Integer 4) Decimal))]
FixedSizeArray
)
ColMajor
)
(ArrayPhysicalCast
(ArrayConstant
8
[3, 3]
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 2 (Integer 4) Decimal))]
FixedSizeArray
)
ColMajor
)
FixedSizeArray
DescriptorArray
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 2 (Integer 4) Decimal))]
DescriptorArray
)
()
)
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))
((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))]
FixedSizeArray
)
()
)
()
Save
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))
((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))]
FixedSizeArray
)
()
Source
Public
Required
.false.
)
})
test_sum_dim
[]
[(Print
(StringFormat
()
[(IntrinsicArrayFunction
Sum
[(ArrayPhysicalCast
(Var 2 x)
FixedSizeArray
DescriptorArray
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))
((IntegerConstant 1 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal))]
DescriptorArray
)
()
)
(Var 2 dim_)]
1
(Array
(Integer 4)
[((IntegerConstant 1 (Integer 4) Decimal)
(IntrinsicElementalFunction
Merge
[(IntegerConstant 3 (Integer 4) Decimal)
(IntegerConstant 3 (Integer 4) Decimal)
(IntegerCompare
(IntegerConstant 1 (Integer 4) Decimal)
Lt
(Var 2 dim_)
(Logical 4)
()
)]
0
(Integer 4)
()
))]
DescriptorArray
)
()
)]
FormatFortran
(Character -1 0 () PointerString)
()
)
)]
)
})
[]
)
Using lfortran a.f90 --dump-all-passes-fortran
, the generated code reveals
the creation of a subroutine (Sum_4_2_1
) that calculates the row or column (or
any other dimension) sum based on the dimension specified. After applying the
intrinsic function pass, the generated code looks like this:
! Fortran code after applying the pass: intrinsic_function
program test_sum_dim
implicit none
integer(4), dimension(:), allocatable :: __libasr__created__var__0_Sum_4_2_1_res
integer(4), save :: dim_ = 1
integer(4), dimension(3, 3), save :: x = reshape([1, 2, 3, 4, 5, 6, 7, 8, 9], [3, 3])
allocate(__libasr__created__var__0_Sum_4_2_1_res(_lcompilers_merge_i32(3, 3, 1 < dim_)))
call Sum_4_2_1(x, dim_, __libasr__created__var__0_Sum_4_2_1_res)
print *, __libasr__created__var__0_Sum_4_2_1_res
contains
subroutine Sum_4_2_1(array, dim, result)
integer(4) :: __1_j
integer(4) :: __2_j
integer(4), dimension(:, :), intent(in) :: array
integer(4), intent(in) :: dim
integer(4), dimension(:), intent(out) :: result
result = 0
if (dim == 2) then
do __1_j = lbound(array, 1), ubound(array, 1)
do __2_j = lbound(array, 2), ubound(array, 2)
result(__1_j) = result(__1_j) + array(__1_j, __2_j)
end do
end do
else
if (dim == 1) then
do __2_j = lbound(array, 2), ubound(array, 2)
do __1_j = lbound(array, 1), ubound(array, 1)
result(__2_j) = result(__2_j) + array(__1_j, __2_j)
end do
end do
end if
end if
end subroutine Sum_4_2_1
integer(4) function _lcompilers_merge_i32(tsource, fsource, mask) result(merge)
integer(4), intent(in) :: fsource
logical(4), intent(in) :: mask
integer(4), intent(in) :: tsource
if (mask) then
merge = tsource
else
merge = fsource
end if
end function _lcompilers_merge_i32
end program test_sum_dim
This implementation of sum
demonstrates how LFortran handles multi-dimensional
intrinsic functions efficiently. The sum
function is transformed into an ASR
node, encapsulating its arguments, dimension, and the return type. A dedicated
subroutine (Sum_4_2_1
) is generated to handle the summation along a specific
dimension, enhancing performance by eliminating runtime overhead. The final
output array is created by reducing the rank of the input array by 1, storing
only the necessary results.
Conclusion
This design makes sum
and other intrinsic functions
highly efficient, handling multi-dimensional arrays without requiring complex
runtime modules, while also making the generated code lightweight and
efficient as it
eliminates any
overhead compared to manually implementing constructs like if
conditions
instead of calling max
. Both the ASR and LLVM representations access the
actual, inlined code, enabling the compiler to optimize and specialize as
needed; an advantage not possible when calling a runtime library function. This
approach ensures that intrinsic functions are as efficient as manually written
code, while also being optimized seamlessly within the compilation process.
Join us
Join us in advancing LFortran! New contributors are always welcome, connect with us on Zulip to get started.
Acknowledgements
We want to thank:
- Sovereign Tech Fund (STF)
- NumFOCUS
- QuantStack
- Google Summer of Code
- John D. Cook
- GSI Technology
- LANL
- Our GitHub, OpenCollective and NumFOCUS sponsors
- All our contributors (92 so far!)
Discussions
- Fortran Discourse: https://fortran-lang.discourse.group/t/lfortran-now-supports-all-intrinsic-functions/8844
- Twitter: https://x.com/lfortranorg/status/1857087857178902894
- Bluesky: https://bsky.app/profile/ondrejcertik.bsky.social/post/3lawc4fdo4s2y
- LinkedIn: https://www.linkedin.com/feed/update/urn:li:activity:7263138560982081536/