We are thrilled to announce that LFortran supports all intrinsic functions listed in F2018 Interpretation Document (Table 16.1 Pg: 332) (also shown by this nice list online at GFortran’s documentation). This comprehensive list of almost 200 intrinsics encompasses a broad array of functions and subroutines essential to Fortran’s functionality, including elemental, inquiry, transformational, array operations, and more. For each function we support both compile-time and runtime evaluation.

The functions are implemented in an ASR (Abstract Semantic Representation) pass, which means they get “instantiated” and specialized for the argument types and then the compiler can optimize it, so using any intrinsic function is at least as fast as implementing the same operation yourself, there is no runtime overhead.

In this blog post we show the details how things are implemented under the hood, and if you find it interesting, go ahead and try it yourself in LFortran and let us know any feedback.

LFortran is still alpha software, meaning that users must continue expecting that LFortran will fail compiling or running their codes. Please report all bugs that you find.

LFortran’s implementation of all intrinsic functions has been instrumental in reaching its current milestone, where it can successfully compile 7 out of 10 essential third-party projects required for its beta phase. These projects, including dftatom, SNAP, SciPy, stdlib, and fastGPT, heavily rely on Fortran’s built-in operations to perform a range of computational tasks, from mathematical computations to array handling and logical operations.

Design of intrinsic functions

In earlier versions, LFortran implemented intrinsic functions via intrinsic modules written in Fortran in the runtime library. The implementation was thus available to the compiler at the ASR level, guaranteeing no runtime overhead, and the implementation was in Fortran, thus any Fortran programmer could participate on the implementation. However it had three major issues:

  • We had to implement a specific version of each function for all argument types and ranks, so for example the function sum must work for integer, real, complex types each with at least two kinds (at least 6 total), and ranks 1-15, so around 100 functions total, with special handling of dim for each.
  • Impossible to specialize the implementation based on the arguments (e.g., sum with the dim argument requires a special loop structure for each dim which should be specialized for highest performance).
  • The compiler just saw an ASR function implementation (such as sin), and we marked the module as “intrinsic”, but it was hard to create ASR passes to add special optimizations for particular intrinsics, since we didn’t have dedicated nodes for them.

To fix these problems, we now represent intrinsics directly as nodes in ASR. This allows all of the compiler to have knowledge about them and can optimize them, create better error messages, backends can have special handling for them, etc. Then we have an ASR pass that instantiates these nodes for the particular arguments in ASR itself, providing the highest performance. This design allows us to later for example not instantiate some intrinsics, but let the backend generate a special hardware instruction (say for inverse square root).

For instance, consider the max function and the following program.

program main
    integer :: a, b
    read *, a, b
    print *, max(a, b)
 end program

The AST (Abstract Syntax Tree) just represents max as a FuncCallOrArray AST node (it does not even know if it is an array or a function call due to Fortran’s syntax ambiguity):

(TranslationUnit
   [(Program
       main
       (TriviaNode
           []
           [(EndOfLine)
           (EndOfLine)]
       )
       []
       []
       [(Declaration
           (AttrType
               TypeInteger
               []
               ()
               ()
               None
           )
           []
           [(a
           []
           []
           ()
           ()
           None
           ())
           (b
           []
           []
           ()
           ()
           None
           ())]
           ()
       )]
       [(Read
           0
           ()
           []
           []
           [a
           b]
           ()
       )
       (Print
           0
           ()
           [(FuncCallOrArray
               max
               []
               [(()
               a
               ()
               0)
               (()
               b
               ()
               0)]
               []
               []
               []
           )]
           ()
       )]
       []
   )]
)

In ASR (Abstract Semantic Representation) on the other hand max is represented as IntrinsicElementalFunction and the compiler knows everything about it:

(TranslationUnit
    (SymbolTable
        1
        {
            main:
                (Program
                    (SymbolTable
                        2
                        {
                            a:
                                (Variable
                                    2
                                    a
                                    []
                                    Local
                                    ()
                                    ()
                                    Default
                                    (Integer 4)
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                ),
                            b:
                                (Variable
                                    2
                                    b
                                    []
                                    Local
                                    ()
                                    ()
                                    Default
                                    (Integer 4)
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                )
                        })
                    main
                    []
                    [(FileRead
                        0
                        ()
                        ()
                        ()
                        ()
                        ()
                        ()
                        [(Var 2 a)
                        (Var 2 b)]
                        ()
                    )
                    (Print
                        (StringFormat
                            ()
                            [(IntrinsicElementalFunction
                                Max
                                [(Var 2 a)
                                (Var 2 b)]
                                0
                                (Integer 4)
                                ()
                            )]
                            FormatFortran
                            (Character -1 0 () PointerString)
                            ()
                        )
                    )]
                )
        })
    []
)

Initially represented as a FuncCallOrArray node in the frontend, it is then identified as an elemental intrinsic function and converted to a dedicated IntrinsicElementalFunction node. As the code progresses through the intrinsic functions ASR pass (implemented in intrinsic_functions.cpp), this node is replaced by an ASR function (_lcompilers_max0_i32) that is constructed and cached on the fly, based on the function’s arguments as shown in ASR below:

(Function
    (SymbolTable
        3
        {
            _lcompilers_max0_i32:
                (Variable
                    3
                    _lcompilers_max0_i32
                    []
                    ReturnVar
                    ()
                    ()
                    Default
                    (Integer 4)
                    ()
                    Source
                    Public
                    Required
                    .false.
                ),
            x0:
                (Variable
                    3
                    x0
                    []
                    In
                    ()
                    ()
                    Default
                    (Integer 4)
                    ()
                    Source
                    Public
                    Required
                    .false.
                ),
            x1:
                (Variable
                    3
                    x1
                    []
                    In
                    ()
                    ()
                    Default
                    (Integer 4)
                    ()
                    Source
                    Public
                    Required
                    .false.
                )
        })
    _lcompilers_max0_i32
    (FunctionType
        [(Integer 4)
        (Integer 4)]
        (Integer 4)
        Source
        Implementation
        ()
        .false.
        .false.
        .false.
        .false.
        .false.
        []
        .false.
    )
    []
    [(Var 3 x0)
    (Var 3 x1)]
    [(Assignment
        (Var 3 _lcompilers_max0_i32)
        (Var 3 x0)
        ()
    )
    (If
        (IntegerCompare
            (Var 3 x1)
            Gt
            (Var 3 _lcompilers_max0_i32)
            (Logical 4)
            ()
        )
        [(Assignment
            (Var 3 _lcompilers_max0_i32)
            (Var 3 x1)
            ()
        )]
        []
    )]
    (Var 3 _lcompilers_max0_i32)
    Public
    .false.
    .false.
    ()
),

And the IntrinsicElementalFunction node is replaced by corresponding function call to the newly created ASR function:

(Print
    (StringFormat
        ()
        [(FunctionCall
            1 _lcompilers_max0_i32
            1 _lcompilers_max0_i32
            [((Var 2 a))
            ((Var 2 b))]
            (Integer 4)
            ()
            ()
        )]
        FormatFortran
        (Character -1 0 () PointerString)
        ()
    )
)

To understand it better, we can print the ASR as Fortran code using lfortran a.f90 --dump-all-passes-fortran and we get:

! Fortran code after applying the pass: intrinsic_function
program main
implicit none
integer(4) :: a
integer(4) :: b
read(*, *) a, b
print *, _lcompilers_max0_i32(a, b)

contains

integer(4) function _lcompilers_max0_i32(x0, x1)
    integer(4), intent(in) :: x0
    integer(4), intent(in) :: x1
    _lcompilers_max0_i32 = x0
    if (x1 > _lcompilers_max0_i32) then
        _lcompilers_max0_i32 = x1
    end if
end function _lcompilers_max0_i32

end program main

The ASR->LLVM backend then takes this code and generates LLVM. By default (Debug mode) it is not optimized:

; ModuleID = 'LFortran'
source_filename = "LFortran"

@0 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
@1 = private unnamed_addr constant [5 x i8] c"%s%s\00", align 1

define i32 @_lcompilers_max0_i32(i32* %x0, i32* %x1) {
.entry:
  %_lcompilers_max0_i32 = alloca i32, align 4
  %0 = load i32, i32* %x0, align 4
  store i32 %0, i32* %_lcompilers_max0_i32, align 4
  %1 = load i32, i32* %x1, align 4
  %2 = load i32, i32* %_lcompilers_max0_i32, align 4
  %3 = icmp sgt i32 %1, %2
  br i1 %3, label %then, label %else

then:                                             ; preds = %.entry
  %4 = load i32, i32* %x1, align 4
  store i32 %4, i32* %_lcompilers_max0_i32, align 4
  br label %ifcont

else:                                             ; preds = %.entry
  br label %ifcont

ifcont:                                           ; preds = %else, %then
  br label %return

return:                                           ; preds = %ifcont
  %5 = load i32, i32* %_lcompilers_max0_i32, align 4
  ret i32 %5
}

define i32 @main(i32 %0, i8** %1) {
.entry:
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  call void @_lpython_call_initial_functions(i32 %0, i8** %1)
  %a1 = alloca i32, align 4
  %b2 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  call void @_lfortran_read_int32(i32* %a1, i32 -1)
  call void @_lfortran_read_int32(i32* %b2, i32 -1)
  call void @_lfortran_empty_read(i32 -1, i32* %2)
  %4 = call i32 @_lcompilers_max0_i32(i32* %a1, i32* %b2)
  %5 = sext i32 %4 to i64
  %6 = call i8* (i32, i8*, ...) @_lcompilers_string_format_fortran(i32 2, i8* null, i32 2, i64 %5)
  call void (i8*, ...) @_lfortran_printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @1, i32 0, i32 0), i8* %6, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @0, i32 0, i32 0))
  br label %return

return:                                           ; preds = %.entry
  ret i32 0
}

declare void @_lpython_call_initial_functions(i32, i8**)

declare void @_lfortran_read_int32(i32*, i32)

declare void @_lfortran_empty_read(i32, i32*)

declare i8* @_lcompilers_string_format_fortran(i32, i8*, ...)

It will call the “instantiated” _lcompilers_max0_i32 function. It’s not slower than calling a pre-compiled max function from a runtime library (as is often done in other compilers), but it is not the fastest possible implementation. To obtain maximum performance, we have to use Release mode by using the --fast flag in LFortran. Currently we do not do many ASR->ASR optimizations (we will add many such optimizations later after reaching beta quality), but it enables LLVM optimizations and LLVM can optimize the above code to just:

; ModuleID = 'LFortran'
source_filename = "LFortran"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-darwin23.3.0"

@0 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
@1 = private unnamed_addr constant [5 x i8] c"%s%s\00", align 1

; Function Attrs: norecurse nounwind readonly
define i32 @_lcompilers_max0_i32(i32* nocapture readonly %x0, i32* nocapture readonly %x1) local_unnamed_addr #0 {
.entry:
  %0 = load i32, i32* %x0, align 4
  %1 = load i32, i32* %x1, align 4
  %2 = icmp sgt i32 %1, %0
  %spec.select = select i1 %2, i32 %1, i32 %0
  ret i32 %spec.select
}

define i32 @main(i32 %0, i8** %1) local_unnamed_addr {
.entry:
  tail call void @_lpython_call_initial_functions(i32 %0, i8** %1)
  %a1 = alloca i32, align 4
  %b2 = alloca i32, align 4
  %2 = alloca i32, align 4
  call void @_lfortran_read_int32(i32* nonnull %a1, i32 -1)
  call void @_lfortran_read_int32(i32* nonnull %b2, i32 -1)
  call void @_lfortran_empty_read(i32 -1, i32* nonnull %2)
  %3 = load i32, i32* %a1, align 4
  %4 = load i32, i32* %b2, align 4
  %5 = icmp sgt i32 %4, %3
  %spec.select.i = select i1 %5, i32 %4, i32 %3
  %6 = sext i32 %spec.select.i to i64
  %7 = call i8* (i32, i8*, ...) @_lcompilers_string_format_fortran(i32 2, i8* null, i32 2, i64 %6)
  call void (i8*, ...) @_lfortran_printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @1, i64 0, i64 0), i8* %7, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @0, i64 0, i64 0))
  ret i32 0
}

declare void @_lpython_call_initial_functions(i32, i8**) local_unnamed_addr

declare void @_lfortran_read_int32(i32*, i32) local_unnamed_addr

declare void @_lfortran_empty_read(i32, i32*) local_unnamed_addr

declare i8* @_lcompilers_string_format_fortran(i32, i8*, ...) local_unnamed_addr

declare void @_lfortran_printf(i8*, ...) local_unnamed_addr

As you can see, the function got inlined and simplified. The resulting code is as fast as you would get by manually implementing the logic. Conclusion: there is no runtime overhead by using intrinsic functions.

Let’s take the example of another function, sum. Sum is an intrinsic array function with more number of arguments, out of which some are optional. Most of the intrinsic array functions and subroutines include optional arguments. They are handled with the help of overload_id in ASR. Let’s look at the dim parameter in the sum intrinsic. In the sum function, the FuncCallOrArray node represented in the frontend is then recognised as an array intrinsic and is replaced with the IntrinsicArrayFunction node.

program test_sum_dim
    integer :: x(3, 3) = reshape([1, 2, 3, 4, 5, 6, 7, 8, 9], [3, 3])
    integer :: dim_ = 1
    print *, sum(x, dim=dim_)
end program

AST:

(TranslationUnit
    [(Program
        test_sum_dim
        ()
        []
        []
        [(Declaration
            (AttrType
                TypeInteger
                []
                ()
                ()
                None
            )
            []
            [(x
            [(1
            3
            DimensionExpr)
            (1
            3
            DimensionExpr)]
            []
            ()
            (FuncCallOrArray
                RESHAPE
                []
                [(()
                (ArrayInitializer
                    ()
                    ()
                    [1
                    2
                    3
                    4
                    5
                    6
                    7
                    8
                    9]
                )
                ()
                0)
                (()
                (ArrayInitializer
                    ()
                    ()
                    [3
                    3]
                )
                ()
                0)]
                []
                []
                []
            )
            Equal
            ())]
            ()
        )
        (Declaration
            (AttrType
                TypeInteger
                []
                ()
                ()
                None
            )
            []
            [(dim_
            []
            []
            ()
            1
            Equal
            ())]
            ()
        )]
        [(Print
            0
            ()
            [(FuncCallOrArray
                SUM
                []
                [(()
                x
                ()
                0)]
                [(DIM
                dim_)]
                []
                []
            )]
            ()
        )]
        []
    )]
)

In ASR, the sum function is represented as an IntrinsicArrayFunction node. This node captures the properties of a sum. The input array x. The dimension (dim=1) along which the summation should be performed. The ASR node includes necessary information, like array dimensions and type, and the data is prepared to ensure that the results fit within an output array of rank n-1 (where n is the rank of x).

(TranslationUnit
    (SymbolTable
        1
        {
            test_sum_dim:
                (Program
                    (SymbolTable
                        2
                        {
                            dim_:
                                (Variable
                                    2
                                    dim_
                                    []
                                    Local
                                    (IntegerConstant 1 (Integer 4) Decimal)
                                    (IntegerConstant 1 (Integer 4) Decimal)
                                    Save
                                    (Integer 4)
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                ),
                            x:
                                (Variable
                                    2
                                    x
                                    []
                                    Local
                                    (ArrayReshape
                                        (ArrayConstant
                                            36
                                            [1, 2, 3, ...., 7, 8, 9]
                                            (Array
                                                (Integer 4)
                                                [((IntegerConstant 1 (Integer 4) Decimal)
                                                (IntegerConstant 9 (Integer 4) Decimal))]
                                                FixedSizeArray
                                            )
                                            ColMajor
                                        )
                                        (ArrayPhysicalCast
                                            (ArrayConstant
                                                8
                                                [3, 3]
                                                (Array
                                                    (Integer 4)
                                                    [((IntegerConstant 1 (Integer 4) Decimal)
                                                    (IntegerConstant 2 (Integer 4) Decimal))]
                                                    FixedSizeArray
                                                )
                                                ColMajor
                                            )
                                            FixedSizeArray
                                            DescriptorArray
                                            (Array
                                                (Integer 4)
                                                [((IntegerConstant 1 (Integer 4) Decimal)
                                                (IntegerConstant 2 (Integer 4) Decimal))]
                                                DescriptorArray
                                            )
                                            ()
                                        )
                                        (Array
                                            (Integer 4)
                                            [((IntegerConstant 1 (Integer 4) Decimal)
                                            (IntegerConstant 3 (Integer 4) Decimal))
                                            ((IntegerConstant 1 (Integer 4) Decimal)
                                            (IntegerConstant 3 (Integer 4) Decimal))]
                                            FixedSizeArray
                                        )
                                        ()
                                    )
                                    ()
                                    Save
                                    (Array
                                        (Integer 4)
                                        [((IntegerConstant 1 (Integer 4) Decimal)
                                        (IntegerConstant 3 (Integer 4) Decimal))
                                        ((IntegerConstant 1 (Integer 4) Decimal)
                                        (IntegerConstant 3 (Integer 4) Decimal))]
                                        FixedSizeArray
                                    )
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                )
                        })
                    test_sum_dim
                    []
                    [(Print
                        (StringFormat
                            ()
                            [(IntrinsicArrayFunction
                                Sum
                                [(ArrayPhysicalCast
                                    (Var 2 x)
                                    FixedSizeArray
                                    DescriptorArray
                                    (Array
                                        (Integer 4)
                                        [((IntegerConstant 1 (Integer 4) Decimal)
                                        (IntegerConstant 3 (Integer 4) Decimal))
                                        ((IntegerConstant 1 (Integer 4) Decimal)
                                        (IntegerConstant 3 (Integer 4) Decimal))]
                                        DescriptorArray
                                    )
                                    ()
                                )
                                (Var 2 dim_)]
                                1
                                (Array
                                    (Integer 4)
                                    [((IntegerConstant 1 (Integer 4) Decimal)
                                    (IntrinsicElementalFunction
                                        Merge
                                        [(IntegerConstant 3 (Integer 4) Decimal)
                                        (IntegerConstant 3 (Integer 4) Decimal)
                                        (IntegerCompare
                                            (IntegerConstant 1 (Integer 4) Decimal)
                                            Lt
                                            (Var 2 dim_)
                                            (Logical 4)
                                            ()
                                        )]
                                        0
                                        (Integer 4)
                                        ()
                                    ))]
                                    DescriptorArray
                                )
                                ()
                            )]
                            FormatFortran
                            (Character -1 0 () PointerString)
                            ()
                        )
                    )]
                )
        })
    []
)

Using lfortran a.f90 --dump-all-passes-fortran, the generated code reveals the creation of a subroutine (Sum_4_2_1) that calculates the row or column (or any other dimension) sum based on the dimension specified. After applying the intrinsic function pass, the generated code looks like this:

! Fortran code after applying the pass: intrinsic_function
program test_sum_dim
implicit none
integer(4), dimension(:), allocatable :: __libasr__created__var__0_Sum_4_2_1_res
integer(4), save :: dim_ = 1
integer(4), dimension(3, 3), save :: x = reshape([1, 2, 3, 4, 5, 6, 7, 8, 9], [3, 3])
allocate(__libasr__created__var__0_Sum_4_2_1_res(_lcompilers_merge_i32(3, 3, 1 < dim_)))
call Sum_4_2_1(x, dim_, __libasr__created__var__0_Sum_4_2_1_res)
print *, __libasr__created__var__0_Sum_4_2_1_res

contains

subroutine Sum_4_2_1(array, dim, result)
    integer(4) :: __1_j
    integer(4) :: __2_j
    integer(4), dimension(:, :), intent(in) :: array
    integer(4), intent(in) :: dim
    integer(4), dimension(:), intent(out) :: result
    result = 0
    if (dim == 2) then
        do __1_j = lbound(array, 1), ubound(array, 1)
            do __2_j = lbound(array, 2), ubound(array, 2)
                result(__1_j) = result(__1_j) + array(__1_j, __2_j)
            end do
        end do
    else
        if (dim == 1) then
            do __2_j = lbound(array, 2), ubound(array, 2)
                do __1_j = lbound(array, 1), ubound(array, 1)
                    result(__2_j) = result(__2_j) + array(__1_j, __2_j)
                end do
            end do
        end if
    end if
end subroutine Sum_4_2_1

integer(4) function _lcompilers_merge_i32(tsource, fsource, mask) result(merge)
    integer(4), intent(in) :: fsource
    logical(4), intent(in) :: mask
    integer(4), intent(in) :: tsource
    if (mask) then
        merge = tsource
    else
        merge = fsource
    end if
end function _lcompilers_merge_i32

end program test_sum_dim

This implementation of sum demonstrates how LFortran handles multi-dimensional intrinsic functions efficiently. The sum function is transformed into an ASR node, encapsulating its arguments, dimension, and the return type. A dedicated subroutine (Sum_4_2_1) is generated to handle the summation along a specific dimension, enhancing performance by eliminating runtime overhead. The final output array is created by reducing the rank of the input array by 1, storing only the necessary results.

Conclusion

This design makes sum and other intrinsic functions highly efficient, handling multi-dimensional arrays without requiring complex runtime modules, while also making the generated code lightweight and efficient as it eliminates any overhead compared to manually implementing constructs like if conditions instead of calling max. Both the ASR and LLVM representations access the actual, inlined code, enabling the compiler to optimize and specialize as needed; an advantage not possible when calling a runtime library function. This approach ensures that intrinsic functions are as efficient as manually written code, while also being optimized seamlessly within the compilation process.

Join us

Join us in advancing LFortran! New contributors are always welcome, connect with us on Zulip to get started.

Acknowledgements

We want to thank:

Discussions