Continuing the previous solver and assembly FEM benchmark comparison this followup ensures that identical simulation problem setup is compared, in this case a 2D Poisson problem solved on a unit square. The problem is discretized with Q1 bilinear Lagrange finite elements. Finite element functions such as assembly written and run in Matlab, Octave, Julia, and Fortran are all compared for an identical Poisson problem on a unit square grid discretized with Q1 fem basis functions. For Matlab and Octave corresponding production FEATool Multiphysics functions were used. The Fortran code is using the FEAT2D finite element library which is the basis of the FeatFlow FEM CFD solver. Lastly, Julia was a straight port of the Fortran code.

What one can see in the following data is that the Matlab, Julia, and Fortran codes all have almost identical assembly times. Octave is the slow outlier which can partly be attributed to the lack of JIT compilation. Matlab has a hard to shake reputation to often be downright slow and one might indeed be surprised and skeptical to see that it performed just as fast as Julia and Fortran. In this case this is due to a heavily vectorized and optimized Matlab assembly routine. Thus it is indeed possible to write both complex and high performance Matlab code, in this case speed at the cost of higher memory requirements.

The following Plotly curve show the results and timings computed for several grids (excluding the solver in that there is not much point comparing a direct solver with multigrid). Although the FEATool Matlab and Octave runs were slower than the Fortran and Julia ones, due to significant vectorization and optimizations they are in fact not magnitudes slower as might otherwise be expected of Matlab code. All in all FEATool compares quite well compared to Fortran, at least up to a million unknowns or so.

Solver time is not compared since Octave, Matlab, and Julia as default all use a direct solver (currently Umfpack) while the FeatFem Fortran code uses a significantly more efficient geometric multigrid iterative solver. The complete data for the benchmark test runs can be seen in the tables below

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|  Octave |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 2.37e-003 | 1.50e-003 | 1.77e-002 | 1.07e-002 | 1.06e-003 | 4.16e-004 | 3.37e-002 | 4.62e-005 | 5.78e-003 |
|      64 | 2.76e-003 | 1.95e-003 | 1.99e-002 | 1.15e-002 | 2.76e-003 | 1.84e-003 | 4.07e-002 | 1.55e-004 | 2.28e-002 |
|     128 | 6.28e-003 | 5.33e-003 | 4.39e-002 | 2.27e-002 | 1.13e-002 | 9.03e-003 | 9.86e-002 | 6.29e-004 | 1.04e-001 |
|     256 | 1.74e-002 | 2.43e-002 | 1.28e-001 | 7.38e-002 | 4.25e-002 | 3.30e-002 | 3.19e-001 | 2.38e-003 | 5.10e-001 |
|     512 | 7.38e-002 | 8.25e-002 | 4.92e-001 | 2.73e-001 | 1.57e-001 | 1.35e-001 | 1.21e+000 | 1.07e-002 | 2.66e+000 |
|    1024 | 2.87e-001 | 3.28e-001 | 1.88e+000 | 1.07e+000 | 6.69e-001 | 5.33e-001 | 4.77e+000 | 4.33e-002 | 1.75e+001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|  Matlab |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 1.74e-003 | 1.26e-003 | 2.50e-003 | 1.88e-003 | 2.22e-003 | 2.05e-003 | 1.16e-002 | 1.18e-005 | 4.05e-003 |
|      64 | 7.09e-004 | 9.97e-004 | 6.51e-003 | 3.00e-003 | 6.13e-003 | 9.71e-003 | 2.71e-002 | 4.62e-005 | 2.17e-002 |
|     128 | 1.58e-003 | 6.25e-003 | 1.58e-002 | 8.54e-003 | 2.28e-002 | 4.07e-002 | 9.57e-002 | 1.84e-004 | 9.43e-002 |
|     256 | 5.06e-003 | 2.44e-002 | 5.49e-002 | 4.11e-002 | 9.02e-002 | 1.47e-001 | 3.63e-001 | 8.68e-004 | 4.48e-001 |
|     512 | 3.06e-002 | 9.75e-002 | 2.05e-001 | 1.57e-001 | 3.74e-001 | 6.11e-001 | 1.47e+000 | 4.90e-003 | 2.54e+000 |
|    1024 | 1.23e-001 | 3.86e-001 | 8.13e-001 | 6.32e-001 | 1.51e+000 | 2.51e+000 | 5.97e+000 | 1.99e-002 | 1.48e+001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|   Julia |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 1.11e-004 | 3.77e-004 | 8.97e-004 | 2.61e-004 | 6.76e-005 | 2.30e-004 | 1.94e-003 | 2.15e-005 | 6.00e-003 |
|      64 | 2.22e-004 | 1.67e-003 | 3.58e-003 | 1.00e-003 | 9.88e-005 | 1.20e-003 | 7.78e-003 | 8.86e-005 | 2.80e-002 |
|     128 | 5.66e-004 | 6.76e-003 | 1.45e-002 | 3.92e-003 | 1.41e-004 | 5.96e-003 | 3.19e-002 | 3.65e-004 | 1.20e-001 |
|     256 | 3.20e-003 | 2.71e-002 | 5.79e-002 | 1.58e-002 | 2.48e-004 | 2.39e-002 | 1.28e-001 | 1.55e-003 | 5.12e-001 |
|     512 | 1.30e-002 | 1.10e-001 | 2.35e-001 | 6.66e-002 | 4.11e-004 | 1.77e-001 | 6.02e-001 | 7.77e-003 | 2.58e+000 |
|    1024 | 5.35e-002 | 4.60e-001 | 9.57e-001 | 2.76e-001 | 2.22e-003 | 6.20e-001 | 2.37e+000 | 3.12e-002 | 1.34e+001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|

|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
| Fortran |           |           |           |           |           |           |           |           |           |
|     1/h |    t_grid |     t_ptr |   t_asm_A |   t_asm_f |     t_bdr |  t_sparse |     t_tot |    t_spmv |   t_solve |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|
|      32 | 0.00e+000 | 8.68e-004 | 8.68e-004 | 0.00e+000 | 0.00e+000 | 0.00e+000 | 1.74e-003 | 2.60e-005 | 4.34e-003 |
|      64 | 0.00e+000 | 8.68e-004 | 7.81e-003 | 2.60e-003 | 8.68e-004 | 0.00e+000 | 1.22e-002 | 9.55e-005 | 8.68e-004 |
|     128 | 4.34e-003 | 4.34e-003 | 1.82e-002 | 1.74e-003 | 0.00e+000 | 0.00e+000 | 2.86e-002 | 2.95e-004 | 1.13e-002 |
|     256 | 3.47e-003 | 1.91e-002 | 6.94e-002 | 1.13e-002 | 1.74e-003 | 0.00e+000 | 1.05e-001 | 1.12e-003 | 4.08e-002 |
|     512 | 3.30e-002 | 9.20e-002 | 2.26e-001 | 4.17e-002 | 1.74e-003 | 0.00e+000 | 3.94e-001 | 4.93e-003 | 1.77e-001 |
|    1024 | 1.35e-001 | 3.86e-001 | 9.05e-001 | 1.67e-001 | 8.68e-004 | 0.00e+000 | 1.59e+000 | 2.35e-002 | 7.65e-001 |
|---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------|