Loading...
Loading...
Loading...
+ [Learning and practice of high performance computing](https://github.com/cjmcv/hpc)
# HPC (High Performance Computing) bookmarks
+ [Learning and practice of high performance computing](https://github.com/cjmcv/hpc)
+ CFD
+ [Algebraic Flux Correction I. Scalar Conservation Laws](http://www.mathematik.tu-dortmund.de/lsiii/cms/papers/Kuzmin2011b.pdf)
+ [Algebraic Flux Correction II. Compressible Euler Equations](https://www.researchgate.net/publication/226067892_Algebraic_Flux_Correction_II_Compressible_Euler_Equations)
+ [CFD Notes by Hiroaki Nishikawa](http://www.cfdnotes.com/)
+ [CFD codes in f90](http://www.cfdbooks.com/cfdcodes.html)
+ [Tim Warburton's github repositories](https://github.com/tcew?tab=repositories)
+ [Nodal Discontinuous Galerkin](https://github.com/tcew/nodal-dg)
+ [Hybrid and Easy Discontinuous Galerkin Environment](https://mathema.tician.de/software/hedge/)
+ [**Element-based-Galerkin-Methods**](https://github.com/fxgiraldo/Element-based-Galerkin-Methods)
+ [Extreme-scale Discontinuous Galerkin Environment (EDGE)](https://github.com/3343/edge)
+ [The Development, Verification, and Validation of a Discontinuous Galerkin Method for the Navier-Stokes Equations](https://dataspace.princeton.edu/handle/88435/dsp01pk02cd90z)
+ [What are possible methods to solve compressible Euler equations](http://scicomp.stackexchange.com/questions/283/what-are-possible-methods-to-solve-compressible-euler-equations/305)
+ [I do like CFD, VOL.1, Second Edition](http://www.cfdbooks.com/cfdbooks.html)
+ [Free CFD Codes](http://www.cfdbooks.com/cfdcodes.html)
+ [CFD Julia: A Learning Module Structuring an Introductory Course on Computational Fluid Dynamics](https://www.mdpi.com/2311-5521/4/3/159/htm)
+ [CFD_Julia](https://github.com/surajp92/CFD_Julia)
+ Lattice Boltzmann codes
+ [A lattice Boltzmann code for complex fluids](https://github.com/ludwig-cf/ludwig)
+ [Hydrodynamics in OpenCL](http://christopheremoore.net/hydrodynamics-cl/)
+ [Roe, HLL, HLLC, Burgers Scheme; 1D, 2D, 3D; Euler Equation, (~Maxwell), (~MHD), ADM Solver in OpenCL](https://github.com/thenumbernine/HydrodynamicsGPU)
+ [Differential Geometry Tensor Library](https://github.com/thenumbernine/Tensor)
+ [Implementing the discontinuous Galerkin method in CUDA](https://github.com/martyfuhry/DGCUDA)
+ [Marty Fuhry's Homepage](http://www.martyfuhry.blogspot.co.uk/p/another-page.html)
+ [Master's Thesis: An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units, defended May, 2013](https://uwspace.uwaterloo.ca/bitstream/handle/10012/7523/Fuhry_Martin.pdf?sequence=1)
+ [A GPU-accelerated adaptive discontinuous Galerkin method for level set equation](https://www.researchgate.net/publication/299471213_A_GPU-accelerated_adaptive_discontinuous_Galerkin_method_for_level_set_equation)
+ [A GPU Accelerated Discontinuous Galerkin Incompressible Flow Solver](https://www.researchgate.net/publication/322221361_A_GPU_Accelerated_Discontinuous_Galerkin_Incompressible_Flow_Solver)
+ [The Development, Verification, and Validation of a Discontinuous Galerkin Method for the Navier-Stokes Equations](https://dataspace.princeton.edu/handle/88435/dsp01pk02cd90z)
+ [SU2](http://su2.stanford.edu/)
+ [github repo for SU2](https://github.com/su2code/SU2)
+ [HiFiLES: High Fidelity Large Eddy Simulation](https://hifiles.stanford.edu/)
+ [github repo for HiFiLES](https://github.com/HiFiLES/HiFiLES-solver)
+ [PyWENO + PyPFASST](https://github.com/memmett)
+ [Clawpack Repositories](https://github.com/clawpack/)
+ [Riemann Problems and Jupyter Solutions](https://github.com/clawpack/riemann_book#installation)
+ [Shenfun is a high performance computing platform for solving partial differential equations (PDEs) by the spectral Galerkin method](https://github.com/spectralDNS/shenfun)
+ [Multilayered Abstractions for Partial Differential Equations by Graham Robert Markall: good review on Nektar++ etc.](http://www.big-grey.co.uk/g_markall_phd_thesis.pdf)
+ [Making Faster FEM Solvers, Faster MPhil Transfer Report By Graham Markall](http://www.doc.ic.ac.uk/~grm08/g_markall_mphil_transfer.pdf)
+ [Graham Markall](http://www.doc.ic.ac.uk/~grm08/)
+ Nektar++
+ [Nektar++: An efficient h to p finite element framework](http://www.nektar.info/)
+ [Nektar++: a high-order finite element framework](https://xyloid.org/assets/talks/2014-06-ices.pdf)
+ [Simple (and not-so-simple) CFD solvers written in Fortran with Python plotting routines](https://github.com/JOThurgood/SimpleCFD)
+ [MAESTROeX solves the equations of low Mach number hydrodynamics for stratified atmospheres/full spherical stars with a general equation of state, and nuclear reaction networks in an adaptive-grid finite-volume framework. It includes reactions and thermal diffusion and can be used on anything from a single core to 100,000s of processor cores with MPI + OpenMP or 1,000s of GPUs](https://github.com/AMReX-Astro/MAESTROeX)
+ [Model stars and atomspheres with MAESTROeX](https://amrex-astro.github.io/MAESTROeX/)
+ [Is there a good tutorial or textbook-like source on implementing ENO/WENO with limiters in one (and more than one) dimension?](https://scicomp.stackexchange.com/questions/8706/is-there-a-good-tutorial-or-textbook-like-source-on-implementing-eno-weno-with-l/8709#8709)
+ [PyWENO](https://pyweno.readthedocs.io/en/latest/)
+ [blitzdg is an open-source library offering discontinuous Galerkin (dg) solvers for common partial differential equations systems using blitz++ for array and tensor manipulations in a C++ environment or NumPy as a Python 3 library](https://github.com/WQCG/blitzdg)
+ [Derek Steinmoeller's blog](https://dsteinmo.github.io/)
+ [DGSWEM V2](https://github.com/UT-CHG/dgswemv2)
+ [adaptive multiresolution DG](https://github.com/JuntaoHuang/adaptive-multiresolution-DG)
+ [Exasim - Generating Discontinuous Galerkin Codes For Extreme Scalable Simulations](https://github.com/exapde/Exasim)
+ Chebyshev Pseudo-Spectral Method (PSM)
+ [Chebyshev Polynomials J.C. Mason D.C. Handscomb](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-FM.pdf)
+ [Chapter 1. Definitions](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch01.pdf)
+ [Chapter 2. Basic Properties and Formulae](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch02.pdf)
+ [Chapter 3. The Minimax Property and Its Applications](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch03.pdf)
+ [Chapter 4. Orthogonality and Least-Squares Approximation](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch04.pdf)
+ [Chapter 5. Chebyshev Series](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch05.pdf)
+ [Chapter 6. Chebyshev Interpolation](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch06.pdf)
+ [Chapter 7. Near-Best L∞, L1 and Lp Approximations](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch07.pdf)
+ [Chapter 8. Integration Using Chebyshev Polynomials](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch08.pdf)
+ [Chapter 9. Solution of Integral Equations](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch09.pdf)
+ [Chapter 10. Solution of Ordinary Differential Equations](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch10.pdf)
+ [**Chapter 11. Chebyshev and Spectral Methods for Partial DifferentialEquations**](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch11.pdf)
+ [Chapter 12. Conclusion](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-Ch12.pdf)
+ [Appendices:](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-App.pdf)
+ [Summary of Notations, Definitions and ImportantProperties](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-App.pdf#[1,{%22name%22:%22FitH%22},690])
+ [Tables of Coefficients](http://inis.jinr.ru/sl/M_Mathematics/MRef_References/Mason,%20Hanscomb.%20Chebyshev%20polynomials%20(2003)/C0355-App.pdf#[7,{%22name%22:%22FitH%22},677])
+ [FFTW Discrete Cosine Transform Derivative](http://www.variousconsequences.com/2009/05/fftw-discrete-cosine-transform.html)
+ [FAST ALGORITHMS FOR DISCRETE POLYNOMIAL TRANSFORMS](https://www.ams.org/journals/mcom/1998-67-224/S0025-5718-98-00975-2/S0025-5718-98-00975-2.pdf)
+ [A New Method for Chebyshev Polynomial Interpolation Based on Cosine Transforms](https://link.springer.com/article/10.1007/s00034-015-0087-4)
+ [A brief introduction to pseudo-spectral methods: application to diffusion problems](https://arxiv.org/pdf/1606.05432.pdf)
+ [Spectral methods in python](http://cpraveen.github.io/teaching/chebpy.html)
+ [An Introduction to Domain Decomposition Methods:algorithms, theory and parallel implementation](https://hal.archives-ouvertes.fr/cel-01100932/file/bookddm.pdf)
+ [Chebyshev-Legendre Spectral Domain Decomposition Method for Two-Dimensional Vorticity Equations](https://www.cambridge.org/core/journals/communications-in-computational-physics/article/abs/chebyshevlegendre-spectral-domain-decomposition-method-for-twodimensional-vorticity-equations/18FEEF1F11DA2E8A8F134C8C2FE18052)
+ [Domain Decomposition Methods for Mortar Finite Elements](https://cs.nyu.edu/media/publications/TR2000-804.pdf)
+ [An efficient domain-decomposition pseudo-spectral method for solving elliptic differential equations](https://eprints.usq.edu.au/4568/)
+ [A Pseudospectral Multi-Domain Method for the Incompressible Navier-Stokes Equations](https://www.researchgate.net/publication/220395568_A_Pseudospectral_Multi-Domain_Method_for_the_Incompressible_Navier-Stokes_Equations)
+ [Deep Domain Decomposition Method: Elliptic Problems](https://arxiv.org/pdf/2004.04884.pdf)
+ [How to Design an Efficient Pseudospectral Code](https://www.math.ualberta.ca/~bowman/talks/caims19.pdf)
+ [code for **How to Design an Efficient Pseudospectral Code**](https://github.com/dealias/dns)
+ [Dedalus is a framework for solving a broad range of partial differential equations using spectral methods, including initial-value, boundary-value, and generalized eigenvalue problems](https://dedalus-project.org/about/)
+ [Dedalus is a flexible framework for solving partial differential equations using spectral methods](https://github.com/DedalusProject/dedalus)
+ [multiple-interval pseudospectral methods to solve optimal control problems](https://github.com/danielrherber/basic-multiple-interval-pseudospectral)
+ [pizza is a high-performance numerical code for quasi-geostrophic and non-rotating convection in a 2-D annulus geometry](https://github.com/magic-sph/pizza)
+ [FDBB (Fluid Dynamics Building Blocks) is a C++ expression template library for fluid dynamics](https://mmoelle1.gitlab.io/FDBB/)
+ [FDBB - Fluid Dynamics Building Blocks](https://gitlab.com/mmoelle1/FDBB)
+ Finite Element Methods (FEM) and Spectral Element Methods (SEM)
+ [deal.II — an open source finite element library](https://www.dealii.org/)
+ [Amandus: Simulations based on multilevel Schwarz methods Documentation](http://www.mathsim.eu/~gkanscha/amandus/)
+ [Feel++ finite element embedded library in C++](http://www.feelpp.org/)
+ [Feel++: Finite Element Embedded Library in C++](https://github.com/feelpp/feelpp)
+ [Veamy: an extensible object-oriented C++ library for the virtual element method](https://camlab.cl/software/veamy/)
+ [Veamy: an extensible object-oriented C++ library for the virtual element method](https://www.researchgate.net/publication/319057392_Veamy_an_extensible_object-oriented_C_library_for_the_virtual_element_method)
+ [Two dimensional high-order spectral element method fluid dynamics solver](https://github.com/horsescfd/HORSES2D)
+ [Two dimensional high-order spectral element method fluid dynamics solver](https://github.com/juanmanzanero/HORSES2D)
+ [github: ITHACA-SEM - In real Time Highly Advanced Computational Applications for Spectral Element Methods](https://github.com/mathLab/ITHACA-SEM)
+ [THACA-SEM - In real Time Highly Advanced Computational Applications for Spectral Element Methods](https://mathlab.sissa.it/ITHACA-SEM)
+ [AxiSEM is a parallel spectral-element method to solve 3D wave propagation in a sphere with axisymmetric or spherically symmetric visco-elastic, acoustic, anisotropic structures](https://github.com/geodynamics/axisem)
+ [HDGlab: An open-source implementation of the hybridisable discontinuous Galerkin method in MATLAB](https://ww2.lacan.upc.edu/scientificPublications/files/pdfs/ACME-GSH-20.pdf)
+ [HDGlab - A Matlab implementation of the hybridisable discontinuous Galerkin (HDG) method](https://git.lacan.upc.edu/hybridLab/HDGlab)
+ [Euler Equations for Ideal Gases](https://github.com/IANW-Projects/ConservationLaws/issues/11)
+ [Split form nodal discontinuous Galerkin schemes with summation-by-parts property for the compressible Euler equations](https://www.sciencedirect.com/science/article/pii/S0021999116304259)
+ Siemens
+ [Embedded Multicore Building Blocks (EMB²)](https://github.com/siemens/embb)
+ Maxeler
+ [Maxeler Technologies - Maximum Performance Computing](https://github.com/maxeler)
+ [AirfoilDFE - An unstructured mesh finite volume solver on DFE.](https://github.com/maxeler/Airfoil)
+ [Lattice QCD is a discretization of Quantum Chromodynamics](https://github.com/maxeler/LatticeQCD)
+ [LatticeBoltzmann](https://github.com/maxeler/LatticeBoltzmann)
+ [facilities to experiment with Discontinuous Petrov Galerkin (DPG) methods](https://github.com/jayggg/DPG)
+ [Research papers of Jay Gopalakrishnan](http://web.pdx.edu/~gjay/research/papers.html)
+ [Free CFD codes](https://www.cfd-online.com/Wiki/Codes)
+ [Code_Saturne](https://www.code-saturne.org/cms/download/Source-code-access)
+ [Large-Scale CFD Parallel Computing Dealing with Massive Mesh](https://www.hindawi.com/journals/je/2013/850148/)016c0bd28b2435d468ce3cd1771426de9f264af6
+ [Open source tools in technical photorealistic large-scale visualisation](http://www.vtt.fi/inf/julkaisut/muut/2015/VTT-R-04911-15.pdf)
+ [An Open Source CFD-DEM Perspective](http://web.student.chalmers.se/groups/ofw5/Presentations/ChristophGonivaSlidesOFW5.pdf)
+ [3D, block structured, explicit/implicit, Navier-Stokes solver](https://github.com/mnucci32/aither)
+ [An evaluation of the Eigen linear algebra library for use in the aither CFD solver](https://github.com/mnucci32/eigenVsAither)
+ [A look at the performance of expression templates in C++: Eigen vs Blaze vs Fastor vs Armadillo vs XTensor](https://romanpoya.medium.com/a-look-at-the-performance-of-expression-templates-in-c-eigen-vs-blaze-vs-fastor-vs-armadillo-vs-2474ed38d982)
+ CFD + GPU
+ [Recent progress and challenges in exploiting graphics processors in computational fluid dynamics: slightly outdated but interesting](http://arxiv.org/pdf/1309.3018.pdf)
+ [Laplace solver running on GPU using CUDA, with CPU version for comparison, slightly outdated](https://github.com/kyleniemeyer/laplace_gpu)
+ PyFR
+ [PyFR is an open-source Python based framework for solving advection-diffusion type problems on streaming architectures using the Flux Reconstruction approach of Huynh](http://www.pyfr.org/)
+ [vincentlab/PyFR](https://github.com/vincentlab/PyFR)
+ [New PyFR Paper “Heterogeneous Computing on Mixed Unstructured Grids with PyFR”](http://www.techenablement.com/new-pyfr-paper-heterogeneous-computing-on-mixed-unstructured-grids-with-pyfr/)
+ [PyFR: An open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach](http://www.sciencedirect.com/science/article/pii/S0010465514002549)
+ [High Performance Parallelism Pearls Volume Two: Multicore and Many-core Programming Approaches](https://books.google.ru/books?id=MUZ0CAAAQBAJ&pg=PA261&lpg=PA261&dq=mako+python+examples+c%2B%2B&source=bl&ots=nBUXLR84mk&sig=FVLDhAaYRjzoEjDCQleT43deZv4&hl=en&sa=X&ved=0CC8Q6AEwCDgKahUKEwjCjMnGvprHAhXJjywKHQuiBnE#v=onepage&q=mako%20python%20examples%20c%2B%2B&f=false)
+ Camellia
+ [Camellia Discontinuous Petrov-Galerkin github repository](https://github.com/CamelliaDPG/Camellia/tree/master/docs/HPC_report)
+ Co-design at Lawrence Livermore National Lab
+ [Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)](https://codesign.llnl.gov/lulesh.php)
+ [DoE Exascale Co-Design Center for Materials in Extreme Environments : Extreme Materials at Extreme Scale](http://www.exmatex.org/)
+ [Programming Models - Languages and tools for developing multi-scale applicatins.](http://www.exmatex.org/prog-models.html)
+ [Terra is a new low-level system programming language that is designed to interoperate seamlessly with the Lua programming language](http://terralang.org/)
+ [List of quantum chemistry and solid-state physics software](https://en.wikipedia.org/wiki/List_of_quantum_chemistry_and_solid-state_physics_software)
+ CP2K
+ [Mirror of official svn repository at sourceforge. Synced every 5 minutes.](https://github.com/cp2k/cp2k)
+ [Accelerated Sparse Matrix Multiplication for Quantum Chemistry with CP2K on Hyprid Supercomputers](https://www.youtube.com/watch?v=5wppMHxF_Js)
+ [Evaluation of C, Go, and Rust in the HPC environment](https://news.ycombinator.com/item?id=9477014)
+ Modern Fortran
+ [NNSA, national labs team with Nvidia to develop open-source Fortran compiler technology](https://www.llnl.gov/news/nnsa-national-labs-team-nvidia-develop-open-source-fortran-compiler-technology)
+ [Flang is a ground-up implementation of a Fortran front end written in modern C++. It started off as the f18 project](https://github.com/llvm/llvm-project/tree/master/flang/)
+ [F18 is a front-end for Fortran intended to replace the existing front-end in the Flang compiler](https://github.com/flang-compiler/f18)
tl;dr 301 moved The code from this repository can now be found at [flang](https://github.com/llvm/llvm-project/tree/master/flang/)
+ [Flang and F18](https://github.com/flang-compiler/flang/wiki)
+ [Installing LLVM Flang Fortran compiler](https://www.scivision.dev/flang-compiler-build-tips/)
tl;dr
```sh
git clone https://github.com/llvm/llvm-project
mkdir -p llvm-project/build
cd llvm-project/build
cmake ../llvm -DLLVM_ENABLE_PROJECTS=flang
```
+ [Unknown CMake command “tablegen”](https://stackoverflow.com/questions/59691069/unknown-cmake-command-tablegen)
+ [libCEED: the CEED Library: Code for Efficient Extensible Discretization](https://github.com/CEED/libCEED)
+ [CEED Library: Code for Efficient Extensible Discretization](https://ceed.exascaleproject.org/software/)
+ [MFEM is a free, lightweight, scalable C++ library for finite element methods](https://mfem.org/)
+ [MFEM is a free, lightweight, scalable C++ library for finite element methods: examples](https://mfem.org/examples/)
+ [**GPU support in MFEM**](https://mfem.org/gpu-tips-n-tricks/)
+ [Finite Element Discretization Library
__
_ __ ___ / _| ___ _ __ ___
| '_ ` _ \ | |_ / _ \| '_ ` _ \
| | | | | || _|| __/| | | | | |
|_| |_| |_||_| \___||_| |_| |_|](https://github.com/mfem/mfem)
+ [High-order Lagrangian Hydrodynamics Miniapp](https://github.com/CEED/Laghos)
+ [Modern trends in programming of GPUs DAQFEET 2021](https://indico.cern.ch/event/974424/contributions/4158315/attachments/2186808/3695101/modern-gpu.pdf)
+ [Toward Performance-Portable PETSc for GPU-based Exascale Systems](https://arxiv.org/pdf/2011.00715.pdf)
+ AMG
+ AMG intro
+ [Iteration methods](https://encyclopediaofmath.org/wiki/Iteration_methods)
+ [Algebraic multigrid method by smoothed agglomeration for a Stokes problem](http://perso.unifr.ch/ales.janka/papers/emg_slides.pdf)
+ [Convergence of Algebraic Multigrid Based on Smoothed Aggregation II: Extension to a Petrov-Galerkin Method](https://hal.inria.fr/inria-00072986)
+ [Lawrence Livermore National Laboratory Robert D. Falgout Center for Applied Scientific Computing An Algebraic Multigrid Tutorial](http://user.it.uu.se/~maya/Courses/NLA_Parallel/Slides_2013/AMG_parallel_Falgout.pdf)
+ [An Introduction to Algebraic Multigrid](https://www2.karlin.mff.cuni.cz/~hron/NMNV532/An_Introduction_to_Algebraic_Multigrid_Computing-Falgout-2006.pdf)
+ [An Algebraic Multigrid Tutorial IMA Tutorial – FastSolution Techniques November28-29, 2010](http://user.it.uu.se/~maya/Courses/NLA_Parallel/Slides_2013/AMG_parallel_Falgout.pdf)
+ [Multigrid Methods: From Geometrical to Algebraic Versions Gundolf HAASE](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.453.4097&rep=rep1&type=pdf)
+ [A root-node based algebraic multigrid method](https://arxiv.org/pdf/1610.03154.pdf)
+ [Iterative methods for linear, non-linear and eigenvalue problems](http://www.mcc.uiuc.edu/summerschool/2001/Eric%20de%20Sturler/desturler.htm)
+ [A Multigrid Tutorial by William L. Briggs](https://www.math.ust.hk/~mawang/teaching/math532/mgtut.pdf)
+ [Algebraic Multigrid Code](https://scicomp.stackexchange.com/questions/1300/algebraic-multigrid-code)
+ [Performance of Preconditioners for Large-Scale Simulations Using Nek5000](https://link.springer.com/chapter/10.1007/978-3-030-39647-3_20)
+ [Reducing Complexity in Parallel Algebraic Multigrid Preconditioners, Hans de Sterck, Ulrike Meier Yang and Jeffrey J. Heys](http://www.math.uwaterloo.ca/~hdesterc/websiteW/Data/publications/journal/pmisPreprint.pdf)
+ [3.2.5. Block Compressed Sparse Row Format (BSR)](https://docs.nvidia.com/cuda/cusparse/index.html#bsr-format)
+ [I don't find the LU decomposition on the device with cuSolver](https://stackoverflow.com/questions/32242677/i-dont-find-the-lu-decomposition-on-the-device-with-cusolver)
+ [AMGX](https://github.com/NVIDIA/AMGX)
+ [AMGX in Julia](https://github.com/JuliaGPU/AMGX.jl)
+ [pyamgx: Python interface to NVIDIA's AMGX library](https://github.com/shwina/pyamgx)
+ [pyamgx - GPU accelerated multigrid library for Python](https://pyamgx.readthedocs.io/en/latest/)
+ [AmgXWrapper](https://github.com/barbagroup/AmgXWrapper)
+ [An example and benchmark of AmgX and PETSc with Poisson system](https://github.com/barbagroup/AmgXWrapper/blob/master/example/poisson/src/main.cpp)
+ [PetIBM - toolbox and applications of the immersed-boundary method on distributed-memory architectures](https://github.com/barbagroup/PetIBM)
+ [geoclaw-landspill](https://github.com/barbagroup/geoclaw-landspill)
+ [High-productivity, high-performance workflow for virus-scale electrostatic simulations with Bempp-Exafmm](https://github.com/barbagroup/bempp_exafmm_paper)
+ [Alexa: Simulating Shock Hydrodynamics on
the GPU using Kokkos](https://www.osti.gov/servlets/purl/1510909)
+ [GPGPU acceleration
a case study of algebraic multigrid preconditioned GMRES](https://pure.tue.nl/ws/portalfiles/portal/142433633/Master_Thesis_Report_Lucas_Bekker_final_.pdf)
+ [AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods](https://www.researchgate.net/publication/283330199_AmgX_A_Library_for_GPU_Accelerated_Algebraic_Multigrid_and_Preconditioned_Iterative_Methods)
+ [Comparison of AMGX and Hypre](https://github.com/NVIDIA/AMGX/issues/112)
+ [rocALUTION is a sparse linear algebra library with focus on exploring fine-grained parallelism](https://rocalution.readthedocs.io/en/master/usermanual.html)
+ [amgcl](https://github.com/ddemidov/amgcl)
+ [amgcl](https://amgcl.readthedocs.io/en/latest/)
+ [C++ library for solving large sparse linear systems with algebraic multigrid method](https://bestofcpp.com/repo/ddemidov-amgcl-cpp-scientific-computing)
+ [Triggering C++11 support in NVCC with CMake](https://stackoverflow.com/questions/36551469/triggering-c11-support-in-nvcc-with-cmake)
tl;dr
```diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6ca3264..b63e326 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -161,9 +161,9 @@ if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR CMAKE_CXX_COMPILER_ID MATCHES "MSVC")
if (CMAKE_CXX_COMPILER_ID MATCHES "GNU")
list(APPEND CUDA_NVCC_FLAGS
- ${CUDA_ARCH_FLAGS} -std=c++11 -Wno-deprecated-gpu-targets)
+ ${CUDA_ARCH_FLAGS} -std=c++17 -Wno-deprecated-gpu-targets)
- list(APPEND CUDA_NVCC_FLAGS -Xcompiler -std=c++11 -Xcompiler -fPIC -Xcompiler -Wno-vla)
+ list(APPEND CUDA_NVCC_FLAGS -Xcompiler -std=c++17 -Xcompiler -fPIC -Xcompiler -Wno-vla)
endif()
add_library(cuda_target INTERFACE)
```
+ [Stokes problem gives NaN by AMG but GMRES works fine](https://github.com/ddemidov/amgcl/issues/144)
+ [Pressure projection solver for Incompressible Navier-Stokes FEM](https://github.com/ddemidov/amgcl/issues/151)
+ [**how to perform matrix construction in GPU deveces without the data transfer**](https://github.com/ddemidov/amgcl/issues/164)
+ [Block preconditioners](https://github.com/ddemidov/amgcl/issues/37)
+ [amg_corrector_solver](https://github.com/Andlon/crest/blob/master/include/crest/basis/amg_corrector_solver.hpp)
+ [schur pressure correction](https://github.com/ddemidov/cppstokes_benchmarks/blob/master/amgcl_spc_pre.cpp)
+ [code accompanying "Accelerating linear solvers for Stokes problems with C++ metaprogramming"](https://github.com/ddemidov/cppstokes_benchmarks/)
+ [Accelerating linear solvers for Stokes problems with C++ metaprogramming](https://arxiv.org/pdf/2006.06052.pdf)
+ [SPARSH-AMG](https://github.com/cmgcds/SParSH-AMG)
+ [SPARSH-AMG: A LIBRARY FOR HYBRID CPU-GPU ALGEBRAIC
MULTIGRID AND PRECONDITIONED ITERATIVE METHODS](https://arxiv.org/pdf/2007.00056.pdf)
+ [Ginkgo is a high-performance linear algebra library for manycore systems, with a focus on sparse solution of linear systems. It is implemented using modern C++ (you will need at least C++14 compliant compiler to build it), with GPU kernels implemented in CUDA and HIP. HAS support for AMG](https://github.com/ginkgo-project/ginkgo)
+ [GPGPU acceleration - a case study of algebraic multigrid preconditioned GMRES](https://pure.tue.nl/ws/portalfiles/portal/142433633/Master_Thesis_Report_Lucas_Bekker_final_.pdf)
+ [BootCMatchG](https://github.com/bootcmatch/BootCMatchG)
+ [multigrid solver for solving elliptic PDEs using finite differences on a rectangular grid](https://github.com/jesserobertson/multigrid)
+ [Multigrid HowTo (Part I): A simple Multigrid solver in
C++ in less than 200 lines of code](https://www10.cs.fau.de/publications/reports/TechRep_2008-03.pdf)
+ [Multigrid HowTo (Part II): An Open Source Algebraic
Multigrid Solver in C++](https://www10.cs.fau.de/publications/reports/TechRep_2009-02.pdf)
+ [Multigrid solver prototype (GMG) and simple Lid Cavity solver](https://discourse.julialang.org/t/multigrid-solver-prototype-gmg-and-simple-lid-cavity-solver/41969)
+ [ExaStencils: Advanced Multigrid Solver Generation](https://link.springer.com/chapter/10.1007/978-3-030-47956-5_14)
+ [EvoStencils - Constructing efficient multigrid solvers through evolutionary computation](https://github.com/jonas-schmitt/evostencils)
+ Sparse Linear System Solvers on GPUs
+ [SPARSE LINEAR SYSTEM SOLVERS ON GPUS: PARALLEL PRECONDITIONING, WORKLOAD BALANCING, AND COMMUNICATION REDUCTION](https://www.tdx.cat/bitstream/handle/10803/667096/2019_Tesis_Flegar_Goran.pdf)
+ [High performance sparse multifrontal solvers on modern GPUs](https://www.sciencedirect.com/science/article/abs/pii/S0167819122000059)
+ [STRUMPACK -- STRUctured Matrix PACKage, Copyright (c) 2014-2021](https://github.com/pghysels/strumpack)
+ [Как SpaceX использует GPU для обсчёта ракетных двигателей](http://habrahabr.ru/post/256081/)
+ [Rockets Shake And Rattle, So SpaceX Rolls Homegrown CFD](http://www.nextplatform.com/2015/03/27/rockets-shake-and-rattle-so-spacex-rolls-homegrown-cfd/)
+ [Modern C++ Parallel Task Programming](https://github.com/cpp-taskflow/cpp-taskflow)
+ [docs for Modern C++ Parallel Task Programming](https://cpp-taskflow.github.io/cpp-taskflow/index.html)
+ [Freud, a tool to create Performance Annotations for C/C++ programs](https://github.com/usi-systems/freud)
+ [Eyal Rozenberg, Ph.D.](https://eyalroz.github.io/)
+ [Eyal Rozenberg](https://github.com/eyalroz)
+ [Thin C++-flavored wrappers for the CUDA APIs: Runtime, Driver, NVRTC and NVTX](https://github.com/eyalroz/cuda-api-wrappers)
+ [GPU Kernel Runner](https://github.com/eyalroz/gpu-kernel-runner)
+ [RAPIDS - Open GPU Data Science](https://github.com/rapidsai)
+ [RAFT: Reusable Accelerated Functions and Tools](https://github.com/rapidsai/raft)
+ [cuDF - GPU DataFrames](https://github.com/rapidsai/cudf)
tl;dr
```sh
cd cpp && mkdir -p build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DOPENSSL_INCLUDE_DIR=/usr/include/openssl -DOPENSSL_CRYPTO_LIBRARY=/usr/lib/libcrypto.so -DOPENSSL_SSL_LIBRARY=/usr/lib/libssl.so
```
+ [cuSpatial - GPU-Accelerated Spatial and Trajectory Data Management and Analytics Library](https://github.com/rapidsai/cuspatial)
+ CUDA rehab & NVidia docs
+ [Documentation of NVIDIA chip/hardware interfaces](https://github.com/NVIDIA/open-gpu-doc)
+ [CS344 : CUDA Programming in C](https://classroom.udacity.com/courses/cs344)
+ [UD281 : High Performance Computing](https://classroom.udacity.com/courses/ud281)
+ [Parallel Computer Architecture and Programming (CMU 15-418/618)](http://15418.courses.cs.cmu.edu/spring2016/)
+ [Parallel Computer Architecture and Programming (CMU 15-418/618)](https://github.com/cmu15418)
+ [CMU 15418 Assignment 1: Analyzing Program Performance on a Multi-Core CPU](https://github.com/cmu15418/assignment1)
+ [Assignment 1: Analyzing Program Performance on a Multi-Core CPU](http://15418.courses.cs.cmu.edu/spring2016/article/3)
+ [Assignment 2: A Simple CUDA Renderer](http://15418.courses.cs.cmu.edu/spring2017/article/4)
+ [Course on CUDA Programming on NVIDIA GPUs, July 22-26, 2019](https://people.maths.ox.ac.uk/gilesm/cuda/)
+ [Lecture 3: control flow and synchronisation: Warp divergence](https://people.maths.ox.ac.uk/gilesm/cuda/lecs/lec3-2x2.pdf)
+ [Is branch divergence really so bad?](https://stackoverflow.com/questions/17223640/is-branch-divergence-really-so-bad)
+ [Lecture 5: libraries and tools](https://people.maths.ox.ac.uk/gilesm/cuda/lecs/lec5.pdf)
+ [Maximizing Unified Memory Performance in CUDA](https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/)
+ [CUDA OPTIMIZATION TIPS, TRICKS AND TECHNIQUES Stephen Jones, GTC 2017](http://on-demand.gputechconf.com/gtc/2017/presentation/s7122-stephen-jones-cuda-optimization-tips-tricks-and-techniques.pdf)
+ [HIGH THROUGHPUT WITH GPUS](https://indico.cern.ch/event/764011/contributions/3214768/attachments/1755004/2845106/RAPID_workshop_20181119.pdf)
+ [Small tips of optimizing CUDA programs](https://nanxiao.me/en/small-tips-of-optimizing-cuda-programs/)
+ [Error using __ldg in cuda kernel at compile time](https://stackoverflow.com/questions/24069524/error-using-ldg-in-cuda-kernel-at-compile-time)
tl;dr
```sh
nvcc -arch=sm_35 ...
```
+ [Open-Arch-Group](https://github.com/Open-Arch-Group)
+ [Matrix Multiplication (MMul) Benchmarks](https://github.com/Open-Arch-Group/mmul)
+ [Performance engineer that's always happy to answer questions!](https://github.com/CoffeeBeforeArch)
+ [GPGPU Programming with CUDA](https://github.com/CoffeeBeforeArch/cuda_programming)
+ [From Scratch: Histograms in CUDA using Atomics](https://www.youtube.com/watch?v=DaEmuL0PYxc)
+ [Parallel Programming in Modern C++](https://github.com/CoffeeBeforeArch/parallel_programming)
+ [This program shows off the basics of stop tokens in C++20](https://github.com/CoffeeBeforeArch/parallel_programming/blob/master/basics/jthread/stop_token.cpp)
+ [Matrix multiplication in cuSparse (cusparseDcsrgemm) outputs wrong results](https://stackoverflow.com/questions/57385060/matrix-multiplication-in-cusparse-cusparsedcsrgemm-outputs-wrong-results)
+ [C++ (Cpp) cusparseDcsrgemm примеры использования](https://cpp.hotexamples.com/ru/examples/-/-/cusparseDcsrgemm/cpp-cusparsedcsrgemm-function-examples.html)
+ [Problem of two large sparse matrices multiplication in cuParse](https://forums.developer.nvidia.com/t/problem-of-two-large-sparse-matrices-multiplication-in-cuparse/33316/4)
+ [spgemm_example.c](https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/spgemm/spgemm_example.c)
+ [CusparseManager.cu](https://github.com/sintefmath/equelle/blob/master/backends/cuda/src/CusparseManager.cu)
+ [how to cast thrust::device_vector<int> to raw pointer](https://stackoverflow.com/questions/11113485/how-to-cast-thrustdevice-vectorint-to-raw-pointer)
+ [Параллельные вычисления с использованием стандартов MPI, OpenMP, OpenACC](https://www.youtube.com/playlist?list=PL-_cKNuVAYAWPC1WfK7_6v-gFOm4i7RKy)
+ Memory Model
+ [C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?](https://stackoverflow.com/questions/6319146/c11-introduced-a-standardized-memory-model-what-does-it-mean-and-how-is-it-g?rq=1)
+ [A Primer on Memory Consistency and Cache Coherence](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.225.9278&rep=rep1&type=pdf)
+ [LPC2018 - Open Source GPU compute stack - Not dancing the CUDA dance](https://www.youtube.com/watch?v=d94N2Lu4x9s)
+ OpenCL
+ [OpenCL 3.0 Specification Released With New Khronos Open-Source OpenCL SDK](https://www.phoronix.com/scan.php?page=news_item&px=OpenCL-3.0-Released-SDK)
+ [The State of OpenCL for Scientific Computing in 2018](https://mathema.tician.de/the-state-of-opencl-for-scientific-computing-in-2018/)
+ [OpenCL: History & Future](http://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/opencl/opencl-10-history-future.pdf?__blob=publicationFile)
+ [Tuned OpenCL BLAS](https://github.com/CNugteren/CLBlast)
+ [CLBlast:ATunedBLASLibrary forFasterDeepLearning](https://cnugteren.github.io/downloads/CLBlast_GTC.pdf)
+ [OpenCL vloadn](https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/vloadn.html)
+ [Could not find a package configuration file provided by "OpenCLHeaders"](https://github.com/KhronosGroup/OpenCL-CLHPP/issues/173)
+ [Using OpenCL on Adreno & Mali GPUs is slower than CPU](https://github.com/ggerganov/llama.cpp/issues/5965)
+ [Zero copy buffer allocation on arm mali midgard gpus?](https://stackoverflow.com/questions/58481560/zero-copy-buffer-allocation-on-arm-mali-midgard-gpus)
+ SYCL - C++ Single-source Heterogeneous Programming for OpenCL
+ [Khronos SYCL](https://www.khronos.org/sycl/)
+ [An open-source implementation of OpenCL SYCL from Khronos Group](https://github.com/triSYCL/triSYCL)
+ [codeplaysoftware](https://github.com/codeplaysoftware)
+ [SYCL BLAS](https://github.com/codeplaysoftware/sycl-blas)
+ [SYCL DNN](https://github.com/codeplaysoftware/SYCL-DNN)
+ [SYCL VisionCpp](https://github.com/codeplaysoftware/visioncpp)
+ [Implementation of the SYCL specification.](https://github.com/ProGTX/sycl-gtx)
+ [Building a brain with SYCL and modern C++](https://www.semanticscholar.org/paper/Building-a-brain-with-SYCL-and-modern-C%2B%2B-Smithe-Potter/01cd48cda17008640076323b8ea10ac59a8b6509)
+ OneAPI
+ [Run simple DPC++ application](https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#run-simple-dpc-application)
+ [oneAPI Direct Programming](https://github.com/zjin-lcf/oneAPI-DirectProgramming)
+ [Port a CUDA App to oneAPI and DPC++ in 5 Minutes](https://www.codeproject.com/Articles/5284841/Port-a-CUDA-App-to-oneAPI-and-DPCplusplus-in-5-Min)
+ [How to run dpc++ code on Intel HD Graphic atop Nvidia GPU](https://community.intel.com/t5/Intel-oneAPI-Data-Parallel-C/How-to-run-dpc-code-on-Intel-HD-Graphic-atop-Nvidia-GPU/m-p/1182497#M374)
+ Kompute
+ [The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)](https://kompute.cc/)
+ [Kompute github repo](https://github.com/KomputeProject/kompute)
+ HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
+ [Why did AMD open source ROCm’s OpenCL driver-stack?](https://streamhpc.com/blog/2017-05-21/amd-open-sourced-rocms-opencl-driver-stack/)
+ [wiki for HCC](https://github.com/RadeonOpenCompute/hcc/wiki)
+ [github HCC repository](https://github.com/RadeonOpenCompute/hcc)
+ [Portable Computing Language](http://portablecl.org/)
+ [A collection of Arch Linux PKGBUILDS for the ROCm platform](https://github.com/rocm-arch/rocm-arch)
tl;dr
```sh
yay -S rocm-opencl-runtime
```
+ [aur package rocm-opencl-runtime](https://aur.archlinux.org/packages/rocm-opencl-runtime/)
+ [Arch GPGPU](https://wiki.archlinux.org/index.php/GPGPU)
+ [Arch ROCm](https://wiki.archlinux.org/index.php/GPGPU#ROCm)
+ [ROCm for Arch Linux](https://github.com/rocm-arch/rocm-arch)
+ [rocm OpenCL Programming Guide](https://rocmdocs.amd.com/en/latest/Programming_Guides/Opencl-programming-guide.html#amd-rocm-implementation)
+ [clinfo ERROR: clBuildProgram(-11)](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/110)
+ [rock-dkms kernel vs mainline clarification](https://github.com/RadeonOpenCompute/ROCm/issues/816)
+ [Error during installation of rock-dkms 4.0 on 5.4 kernel](https://github.com/RadeonOpenCompute/ROCm/issues/1367)
+ [dkms build on unsported kernel and supported which makes errors](https://github.com/RadeonOpenCompute/ROCm/issues/1311)
+ [ROCm support in upstream Linux kernels](https://github.com/RadeonOpenCompute/ROCm#rocm-support-in-upstream-linux-kernels)
+ [Information for rock-dkms](https://repology.org/project/rock-dkms/information)
+ [Radeon ROCm 4.1 Released - Still Without RDNA GPU Support](https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1246716-radeon-rocm-4-1-released-still-without-rdna-gpu-support/page5)
+ [ROCm 4.1 - Vega 20 (Radeon VII) with upstream amdgpu](https://githubmemory.com/@FilipVaverka)
+ [AMD dkms fails](https://bbs.archlinux.org/viewtopic.php?id=258940)
```sh
dkms install --no-depmod -m amdgpu-4.0 -v 23 -k 5.11.16-arch1-1
Error! Bad return status for module build on kernel: 5.11.16-arch1-1 (x86_64)
Consult /var/lib/dkms/amdgpu-4.0/23/build/make.log for more information.
==> Warning, `dkms install --no-depmod -m amdgpu-4.0 -v 23 -k 5.11.16-arch1-1' returned 10
pacman -Qo /usr/src/amdgpu-4.0-23
/usr/src/amdgpu-4.0-23/ принадлежит rock-dkms-bin 4.0-3
/usr/src/amdgpu-4.0-23/ принадлежит rock-dkms-firmware-bin 4.0-3
```
+ [Radeon Instinct like : Radeon VII](https://www.ixbt.com/3dv/amd-radeon-vii-review.html)
+ [RTX 2080 vs. Radeon VII vs. 5700 XT: Rendering and Compute Performance](https://www.extremetech.com/computing/297167-rtx-2080-vs-radeon-vii-vs-5700-xt-rendering-and-compute-performance)
+ [AMD Radeon VII Review: This Isn’t the 7nm GPU You’re Looking For](https://www.extremetech.com/computing/285286-amd-radeon-vii-review-this-isnt-the-7nm-gpu-youre-looking-for)
+ [Is a used Radeon VII worth it in 2020?](https://www.quora.com/Is-a-used-Radeon-VII-worth-it-in-2020)
+ [AMD Radeon Instinct MI50 1725MHz PCI-E 4.0 16384MB 1000MHz 4096 bit](https://market.yandex.ru/product--videokarta-amd-radeon-instinct-mi50-1725mhz-pci-e-4-0-16384mb-1000mhz-4096-bit/674247125?text=AMD%20Radeon%20VII)
+ [hipSYCL - a SYCL implementation for CPUs and GPUs](https://github.com/illuhad/hipSYCL)
+ [hipSYCL performance](https://githubmemory.com/repo/FilipVaverka/hipSYCL#performance)
+ OpenCL => Vulkan
+ [a prototype implementation of OpenCL 1.2 on top of Vulkan using clspv as the compiler](https://github.com/kpet/clvk)
+ [**clspv** is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders](https://github.com/google/clspv)
+ [How To Set The CPU Affinity Of A Running Process In Linux](https://www.youtube.com/watch?v=9VJRsBmmY-4&feature=youtu.be)
+ OpenMP
+ [Ждали, ждали и дождались! OpenMP 4.0](http://habrahabr.ru/company/intel/blog/204668/)
+ [Parallelization of a prefix sum (Openmp)](https://stackoverflow.com/questions/35821844/parallelization-of-a-prefix-sum-openmp)
+ [Parallel Prefix Sum (Scan) with CUDA](http://www.eecs.umich.edu/courses/eecs570/hw/parprefix.pdf)
+ [Parallel prefixsum algorithm in fastflow](https://github.com/pinkgopher/prefixsum)
+ [GPU prefix scan](https://github.com/mark-poscablo/gpu-prefix-sum/blob/master/scan_standalone/scan.cu)
+ OpenACC
+ [IPMACC is a framework for translating/executing OpenACC for C API to/over CUDA or OpenCL runtime](https://github.com/lashgar/ipmacc)
+ [IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator](http://www.techenablement.com/ipmacc-open-source-openacc-cudaopencl-translator/)
+ MATOG - GPU Access Auto Tuning
+ [MATOG Auto-Tuning on GPUs is a tool to automatically optimize performance of NVIDIA CUDA code](https://www.gcc.tu-darmstadt.de/home/proj/matog/)
+ [MATOG preprint](https://tuprints.ulb.tu-darmstadt.de/6507/)
+ [MATOG: CUDA Array Access Auto-Tuner](https://github.com/mergian/matog)
+ [OCCA (Open Concurrent Compute Abstraction)](http://libocca.org/)
+ [github repository for OCCA](https://github.com/libocca/occa)
+ [LCSE - Linked Cluster Series Expansions - a framework for high-temperature series expansions](http://comp-phys.org/lcse/)
+ [VLI is a llibrary for high but fixed (128 to 512-bit) arithmetic and symbolic polinomials computations](http://comp-phys.org/vli/)
+ [Series Expansion Methods for Quantum Lattice Models](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/123831/eth-50186-02.pdf)
+ Apache Arrow
+ [Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.](https://github.com/apache/arrow)
+ Sandia
+ [Trilinos is a collection of open-source software libraries, called packages, intended to be used as building blocks for the development of scientific applications.](https://en.wikipedia.org/wiki/Trilinos)
+ [github repo fo Trilinos](https://github.com/trilinos/Trilinos)
tl;dr
```
$ yay -s trilinos
3 aur/trilinos 12.14.1-2 (+0 0.00%)
algorithms for the solution of large-scale scientific problems
2 aur/mingw-w64-trilinos 12.12.1-1 (+0 0.00%)
Framework for the solution of large-scale, complex multi-physics engineering and scientific problems (mingw-w64)
1 aur/trilinos-git 12.12.0.gd3b096f4f1-1 (+1 0.00%) (Out-of-date 2019-06-21)
An effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems.
```
+ [Add option to turn off the install of gtest header and lib even if Gtest package is enabled](https://github.com/trilinos/Trilinos/issues/5341)
+ ARM
+ [The ARM Computer Vision and Machine Learning library](https://github.com/ARM-software/ComputeLibrary)
+ [HPCG for Arm](https://github.com/ARM-software/HPCG_for_Arm)
+ [Parallelizing HPCG's main kernels](https://community.arm.com/developer/tools-software/hpc/b/hpc-blog/posts/parallelizing-hpcg)
+ ARM Neon
+ [Coding for ARM NEON: How to start?](https://stackoverflow.com/questions/28547697/coding-for-arm-neon-how-to-start)
+ [SIMD Assembly Tutorial:ARM NEON](https://people.xiph.org/~tterribe/daala/neon_tutorial.pdf)
+ [ARM NEON скининг](https://habr.com/en/post/153015/)
+ CPU, GPU & DRAM Architecture Simulators
+ [GPGPU-Sim](http://www.gpgpu-sim.org/)
+ [Integrated gem5 + GPGPU-Sim Simulator](http://cpu-gpu-sim.ece.wisc.edu/)
+ [Getting gem5](http://www.m5sim.org/Download)
+ [SimpleScalar LLC](http://www.simplescalar.com/)
+ [SimpleScalar LLC Intro](http://www.ecs.umass.edu/ece/koren/architecture/Simplescalar/SimpleScalar_introduction.htm)
+ [Todd Austin : the author](http://web.eecs.umich.edu/~taustin/)
+ [DRAMSim2](http://www.eng.umd.edu/~blj/dramsim/)
+ [github repos for DRAMSim2 etc. from University of Maryland](https://github.com/dramninjasUMD)
+ [Write-back vs Write-Through](https://stackoverflow.com/questions/27087912/write-back-vs-write-through)
+ [Study of Different Cache Line Replacement Algorithms in Embedded Systems](https://people.kth.se/~ingo/MasterThesis/ThesisDamienGille2007.pdf)
+ [Chisel: Constructing Hardware in a Scala Embedded Language](https://chisel.eecs.berkeley.edu/)
+ [UC Berkeley Architecture Research](https://github.com/ucb-bar)
+ [The RISC-V Instruction Set Architecture](http://riscv.org)
+ [Rocket Chip Generator](http://riscv.org/download.html#tab_rocket)
+ [Rocket Microarchitectural Implementation of RISC-V ISA](https://github.com/ucb-bar/rocket)
+ [Rocket uncore: L2 cache, etc.](https://github.com/ucb-bar/uncore)
# CUDA and friends related surveys, papers
+ [A Survey of CPU-GPU Heterogeneous Computing Techniques](https://www.academia.edu/12355899/A_Survey_of_CPU-GPU_Heterogeneous_Computing_Techniques)
+ [Гибридная реализация алгоритма MST с использованием CPU и GPU](http://habrahabr.ru/post/253031/)
+ [Понимание конфликтов банков разделяемой (shared) памяти в NVIDIA CUDA](http://habrahabr.ru/post/100363/)
+ [Vulkan: The next Khronos graphics API… that is not OpenGL](http://anki3d.org/vulkan-the-next-khronos-graphics-api-that-is-not-opengl/)
+ [AMD supported project: HIP : Convert CUDA to Portable C++ Code](https://github.com/ROCm-Developer-Tools/HIP)
+ [Examples for HIP](https://github.com/ROCm-Developer-Tools/HIP-Examples)
# DSLs targeting GPU
+ [CARP: Correct and Efficient Accelerator Programming](http://carp.doc.ic.ac.uk/external/news.php)
+ [CARP dessimination](http://carp.doc.ic.ac.uk/external/dissemination.php)
+ [A taste of CARP: benchmark analysis, language design and kernel verification](http://www.cs.bris.ac.uk/Research/Micro/UKMAC2012/UKMAC12_Kravets_ARM.pdf)
+ PENCIL: a C99-based intermediate language for compute & optimization
+ [PENCIL summary in one slide: poster](http://carp.doc.ic.ac.uk/external/publications/posters/HiPEAC2013.pdf)
+ [PENCIL: A Platform-Neutral Language for Accelerator Programming](http://www.many-core.group.cam.ac.uk/ukmac2014/UKMAC2014_04_Grevendonk.pdf)
+ [PENCIL support in pet and PPCG](http://www.researchgate.net/profile/Sven_Verdoolaege/publication/273911354_PENCIL_support_in_pet_and_PPCG/links/551031d20cf27d62b913cc0b.pdf)
+ see also PPCG (below)
+ [Framework for performance-portable parallel computations on unstructured meshes](https://github.com/OP2/PyOP2)
+ [OP2: Developing an open-source framework for the execution of unstructured grid applications](http://www.oerc.ox.ac.uk/projects/op2)
+ [Optimising Unstructured Mesh Computational Fluid Dynamics Applications on Multicores via Machine Learning and Code Transformation](http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/r.rusitoru.pdf)
+ [Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs](https://www.oerc.ox.ac.uk/sites/default/files/uploads/profile-pages/Gihan/op2-lcpc.pdf)
+ [Copperhead Data Parallel Python](https://copperhead.github.io/)
+ [github CU copperhead](https://github.com/copperhead)
+ [Delite](https://github.com/stanford-ppl/Delite)
+ [Scalan](https://github.com/scalan)
+ [Scalan Community Edition](https://github.com/scalan/scalan-ce)
+ [Generating Performance Portable Code using Rewrite Rules: From High-level Functional Expressions to High-Performance OpenCL Code](http://homepages.inf.ed.ac.uk/slindley/papers/array-gpu-draft-february2015.pdf)
+ [Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems, Fykse, Egil](http://brage.bibsys.no/xmlui/handle/11250/256108)
+ ROSE compiler + Mint for C-to-CUDA code generation
+ [ROSE compiler github](https://github.com/rose-compiler)
+ MINT
+ [ROSE project MINT](https://github.com/rose-compiler/rose/tree/master/projects/mint)
+ [MINT google project](https://sites.google.com/site/mintmodel/)
+ [Mint: Realizing CUDA performance in 3D Stencil Methods with Annotated C: claims 78% of handwritten CUDA performance](http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2011/mint-unat-ics11.pdf)
+ [MINT PhD thesis](http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2012/DidemUnat_thesis.pdf)
+ Nested Data Parallelism, Haskell, and friends
+ [Nested Data Parallelism on GPU](http://people.cs.uchicago.edu/~jhr/papers/2012/icfp-gpu.pdf)
+ [Compiling a high-level language for GPUs: (via language support for architectures and compilers)](http://hgpu.org/?p=7809)
+ [NOVA: A Functional Language for Data Parallelism](https://research.nvidia.com/sites/default/files/publications/nvr-2013-002_0.pdf)
+ [CuNesl: Compiling Nested Data-Parallel Languages for ... ](http://moss.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/icpp12.pdf)
+ [A Haskell EDSL for Nested Data-parallel Design-space ... ](http://www.cse.chalmers.se/edu/course/pfp/exploration-draft-Obsidian.pdf)
+ [Functional programming for nested data parallelism on GPUs](https://wiki.aalto.fi/download/attachments/70779066/T-106.5840_2012_Halme.pdf?version=1&modificationDate=1357205607000)
+ [Platform-Specific Optimization and Mapping of Stencil Codes through Refinement](https://graphics.cg.uni-saarland.de/2014/platform-specific-optimization-and-mapping-of-stencil-codes-through-refinement/)
+ [High-Performance Domain-Specific Languages for GPU Computing](https://anydsl.github.io/images/anydsl.pdf)
+ [Monoids and their efficiency in practice](http://myhaskelljournal.com/monoids-and-their-efficiency-in-practice/)
+ CUDA kernels generation using C++ expression templates technique
+ CU++ -- an interesting approach
+ [CU++, An Object Oriented Tool for CFD Applications: GTC 2012](http://on-demand.gputechconf.com/gtc/2012/presentations/S0264-CU++-An-Object-Oriented-Framework-for-CFD-CFD-Apps.pdf)
+ [CU++(ET) / UGC- CUDA With C++ Expression Templates with the Unified GPU-CPU Compiler](http://w3.uwyo.edu/~dchandar/CU++.html)
+ [A Hybrid Multi-GPU/CPU Computational Framework](http://scientific-sims.com/cfdlab/Dimitri_Mavriplis/HOME/NEW_PAPERS/Chandar.2013-2855.pdf)
+ VexCL is a C++ vector expression template library for OpenCL/CUDA
+ [VexCL is a C++ vector expression template library for OpenCL/CUDA](https://github.com/ddemidov/vexcl)
+ [Generating OpenCL/CUDA source code from C++ expressions in VexCL](https://isocpp.org/blog/2015/01/generating-opencl-cuda-source-code-from-c-expressions-in-vexcl)
+ AnyDSL - A Framework for Rapid Development of Domain-Specific Libraries; thorin (The Higher-ORder INtermediate representation) / impala (An imperative and functional programming language)
+ [A Framework for Rapid Development of Domain-Specific Libraries](http://anydsl.github.io/)
+ [AnyDSL Build Instructions](https://github.com/AnyDSL/anydsl/wiki/Build-Instructions)
+ [Shallow Embedding of DSLs via Online Partial Evaluation.(Best Paper Award)](http://compilers.cs.uni-saarland.de/papers/gpce15.pdf)
+ [thorin - The Higher-ORder INtermediate representation](https://github.com/AnyDSL/thorin)
+ [impala - An imperative and functional programming language](https://github.com/AnyDSL/impala)
+ [A DSL for Stencil Codes](https://github.com/AnyDSL/stincilla)
+ [AnyDSL ports from http://benchmarksgame.alioth.debian.org](https://github.com/AnyDSL/benchmarks-impala)
# parallelforall
+ [An Efficient Matrix Transpose in CUDA C/C++](http://devblogs.nvidia.com/parallelforall/efficient-matrix-transpose-cuda-cc/)
+ [BIDMach: Machine Learning at the Limit with GPUs](http://devblogs.nvidia.com/parallelforall/bidmach-machine-learning-limit-gpus/)
+ [High-Performance Geometric Multi-Grid with GPU Acceleration](https://devblogs.nvidia.com/parallelforall/high-performance-geometric-multi-grid-gpu-acceleration/)
+ [Inside Pascal: NVIDIA’s Newest Computing Platform](https://devblogs.nvidia.com/parallelforall/inside-pascal/)
+ [GPU Programming in Functional Languages](http://www.cse.chalmers.se/~joels/writing/GPUFL.pdf)
+ [HIP : Convert CUDA to Portable C++ Code](https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP)
# Pencil computations
+ [Ускоряем трафаретные вычисления: сборка и запуск YASK на процессорах Intel](https://habrahabr.ru/company/intel/blog/305128/)
+ [flexible package manager that supports multiple versions, configurations, platforms, and compilers. https://spack.io](https://github.com/LLNL/spack)
+ [Tutorial: Spack 101](https://spack.readthedocs.io/en/latest/tutorial_sc16.html)
+ [NASA: High Performance Fast Computing Challenge](https://hn.svelte.technology/item/14265751)
+ [Why Rust fails hard at scientific computing](https://www.reddit.com/r/rust/comments/76olo3/why_rust_fails_hard_at_scientific_computing/)
+ [Why Rust fails hard at scientific computing](https://internals.rust-lang.org/t/why-rust-fails-hard-at-scientific-computing/6065)
+ [technicalities: interactive scientific computing #2 of 2, goldilocks languages](https://graydon2.dreamwidth.org/189377.html)
# Nim links
+ [Laser - Primitives for high performance computing](https://github.com/numforge/laser)
+ [NimTorch](https://github.com/fragcolor-xyz/nimtorch)
+ [A matrix library https://unicredit.github.io/neo/](https://github.com/unicredit/neo)
+ [A fast, ergonomic and portable tensor library with a deep learning focus](https://github.com/mratsim/Arraymancer)
+ [high performance tensor library in Nim](https://andre-ratsimbazafy.com/high-performance-tensor-library-in-nim/#how-controlling-overhead)
+ [Arraymancer - A n-dimensional tensor (ndarray) library](https://mratsim.github.io/Arraymancer/)
+ [A curated list of awesome Nim frameworks, libraries and software](https://github.com/VPashkov/awesome-nim)
+ [Find the nim package](http://nimism.co/)
+ [Meta Nim Are we scientists yet?](https://github.com/nim-lang/needed-libraries/issues/77)
+ [Quantum EXpressions lattice field theory framework](https://github.com/jcosborn/qex)
+ [QEX: a framework for lattice field theories](https://arxiv.org/abs/1612.02750)
+ tl;dr
```sh
nimble refresh
nimble install neo
nimble install Arraymancer
```
+ [Why is nim and nimble in official repo so outdated?](https://amp.reddit.com/r/archlinux/comments/cdv3xu/why_is_nim_and_nimble_in_official_repo_so_outdated/)
+ [parallel-computing resources list](https://github.zhrichard.me/topics/parallel-computing)
+ [Portable Hardware Locality (hwloc)](https://www.open-mpi.org/projects/hwloc/)
+ [Overview of the Efficient Programming Languages (v.3) 2018.4](https://sdevprog.blogspot.com/2018/04/overview-of-efficient-programming.html?m=1)
+ Intel Level Zero
+ [oneAPI Level Zero](https://github.com/oneapi-src/level-zero)
+ [Code Generation for High Performance PDE Solvers on Modern Architectures](https://archiv.ub.uni-heidelberg.de/volltextserver/27360/)
+ [PhD Thesis Software Stack](https://github.com/dokempf/dkempf-phd-software)
+ [Loopy: Transformation-Based Generation of High-Performance CPU/GPU Code](https://github.com/inducer/loopy)
+ [HyperHDG - a C++ based library implementing hybrid discontinuous Galerkin methods on extremely general domains ](https://github.com/HyperHDG/HyperHDG)
+ GPU roof model
+ [Elias Konstantinidis publications](http://users.uoa.gr/~ekondis/publications/)
+ [A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling](https://www.sciencedirect.com/science/article/pii/S0743731517301247)
+ [mixbench - The purpose of this benchmark tool is to evaluate performance bounds of GPUs on mixed operational intensity kernels](https://github.com/ekondis/mixbench)
+ [Analysis-Driven Optimization: Preparing for Analysis with NVIDIA Nsight Compute, Part 1](https://developer.nvidia.com/blog/analysis-driven-optimization-preparing-for-analysis-with-nvidia-nsight-compute-part-1/)
+ [GPU Performance Analysis](https://vimeo.com/454873041)
+ [Roofline and NVIDIA Ampere GPU Architecture Analysis](https://www.youtube.com/watch?v=VtkxhygfNsY)
+ [Nsight Compute Feature Spotlight: Roofline Analysis, Asynchronous Copy, Sparse Data Compression](https://www.youtube.com/watch?v=DnwZ6ZTLw50)
+ [Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems](https://www.youtube.com/watch?v=kTKk05yzuzo&list=UUBHcMCGaiJhv-ESTcWGJPcw)
+ [Roofline Hackathon 2020 part 1 and 2](https://www.youtube.com/watch?v=Hy48J0Ivz18)
+ YouTube videos on GPU embedded profiling/optimization
+ [Presentation: Mali Graphics Debugger (GDC 2014)](https://www.youtube.com/watch?v=yv-V9Bl9pO4)
+ [GPU Compute Optimisation with Hardware Counters](https://www.youtube.com/watch?v=93cWfkyid7k)
+ [ARM Mali GPU Architecture Overview](https://www.youtube.com/watch?v=mo5zVbCg12I)
+ [AMD Radeon and NVIDIA GeForce FP32/FP64 GFLOPS Table](https://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/)
+ [RICOS Co. Ltd. Research Institute for Computational Science Co.Ltd.](https://github.com/ricosjp)
+ [Load-link/store-conditional](https://en.wikipedia.org/wiki/Load-link/store-conditional)
[← Back to docs](README.md)
title: 'LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification'
title: Ruby 2.7 changes