Running Starnet2/Starnet++ with GPU on Linux

StarNet (https://www.starnetastro.com/) has become the de-facto star reduction software of choice for astrophotography. Its purpose is to remove stars from the pictures, so you can focus on treating the background that require different settings than the stars. StarNet++ works pretty well on Linux (as a CLI application) but the executable does not support CUDA and GPU acceleration by default, making it a bit slow.

I found this tutorial that explains how to GPU-enable StarNet on Windows: https://www.williamliphotos.com/starnet-cuda. Let’s do the same on Linux. I am using Ubuntu 22.04.1 with an nVidia RTX 2080 SUPER.

First, download the CLI version of Starnet and extract it, if it’s not done yet. create a tf directory where we will link the important files:

[email protected]:~/astro/StarNetv2CLI_linux$ mkdir tf
[email protected]:~/astro/StarNetv2CLI_linux$ cd tf
[email protected]:~/astro/StarNetv2CLI_linux/tf$ ln -s ../libtiff.so.3 ../starnet++ ../starnet2_weights.pb .
[email protected]:~/astro/StarNetv2CLI_linux/tf$

Now you need to install the GPU-enabled version of tensorflow and the CUDA libraries.

Download TensorFlow from https://www.tensorflow.org/install/lang_c?hl=en and choose “Linux GPU support”. In my case, I used this link.

[email protected]:~/astro/StarNetv2CLI_linux/tf$ wget https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-2.9.1.tar.gz
--2022-09-02 18:30:49-- https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-2.9.1.tar.gz
Résolution de storage.googleapis.com (storage.googleapis.com)… 216.58.214.16, 142.250.179.208, 216.58.208.112, …
Connexion à storage.googleapis.com (storage.googleapis.com)|216.58.214.16|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 411473058 (392M) [application/x-tar]
Enregistre : ‘libtensorflow-gpu-linux-x86_64-2.9.1.tar.gz’libtensorflow-gpu-linux-x86_64- 100%[====================================================>] 392,41M 39,8MB/s ds 9,7s
2022-09-02 18:31:00 (40,2 MB/s) - ‘libtensorflow-gpu-linux-x86_64-2.9.1.tar.gz’ enregistré [411473058/411473058]

[email protected]:~/astro/StarNetv2CLI_linux/tf$ tar xzfv libtensorflow-gpu-linux-x86_64-2.9.1.tar.gz
./
./include/
./include/tensorflow/
./include/tensorflow/c/
./include/tensorflow/c/c_api.h
./include/tensorflow/c/c_api_experimental.h
./include/tensorflow/c/c_api_macros.h
./include/tensorflow/c/tensor_interface.h
./include/tensorflow/c/tf_attrtype.h
./include/tensorflow/c/tf_datatype.h
./include/tensorflow/c/tf_file_statistics.h
./include/tensorflow/c/tf_status.h
./include/tensorflow/c/tf_tensor.h
./include/tensorflow/c/tf_tstring.h
./include/tensorflow/core/
./include/tensorflow/core/platform/
./include/tensorflow/core/platform/ctstring.h
./include/tensorflow/core/platform/ctstring_internal.h
./include/tensorflow/compiler/
./include/tensorflow/compiler/tf2tensorrt/
./include/tensorflow/compiler/tf2tensorrt/trt_convert_api.h
./lib/
./lib/libtensorflow.so.2.9.1
./lib/libtensorflow_framework.so.2.9.1
./lib/libtensorflow_framework.so
./lib/libtensorflow_framework.so.2
./lib/libtensorflow.so
./lib/libtensorflow.so.2
./THIRD_PARTY_TF_C_LICENSES
./LICENSE
./include/tensorflow/c/eager/
./include/tensorflow/c/eager/c_api.h
./include/tensorflow/c/eager/c_api_experimental.h
./include/tensorflow/c/eager/dlpack.h
[email protected]:~/astro/StarNetv2CLI_linux/tf$

Now, install CUDA and nvidia drivers. At the time of writing, only the version -510 of the nvidia drivers are compatible with the available CUDA packages on Ubuntu 22.04. If you’re using the latest version of the driver (nvidia-driver-515), installing the nvidia-cudnn package will attempt to delete it.

[email protected]:~/astro/StarNetv2CLI_linux/tf$ sudo apt-get install nvidia-cudnn nvidia-driver-510

If you were running the nvidia drivers 515 or above and the version of your nvidia drivers changed, you need to reboot before tensorflow will work with your GPU, due to a difference of version between the running and installed drivers.

We now need to link the libtensorflow libraries so they’re found by StarNet++:

[email protected]:~/astro/StarNetv2CLI_linux/tf$ ln -sf lib/libtensorflow.so.2 lib/libtensorflow_framework.so.2 .
[email protected]:~/astro/StarNetv2CLI_linux/tf$

It’s now time to test and compare the performances of StarNet in GPU mode:

[email protected]:~/astro/StarNetv2CLI_linux/tf$ time ./starnet++ ../2022-08-20-M31-v1_LRGB.tif
Reading input image… Done!
Bits per sample: 16
Samples per pixel: 3
Height: 3596
Width: 5328
Restoring neural network checkpoint… Done!
2022-09-02 18:38:31.270863: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-02 18:38:31.479773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:31.480021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:31.509921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:31.510177: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:31.510378: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:31.510575: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.092283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.092551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.092757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.092963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.093165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.093390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5954 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5
2022-09-02 18:38:32.093804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 18:38:32.094004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 6646 MB memory: -> device: 1, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:0b:00.0, compute capability: 7.5
Total number of tiles: 315
2022-09-02 18:38:32.184429: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-09-02 18:38:34.100134: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8204
100% finishedDone!
real 0m21,444s
user 0m19,678s
sys 0m1,710s

[email protected]:~/astro/StarNetv2CLI_linux$ time ./starnet++ ./2022-08-20-M31-v1_LRGB.tif
Reading input image… Done!
Bits per sample: 16
Samples per pixel: 3
Height: 3596
Width: 5328
Restoring neural network checkpoint… Done!
2022-09-02 18:39:42.337410: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Total number of tiles: 315
100% finishedDone!
real 0m58,745s
user 13m20,388s
sys 2m13,527s

StarNet++ is now running 3 times faster without hogging the CPU for a whole minute.