Building Jetson Nano’s PyTorch, OpenCV libraries on the host PC.

In this article, we built the PyTorch library on the Jetson Nano, and in this article, we built the OpenCV library on the Jetson Nano. All of these builds were performed on the Jetson Nano, but this article explains how to build the PyTorch and OpenCV libraries on the host PC running aarch64 binary, which is the architecture of the Jetson Nano, using the method described in this article.

docker
PyTorch
OpenCV
Performance
Reference

docker

Docker is used for the build. By using Docker, you can setup a shared library environment required by aarch64 binaries. The following steps are required to add a login user to the docker group so that you can run docker commands without sudo.

sudo groupadd docker
sudo gpasswd -a $USER dockerCode language: PHP (php)

Restart the system to reflect the addition to the docker group.

PyTorch

Below is a Dockerfile description.

FROM nvcr.io/nvidia/l4t-base:r32.7.1
ENV DEBIAN_FRONTEND=noninteractive
ARG MAX_JOBS=2Code language: Dockerfile (dockerfile)

ARG MAX_JOBS allows to pass it from outside during docker build as --build-arg MAX_JOBS=$(nproc), etc.

# https://qengineering.eu/install-pytorch-on-jetson-nano.html
RUN apt-get update && apt-get install -y \
      python3.8 python3.8-dev \
      ninja-build git cmake clang \
      libopenmpi-dev libomp-dev ccache \
      libopenblas-dev libblas-dev libeigen3-dev \
      python3-pip libjpeg-dev \
      gnupg2 curl

RUN apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc
RUN echo 'deb https://repo.download.nvidia.com/jetson/common r32.7 main\n\
deb https://repo.download.nvidia.com/jetson/t210 r32.7 main' > /etc/apt/sources.list.d/nvidia-l4t-apt-source.list

RUN apt-get update && apt-get install -y nvidia-cuda nvidia-cudnn8
RUN python3.8 -m pip install -U pip
RUN python3.8 -m pip install -U setuptools
RUN python3.8 -m pip install -U wheel mock pillow
RUN python3.8 -m pip install scikit-build
RUN python3.8 -m pip install cython PillowCode language: Dockerfile (dockerfile)

repo.download.nvidia.com is added to the apt repository, and then nvidia-cuda and nvidia-cudnn8 are installed.

The following is almost the same as described in this article.

## download PyTorch v1.11.0 with all its libraries
RUN git clone -b v1.11.0 --depth 1 --recursive --recurse-submodules --shallow-submodules https://github.com/pytorch/pytorch.git
WORKDIR pytorch
RUN python3.8 -m pip install -r requirements.txt
COPY pytorch-1.11-jetson.patch .
RUN patch -p1 < pytorch-1.11-jetson.patch

RUN apt-get install -y software-properties-common lsb-release
RUN wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
RUN apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
RUN apt-get update && apt-get install -y cmake

ENV BUILD_CAFFE2_OPS=OFF
ENV USE_FBGEMM=OFF
ENV USE_FAKELOWP=OFF
ENV BUILD_TEST=OFF
ENV USE_MKLDNN=OFF
ENV USE_NNPACK=OFF
ENV USE_XNNPACK=OFF
ENV USE_QNNPACK=OFF
ENV USE_PYTORCH_QNNPACK=OFF
ENV USE_CUDA=ON
ENV USE_CUDNN=ON
ENV TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"
ENV USE_NCCL=OFF
ENV USE_SYSTEM_NCCL=OFF
ENV USE_OPENCV=OFF
ENV MAX_JOBS=$MAX_JOBS
# set path to ccache
ENV PATH=/usr/lib/ccache:$PATH
# set clang compiler
ENV CC=clang
ENV CXX=clang++
# create symlink to cublas
# ln -s /usr/lib/aarch64-linux-gnu/libcublas.so /usr/local/cuda/lib64/libcublas.so
# start the build
RUN python3.8 setup.py bdist_wheel
RUN find /pytorch/dist/ -type f|xargs python3.8 -m pip install

# torch vision
RUN git clone --depth=1 https://github.com/pytorch/vision torchvision -b v0.12.0
RUN cd torchvision && \
  TORCH_CUDA_ARCH_LIST='5.3;6.2;7.2' \
  FORCE_CUDA=1 \
  python3.8 setup.py bdist_wheelCode language: Dockerfile (dockerfile)

Build the Dockerfile above.

docker build -t pytorch-build . --build-arg MAX_JOBS=$(nproc)Code language: Bash (bash)

When building on the Jetson Nano, MAX_JOBS=2 was specified due to limited memory, but on the host PC this restriction is relaxed.

After the build is complete, retrieve the built binaries.

$ id=$(docker run -it --rm -d pytorch-build bash)
$ pytorch=$(docker exec -it ${id} find /pytorch/dist -type f | sed -e "s/[rn]+//g")
$ vision=$(docker exec -it ${id} find /pytorch/torchvision/dist -type f | sed -e "s/[rn]+//g")
$ docker cp ${id}:${pytorch} .
$ docker cp ${id}:${vision} .
$ docker stop ${id}
$ ls Dockerfile *whl
Dockerfile  torch-1.11.0a0+gitbc2c6ed-cp38-cp38-linux_aarch64.whl  torchvision-0.12.0a0+9b5a3fe-cp38-cp38-linux_aarch64.whlCode language: Bash (bash)

Now we have two on hand: torch-1.11.0a0+gitbc2c6ed-cp38-cp38-linux_aarch64.whl and torchvision-0.12.0a0+9b5a3fe-cp38-cp38-linux_aarch64.whl.

Copy and install these two into the Jetson Nano. The followings are performed on the Jetson Nano.

sudo apt update
sudo apt install -y 
    python3.8 python3.8-dev python3-pip 
    libopenmpi-dev libomp-dev libopenblas-dev libblas-dev libeigen3-dev 
    nvidia-cuda nvidia-cudnn8
python3.8 -m pip install -U pip
python3.8 -m pip install torch-*.whl torchvision-*.whlCode language: Bash (bash)

OpenCV

Below is a Dockerfile description.

FROM nvcr.io/nvidia/l4t-base:r32.6.1
ENV DEBIAN_FRONTEND=noninteractive

ARG VER="4.6.0"
ARG PREFIX=/usr/local
ARG MAX_JOBSCode language: Dockerfile (dockerfile)

This one also allows some variables to be passed with –build-arg at docker build. The followings are almost the same as explained in this article.

#    setup
RUN cd tmp && mkdir build_opencv
WORKDIR /tmp/build_opencv

#    install_dependencies
RUN apt-get update && \
    apt-get install -y \
        build-essential \
        cmake \
        git \
        gfortran \
        libatlas-base-dev \
        libavcodec-dev \
        libavformat-dev \
        libavresample-dev \
        libcanberra-gtk3-module \
        libdc1394-22-dev \
        libeigen3-dev \
        libglew-dev \
        libgstreamer-plugins-base1.0-dev \
        libgstreamer-plugins-good1.0-dev \
        libgstreamer1.0-dev \
        libgtk-3-dev \
        libjpeg-dev \
        libjpeg8-dev \
        libjpeg-turbo8-dev \
        liblapack-dev \
        liblapacke-dev \
        libopenblas-dev \
        libpng-dev \
        libpostproc-dev \
        libswscale-dev \
        libtbb-dev \
        libtbb2 \
        libtesseract-dev \
        libtiff-dev \
        libv4l-dev \
        libxine2-dev \
        libxvidcore-dev \
        libx264-dev \
        pkg-config \
        python3.8-dev \
        python3.8-dev \
        python3-numpy \
        python3-matplotlib \
        python3-pip \
        qv4l2 \
        v4l-utils \
        v4l2ucp \
        zlib1g-dev

#    git_source ${VER}
RUN git clone --depth 1 --branch ${VER} https://github.com/opencv/opencv.git
RUN git clone --depth 1 --branch ${VER} https://github.com/opencv/opencv_contrib.git

RUN python3 -m pip install -U pip
RUN python3 -m pip uninstall -y numpy
RUN python3.8 -m pip install -U pip
RUN python3.8 -m pip install setuptools
RUN python3.8 -m pip install numpy

RUN apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc
RUN echo 'deb https://repo.download.nvidia.com/jetson/common r32.6 main\n\
deb https://repo.download.nvidia.com/jetson/t210 r32.6 main' > /etc/apt/sources.list.d/nvidia-l4t-apt-source.list

RUN apt-get update && apt-get install -y nvidia-cuda nvidia-cudnn8

RUN cd opencv && \
      mkdir build && \
      cd build && \
      cmake \
        -D BUILD_EXAMPLES=OFF \
        -D BUILD_opencv_python2=OFF \
        -D BUILD_opencv_python3=ON \
        -D CMAKE_BUILD_TYPE=RELEASE \
        -D CMAKE_INSTALL_PREFIX=${PREFIX} \
        -D CUDA_ARCH_BIN=5.3,6.2,7.2 \
        -D CUDA_ARCH_PTX= \
        -D CUDA_FAST_MATH=ON \
        -D CUDNN_VERSION='8.0' \
        -D EIGEN_INCLUDE_PATH=/usr/include/eigen3  \
        -D ENABLE_NEON=ON \
        -D OPENCV_DNN_CUDA=ON \
        -D OPENCV_ENABLE_NONFREE=ON \
        -D OPENCV_EXTRA_MODULES_PATH=/tmp/build_opencv/opencv_contrib/modules \
        -D OPENCV_GENERATE_PKGCONFIG=ON \
        -D WITH_CUBLAS=ON \
        -D WITH_CUDA=ON \
        -D WITH_CUDNN=ON \
        -D WITH_GSTREAMER=ON \
        -D WITH_LIBV4L=ON \
        -D WITH_OPENGL=ON \
        -D BUILD_PERF_TESTS=OFF \
        -D BUILD_TESTS=OFF \
        -D PYTHON3_EXECUTABLE=python3.8 \
        -D PYTHON3_INCLUDE_PATH=$(python3.8 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
        -D PYTHON3_PACKAGES_PATH=$(python3.8 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
        -D PYTHON3_LIBRARY=/usr/lib/aarch64-linux-gnu/libpython3.8.so \
        -D CPACK_BINARY_DEB=ON \
        -D CPACK_PACKAGING_INSTALL_PREFIX=${PREFIX} \
        ..
WORKDIR /tmp/build_opencv/opencv/build
RUN make -j${MAX_JOBS}
RUN make install
RUN cpack -G DEBCode language: Dockerfile (dockerfile)

To create the deb package, -D CPACK_BINARY_DEB=ON and -D CPACK_PACKAGING_INSTALL_PREFIX=${PREFIX} are added to the cmake options. Finally, cpack -G DEB creates a deb package.

Build the Dockerfile above.

docker build -t opencv-build . --build-arg MAX_JOBS=$(nproc)Code language: Bash (bash)

After the build is complete, retrieve the built binaries.

$ id=$(docker run -it --rm -d opencv-build bash)
$ debs=$(docker exec -it  ${id} find /tmp/build_opencv/opencv/build/ -maxdepth 1 -name "*.deb" | sed -e "s/[rn]+//g")
$ for deb in $debs; do
  docker cp ${id}:$deb .
done
$ docker stop ${id}
$ ls Dockerfile *deb
Dockerfile  OpenCV-4.6.0-aarch64-dev.deb  OpenCV-4.6.0-aarch64-libs.deb  OpenCV-4.6.0-aarch64-licenses.deb  OpenCV-4.6.0-aarch64-main.deb  OpenCV-4.6.0-aarch64-python.deb  OpenCV-4.6.0-aarch64-scripts.debCode language: Bash (bash)

Copy and install these deb packages into Jetson Nano. The following is executed on the Jetson Nano.

sudo apt update
sudo apt list --installed | grep -i opencv | xargs apt remove -y
sudo apt install -y ./OpenCV*.debCode language: Bash (bash)

Remove the existing opencv package and then install the built OpenCV package.

Performance

We will post the times we ran the Docker builds of Pytorch and OpenCV on the Jetson Nano and the host PC, respectively.

	Jetson Nano	Host PC i9-12900K, 32GB	faster
PyTorch	MAX_JOBS=2 real 832m39.333s user 0m7.352s sys 0m4.268s	MAX_JOBS=24 real 176m48.354s user 0m0.852s sys 0m0.904s	x4.7
OpenCV	MAX_JOBS=4 real 161m34.754s user 0m2.240s sys 0m1.660s	MAX_JOBS=24 real 117m28.657s user 0m0.648s sys 0m0.517s	x1.3

Builds on the host PC are 4.7 times faster on PyTorch and 1.3 times faster on OpenCV. Also, host PC builds do not depend on hardware, so they can be introduced to build servers such as CI/CD.

The above codes are available at https://github.com/otamajakusi/build_jetson_nano_libraries.

That’s all.

Reference

https://github.com/otamajakusi/build_jetson_nano_libraries