RDMA 101 - Buiding virtual setup

Playing with RDMA in virtual setup
Published on April 29, 2017 under the tag storage, rdma, qemu

We are going to explore RDMA and it’s applications in a series of tutorials starting with this one.

The first step is to build virtual environment where we can run our applications. Usually RDMA communication requires special RDMA capable NIC on your server. Many vendors, such as Mellanox, sell this hardware but for our development effort we are going to build a virtual environment based on qemu and softiwarp software.

Virtual environment

Base image

We start with ubuntu 16.04.2 qemu image. It has linux kernel 4.8 installed and a basic development tools

$ sudo apt-get install linux-generic-hwe-16.04 build-essential \
                       automake autoconf libtool

Let’s call this image ubuntu-16.04.2-dev. We are going to create another image based on this image with all the rdma stack installed on it called ubuntu-rdma.qcow2

$ qemu-img create -f qcow2 -b ubuntu-16.04.2-dev ubuntu-rdma.qcow2
Formatting 'ubuntu-rdma.qcow2' ...

Let’s start a guest and install RDMA stack inside of it.

kvm -m 1G \
-netdev user,hostfwd=tcp::5555-:22,id=net0 \
-device virtio-net-pci,netdev=net0 \
-drive if=virtio,file=ubuntu-rdma.qcow2,cache=unsafe

We can ssh to guest through local port 5555 which is redirected to port 22 on guest.

Install RDMA libraries and tools

$ sudo apt-get install libibverbs-dev librdmacm-dev \
                       rdmacm-utils perftest ibverbs-utils

Install SoftiWARP

$ git clone https://github.com/zrlio/softiwarp.git
...
$ pushd softiwarp/kernel
$ make
$ sudo mkdir -p /lib/modules/`uname -r`/kernel/extra
$ sudo cp siw.ko /lib/modules/`uname -r`/kernel/extra
$ sudo depmod -a
$ popd
$ pushd softiwarp/userlib
$ ./autogen.sh && ./configure --prefix= && make && sudo make install

Shutdown

sudo shutdown -h now

Create two guests

Based on the ubuntu-rdma.qcow2 image we just created let’s build two images vm1.qcow2 and vm2.qcow2

$ for i in {1..2}; do qemu-img create -f qcow2 -b ubuntu-rdma.qcow2 vm${i}.qcow2; done
$ ls vm*
vm1.qcow2  vm2.qcow2

Now we are going to run both guest. Each of them will have two network interfaces. The first interface is configured to allow access from the guest to the internet and to allow access from the host to the guest’s ssh service by connecting to local port 5551 (or 5552 for second vm). The second network interface will be used for rdma access we are going to use qemu’s feature to create network using UDP multicast socket.

for i in {1..2}
do
  kvm -name vm${i} -m 1G \
      -netdev user,hostfwd=tcp::555${i}-:22,id=net0 \
      -device virtio-net-pci,netdev=net0 \
      -netdev socket,mcast=230.0.0.1:1234,id=net1 \
      -device virtio-net-pci,mac=52:54:00:12:34:0${i},netdev=net1 \
      -drive if=virtio,file=vm${i}.qcow2,cache=unsafe &
done

Now login to each of the machines and set hostname and ip for the rdma nic:

ssh -p 5551 localhost
vm1 $ sudo bash -c 'echo 127.0.0.1 vm1 >> /etc/hosts'
vm1 $ sudo hostnamectl set-hostname vm1
vm1 $ sudo su
vm1 $ cat << EOF >> /etc/network/interfaces
auto ens4
iface ens4 inet static
  address 10.0.0.1
  netmask 255.255.255.0
EOF
vm1 $ ifup ens4
vm1 $ exit
vm1 $ exit
ssh -p 5552 localhost
vm2 $ sudo bash -c 'echo 127.0.0.1 vm2 >> /etc/hosts'
vm2 $ sudo hostnamectl set-hostname vm2
vm2 $ sudo su
vm2 $ cat << EOF >> /etc/network/interfaces
auto ens4
iface ens4 inet static
  address 10.0.0.2
  netmask 255.255.255.0
EOF
vm2 $ ifup ens4
vm2 $ exit
vm2 $ exit

Make sure that your host firewall is not blocking the udp broadcast over port 1234

$ sudo iptables -A INPUT -p udp --dport 1234 -j ACCEPT

Check tcp connectivity

vm1 $ ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.671 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.776 ms

Check RDMA connectivity

vm1 $ sudo modprobe -a siw rdma_ucm
vm1 $ rping -s -a 10.0.0.1 -v
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10
vm2 $ sudo modprobe -a siw rdma_ucm
vm2 $ rping -c -a 10.0.0.1 -C 2 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs

Check RDMA bandwidth and latency

Bandwidth

vm1 $ ibv_devices
    device          	   node GUID
    ------          	----------------
    siw_ens3        	5254001234560000
    siw_lo          	7369775f6c6f0000
    siw_ens4        	5254001234010000
vm1 $ sudo su
vm1 $ ulimit -l unlimited
vm1 $ ib_write_bw -R -d siw_ens4 -i 1  -D 10 -F
vm2 $ sudo su
vm2 $ ulimit -l unlimited
vm2 $ ib_write_bw -R -d siw_ens4 -i 1 -D 10 -F 10.0.0.1
...
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      3200             0.00               33.33  		   0.000533

The results are really lame but please remember we are using virtual environment so performance is always bad here

Latency

vm1 $ ib_write_lat -R -d siw_ens4 -i 1  -D 10 -F
vm2 $ ib_write_lat -R -d siw_ens4 -i 1 -D 10 -F 10.0.0.1
...
 #bytes        #iterations       t_avg[usec]
 2             4316            695.11

Same lame results due to the nature of our virtual setup and the use of softiwarp instead of real RDMA capable hardware

Summary

Now we have a working setup of two vm’s which can communicate using RDMA. In the next posts we are going to explore more interesting RDMA stuff.