Juniper vMX router installation and troubleshooting in KVM
Below you’ll find a compilation of notes and errors encountered during the installation of the Juniper vMX router in KVM. The vMX image provides a nice way to have a local lab, and it is fairly manageable with vmx script, virsh and maybe a script of your own ;-) It is also a solution aiming at service providers willing to have network function virtualization (NFV), thus this router can support high performance throughput.
First of all, you must be aware that the vMX router is actually made of two parts
- vRE / vCP : virtual routing engine, the control plane
- vPFE / vFP : virtual packet forwarding engine, the data plane
As this virtual router aims to be deployed in production environment you have different modes of installation to gain in performance with, for example, the direct access to the network card (SR-IOV) feature. In our case we will stay with the standard installation and lite mode. The lite mode can be configured inside the VM directly.
Ok, so you basically need KVM and libvirt installed.
Be sure to load the drivers for nested virtualisation (https://wiki.ubuntu.com/kvm) You can also go through this documentation before a production installation https://www.juniper.net/documentation/en_US/vmx/topics/topic-map/vmx-installing-on-kvm.html#id-preparing-the-ubuntu-host-to-install-vmx
$ sudo modprobe kvm-intel
If you run into this error, it means that your Intel-VT / AMD-V virtualization options are disabled in the BIOS or not supported at all by your CPU.
FATAL: Error inserting kvm_intel (/lib/modules/2.6.20-15-generic/kernel/drivers/kvm/kvm-intel.ko): Operation not supported
Typing dmesg you may find the following at the end:-
Then you can retry. To make this settings permanent, you can adjust those configuration files
screw@kvmhost:~/vmx$ sudo vim /etc/sysctl.conf
screw@kvmhost:~/vmx$ sudo vim /etc/default/qemu-kvm
The configuration file is a YAML file which can be broke down into a few parts:
The configuration of the host and the links to the qcow2 images:
---
#Configuration
HOST:
identifier : R2
host-management-interface : ens33
routing-engine-image : "/home/screw/vmx/images/junos-vmx-x86-64-17.2R1.13.qcow2"
routing-engine-hdd : "/home/screw/vmx/images/vmxhdd.img"
forwarding-engine-image : "/home/screw/vmx/images/vFPC-20170523.img"
---
The external bridge configuration useful to get access to the hosts externally :
#External bridge configuration
BRIDGES:
- type : external
name : br-ext
---
Then the vRE VM parameters, such as CPUs, RAM, console port, interfaces. This interface will be used to communicate with the vPFE.
#vRE VM parameters
CONTROL_PLANE:
vcpus : 1
memory-mb : 1024
console_port: 8601
interfaces :
- type : static
ipaddr : 10.102.144.201
macaddr : "0A:00:DD:d8:4f:b8"
---
Then the vPFE VM parameters, with the interface, in the same IP subnet to allow communication between the two. A specific bridge will be created.
#vPFE VM parameters
FORWARDING_PLANE:
memory-mb : 2048
vcpus : 3
console_port: 8601
device-type : virtio
interfaces :
- type : static
ipaddr : 10.102.144.41
macaddr : "0A:00:DE:4f:84:23"
---
Finally the interfaces configuration. These are the production interfaces you will use to run routing protocols and other stuff
---
#Interfaces
JUNOS_DEVICES:
- interface : ge-0/0/0
mac-address : "02:06:0A:7b:84:50"
description : "ge-0/0/0 interface"
- interface : ge-0/0/1
mac-address : "02:06:0A:4d:ec:ce"
description : "ge-0/0/1 interface"
The links configuration file is documented here: https://www.juniper.net/documentation/en_US/vmx14.1/topics/task/configuration/vmx-virtio-devices-binding.html
The file is another YAML file with the following format (an example sits in config/vmx-junosdev.conf) :
interfaces :
- link_name : vmx_link1
mtu : 1500
endpoint_1 :
- type : junos_dev
vm_name : vmx1
dev_name : ge-0/0/0
endpoint_2 :
- type : bridge_dev
dev_name : bridge1
- link_name : vmx_link2
mtu : 1500
endpoint_1 :
- type : junos_dev
vm_name : vmx2
dev_name : ge-0/0/0
endpoint_2 :
- type : bridge_dev
dev_name : bridge1
- link_name : vmx_link3
endpoint_1 :
- type : junos_dev
vm_name : vmx1
This will launch the first instance : You will repeat this step for each router.
sudo bash vmx.sh --install --cfg config/R1.conf
Then you can launch the connection script:
sudo bash vmx.sh --bind-dev –-cfg config/links.conf
Some lines are shown not to be interpreted. This is because the shebang specify sh instead of bash.
You can either use the bash
command to launch the script or replace #!/bin/sh
by #!/bin/bash
in the vmx.sh script.
Generally, the error comes from the privileges of the user that executed the script. Run the script as root so it make the necessary verification regarding libvirt and hugepages. If the error persist, re-run the script again.
Then, if it still fails, you can check that everything is properly configurer on your host:
pub@kvmhost:~/vmx$ cat /etc/default/qemu-kvm | grep HUGE
KVM_HUGEPAGES=1
pub@kvmhost:~/vmx$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
HugePages_Total: 44
HugePages_Free: 44
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
pub@kvmhost:~/vmx$ cat /etc/sysctl.conf | grep -i huge
# Allocate 256 HugePageTables (start with a low number but increas it before using it
vm.nr_hugepages = 8192
Useful links: https://forums.juniper.net/t5/vMX/VMX-install-fails-on-quot-setup-huge-pages-quot/td-p/309471 https://help.ubuntu.com/community/KVM%20-%20Using%20Hugepages
If this happens it means that you have an already br-ext bridge registered but not active. It may be the results of an unsuccessful attempt to run the script or a previous setup. In order to clean this, you can do it by running the following virsh commands:
pub@kvmhost:~/vmx$ virsh
virsh # net-list
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
virsh # net-list --all
Name State Autostart Persistent
----------------------------------------------------------
br-ext inactive no yes
default active yes yes
virsh # net-undefine br-ext
Network br-ext has been undefined
virsh # net-list --all
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
If the network is active and you want to achieve the same results, you need to destroy it first with:
net-destroy
If you encounter the following error, install the corresponding python module:
File "/home/nugraha/Documents/vmx/scripts/common/vmx_configure.py", line 9, in <module>
import netifaces as ni
ImportError: No module named netifaces
Source is here : https://forums.juniper.net/t5/vMX/vMX-Failed-Generate-libvirt-files/td-p/286736
You have to configure the cpu-mode as host-passthrough in the vRE xml.
vim build/vmx1/xml/vRE-generated.xml
<cpu mode="host-passthrough">
Sources: https://forums.juniper.net/t5/vMX/vMX-cannot-boot-up-staying-at-db-gt/td-p/305658 https://kb.juniper.net/InfoCenter/index?page=content&id=KB20635&actp=METADATA
If you got into this error, you might need to check that you IPs, MACs or console ports are unique across your configuration files.
error : internal error: early end of file from monitor, possible problem: device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
2018-05-21T17:24:21.734512Z qemu-system-x86_64: -chardev socket,id=charserial0,host=127.0.0.1,port=8600,telnet,server,nowait: Failed to bind socket: Address already in use
If default br is missing or you have deleted it accidentally, then recreate it from this configuration found on github:
<network>
<name>default</name>
<bridge name="virbr0" />
<forward/>
<ip address="192.168.122.1" netmask="255.255.255.0">
<dhcp>
<range start="192.168.122.2" end="192.168.122.254" />
</dhcp>
</ip>
</network>
You can use the following commands to manage your lab:
virsh
list
destroy
net-list --all
net-undefine
This is an automatic process based on DHCP for operating system upgrade on Juniper switches. You can disable it by entering the following commands in JunOS configuration mode:
delete chassis auto-image-upgrade
commit
As always, you will be asked to change the root password on the JunOS box, here is the snippet:
edit system
set root-authentication plain-text-password XX....XX
Message from syslogd@ at Apr 9 15:28:36 … fpc0 Frame 8: sp = 0xffeeb978, pc = 0x807c415 Message from syslogd@ at Nov 11 09:07:28 … fpc0 Scheduler Oinker
This log message kept bugging me. The only mean I found to silence it, is to put a regex to prevent it from getting its way to the console.
edit system syslog user *
set match "!fpc"
This Juniper troubleshooting procedure is interesting: https://www.juniper.net/documentation/en_US/vmx/topics/task/verification/vmx-vm-connection-troubleshooting-esxi.html
Two things seen there: troubleshoot communication between vFP and vCP using ping, and check linecard discovery by vCP.
To ping, you need to find the IPs on both vCP and vFP. It can be done with show interfaces terse
and ifconfig
on vFP.
Then the following ping command is used:
root> ping 128.0.0.16 routing-instance __juniper_private1__
To check the line card discovery : show chassis fpc
looking for linecard starting in slot 0 and show interfaces terse
looking for ge-0/x/x interfaces.
If not two command are recommanded to restart the corresponding processes:
request chassis fpc slot 0 restart
and if it fails showing FPC is in transition
:
restart chassis-control
If you have an high CPU usage, you might need to change the performance mode to lite.
root# edit chassis fpc 0
root# set lite-mode
Source: https://forums.juniper.net/t5/vMX/Juniper-VMX-bad-cpu-usage-using-lite-mode-in-kvm-compared-to/td-p/318927 https://www.juniper.net/documentation/en_US/vmx/topics/task/configuration/vmx-chassis-flow-caching-enabling.html
SQUASHFS is a compressed read-only filesystem that is generally used for CD images, liveCD… Getting those error messages may mean that the drive or the media has an issue. Check you img file (hash fingerprint), and if there is no issue on that side, reboot your VM.
SQUASHFS error: squashfs_read_data failed to read block 0x7ca248
SQUASHFS error: Unable to read fragment cache entry [7ca248]
SQUASHFS error: Unable to read page, block 7ca248, size 980f
SQUASHFS error: Unable to read fragment cache entry [7ca248]
SQUASHFS error: Unable to read page, block 7ca248, size 980f
SQUASHFS error: Unable to read fragment cache entry [7ca248]
SQUASHFS error: Unable to read page, block 7ca248, size 980f
SQUASHFS error: Unable to read fragment cache entry [7ca248]
SQUASHFS error: Unable to read page, block 7ca248, size 980f
SQUASHFS error: Unable to read fragment cache entry [7ca248]
SQUASHFS error: Unable to read page, block 7ca248, size 980f
I put the default password for reference here:
- vCP: root / no password
- vFP: root / root
If you are crazy enough to decide to emulate multiple linecards, here’s an interesting link: https://jncie.eu/how-to-deploy-vmx-with-multiple-res-and-multiple-fpcs-in-eve-ng-kvm/
In that particular case, you have multiple vPFE for one single vCP.
If you got the following error: mount: mounting /dev/sda2 on /mnt failed: No such file or directory
you can correct your virtual hard disk drive type to IDE for this machine.
In case you want to change the host to increase performance or just to move the VM to another lab environment, I noted that it is generally preferable to have a fresh install instead. It will spare you some useless troubleshooting time :-)