Using PerfSonar, pscheduler, and iperf3 to test network throughput problems

The PerfSonar project enables network engineers to perform a variety of troubleshooting tests. While investigating a file transfer issue over Internet2, I was able to use pscheduler on the PerfSonar servers to test both upload and download throughput.

When a PerfSonar system is configured and online, it registers in the Global PerfSonar Directory. Most of these servers are publicly available for realtime troubleshooting.

I was able to use the pScheduler utility to execute an iperf3 throughput test on a es.net server. The first run tested download speeds (no issue):

[jemurray@wuit-s-00050 ~]$ pscheduler task --tool iperf3 throughput -d bost-pt1.es.net -t 10 --reverse
Submitting task...
Task URL:
https://localhost/pscheduler/tasks/f350732b-7318-4c35-9360-79ce423c5655
Running with tool 'iperf3'
Fetching first run...

Next scheduled run:
https://localhost/pscheduler/tasks/f350732b-7318-4c35-9360-79ce423c5655/runs/1775f5f4-2a98-4ad1-81b2-cf1a71e08ffc
Starts 2020-08-17T17:31:45Z (~9 seconds)
Ends   2020-08-17T17:32:04Z (~18 seconds)
Waiting for result...

* Stream ID 5
Interval       Throughput     Retransmits    Current Window
0.0 - 1.0      6.33 Gbps      9              134.22 MBytes
1.0 - 2.0      8.35 Gbps      0              134.22 MBytes
2.0 - 3.0      8.47 Gbps      0              134.22 MBytes
3.0 - 4.0      8.39 Gbps      0              134.22 MBytes
4.0 - 5.0      8.26 Gbps      0              134.22 MBytes
5.0 - 6.0      8.38 Gbps      0              134.22 MBytes
6.0 - 7.0      8.35 Gbps      0              134.22 MBytes
7.0 - 8.0      8.34 Gbps      0              134.22 MBytes
8.0 - 9.0      8.39 Gbps      0              134.22 MBytes
9.0 - 10.0     8.44 Gbps      0              134.22 MBytes
10.0 - 10.0    8.59 Gbps      0              134.22 MBytes

Summary
Interval       Throughput     Retransmits
0.0 - 10.0     8.17 Gbps      9

No further runs scheduled.

The second run tested upload speeds. Found the problem, 486Kbp/s - far from the 10Gb/s capacity it should normally perform at:

[jemurray@wuit-s-00050 ~]$ pscheduler task --tool iperf3 throughput -d bost-pt1.es.net -t 10
Submitting task...
Task URL:
https://localhost/pscheduler/tasks/71329e14-c5cd-489b-9600-0826be4457d1
Running with tool 'iperf3'
Fetching first run...

Next scheduled run:
https://localhost/pscheduler/tasks/71329e14-c5cd-489b-9600-0826be4457d1/runs/07b7cee2-4128-4970-ad04-a875b6a85d16
Starts 2020-08-17T17:30:35Z (~6 seconds)
Ends   2020-08-17T17:30:54Z (~18 seconds)
Waiting for result...

* Stream ID 5
Interval       Throughput     Retransmits    Current Window
0.0 - 1.0      3.87 Mbps      8              26.84 KBytes
1.0 - 2.0      0.00bps        2              8.95 KBytes
2.0 - 3.0      0.00bps        1              8.95 KBytes
3.0 - 4.0      0.00bps        0              8.95 KBytes
4.0 - 5.0      0.00bps        2              8.95 KBytes
5.0 - 6.0      0.00bps        8              8.95 KBytes
6.0 - 7.0      1.00 Mbps      5              8.95 KBytes
7.0 - 8.0      0.00bps        3              8.95 KBytes
8.0 - 9.0      0.00bps        2              26.84 KBytes
9.0 - 10.0     0.00bps        3              8.95 KBytes

Summary
Interval       Throughput     Retransmits
0.0 - 10.0     486.72 Kbps    34

No further runs scheduled.

Finally, mtr was run to map out the exact path:

[jemurray@wuit-s-00050 ~]$ mtr --report --report-wide -c 120 bost-pt1.es.net
Start: Mon Aug 17 12:49:52 2020
HOST: wuit-s-00050.accounts.ad.wustl.edu         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- xe-0-0-8-900w-mmr-wu-rt-0.net.wustl.edu     0.0%   120    0.4   0.5   0.2   6.3   0.8
  2.|-- xe-0-0-1.944.rtr2.chic.indiana.gigapop.net  1.7%   120    5.6   5.6   5.5   6.5   0.0
  3.|-- 149.165.254.122                             0.8%   120    5.7   5.7   5.6   6.2   0.0
  4.|-- eqxchicr5-ip-b-chiccr5.es.net               2.5%   120    6.1   6.1   6.0   6.6   0.0
  5.|-- eqxashcr5-ip-a-eqxchicr5.es.net             4.2%   120   20.6  20.6  20.5  21.3   0.0
  6.|-- washcr5-ip-c-eqxashcr5.es.net               7.5%   120   21.1  21.0  20.9  21.7   0.0
  7.|-- aofacr5-ip-b-washcr5.es.net                84.2%   120  7973. 7954. 7871. 8013.  41.8
  8.|-- newycr5-ip-a-aofacr5.es.net                 9.2%   120   26.2  26.2  26.0  29.0   0.3
  9.|-- bostcr5-ip-a-newycr5.es.net                 2.5%   120   30.7  30.7  30.6  31.2   0.0
 10.|-- bost-pt1.es.net                             3.3%   120   30.4  30.4  30.3  30.5   0.0

By using these publicly available tools and the support person on the phone, the problem was quickly narrowed down to a receive optic on the far side of our service providers edge router.