Intel Optane DC P4800X (NVMe) ACT results


#1

This part of the forum doesn’t seem to get a lot of traffic, but it is that time of the cycle where we are starting to look at options for new drives, so there are a few I’m testing out, and I’ll post results for some of them. Today is one I have been looking forward to testing for a while now.

First the drive info:

First, our configuration:

Model No             : INTEL SSDPED1K375GA
FW-Rev               : E2010324           
Total Size           : 375.00GB
Drive Status         : Drive is in good health
PCI Path (B:D.F)     : 06:00.0
Vendor               : Micron
OS Device            : /dev/nvme1n1
PCIe Link Speed      : 8.0 GT/s
System: Dell R720 with PERC H710P Mini (FW: 21.2.0-0007), 12 core Intel® Xeon® CPU E5-2630 0 @ 2.30GHz

No special handling in regards to spare space, just used 100% of available space. Given this is a PCIe card, it isn’t going via the RAID controller, so no special handling required there either.

Now some results:

$ ./latency_calc/act_latency.py -t 3600 -l output.100x_1d.txt`
data is act version 4.0
        trans                                              device
    %>(ms)                                             %>(ms)

slice  1      2      4      8     16     32     64        1      2      4      8     16     32     64
----- ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
 1     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 2     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 3     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 4     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 5     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 6     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 7     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 8     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
 9     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
10     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
11     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
12     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
13     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
14     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
15     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
16     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
17     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
18     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
19     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
20     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
21     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
22     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
23     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
24     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
avg     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
max     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00

Note that these are for a 24h ACT run at 100x…

150x will fail immediately, 125x fails after 40 seconds. So this is quite interesting, it seems that the drive results in perfect IO operations pretty much up until it fails. The 125x run still was mostly 0s for the short period of time that it ran.

These drives are amazing! I will be getting some other NVMe based drives in, so I should have something non-SATA to compare it to in the near future, but at this point, if money is not a problem, these are the drives to get (which I guess is no surprise to anyone).


#2

If you are using ACT 4.0, try partitioning the drive into 4 partitions and run the test as if it were 4 drives. The 100x 1 drive test would run at 25x for the 4 drive test.


#3

We did get ours to run at 125x.

That might have been on pre-production hardware, and we’re about to re-test with the larger 720G drives. There’s always a question of firmware, host CPU, and what not… are you able to share some of those details?

(And please let us know if partitioning into 4x ACTs ends up helping… we’re trying to investigate whether this is widespread, and why it’s happening )

Thanks!


#4

Oops I thought I included that, sorry: System: Dell R720 with PERC H710P Mini (FW: 21.2.0-0007), 12 core Intel® Xeon® CPU E5-2630 0 @ 2.30GHz

I did wonder if I was running out of CPU horsepower, it’s a pretty old, slow CPU.


#5

But the drives aren’t connected to the PERC, right? The PERC is just for OS drives?

Our benchmark machines are similar, might be 2.5Ghz, but similar.

And, what kernel / NVMe driver version? We generally do a lot with 4.x++ ( like Debian 9’s 4.4, or Ubuntu 18.4 TLS )…

Give Tibor’s idea a try and see if 4 copies against 4 different partitions ( no file system ) improve things.


#6

Correct, I just copied and pasted that from the previous ACT result set I posted for SM863’s.

For the nvme driver, mod info returns:

$ modinfo nvme
filename:       /lib/modules/3.10.0-514.el7.x86_64/kernel/drivers/nvme/host/nvme.ko
version:        1.0
license:        GPL
author:         Matthew Wilcox <willy@linux.intel.com>
rhelversion:    7.3
srcversion:     5B1DDA93CDE30B05D22BA64
alias:          pci:v*d*sv*sd*bc01sc08i02*
alias:          pci:v00001C58d00000003sv*sd*bc*sc*i*
alias:          pci:v00008086d00005845sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A54sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A53sv*sd*bc*sc*i*
alias:          pci:v00008086d00000953sv*sd*bc*sc*i*
depends:        
intree:         Y
vermagic:       3.10.0-514.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        D4:88:63:A7:C1:6F:CC:27:41:23:E6:29:8F:74:F0:57:AF:19:FC:54
sig_hashalgo:   sha256
parm:           admin_timeout:timeout in seconds for admin commands (byte)
parm:           io_timeout:timeout in seconds for I/O (byte)
parm:           shutdown_timeout:timeout in seconds for controller shutdown (byte)
parm:           use_threaded_interrupts:int
parm:           use_cmb_sqes:use controller's memory buffer for I/O SQes (bool)
parm:           nvme_major:int
parm:           nvme_char_major:int

I will try the partitioning idea, although at this point the results are crazy fast anyway :slight_smile:


#7

125x still doesn’t work: Aerospike act version 4.0 - device IO test Copyright 2011 by Aerospike. All rights reserved.

ACT CONFIGURATION
device-names: /dev/nvme0n1p1 /dev/nvme0n1p2 /dev/nvme0n1p3 /dev/nvme0n1p4
num-devices: 4
num-queues: 8
threads-per-queue: 8
test-duration-sec: 86400
report-interval-sec: 1
microsecond-histograms: no
read-reqs-per-sec: 1000000
write-reqs-per-sec: 500000
record-bytes: 1536
record-bytes-range-max: 0
large-block-op-kbytes: 128
replication-factor: 1
update-pct: 0
defrag-lwm-pct: 50
commit-to-device: no
commit-min-bytes: 0
tomb-raider: no
tomb-raider-sleep-usec: 0
scheduler-mode: noop

internal read requests per sec: 1000000
internal write requests per sec: 0
bytes per stored record: 1536 ... 1536
large block reads per sec: 11764.71
large block writes per sec: 11764.71

ERROR: couldn't open /sys/block/nvme0n1p1/queue/scheduler errno 2 'No such file or directory'
ERROR: couldn't open /sys/block/nvme0n1p2/queue/scheduler errno 2 'No such file or directory'
ERROR: couldn't open /sys/block/nvme0n1p3/queue/scheduler errno 2 'No such file or directory'
ERROR: couldn't open /sys/block/nvme0n1p4/queue/scheduler errno 2 'No such file or directory'
Using xorshift+ random generator.

/dev/nvme0n1p1 size = 93770901504 bytes, 715415 large blocks, minimum IO size = 512 bytes
/dev/nvme0n1p2 size = 93770901504 bytes, 715415 large blocks, minimum IO size = 512 bytes
/dev/nvme0n1p3 size = 93770901504 bytes, 715415 large blocks, minimum IO size = 512 bytes
/dev/nvme0n1p4 size = 93769539584 bytes, 715404 large blocks, minimum IO size = 512 bytes

ERROR: too many requests queued
drive(s) can't keep up - test stopped
After 1 sec:
requests queued: 99937
LARGE BLOCK READS  (782 total)
 (00: 0000000713) (01: 0000000069)
LARGE BLOCK WRITES (806 total)
 (00: 0000000806)
RAW READS          (10238 total)
 (00: 0000008183) (01: 0000001933) (02: 0000000122)
/dev/nvme0n1p1     (2511 total)
 (00: 0000002014) (01: 0000000462) (02: 0000000035)
/dev/nvme0n1p2     (2552 total)
 (00: 0000002041) (01: 0000000479) (02: 0000000032)
/dev/nvme0n1p3     (2576 total)
 (00: 0000002028) (01: 0000000518) (02: 0000000030)
/dev/nvme0n1p4     (2599 total)
 (00: 0000002100) (01: 0000000474) (02: 0000000025)
READS              (10238 total)
 (00: 0000000136) (01: 0000000125) (02: 0000000225) (03: 0000000404)
 (04: 0000000734) (05: 0000001744) (06: 0000003013) (07: 0000003857)

I’m not too worried tho, I’m satisfied with ‘100x’ being completely awesome.


#8

Here are results from recent tests in the labs here.

750G Optane at 144x running ACT 4.0 4 partitions.

data is act version 4.0
        trans                                              device
        %>(ms)                                             %>(ms)
slice        1      2      4      8     16     32     64        1      2      4      8     16     32     64
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
    1     0.02   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    2     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    3     0.10   0.07   0.05   0.03   0.02   0.01   0.00     0.05   0.02   0.01   0.00   0.00   0.00   0.00
    4     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    5     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    6     0.15   0.12   0.10   0.07   0.06   0.04   0.02     0.06   0.03   0.01   0.00   0.00   0.00   0.00
    7     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    8     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    9     0.14   0.11   0.08   0.06   0.04   0.02   0.01     0.06   0.03   0.01   0.00   0.00   0.00   0.00
   10     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   11     0.02   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   12     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   13     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   14     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   15     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   16     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   17     0.09   0.06   0.04   0.03   0.02   0.01   0.00     0.05   0.02   0.01   0.00   0.00   0.00   0.00
   18     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   19     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   20     0.32   0.28   0.25   0.22   0.19   0.16   0.12     0.10   0.05   0.03   0.00   0.00   0.00   0.00
   21     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   22     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
   23     0.38   0.33   0.30   0.27   0.23   0.18   0.13     0.11   0.05   0.03   0.01   0.00   0.00   0.00
   24     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
  avg     0.06   0.04   0.03   0.03   0.02   0.02   0.01     0.03   0.01   0.00   0.00   0.00   0.00   0.00
  max     0.38   0.33   0.30   0.27   0.23   0.18   0.13     0.11   0.05   0.03   0.01   0.00   0.00   0.00

375G Optane at 140x running ACT 4.0 no partitions.

data is act version 4.0
        trans                                              device
        %>(ms)                                             %>(ms)
slice        1      2      4      8     16     32     64        1      2      4      8     16     32     64
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
    1     0.03   0.03   0.03   0.02   0.02   0.01   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00
    2     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    3     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    4     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    5     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    6     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    7     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    8     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    9     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   10     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   11     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   12     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   13     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   14     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   15     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   16     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   17     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   18     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
   19     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
  avg     0.00   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
  max     0.03   0.03   0.03   0.02   0.02   0.01   0.00     0.01   0.00   0.00   0.00   0.00   0.00   0.00

#9

From the test configuration items listed for the partitioned test, I see the test is doing 1 M reads and 500K writes. That is a 500x test. To do 125x test for the 4 partitions, the test should run at 31x or 32x which would be 124x or 128x for the drive.


#10

Duh! Thanks tibors.

Looks much better:

data is act version 4.0
            trans                                              device
        %>(ms)                                             %>(ms)
slice        1      2      4      8     16     32     64        1      2      4      8     16     32     64
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
    1     0.35   0.31   0.30   0.28   0.25   0.20   0.06     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    2     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    3     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    4     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
    5     0.01   0.00   0.00   0.00   0.00   0.00   0.00     0.00   0.00   0.00   0.00   0.00   0.00   0.00
-----   ------ ------ ------ ------ ------ ------ ------   ------ ------ ------ ------ ------ ------ ------
  avg     0.08   0.06   0.06   0.06   0.05   0.04   0.01     0.00   0.00   0.00   0.00   0.00   0.00   0.00
  max     0.35   0.31   0.30   0.28   0.25   0.20   0.06     0.00   0.00   0.00   0.00   0.00   0.00   0.00

I don’t quite understand why the very first time period has much worse latency. If I use finer grained buckets (say 10s), the first bucket is really bad. So this is for 32x4=128x

Super impressive.


#11

I’ve seen that too, on pretty much all drives, but I’m not very worried about it. I assume it’s some kind of ramp up in the system either on the disk/system or in act itself. I’m excited to see the optane results though, this is good - thanks! Does the optane have options to modify the sector size?


#12

I haven’t looked into it in much depth, which means I have done zero optimisation on them, and there may well be overhead available with some tweaking. At this point the performance is much higher than what we need, but it will make a very useful benchmark to test other drives against!