IRC log of #cubox of Mon 22 Jul 2013. All times are in CEST < Back to index

07:29 rabeeh dv_ / MMlosh: CuBox has SATA gen 2 (3Gbps)
07:30 rabeeh typically dd and other applications uses CPU that first copies to the cache buffer and then commits to the drive
07:30 rabeeh if you really want to test the SATA port uses 'sgdd' or similar that reads and writes bypassing the cache buffer
07:31 rabeeh or you can use a non busybox dd with iflag=bio or oflag=bio (depending if reading or writing) - read the man pages first
09:57 MMlosh rabeeh, thanks for the interesting info.. is that cubox-specific, or it works like that on all platforms?
09:58 rabeeh all platforms are the same
09:58 MMlosh my dd doesn't do "bio"
09:58 MMlosh $ dd --version
09:58 MMlosh dd (coreutils) 8.20
09:58 rabeeh for interesting readings; you can search for zero copy concepts, splice and sendfile which are important for NAS functions
09:59 MMlosh sgdd says it will ignore partitioning while doing SG_IO.. I am usually using dd on partitions :/
09:59 rabeeh MMlosh: my mistake
09:59 rabeeh s/bio/direct
10:00 rabeeh for instance -
10:00 rabeeh dd if=/dev/sda of=/dev/null bs=1M count=1000 iflag=direct
10:00 rabeeh this will transfer 1GB from drive to /dev/null; each transfer command is 1MByte
10:01 MMlosh I don't think so
10:01 MMlosh you've mixed up marketting-GB with a complete-GB
10:01 MMlosh I guess that would copy 1000MiB
10:01 rabeeh :)
10:02 rabeeh 1000MiB
10:02 rabeeh actually it's neither
10:03 MMlosh ? you wrote exactly what I did
10:04 rabeeh with that dd command, we are transferring one thousand time of 1MiB
10:04 rabeeh so it's 1024x1024x1000
10:04 rabeeh where 1GB is 1000x1000x1000
10:04 rabeeh 1GiB is 1024x1024x1024
10:04 MMlosh yes, yes, but that doesn't make it _not_ 1000MiB
10:05 rabeeh right
10:06 MMlosh btw: how can cubox installer bugs be reported?
10:06 rabeeh do you have SSD you can try it out?
10:06 MMlosh I don't
10:08 MMlosh oh, nevermind with bugs.. they are already reported here: http://solid-run.com/phpbb/viewtopic.php?f=2&t=817&sid=a047b5406781017e9d238765813b3c37&start=50#p7771
10:09 rabeeh https://github.com/rabeeh/cubox-installer/issues
10:09 rabeeh ?
10:10 rabeeh ok
10:10 MMlosh oh.. issue tracker with no issues looks suspicious and usused
10:10 rabeeh hehe...
10:10 rabeeh patches accepted
10:11 rabeeh or in your case issues are accepted
10:12 rabeeh i have few SSDs around; will hook it and see the performance
10:12 rabeeh SATA 1 theoritical is 150MB/sec; in practice it maxes out around 120MB/sec
10:12 rabeeh SATA 2 is 300MB/sec in theory; in practice it is around 230MB/sec
10:13 rabeeh i think CuBox with the direct i/o flag should be able to reach 230MB/sec on large transfers (for instance 1MB each transfer can be committed as-is to the SATA drive)
10:14 MMlosh that is nice.. won't such speed be attainable only when copying drives, though? (And not really relevant from the user-experience point of view)
10:16 MMlosh btw: mm.. https://github.com/rabeeh/linux has no issue tracker.. I guess I have to ask where the "why is ipv6 missing" issue should go. There are quite a lot ipv6-only services on my network
16:40 RandomPixels hello rabeeh
16:42 TrevorH1 dd results without & then with iflag=direct : 1073741824 bytes (1.1 GB) copied, 10.7827 s, 99.6 MB/s vs 1073741824 bytes (1.1 GB) copied, 7.77442 s, 138 MB/s
16:43 TrevorH1 that's a sata 6g ssd on an esata 3gbps port, reading from /dev/sda and output /dev/null
17:00 dv__ hm
17:00 dv_ try also to test the iops
17:01 dv_ but the transfer rate is oddly low
17:06 TrevorH1 this is not the greatest ssd on the market but it's probably capable of more than that. Not very concerned, that's faste enough for the purposes my cubox serves
17:15 RandomPixels hello guys, i need some help with a driver compile
17:20 RandomPixels back :)
18:44 rabeeh RandomPixels: hi
18:46 rabeeh TrevorH1: can you do it with 'time' before the 'dd'? i wonder what is the CPU utilization in user and kernel space
18:52 TrevorH1 real 0m7.797s user 0m0.010s sys 0m3.980s on the iflag=direct version
19:26 rabeeh so, ~50% goes to system which is probably the buffer allocations and i/o handling
19:26 rabeeh i think you can get more then; so it's ssd limited
22:24 dv_ rabeeh: speaking about buffers: is the cubox' memory bandwith low?
22:24 dv_ (hi again btw... its too warm to sleep :/ )
22:25 rabeeh dv_: memory bandwidth can get up to 2GB/sec
22:26 rabeeh that's actual performance
22:26 rabeeh theoritical is 3.2GB/sec
22:26 dv_ rabeeh: in the decoder I was copying the frames from dma buffers to virtual mem blocks, and with 1920x1080 frames, CPU usage was at about 70%
22:26 rabeeh ouch
22:26 rabeeh how are those dma buffers are being allocated?
22:27 rabeeh cacheable / non-cacheable / bufferable?
22:27 dv_ I havent seen "bufferable" anywhere
22:27 rabeeh in your case if you are reading from those dma buffers then cacheable / non-cacheable is what you care about
22:27 dv_ but they arent allocated cacheable
22:28 rabeeh bufferable means that those pages are tagged in the page table that the processor can combine contiguous writes into bigger ones
22:28 dv_ vdec_os_api_dma_alloc(vmeta_dec->dec_info.seq_info.dis_buf_size, VMETA_DIS_BUF_ALIGN, &(picture->nPhyAddr));
22:28 dv_ I use that one
22:28 dv_ ah
22:28 dv_ I guess bufferable is done by using vdec_os_api_dma_alloc_writecombine() the
22:28 rabeeh if they are not allocated cacheable then every read from that buffer goes to the memory and stalls the whole processor pipeline until read comes back from the memory controller
22:28 dv_ *then
22:28 dv_ interesting
22:28 rabeeh exactly
22:29 dv_ well, I use _writecombine for input buffers (that is, buffers I write input blocks to)
22:29 rabeeh now with using cacheable memory comes responsiblity to take care of D-cache that it reads fresh data
22:30 rabeeh for instance, you can allocate buffer, invalidate it (take care about 32byte boundaries not to trash surrounding data) then submitting that buffer to vmeta
22:30 dv_ I guess this is what vdec_os_api_flush_cache() is about
22:31 rabeeh when reading you can 'ldr', ldrd' 'ldm' or other ARM instructions that reads 32bit, 64bit or array to registers
22:31 rabeeh the flush_cache() can be used; i recall we had an issue with that on the xbmc port
22:31 rabeeh i can't recall what
22:31 dv_ so, writecombine for the input, cacheable + flush_cache for the output
22:31 dv_ alright
22:32 dv_ I anyway only am going to allocate the cacheable frames when the decoder requires an output frame
23:19 dv_ rabeeh: I looked over the patches. I think the issue was that it didnt yield much benefit
23:21 dv_ at least for the output frames there is zero benefit unless you read from it more than once
23:22 dv_ but thats exactly what happens. input buffer goes into vmeta, output buffer comes from vmeta, pixels are memcpy()'d from the output buffer to a malloc'ed buffer . that happens for every decoded frame.
23:24 dv_ of course, perhaps memcpy does not use ldr,ldrd,ldm ...
23:27 _rmk_ dv: it would be rediculous if memcpy wasn't already optimised
23:27 dv_ true
23:29 dv_ I'll then stick to my original plan for now: keep the dma buffers with the decoded frames, and pass them around to a modified xvimagesink
23:30 dv_ btw, _rmk_, have you noticed something inside the x11 driver that allows the xvideo/xshm stuff to accept DMA buffers for video frame data?
23:30 _rmk_ my cubox is currently offline as I've had to nick my esata box for investigating my PVRs disk over the weekend after a massive screwup
23:31 dv_ marvell made a modified xvimagesink gstreamer element that does not memcpy to it if the incoming frame is stored in a dma buffer
23:31 dv_ oh, ouch
23:31 _rmk_ dv: there's only the BMM hack, but... if I can ever get my DRM stuff sorted, and I release the X driver, you'll find that you can pass DRM objects via the Xv thing with my stuff
23:32 dv_ ah yes, initially the xvimagesink used bmm as well
23:32 dv_ the newer version uses libphycontmem to get a physical address for a virtual one
23:32 _rmk_ ah, but with this you construct the DRM object id in xvimagesink and pass that ID over
23:32 _rmk_ the DRM model is you don't deal with physical addresses
23:33 dv_ hmm I am unfamiliar with the DRM model
23:33 dv_ it can accept dma buffers, avoiding memcpy?
23:33 dv_ (I have the virtual address of the buffer as well as the physical one)
23:33 _rmk_ (the massive screwup happened on friday - the broadcast guide data contained illegal stuff which crashed the freeview receiver solid, and had everyone with a particular Sony, Pioneer or Vestel PVR thinking that it had died)
23:34 _rmk_ that's the theory but I have serious concerns with dmabuf
23:35 _rmk_ but...
23:35 _rmk_ for the implementation I have, it doesn't involve any calls to dma_map_sg()/dma_unmap_sg() in that path, so no cache flushes and no memcpys
23:35 dv_ nice
23:35 dv_ that would mean zero unnecessary copies of frames from start to finish
23:36 dv_ and 1080p video with very few CPU% :)
23:36 _rmk_ yep, that's the theory, that's what the old bmmxvimagesink used to do
23:36 dv_ the newer vmetaxvsink too, just a different library/interface, but its still a hack
23:36 dv_ thats what worries me - I have no clear way for detecting whether or not the driver supports the hack
23:37 dv_ so, using this sink with another driver .... well, crash&burn comes to mind
23:37 _rmk_ that's why this stuff should've been sorted out yonks ago, so that it was possible to standardize on a driver which did things in ways which people found acceptable
23:38 dv_ agreed ... I have no idea why this became such a mess
23:38 _rmk_ with the way I did my X backend, it uses a separate fourcc for this method, rather than the magic numbers in the shm buffer
23:38 dv_ also this bmm vs. libphycontmem thing - why introduce a new library? whats wrong with bmm?
23:38 dv_ fourcc is fine
23:38 _rmk_ so it is detectable. "XVBO" fourcc :)
23:38 dv_ or UUID if you are paranoid or like microsoft :)
23:39 _rmk_ ewwewweyedee ?
23:39 _rmk_ eww definitely :)
23:39 dv_ okay, they call it GUID...
23:40 dv_ fourccs remind me of the old amiga IFF days though, so they are cooler. :P
23:40 _rmk_ I also made XVBO pass not only the DRM object id but also the fourcc for the type of image it referred to
23:41 dv_ UYVY and friends you mean?
23:41 _rmk_ yes, and so you could even pass a ARGB8888 buffer if you really wanted
23:41 dv_ nice
23:42 dv_ hmm just a second, if I understand this correctly, bmm and phycontmem are wrappers around pmem?
23:42 _rmk_ from the sounds of it, they're both about getting hold of a chunk of contiguous memory, and its physical address
23:44 dv_ yeah, looks like it
23:44 dv_ and for getting the physical address associated with a virtual one, and vice versa
23:45 dv_ except that I get that with the vmeta api already. I guess internally it talks to pmem directly for that
23:45 _rmk_ well... really, for output frames, you don't need to know the virtual address, but BMM contains a few flaws...
23:45 _rmk_ (1) you can only free a BMM buffer by _virtual_ address
23:45 _rmk_ eek, rain.
23:46 _rmk_ and (2) it needlessly padds the beginning of the buffer, wasting memory to store data necessary to align it...
23:47 _rmk_ but vmeta only ever wants page size alignment, and the memory is already page sized aligned, so it ends up wasting a page on every single allocation!
23:47 dv_ eek
23:47 dv_ especially with the fixed size vmem limit
23:47 _rmk_ a lot of the bmm crappyness can be sorted by changing the kernel API... such as freeing by physical address rather than virtual.
23:48 dv_ hmm then I guess phycontmem is an attempt to make things better