07:29 | rabeeh | dv_ / MMlosh: CuBox has SATA gen 2 (3Gbps) |
07:30 | rabeeh | typically dd and other applications uses CPU that first copies to the cache buffer and then commits to the drive |
07:30 | rabeeh | if you really want to test the SATA port uses 'sgdd' or similar that reads and writes bypassing the cache buffer |
07:31 | rabeeh | or you can use a non busybox dd with iflag=bio or oflag=bio (depending if reading or writing) - read the man pages first |
09:57 | MMlosh | rabeeh, thanks for the interesting info.. is that cubox-specific, or it works like that on all platforms? |
09:58 | rabeeh | all platforms are the same |
09:58 | MMlosh | my dd doesn't do "bio" |
09:58 | MMlosh | $ dd --version |
09:58 | MMlosh | dd (coreutils) 8.20 |
09:58 | rabeeh | for interesting readings; you can search for zero copy concepts, splice and sendfile which are important for NAS functions |
09:59 | MMlosh | sgdd says it will ignore partitioning while doing SG_IO.. I am usually using dd on partitions :/ |
09:59 | rabeeh | MMlosh: my mistake |
09:59 | rabeeh | s/bio/direct |
10:00 | rabeeh | for instance - |
10:00 | rabeeh | dd if=/dev/sda of=/dev/null bs=1M count=1000 iflag=direct |
10:00 | rabeeh | this will transfer 1GB from drive to /dev/null; each transfer command is 1MByte |
10:01 | MMlosh | I don't think so |
10:01 | MMlosh | you've mixed up marketting-GB with a complete-GB |
10:01 | MMlosh | I guess that would copy 1000MiB |
10:01 | rabeeh | :) |
10:02 | rabeeh | 1000MiB |
10:02 | rabeeh | actually it's neither |
10:03 | MMlosh | ? you wrote exactly what I did |
10:04 | rabeeh | with that dd command, we are transferring one thousand time of 1MiB |
10:04 | rabeeh | so it's 1024x1024x1000 |
10:04 | rabeeh | where 1GB is 1000x1000x1000 |
10:04 | rabeeh | 1GiB is 1024x1024x1024 |
10:04 | MMlosh | yes, yes, but that doesn't make it _not_ 1000MiB |
10:05 | rabeeh | right |
10:06 | MMlosh | btw: how can cubox installer bugs be reported? |
10:06 | rabeeh | do you have SSD you can try it out? |
10:06 | MMlosh | I don't |
10:08 | MMlosh | oh, nevermind with bugs.. they are already reported here: http://solid-run.com/phpbb/viewtopic.php?f=2&t=817&sid=a047b5406781017e9d238765813b3c37&start=50#p7771 |
10:09 | rabeeh | https://github.com/rabeeh/cubox-installer/issues |
10:09 | rabeeh | ? |
10:10 | rabeeh | ok |
10:10 | MMlosh | oh.. issue tracker with no issues looks suspicious and usused |
10:10 | rabeeh | hehe... |
10:10 | rabeeh | patches accepted |
10:11 | rabeeh | or in your case issues are accepted |
10:12 | rabeeh | i have few SSDs around; will hook it and see the performance |
10:12 | rabeeh | SATA 1 theoritical is 150MB/sec; in practice it maxes out around 120MB/sec |
10:12 | rabeeh | SATA 2 is 300MB/sec in theory; in practice it is around 230MB/sec |
10:13 | rabeeh | i think CuBox with the direct i/o flag should be able to reach 230MB/sec on large transfers (for instance 1MB each transfer can be committed as-is to the SATA drive) |
10:14 | MMlosh | that is nice.. won't such speed be attainable only when copying drives, though? (And not really relevant from the user-experience point of view) |
10:16 | MMlosh | btw: mm.. https://github.com/rabeeh/linux has no issue tracker.. I guess I have to ask where the "why is ipv6 missing" issue should go. There are quite a lot ipv6-only services on my network |
16:40 | RandomPixels | hello rabeeh |
16:42 | TrevorH1 | dd results without & then with iflag=direct : 1073741824 bytes (1.1 GB) copied, 10.7827 s, 99.6 MB/s vs 1073741824 bytes (1.1 GB) copied, 7.77442 s, 138 MB/s |
16:43 | TrevorH1 | that's a sata 6g ssd on an esata 3gbps port, reading from /dev/sda and output /dev/null |
17:00 | dv__ | hm |
17:00 | dv_ | try also to test the iops |
17:01 | dv_ | but the transfer rate is oddly low |
17:06 | TrevorH1 | this is not the greatest ssd on the market but it's probably capable of more than that. Not very concerned, that's faste enough for the purposes my cubox serves |
17:15 | RandomPixels | hello guys, i need some help with a driver compile |
17:20 | RandomPixels | back :) |
18:44 | rabeeh | RandomPixels: hi |
18:46 | rabeeh | TrevorH1: can you do it with 'time' before the 'dd'? i wonder what is the CPU utilization in user and kernel space |
18:52 | TrevorH1 | real 0m7.797s user 0m0.010s sys 0m3.980s on the iflag=direct version |
19:26 | rabeeh | so, ~50% goes to system which is probably the buffer allocations and i/o handling |
19:26 | rabeeh | i think you can get more then; so it's ssd limited |
22:24 | dv_ | rabeeh: speaking about buffers: is the cubox' memory bandwith low? |
22:24 | dv_ | (hi again btw... its too warm to sleep :/ ) |
22:25 | rabeeh | dv_: memory bandwidth can get up to 2GB/sec |
22:26 | rabeeh | that's actual performance |
22:26 | rabeeh | theoritical is 3.2GB/sec |
22:26 | dv_ | rabeeh: in the decoder I was copying the frames from dma buffers to virtual mem blocks, and with 1920x1080 frames, CPU usage was at about 70% |
22:26 | rabeeh | ouch |
22:26 | rabeeh | how are those dma buffers are being allocated? |
22:27 | rabeeh | cacheable / non-cacheable / bufferable? |
22:27 | dv_ | I havent seen "bufferable" anywhere |
22:27 | rabeeh | in your case if you are reading from those dma buffers then cacheable / non-cacheable is what you care about |
22:27 | dv_ | but they arent allocated cacheable |
22:28 | rabeeh | bufferable means that those pages are tagged in the page table that the processor can combine contiguous writes into bigger ones |
22:28 | dv_ | vdec_os_api_dma_alloc(vmeta_dec->dec_info.seq_info.dis_buf_size, VMETA_DIS_BUF_ALIGN, &(picture->nPhyAddr)); |
22:28 | dv_ | I use that one |
22:28 | dv_ | ah |
22:28 | dv_ | I guess bufferable is done by using vdec_os_api_dma_alloc_writecombine() the |
22:28 | rabeeh | if they are not allocated cacheable then every read from that buffer goes to the memory and stalls the whole processor pipeline until read comes back from the memory controller |
22:28 | dv_ | *then |
22:28 | dv_ | interesting |
22:28 | rabeeh | exactly |
22:29 | dv_ | well, I use _writecombine for input buffers (that is, buffers I write input blocks to) |
22:29 | rabeeh | now with using cacheable memory comes responsiblity to take care of D-cache that it reads fresh data |
22:30 | rabeeh | for instance, you can allocate buffer, invalidate it (take care about 32byte boundaries not to trash surrounding data) then submitting that buffer to vmeta |
22:30 | dv_ | I guess this is what vdec_os_api_flush_cache() is about |
22:31 | rabeeh | when reading you can 'ldr', ldrd' 'ldm' or other ARM instructions that reads 32bit, 64bit or array to registers |
22:31 | rabeeh | the flush_cache() can be used; i recall we had an issue with that on the xbmc port |
22:31 | rabeeh | i can't recall what |
22:31 | dv_ | so, writecombine for the input, cacheable + flush_cache for the output |
22:31 | dv_ | alright |
22:32 | dv_ | I anyway only am going to allocate the cacheable frames when the decoder requires an output frame |
23:19 | dv_ | rabeeh: I looked over the patches. I think the issue was that it didnt yield much benefit |
23:21 | dv_ | at least for the output frames there is zero benefit unless you read from it more than once |
23:22 | dv_ | but thats exactly what happens. input buffer goes into vmeta, output buffer comes from vmeta, pixels are memcpy()'d from the output buffer to a malloc'ed buffer . that happens for every decoded frame. |
23:24 | dv_ | of course, perhaps memcpy does not use ldr,ldrd,ldm ... |
23:27 | _rmk_ | dv: it would be rediculous if memcpy wasn't already optimised |
23:27 | dv_ | true |
23:29 | dv_ | I'll then stick to my original plan for now: keep the dma buffers with the decoded frames, and pass them around to a modified xvimagesink |
23:30 | dv_ | btw, _rmk_, have you noticed something inside the x11 driver that allows the xvideo/xshm stuff to accept DMA buffers for video frame data? |
23:30 | _rmk_ | my cubox is currently offline as I've had to nick my esata box for investigating my PVRs disk over the weekend after a massive screwup |
23:31 | dv_ | marvell made a modified xvimagesink gstreamer element that does not memcpy to it if the incoming frame is stored in a dma buffer |
23:31 | dv_ | oh, ouch |
23:31 | _rmk_ | dv: there's only the BMM hack, but... if I can ever get my DRM stuff sorted, and I release the X driver, you'll find that you can pass DRM objects via the Xv thing with my stuff |
23:32 | dv_ | ah yes, initially the xvimagesink used bmm as well |
23:32 | dv_ | the newer version uses libphycontmem to get a physical address for a virtual one |
23:32 | _rmk_ | ah, but with this you construct the DRM object id in xvimagesink and pass that ID over |
23:32 | _rmk_ | the DRM model is you don't deal with physical addresses |
23:33 | dv_ | hmm I am unfamiliar with the DRM model |
23:33 | dv_ | it can accept dma buffers, avoiding memcpy? |
23:33 | dv_ | (I have the virtual address of the buffer as well as the physical one) |
23:33 | _rmk_ | (the massive screwup happened on friday - the broadcast guide data contained illegal stuff which crashed the freeview receiver solid, and had everyone with a particular Sony, Pioneer or Vestel PVR thinking that it had died) |
23:34 | _rmk_ | that's the theory but I have serious concerns with dmabuf |
23:35 | _rmk_ | but... |
23:35 | _rmk_ | for the implementation I have, it doesn't involve any calls to dma_map_sg()/dma_unmap_sg() in that path, so no cache flushes and no memcpys |
23:35 | dv_ | nice |
23:35 | dv_ | that would mean zero unnecessary copies of frames from start to finish |
23:36 | dv_ | and 1080p video with very few CPU% :) |
23:36 | _rmk_ | yep, that's the theory, that's what the old bmmxvimagesink used to do |
23:36 | dv_ | the newer vmetaxvsink too, just a different library/interface, but its still a hack |
23:36 | dv_ | thats what worries me - I have no clear way for detecting whether or not the driver supports the hack |
23:37 | dv_ | so, using this sink with another driver .... well, crash&burn comes to mind |
23:37 | _rmk_ | that's why this stuff should've been sorted out yonks ago, so that it was possible to standardize on a driver which did things in ways which people found acceptable |
23:38 | dv_ | agreed ... I have no idea why this became such a mess |
23:38 | _rmk_ | with the way I did my X backend, it uses a separate fourcc for this method, rather than the magic numbers in the shm buffer |
23:38 | dv_ | also this bmm vs. libphycontmem thing - why introduce a new library? whats wrong with bmm? |
23:38 | dv_ | fourcc is fine |
23:38 | _rmk_ | so it is detectable. "XVBO" fourcc :) |
23:38 | dv_ | or UUID if you are paranoid or like microsoft :) |
23:39 | _rmk_ | ewwewweyedee ? |
23:39 | _rmk_ | eww definitely :) |
23:39 | dv_ | okay, they call it GUID... |
23:40 | dv_ | fourccs remind me of the old amiga IFF days though, so they are cooler. :P |
23:40 | _rmk_ | I also made XVBO pass not only the DRM object id but also the fourcc for the type of image it referred to |
23:41 | dv_ | UYVY and friends you mean? |
23:41 | _rmk_ | yes, and so you could even pass a ARGB8888 buffer if you really wanted |
23:41 | dv_ | nice |
23:42 | dv_ | hmm just a second, if I understand this correctly, bmm and phycontmem are wrappers around pmem? |
23:42 | _rmk_ | from the sounds of it, they're both about getting hold of a chunk of contiguous memory, and its physical address |
23:44 | dv_ | yeah, looks like it |
23:44 | dv_ | and for getting the physical address associated with a virtual one, and vice versa |
23:45 | dv_ | except that I get that with the vmeta api already. I guess internally it talks to pmem directly for that |
23:45 | _rmk_ | well... really, for output frames, you don't need to know the virtual address, but BMM contains a few flaws... |
23:45 | _rmk_ | (1) you can only free a BMM buffer by _virtual_ address |
23:45 | _rmk_ | eek, rain. |
23:46 | _rmk_ | and (2) it needlessly padds the beginning of the buffer, wasting memory to store data necessary to align it... |
23:47 | _rmk_ | but vmeta only ever wants page size alignment, and the memory is already page sized aligned, so it ends up wasting a page on every single allocation! |
23:47 | dv_ | eek |
23:47 | dv_ | especially with the fixed size vmem limit |
23:47 | _rmk_ | a lot of the bmm crappyness can be sorted by changing the kernel API... such as freeing by physical address rather than virtual. |
23:48 | dv_ | hmm then I guess phycontmem is an attempt to make things better |