#cubox irc log of Mon 22 Jul 2013

IRC log of #cubox of Mon 22 Jul 2013. All times are in CEST < Back to index

07:29 rabeeh dv_ / MMlosh: CuBox has SATA gen 2 (3Gbps)

07:30 rabeeh typically dd and other applications uses CPU that first copies to the cache buffer and then commits to the drive

07:30 rabeeh if you really want to test the SATA port uses 'sgdd' or similar that reads and writes bypassing the cache buffer

07:31 rabeeh or you can use a non busybox dd with iflag=bio or oflag=bio (depending if reading or writing) - read the man pages first

09:57 MMlosh rabeeh, thanks for the interesting info.. is that cubox-specific, or it works like that on all platforms?

09:58 rabeeh all platforms are the same

09:58 MMlosh my dd doesn't do "bio"

09:58 MMlosh $ dd --version

09:58 MMlosh dd (coreutils) 8.20

09:58 rabeeh for interesting readings; you can search for zero copy concepts, splice and sendfile which are important for NAS functions

09:59 MMlosh sgdd says it will ignore partitioning while doing SG_IO.. I am usually using dd on partitions :/

09:59 rabeeh MMlosh: my mistake

09:59 rabeeh s/bio/direct

10:00 rabeeh for instance -

10:00 rabeeh dd if=/dev/sda of=/dev/null bs=1M count=1000 iflag=direct

10:00 rabeeh this will transfer 1GB from drive to /dev/null; each transfer command is 1MByte

10:01 MMlosh I don't think so

10:01 MMlosh you've mixed up marketting-GB with a complete-GB

10:01 MMlosh I guess that would copy 1000MiB

10:01 rabeeh :)

10:02 rabeeh 1000MiB

10:02 rabeeh actually it's neither

10:03 MMlosh ? you wrote exactly what I did

10:04 rabeeh with that dd command, we are transferring one thousand time of 1MiB

10:04 rabeeh so it's 1024x1024x1000

10:04 rabeeh where 1GB is 1000x1000x1000

10:04 rabeeh 1GiB is 1024x1024x1024

10:04 MMlosh yes, yes, but that doesn't make it _not_ 1000MiB

10:05 rabeeh right

10:06 MMlosh btw: how can cubox installer bugs be reported?

10:06 rabeeh do you have SSD you can try it out?

10:06 MMlosh I don't

10:08 MMlosh oh, nevermind with bugs.. they are already reported here: http://solid-run.com/phpbb/viewtopic.php?f=2&t=817&sid=a047b5406781017e9d238765813b3c37&start=50#p7771

10:09 rabeeh https://github.com/rabeeh/cubox-installer/issues

10:09 rabeeh ?

10:10 rabeeh ok

10:10 MMlosh oh.. issue tracker with no issues looks suspicious and usused

10:10 rabeeh hehe...

10:10 rabeeh patches accepted

10:11 rabeeh or in your case issues are accepted

10:12 rabeeh i have few SSDs around; will hook it and see the performance

10:12 rabeeh SATA 1 theoritical is 150MB/sec; in practice it maxes out around 120MB/sec

10:12 rabeeh SATA 2 is 300MB/sec in theory; in practice it is around 230MB/sec

10:13 rabeeh i think CuBox with the direct i/o flag should be able to reach 230MB/sec on large transfers (for instance 1MB each transfer can be committed as-is to the SATA drive)

10:14 MMlosh that is nice.. won't such speed be attainable only when copying drives, though? (And not really relevant from the user-experience point of view)

10:16 MMlosh btw: mm.. https://github.com/rabeeh/linux has no issue tracker.. I guess I have to ask where the "why is ipv6 missing" issue should go. There are quite a lot ipv6-only services on my network

16:40 RandomPixels hello rabeeh

16:42 TrevorH1 dd results without & then with iflag=direct : 1073741824 bytes (1.1 GB) copied, 10.7827 s, 99.6 MB/s vs 1073741824 bytes (1.1 GB) copied, 7.77442 s, 138 MB/s

16:43 TrevorH1 that's a sata 6g ssd on an esata 3gbps port, reading from /dev/sda and output /dev/null

17:00 dv__ hm

17:00 dv_ try also to test the iops

17:01 dv_ but the transfer rate is oddly low

17:06 TrevorH1 this is not the greatest ssd on the market but it's probably capable of more than that. Not very concerned, that's faste enough for the purposes my cubox serves

17:15 RandomPixels hello guys, i need some help with a driver compile

17:20 RandomPixels back :)

18:44 rabeeh RandomPixels: hi

18:46 rabeeh TrevorH1: can you do it with 'time' before the 'dd'? i wonder what is the CPU utilization in user and kernel space

18:52 TrevorH1 real 0m7.797s user 0m0.010s sys 0m3.980s on the iflag=direct version

19:26 rabeeh so, ~50% goes to system which is probably the buffer allocations and i/o handling

19:26 rabeeh i think you can get more then; so it's ssd limited

22:24 dv_ rabeeh: speaking about buffers: is the cubox' memory bandwith low?

22:24 dv_ (hi again btw... its too warm to sleep :/ )

22:25 rabeeh dv_: memory bandwidth can get up to 2GB/sec

22:26 rabeeh that's actual performance

22:26 rabeeh theoritical is 3.2GB/sec

22:26 dv_ rabeeh: in the decoder I was copying the frames from dma buffers to virtual mem blocks, and with 1920x1080 frames, CPU usage was at about 70%

22:26 rabeeh ouch

22:26 rabeeh how are those dma buffers are being allocated?

22:27 rabeeh cacheable / non-cacheable / bufferable?

22:27 dv_ I havent seen "bufferable" anywhere

22:27 rabeeh in your case if you are reading from those dma buffers then cacheable / non-cacheable is what you care about

22:27 dv_ but they arent allocated cacheable

22:28 rabeeh bufferable means that those pages are tagged in the page table that the processor can combine contiguous writes into bigger ones

22:28 dv_ vdec_os_api_dma_alloc(vmeta_dec->dec_info.seq_info.dis_buf_size, VMETA_DIS_BUF_ALIGN, &(picture->nPhyAddr));

22:28 dv_ I use that one

22:28 dv_ ah

22:28 dv_ I guess bufferable is done by using vdec_os_api_dma_alloc_writecombine() the

22:28 rabeeh if they are not allocated cacheable then every read from that buffer goes to the memory and stalls the whole processor pipeline until read comes back from the memory controller

22:28 dv_ *then

22:28 dv_ interesting

22:28 rabeeh exactly

22:29 dv_ well, I use _writecombine for input buffers (that is, buffers I write input blocks to)

22:29 rabeeh now with using cacheable memory comes responsiblity to take care of D-cache that it reads fresh data

22:30 rabeeh for instance, you can allocate buffer, invalidate it (take care about 32byte boundaries not to trash surrounding data) then submitting that buffer to vmeta

22:30 dv_ I guess this is what vdec_os_api_flush_cache() is about

22:31 rabeeh when reading you can 'ldr', ldrd' 'ldm' or other ARM instructions that reads 32bit, 64bit or array to registers

22:31 rabeeh the flush_cache() can be used; i recall we had an issue with that on the xbmc port

22:31 rabeeh i can't recall what

22:31 dv_ so, writecombine for the input, cacheable + flush_cache for the output

22:31 dv_ alright

22:32 dv_ I anyway only am going to allocate the cacheable frames when the decoder requires an output frame

23:19 dv_ rabeeh: I looked over the patches. I think the issue was that it didnt yield much benefit

23:21 dv_ at least for the output frames there is zero benefit unless you read from it more than once

23:22 dv_ but thats exactly what happens. input buffer goes into vmeta, output buffer comes from vmeta, pixels are memcpy()'d from the output buffer to a malloc'ed buffer . that happens for every decoded frame.

23:24 dv_ of course, perhaps memcpy does not use ldr,ldrd,ldm ...

23:27 _rmk_ dv: it would be rediculous if memcpy wasn't already optimised

23:27 dv_ true

23:29 dv_ I'll then stick to my original plan for now: keep the dma buffers with the decoded frames, and pass them around to a modified xvimagesink

23:30 dv_ btw, _rmk_, have you noticed something inside the x11 driver that allows the xvideo/xshm stuff to accept DMA buffers for video frame data?

23:30 _rmk_ my cubox is currently offline as I've had to nick my esata box for investigating my PVRs disk over the weekend after a massive screwup

23:31 dv_ marvell made a modified xvimagesink gstreamer element that does not memcpy to it if the incoming frame is stored in a dma buffer

23:31 dv_ oh, ouch

23:31 _rmk_ dv: there's only the BMM hack, but... if I can ever get my DRM stuff sorted, and I release the X driver, you'll find that you can pass DRM objects via the Xv thing with my stuff

23:32 dv_ ah yes, initially the xvimagesink used bmm as well

23:32 dv_ the newer version uses libphycontmem to get a physical address for a virtual one

23:32 _rmk_ ah, but with this you construct the DRM object id in xvimagesink and pass that ID over

23:32 _rmk_ the DRM model is you don't deal with physical addresses

23:33 dv_ hmm I am unfamiliar with the DRM model

23:33 dv_ it can accept dma buffers, avoiding memcpy?

23:33 dv_ (I have the virtual address of the buffer as well as the physical one)

23:33 _rmk_ (the massive screwup happened on friday - the broadcast guide data contained illegal stuff which crashed the freeview receiver solid, and had everyone with a particular Sony, Pioneer or Vestel PVR thinking that it had died)

23:34 _rmk_ that's the theory but I have serious concerns with dmabuf

23:35 _rmk_ but...

23:35 _rmk_ for the implementation I have, it doesn't involve any calls to dma_map_sg()/dma_unmap_sg() in that path, so no cache flushes and no memcpys

23:35 dv_ nice

23:35 dv_ that would mean zero unnecessary copies of frames from start to finish

23:36 dv_ and 1080p video with very few CPU% :)

23:36 _rmk_ yep, that's the theory, that's what the old bmmxvimagesink used to do

23:36 dv_ the newer vmetaxvsink too, just a different library/interface, but its still a hack

23:36 dv_ thats what worries me - I have no clear way for detecting whether or not the driver supports the hack

23:37 dv_ so, using this sink with another driver .... well, crash&burn comes to mind

23:37 _rmk_ that's why this stuff should've been sorted out yonks ago, so that it was possible to standardize on a driver which did things in ways which people found acceptable

23:38 dv_ agreed ... I have no idea why this became such a mess

23:38 _rmk_ with the way I did my X backend, it uses a separate fourcc for this method, rather than the magic numbers in the shm buffer

23:38 dv_ also this bmm vs. libphycontmem thing - why introduce a new library? whats wrong with bmm?

23:38 dv_ fourcc is fine

23:38 _rmk_ so it is detectable. "XVBO" fourcc :)

23:38 dv_ or UUID if you are paranoid or like microsoft :)

23:39 _rmk_ ewwewweyedee ?

23:39 _rmk_ eww definitely :)

23:39 dv_ okay, they call it GUID...

23:40 dv_ fourccs remind me of the old amiga IFF days though, so they are cooler. :P

23:40 _rmk_ I also made XVBO pass not only the DRM object id but also the fourcc for the type of image it referred to

23:41 dv_ UYVY and friends you mean?

23:41 _rmk_ yes, and so you could even pass a ARGB8888 buffer if you really wanted

23:41 dv_ nice

23:42 dv_ hmm just a second, if I understand this correctly, bmm and phycontmem are wrappers around pmem?

23:42 _rmk_ from the sounds of it, they're both about getting hold of a chunk of contiguous memory, and its physical address

23:44 dv_ yeah, looks like it

23:44 dv_ and for getting the physical address associated with a virtual one, and vice versa

23:45 dv_ except that I get that with the vmeta api already. I guess internally it talks to pmem directly for that

23:45 _rmk_ well... really, for output frames, you don't need to know the virtual address, but BMM contains a few flaws...

23:45 _rmk_ (1) you can only free a BMM buffer by _virtual_ address

23:45 _rmk_ eek, rain.

23:46 _rmk_ and (2) it needlessly padds the beginning of the buffer, wasting memory to store data necessary to align it...

23:47 _rmk_ but vmeta only ever wants page size alignment, and the memory is already page sized aligned, so it ends up wasting a page on every single allocation!

23:47 dv_ eek

23:47 dv_ especially with the fixed size vmem limit

23:47 _rmk_ a lot of the bmm crappyness can be sorted by changing the kernel API... such as freeing by physical address rather than virtual.

23:48 dv_ hmm then I guess phycontmem is an attempt to make things better

07:29	rabeeh	dv_ / MMlosh: CuBox has SATA gen 2 (3Gbps)
07:30	rabeeh	typically dd and other applications uses CPU that first copies to the cache buffer and then commits to the drive
07:30	rabeeh	if you really want to test the SATA port uses 'sgdd' or similar that reads and writes bypassing the cache buffer
07:31	rabeeh	or you can use a non busybox dd with iflag=bio or oflag=bio (depending if reading or writing) - read the man pages first
09:57	MMlosh	rabeeh, thanks for the interesting info.. is that cubox-specific, or it works like that on all platforms?
09:58	rabeeh	all platforms are the same
09:58	MMlosh	my dd doesn't do "bio"
09:58	MMlosh	$ dd --version
09:58	MMlosh	dd (coreutils) 8.20
09:58	rabeeh	for interesting readings; you can search for zero copy concepts, splice and sendfile which are important for NAS functions
09:59	MMlosh	sgdd says it will ignore partitioning while doing SG_IO.. I am usually using dd on partitions :/
09:59	rabeeh	MMlosh: my mistake
09:59	rabeeh	s/bio/direct
10:00	rabeeh	for instance -
10:00	rabeeh	dd if=/dev/sda of=/dev/null bs=1M count=1000 iflag=direct
10:00	rabeeh	this will transfer 1GB from drive to /dev/null; each transfer command is 1MByte
10:01	MMlosh	I don't think so
10:01	MMlosh	you've mixed up marketting-GB with a complete-GB
10:01	MMlosh	I guess that would copy 1000MiB
10:01	rabeeh	:)
10:02	rabeeh	1000MiB
10:02	rabeeh	actually it's neither
10:03	MMlosh	? you wrote exactly what I did
10:04	rabeeh	with that dd command, we are transferring one thousand time of 1MiB
10:04	rabeeh	so it's 1024x1024x1000
10:04	rabeeh	where 1GB is 1000x1000x1000
10:04	rabeeh	1GiB is 1024x1024x1024
10:04	MMlosh	yes, yes, but that doesn't make it _not_ 1000MiB
10:05	rabeeh	right
10:06	MMlosh	btw: how can cubox installer bugs be reported?
10:06	rabeeh	do you have SSD you can try it out?
10:06	MMlosh	I don't
10:08	MMlosh	oh, nevermind with bugs.. they are already reported here: http://solid-run.com/phpbb/viewtopic.php?f=2&t=817&sid=a047b5406781017e9d238765813b3c37&start=50#p7771
10:09	rabeeh	https://github.com/rabeeh/cubox-installer/issues
10:09	rabeeh	?
10:10	rabeeh	ok
10:10	MMlosh	oh.. issue tracker with no issues looks suspicious and usused
10:10	rabeeh	hehe...
10:10	rabeeh	patches accepted
10:11	rabeeh	or in your case issues are accepted
10:12	rabeeh	i have few SSDs around; will hook it and see the performance
10:12	rabeeh	SATA 1 theoritical is 150MB/sec; in practice it maxes out around 120MB/sec
10:12	rabeeh	SATA 2 is 300MB/sec in theory; in practice it is around 230MB/sec
10:13	rabeeh	i think CuBox with the direct i/o flag should be able to reach 230MB/sec on large transfers (for instance 1MB each transfer can be committed as-is to the SATA drive)
10:14	MMlosh	that is nice.. won't such speed be attainable only when copying drives, though? (And not really relevant from the user-experience point of view)
10:16	MMlosh	btw: mm.. https://github.com/rabeeh/linux has no issue tracker.. I guess I have to ask where the "why is ipv6 missing" issue should go. There are quite a lot ipv6-only services on my network
16:40	RandomPixels	hello rabeeh
16:42	TrevorH1	dd results without & then with iflag=direct : 1073741824 bytes (1.1 GB) copied, 10.7827 s, 99.6 MB/s vs 1073741824 bytes (1.1 GB) copied, 7.77442 s, 138 MB/s
16:43	TrevorH1	that's a sata 6g ssd on an esata 3gbps port, reading from /dev/sda and output /dev/null
17:00	dv__	hm
17:00	dv_	try also to test the iops
17:01	dv_	but the transfer rate is oddly low
17:06	TrevorH1	this is not the greatest ssd on the market but it's probably capable of more than that. Not very concerned, that's faste enough for the purposes my cubox serves
17:15	RandomPixels	hello guys, i need some help with a driver compile
17:20	RandomPixels	back :)
18:44	rabeeh	RandomPixels: hi
18:46	rabeeh	TrevorH1: can you do it with 'time' before the 'dd'? i wonder what is the CPU utilization in user and kernel space
18:52	TrevorH1	real 0m7.797s user 0m0.010s sys 0m3.980s on the iflag=direct version
19:26	rabeeh	so, ~50% goes to system which is probably the buffer allocations and i/o handling
19:26	rabeeh	i think you can get more then; so it's ssd limited
22:24	dv_	rabeeh: speaking about buffers: is the cubox' memory bandwith low?
22:24	dv_	(hi again btw... its too warm to sleep :/ )
22:25	rabeeh	dv_: memory bandwidth can get up to 2GB/sec
22:26	rabeeh	that's actual performance
22:26	rabeeh	theoritical is 3.2GB/sec
22:26	dv_	rabeeh: in the decoder I was copying the frames from dma buffers to virtual mem blocks, and with 1920x1080 frames, CPU usage was at about 70%
22:26	rabeeh	ouch
22:26	rabeeh	how are those dma buffers are being allocated?
22:27	rabeeh	cacheable / non-cacheable / bufferable?
22:27	dv_	I havent seen "bufferable" anywhere
22:27	rabeeh	in your case if you are reading from those dma buffers then cacheable / non-cacheable is what you care about
22:27	dv_	but they arent allocated cacheable
22:28	rabeeh	bufferable means that those pages are tagged in the page table that the processor can combine contiguous writes into bigger ones
22:28	dv_	vdec_os_api_dma_alloc(vmeta_dec->dec_info.seq_info.dis_buf_size, VMETA_DIS_BUF_ALIGN, &(picture->nPhyAddr));
22:28	dv_	I use that one
22:28	dv_	ah
22:28	dv_	I guess bufferable is done by using vdec_os_api_dma_alloc_writecombine() the
22:28	rabeeh	if they are not allocated cacheable then every read from that buffer goes to the memory and stalls the whole processor pipeline until read comes back from the memory controller
22:28	dv_	*then
22:28	dv_	interesting
22:28	rabeeh	exactly
22:29	dv_	well, I use _writecombine for input buffers (that is, buffers I write input blocks to)
22:29	rabeeh	now with using cacheable memory comes responsiblity to take care of D-cache that it reads fresh data
22:30	rabeeh	for instance, you can allocate buffer, invalidate it (take care about 32byte boundaries not to trash surrounding data) then submitting that buffer to vmeta
22:30	dv_	I guess this is what vdec_os_api_flush_cache() is about
22:31	rabeeh	when reading you can 'ldr', ldrd' 'ldm' or other ARM instructions that reads 32bit, 64bit or array to registers
22:31	rabeeh	the flush_cache() can be used; i recall we had an issue with that on the xbmc port
22:31	rabeeh	i can't recall what
22:31	dv_	so, writecombine for the input, cacheable + flush_cache for the output
22:31	dv_	alright
22:32	dv_	I anyway only am going to allocate the cacheable frames when the decoder requires an output frame
23:19	dv_	rabeeh: I looked over the patches. I think the issue was that it didnt yield much benefit
23:21	dv_	at least for the output frames there is zero benefit unless you read from it more than once
23:22	dv_	but thats exactly what happens. input buffer goes into vmeta, output buffer comes from vmeta, pixels are memcpy()'d from the output buffer to a malloc'ed buffer . that happens for every decoded frame.
23:24	dv_	of course, perhaps memcpy does not use ldr,ldrd,ldm ...
23:27	_rmk_	dv: it would be rediculous if memcpy wasn't already optimised
23:27	dv_	true
23:29	dv_	I'll then stick to my original plan for now: keep the dma buffers with the decoded frames, and pass them around to a modified xvimagesink
23:30	dv_	btw, _rmk_, have you noticed something inside the x11 driver that allows the xvideo/xshm stuff to accept DMA buffers for video frame data?
23:30	_rmk_	my cubox is currently offline as I've had to nick my esata box for investigating my PVRs disk over the weekend after a massive screwup
23:31	dv_	marvell made a modified xvimagesink gstreamer element that does not memcpy to it if the incoming frame is stored in a dma buffer
23:31	dv_	oh, ouch
23:31	_rmk_	dv: there's only the BMM hack, but... if I can ever get my DRM stuff sorted, and I release the X driver, you'll find that you can pass DRM objects via the Xv thing with my stuff
23:32	dv_	ah yes, initially the xvimagesink used bmm as well
23:32	dv_	the newer version uses libphycontmem to get a physical address for a virtual one
23:32	_rmk_	ah, but with this you construct the DRM object id in xvimagesink and pass that ID over
23:32	_rmk_	the DRM model is you don't deal with physical addresses
23:33	dv_	hmm I am unfamiliar with the DRM model
23:33	dv_	it can accept dma buffers, avoiding memcpy?
23:33	dv_	(I have the virtual address of the buffer as well as the physical one)
23:33	_rmk_	(the massive screwup happened on friday - the broadcast guide data contained illegal stuff which crashed the freeview receiver solid, and had everyone with a particular Sony, Pioneer or Vestel PVR thinking that it had died)
23:34	_rmk_	that's the theory but I have serious concerns with dmabuf
23:35	_rmk_	but...
23:35	_rmk_	for the implementation I have, it doesn't involve any calls to dma_map_sg()/dma_unmap_sg() in that path, so no cache flushes and no memcpys
23:35	dv_	nice
23:35	dv_	that would mean zero unnecessary copies of frames from start to finish
23:36	dv_	and 1080p video with very few CPU% :)
23:36	_rmk_	yep, that's the theory, that's what the old bmmxvimagesink used to do
23:36	dv_	the newer vmetaxvsink too, just a different library/interface, but its still a hack
23:36	dv_	thats what worries me - I have no clear way for detecting whether or not the driver supports the hack
23:37	dv_	so, using this sink with another driver .... well, crash&burn comes to mind
23:37	_rmk_	that's why this stuff should've been sorted out yonks ago, so that it was possible to standardize on a driver which did things in ways which people found acceptable
23:38	dv_	agreed ... I have no idea why this became such a mess
23:38	_rmk_	with the way I did my X backend, it uses a separate fourcc for this method, rather than the magic numbers in the shm buffer
23:38	dv_	also this bmm vs. libphycontmem thing - why introduce a new library? whats wrong with bmm?
23:38	dv_	fourcc is fine
23:38	_rmk_	so it is detectable. "XVBO" fourcc :)
23:38	dv_	or UUID if you are paranoid or like microsoft :)
23:39	_rmk_	ewwewweyedee ?
23:39	_rmk_	eww definitely :)
23:39	dv_	okay, they call it GUID...
23:40	dv_	fourccs remind me of the old amiga IFF days though, so they are cooler. :P
23:40	_rmk_	I also made XVBO pass not only the DRM object id but also the fourcc for the type of image it referred to
23:41	dv_	UYVY and friends you mean?
23:41	_rmk_	yes, and so you could even pass a ARGB8888 buffer if you really wanted
23:41	dv_	nice
23:42	dv_	hmm just a second, if I understand this correctly, bmm and phycontmem are wrappers around pmem?
23:42	_rmk_	from the sounds of it, they're both about getting hold of a chunk of contiguous memory, and its physical address
23:44	dv_	yeah, looks like it
23:44	dv_	and for getting the physical address associated with a virtual one, and vice versa
23:45	dv_	except that I get that with the vmeta api already. I guess internally it talks to pmem directly for that
23:45	_rmk_	well... really, for output frames, you don't need to know the virtual address, but BMM contains a few flaws...
23:45	_rmk_	(1) you can only free a BMM buffer by _virtual_ address
23:45	_rmk_	eek, rain.
23:46	_rmk_	and (2) it needlessly padds the beginning of the buffer, wasting memory to store data necessary to align it...
23:47	_rmk_	but vmeta only ever wants page size alignment, and the memory is already page sized aligned, so it ends up wasting a page on every single allocation!
23:47	dv_	eek
23:47	dv_	especially with the fixed size vmem limit
23:47	_rmk_	a lot of the bmm crappyness can be sorted by changing the kernel API... such as freeing by physical address rather than virtual.
23:48	dv_	hmm then I guess phycontmem is an attempt to make things better