#cubox irc log of Tue 30 Jul 2013

IRC log of #cubox of Tue 30 Jul 2013. All times are in CEST < Back to index

07:08 jnettlet dv_, One of the GoogleTV boxes runs a Marvell Chip, Vizio made an android tablet that had the same chip as the XO 1.75, Panasonic has a new tablet based on the mmp3

07:12 jnettlet Both the Asus Cube and Vizio Co-Star Google TV boxes use the Armada 1500 that sports a vMeta co-processor

08:54 shesselba jnettlet, dv_: IIRC all 2nd gen googletvs comprise the Armada 1500. But be careful, it's using some "secure" bootloader.

08:55 shesselba I have been playing with sony's nsz-gs7. Except that I have serial output and can enforce spi boot, I haven't got far. Maybe that usb boot hack from chromecast also works

08:57 shesselba look for the bootloader sources gtvhacker mentioned, it gives some impression of the emmc layout. but there is some encryption scheme on most of the data stored

10:41 dv_ ah okay

10:42 dv_ jnettlet, shesselba_away: I ask because I need somebody to test my GStreamer 1.0 vmeta plugin on non-Dove hardware

10:43 dv_ there are some distinctions inside the plugins made by marvell. for example, on non-Dove hardware, vmeta is suspended when paused, and resumed when playback continues

10:44 dv_ and there is no explanation why this isnt done on Dove. also, on Dove, after each completed (= decoded) frame, Dove is suspended, vdec_os_api_suspend_ready() is called, and then vmeta is immediately resumed. again, no explanation why, and why only on Dove.

10:45 _rmk_ maybe they measured the power usage and found one scheme better for one SoC than the other

10:52 dv_ some comments would have been helpful in the code..

10:52 dv_ but I wonder how power usage could be improved by suspending and immediately resuming after frame completion

10:54 _rmk_ depends how much state has to be reloaded, and what the effect of suspending it is (iow, is it just stopping clocks or removing power too)

11:01 rabeeh the state reload should be immediate; should be a single pointer to some sort of firmware

11:02 rabeeh but then; if vmeta had some caches inside it then the pointer is just to get it up and running; but caches needs to be filled in back again

11:03 dv_ hm

11:04 _rmk_ I disagree - looking at how many register writes the closed source libs do via the open source libvmeta

11:04 dv_ the part marvell's code is here: https://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta/gstvmetadec.c#n2770

11:05 _rmk_ its not just one or two writes that happen to the hardware, there's sometimes hundreds of writes

11:05 dv_ my point is: do I really need to do this suspend-and-immediate-resume for dove?

11:06 dv_ I do not like to make dove/non-dove distinctions in my code, and wish to avoid it if possible

11:06 _rmk_ for a first shot, I'd just make it always do that :)

11:07 rabeeh vmeta has both clocking gating and power down features

11:08 rabeeh dv_: do you know if that suspend stops the clocks? or powers down the unit?

11:08 dv_ nope

11:08 dv_ well okay. I will make this weird suspend-resume stuff and the *actual* suspend-resume (which happens when changing vom playing to paused state and vice versa) configurable by properties

11:09 dv_ rabeeh: good point though. will check

11:10 rabeeh dv_: namely; does it call - vdec_os_api_power_on() or vdec_os_api_clock_on()?

11:10 dv_ no

11:10 dv_ neither does marvell's code

11:11 rabeeh oh; i just found a small surprize in my kernel -

11:11 rabeeh drivers/uio/uio_vmeta.c

11:11 rabeeh int vmeta_power_on(struct vmeta_instance *vi)

11:11 rabeeh {

11:11 rabeeh return 0; // Rabeeh - hack

11:11 rabeeh down_read(&vi->sem);

11:11 rabeeh ...

11:11 dv_ haha

11:19 bencoh :?

11:42 _rmk_ not sure why that's a surprise - I asked you about that last year

11:43 _rmk_ dv: which of the closed source libs are you using?

11:43 dv_ miscgen, vmeta, vmetahal, codecvmetadec

11:44 _rmk_ so you're using DecodeFrame_Vmeta() then...

11:44 dv_ yes

11:44 dv_ initially I wanted to use vdec_api.h , but this one does not even have constants for codec types etc.

11:44 _rmk_ that internally issues the vdec_os_api_power_on()/vdec_os_api_clock_on() calls

11:46 _rmk_ you can find that by observing the behaviour & call trace from the open source libvmeta back up

11:46 dv_ hm. true

11:47 dv_ also, trying out suspend/resume when switching between pause and playback , it seems that suspend sometimes freezes the application, sometimes it doesnt

14:01 jnettlet dv_, sorry I missed you been running around today. On our hardware we are stopping the clocks and powering down the hardware on both CLK_OFF and POWER_OFF commands. There are #ifdefs in the uio_driver to have separate control over these.

14:29 dv_ jnettlet: okay

14:30 dv_ jnettlet: so, do you have any idea why suspend/resume is necessary after each completed frame?

14:31 dv_ my guess, after what _rmk_ said, is that DecodeFrame_Vmeta() sometimes turns off power & clk

14:31 dv_ (sometimes after a frame was completed)

14:32 dv_ and therefore this suspend/resume thing makes sure it does not stay that way

14:32 dv_ unfortunately I didnt have the time to trace libvmeta .... will do that when I can

14:33 dv_ oh, and does the armada 510 also support *en*coding? I have seen marvell slides where only "decoding" was listed for the 500 series and en/decoding for 600 and later

14:44 jnettlet dv_, I don't know if it really has to. I was wondering if it was some sort of additional power saving attempt.

14:47 dv_ I'll leave it in for now then

14:48 dv_ do you have access to an OLPC?

14:48 dv_ and why are there multiple gst-plugins-marvell versions for the different OLPC ones? what are the relevant differences?

14:58 _rmk_ dv: it's probably all caused by no one maintaining a central repository for it, so everyone just grabs a copy of one of them, modifies it a bit and there's no focus to recombine them back together

15:00 dv_ yeah, makes sense

15:02 _rmk_ I'm also debating about augmenting the vdec_os API with functions which provide buffers with physical addresses only - this need to have a process virtual address associated with each buffer is craaaaaazy

15:04 _rmk_ that only comes from someone's broken idea in the BMM stuff that on allocation, you're given a physical address by the kernel, but on free you need to give the kernel your process virtual address for that buffer - which of course you successfully mmap'd

15:24 dv_ well, as long as there is a function for getting a virtual address for the physical one

15:24 dv_ otherwise it will be difficult to get to the data

15:24 _rmk_ for things like the reference buffers, it doesn't need to be mapped

15:24 dv_ yeah, but for input, it needs to

15:25 _rmk_ for input and maybe output depending on what you do with the decoded frames

15:25 dv_ yes

15:25 dv_ but I'd be happy with something like vdec_map / vdec_unmap

15:26 dv_ or vdec_get_virt_addr . both are okay

15:31 _rmk 15:31 * _rmk_ needs to reboot the cubox with kernel profiling enabled... but...

15:31 _rmk_ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

15:31 _rmk_ 9734 cubox 20 0 167m 57m 36m S 29.0 8.0 0:58.51 vlc

15:31 _rmk_ Cpu(s): 27.2%us, 9.2%sy, 0.0%ni, 55.4%id, 0.0%wa, 2.4%hi, 5.8%si, 0.0%st

15:32 _rmk_ that's with AC3 output via spdif

15:32 _rmk_ 448kb/s A52 to be precise

15:33 dv_ nice. is this 5.1 ?

15:33 dv_ (I am not familiar with AC3)

15:33 _rmk_ yep

15:33 dv_ and, vlc built with iwmmxt enabled in libavcodec?

15:34 dv_ I was thinking about how ac3 is the one case where hw acceleration might pay off on cortex A8 class machines

15:34 dv_ *hw acceleration for audio decoding

15:34 _rmk_ it's my own vlc and libavcodec, libva and vmeta

15:35 _rmk_ I have a "HD audio rush" box connected to the spdif which can do non-audio decode

15:37 _rmk_ my el-cheapo tv can't cope with the AC3 stream, but then its got soo crappy speakers on it anyway that the volume is permanently set to -?dB

15:41 dv_ I found it funny the other day that video was using less CPU% than audio (thanks to vmeta and AC3)

15:43 jnettlet dv_, sorry was out walking the dogs while weather permitted.

15:43 jnettlet I do have XO-1.75's and XO-4's for testing

15:44 jnettlet The different versions are due to linking to different binaries, and also kernel interface versions. the XO-1.75 is using iWMMXt accelerated binaries and the XO-4 is using NEON

15:45 jnettlet although the XO-4 can also use the iWMMXt binaries. I never got a chance to profile the two instruction sets against one another. We figured NEON is better supported so we will just go with that.

15:46 jnettlet dv_, we have found audio using more cpu than video in a lot of our testing.

15:48 dv_ uh what?

15:48 dv_ I thought they all use the armada chips

15:48 dv_ so some armada chips have NEON?

16:00 jnettlet yep the new mmp3/pxa2128 or whatever you want to call it does. Supports both iWMMXt and NEON

16:00 dv_ whew. nice.

16:00 dv_ also, about the encoding I mentioned earlier. any idea

16:00 dv_ ?

16:01 dv_ it would otherwise be pointless for me to try to write an encoder as well, if the cubox cant encode

16:02 jnettlet yes the vMeta chip does support encoding. The newest marvell gstreamer plugins have a semi-working version of it implemented.

16:03 dv_ and the newest plugins are in dev.laptop.org/dsd/

16:03 dv_ uh no, wrong urk

16:03 dv_ *url

16:03 dv_ this http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/

16:04 jnettlet looks like he has stripped out the encoder plugin.

16:04 jnettlet 1 sec

16:05 dv_ no wait

16:05 dv_ switch the branch to mmp3

16:05 dv_ http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta?h=mmp3

16:07 jnettlet yep they are under mmp3

16:07 dv_ hmm strange how there are different elements for each bitstream

16:07 jnettlet the source in that branch is not mmp3 specific just the binary code

16:08 jnettlet well you need to tell the encoder what to output.

16:08 jnettlet and each format takes different parameters

16:08 dv_ hmm actually thats a good point :)

16:08 dv_ you could probably also do it using explicit caps

16:08 dv_ but thats rather awkward to do

16:09 dv_ alright, and the last bit: xv output. I wonder, why does vmetaxv - the bmmxvimagesink successor - now use libphycontmem instead of libbmm?

16:09 dv_ is phycontmem an attempt to fix libbmm design flaws?

16:11 jnettlet libphycontmem is an abstraction that Marvell introduced when switching from bmm to pmem. Which was good because it made it relatively painless to then switch to using ION memory manager.

16:12 jnettlet Really we could add a bmm backend to libphycontmem as well. Danial wrote it so it tries to auto-detect the api supported by the kernel

16:12 dv_ ah

16:12 dv_ and pmem no longer requires the vmem= kernel parameter?

16:13 jnettlet pmem does, but ION doesn't if you have it patched with Linaro's CMA backend.

16:13 dv_ either way, I'll use phycontmem instead of libbmm then.

16:13 dv_ (I also want to add a vmetaxv element for gstreamer 1.0)

16:14 dv_ what I dont like though is that this bmm hack for passing DMA buffers to the output without requiring a memcpy cannot be detected

16:14 dv_ that is, there is no way to tell if the driver supports this hack

16:15 jnettlet that needs to be cleaned up. I thought _rmk_ had written something to fix this.

16:17 jnettlet dv_, which driver? The xorg driver?

16:17 dv_ yes

16:17 jnettlet It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over.

16:18 jnettlet The hackiness is that the "id" needs to match between the gstreamer plugin and the xorg driver

16:18 jnettlet really this is where v4l dmabuf is supposed to kick in.

16:19 dv_ I didnt see any "id" in the modified xvimagesink

16:19 dv_ perhaps I overlooked it

16:19 dv_ which is my point. I am concerned that this would cause segfaults then

16:20 jnettlet not that I have seen. I think all the code just falls back to the old slow memcpy way if it doesn't match the id.

16:21 jnettle 16:21 * jnettlet has to go tend to his bbq. Be back in a half an hour or so

16:50 dv_ that needs to be cleaned up. I thought _rmk_ had written something to fix this. <- do you refer to his dmabuf/DRM patches?

16:51 _rmk_ that and my X server driver too

16:51 _rmk_ and converted bmmxvimagesink stuff

16:52 dv_ since i will port these changes to 1.0 , I should eventually sync up with you about these patches you've made

16:53 dv_ currently trying to understand the vmetaxv changes. this system with two magic IDs is ... weird.

16:54 rabeeh dv_: the first magic word is for queuing; the second one is for retrieving presented frames.

16:54 rabeeh dv_: crappy mechanism.

16:54 dv_ yeah, this I found out

16:54 dv_ i also found the note in the code about how broken this is :)

17:10 _rmk_ rabeeh isn't quite correct there.

17:11 _rmk_ first word is a magic word which indicates that it's the BMM buffer passing mechanism

17:12 _rmk_ next word is the count of buffers

17:12 _rmk_ and _then_ is a list of the physical addresses of the buffers

17:12 _rmk_ final word is a checksum on the entire thing

17:12 dv_ well, you could call this queuing I guess

17:14 dv_ but yes, what you describe is what I found in the code. one physical address is put in the list

17:14 _rmk_ yes, because how Marvell had this all working before is you could throw a load of buffers into the video driver and it switches between each one on each vsync

17:14 dv_ and the magic ID there is 0x13572468

17:15 _rmk_ and the video driver returns a list of physical addresses, which the X server then writes back into the SHM region in the same format, but a different magic ID to pass them back to the application as "completed"

17:15 _rmk_ again, checksummed

17:16 dv_ yes. and the reason why it is marked as "broken" in the code is that there is no way to be 100% sure that the frames were fully consumed by that point

17:18 _rmk_ actually, using that BMM passing method there is. you pass the physical address via XShmPutImage - at that point, the X server owns the buffer until it returns its physical address to you. Once it has, you own the buffer again.

17:20 dv_ according to the code, not in all cases. see http://dev.laptop.org/git/users/dsd/gst-plugins-vmetaxv/tree/vmetaxvimagesink.c#n1040

17:21 _rmk_ frankly, that comment's crap - look at it, they've commented out the code which receives the buffers back and _then_ complain that they don't know when the X server/driver has done with it.

17:22 _rmk_ and those changes break overlay display.

17:23 dv_ yeah, they just delete the buffer immediately now

17:23 _rmk_ with overlay, the frame is displayed until you replace it with a different frame, and if you're doing "zero copy" you can't free the buffer until you've replaced it and _know_ that it's been replaced.

17:23 _rmk_ yes, which is total trash

17:24 _rmk_ sure, if you use the GPU to copy it in the X server driver, then the buffer is done with once XvShmPutImage returns

17:24 _rmk_ but with overlay that's definitely not the case

17:24 dv_ agreed

17:24 dv_ also, I still do not see how you can recognize if the driver supports this hack. the IDs are only used for passing buffers . I am unfamiliar with xv/xshm - does the API complain that it does not recognize the ID at some point?

17:24 _rmk_ you can't

17:25 dv_ It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over. <- then I misunderstood this

17:25 _rmk_ if it doesn't and you try using this method, you'll get the contents of the entire SHM buffer displayed

17:25 dv_ oh

17:25 dv_ which is just a bunch of bytes

17:26 dv_ in other words, the magic ID + the physical addresses + the checksum will be interpreted as pixel data, right?

17:26 _rmk_ jnettlet is wrong there. If you pass the X server this magic SHM buffer with the magic word and physical addresses, and the X server doesn't support it, it will dutifully send the entire contents of the SHM buffer to the display

17:26 _rmk_ yes

17:26 dv_ oh. ugly, but at least it doesnt crash

17:27 _rmk_ note that even using the BMM method, the SHM buffer must be big enough to store the full sized image

17:27 _rmk_ even though it only passes a few (maybe 4) words

17:27 dv_ as a fallback? or does the driver to internal copies to it?

17:27 _rmk_ { magic, 1, phys, checksum }

17:28 _rmk_ the X server checks the size of the SHM buffer to validate the XvShmPutImage call way before the DDX driver gets a look-in

17:28 dv_ oh, I see

17:29 dv_ and what do you think, would v4l be ultimately better here?

17:30 _rmk_ no, just a saner API - but... there's still this big question over "when has the display done with my buffer"

17:31 _rmk_ if you have sane and fast buffer allocation, then what you can do is forget about having the app free the buffer...

17:31 _rmk_ or you do and you have refcounting on the buffer

17:32 _rmk_ basically, with dmabuf/drm...

17:32 dv_ the refcounting is done by gstreamer already

17:32 _rmk_ if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system

17:32 dv_ this is in fact how I do things currently

17:32 _rmk_ gstreamer refcounting is useless here

17:32 dv_ oh you mean x11-side refcounts

17:32 dv_ x11/v4l/whateverapi side

17:33 _rmk_ no, I'm talking kernel here... dmabuf is all kernel

17:33 _rmk_ so...

17:33 _rmk_ if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system. once its imported into DRM, you can drop your reference to it in the exporting subsystem.

17:33 dv_ yes, the application can consider it "done"

17:34 dv_ hmm

17:34 dv_ that would be nice indeed

17:34 _rmk_ when DRM has finished with it, DRM is done with its object, and frees its own object - that causes the import to be dropped, which drops the refcount on the exporting subsystem, and finally frees the buffer

17:36 _rmk_ that's what's possible to do... personally though, in my vlc/libva driver code, I keep 16 picture buffers in a fifo structure, and cycle them round without freeing - that's more than enough to avoid any problems

17:37 _rmk_ but then it doesn't matter in vlc because vlc itself copies the data out and doesn't do the "zero copy" thing with X

17:37 dv_ yes, I do something similar with a custom gstbufferpool

07:08	jnettlet	dv_, One of the GoogleTV boxes runs a Marvell Chip, Vizio made an android tablet that had the same chip as the XO 1.75, Panasonic has a new tablet based on the mmp3
07:12	jnettlet	Both the Asus Cube and Vizio Co-Star Google TV boxes use the Armada 1500 that sports a vMeta co-processor
08:54	shesselba	jnettlet, dv_: IIRC all 2nd gen googletvs comprise the Armada 1500. But be careful, it's using some "secure" bootloader.
08:55	shesselba	I have been playing with sony's nsz-gs7. Except that I have serial output and can enforce spi boot, I haven't got far. Maybe that usb boot hack from chromecast also works
08:57	shesselba	look for the bootloader sources gtvhacker mentioned, it gives some impression of the emmc layout. but there is some encryption scheme on most of the data stored
10:41	dv_	ah okay
10:42	dv_	jnettlet, shesselba_away: I ask because I need somebody to test my GStreamer 1.0 vmeta plugin on non-Dove hardware
10:43	dv_	there are some distinctions inside the plugins made by marvell. for example, on non-Dove hardware, vmeta is suspended when paused, and resumed when playback continues
10:44	dv_	and there is no explanation why this isnt done on Dove. also, on Dove, after each completed (= decoded) frame, Dove is suspended, vdec_os_api_suspend_ready() is called, and then vmeta is immediately resumed. again, no explanation why, and why only on Dove.
10:45	_rmk_	maybe they measured the power usage and found one scheme better for one SoC than the other
10:52	dv_	some comments would have been helpful in the code..
10:52	dv_	but I wonder how power usage could be improved by suspending and immediately resuming after frame completion
10:54	_rmk_	depends how much state has to be reloaded, and what the effect of suspending it is (iow, is it just stopping clocks or removing power too)
11:01	rabeeh	the state reload should be immediate; should be a single pointer to some sort of firmware
11:02	rabeeh	but then; if vmeta had some caches inside it then the pointer is just to get it up and running; but caches needs to be filled in back again
11:03	dv_	hm
11:04	_rmk_	I disagree - looking at how many register writes the closed source libs do via the open source libvmeta
11:04	dv_	the part marvell's code is here: https://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta/gstvmetadec.c#n2770
11:05	_rmk_	its not just one or two writes that happen to the hardware, there's sometimes hundreds of writes
11:05	dv_	my point is: do I really need to do this suspend-and-immediate-resume for dove?
11:06	dv_	I do not like to make dove/non-dove distinctions in my code, and wish to avoid it if possible
11:06	_rmk_	for a first shot, I'd just make it always do that :)
11:07	rabeeh	vmeta has both clocking gating and power down features
11:08	rabeeh	dv_: do you know if that suspend stops the clocks? or powers down the unit?
11:08	dv_	nope
11:08	dv_	well okay. I will make this weird suspend-resume stuff and the actual suspend-resume (which happens when changing vom playing to paused state and vice versa) configurable by properties
11:09	dv_	rabeeh: good point though. will check
11:10	rabeeh	dv_: namely; does it call - vdec_os_api_power_on() or vdec_os_api_clock_on()?
11:10	dv_	no
11:10	dv_	neither does marvell's code
11:11	rabeeh	oh; i just found a small surprize in my kernel -
11:11	rabeeh	drivers/uio/uio_vmeta.c
11:11	rabeeh	int vmeta_power_on(struct vmeta_instance *vi)
11:11	rabeeh	{
11:11	rabeeh	return 0; // Rabeeh - hack
11:11	rabeeh	down_read(&vi->sem);
11:11	rabeeh	...
11:11	dv_	haha
11:19	bencoh	:?
11:42	_rmk_	not sure why that's a surprise - I asked you about that last year
11:43	_rmk_	dv: which of the closed source libs are you using?
11:43	dv_	miscgen, vmeta, vmetahal, codecvmetadec
11:44	_rmk_	so you're using DecodeFrame_Vmeta() then...
11:44	dv_	yes
11:44	dv_	initially I wanted to use vdec_api.h , but this one does not even have constants for codec types etc.
11:44	_rmk_	that internally issues the vdec_os_api_power_on()/vdec_os_api_clock_on() calls
11:46	_rmk_	you can find that by observing the behaviour & call trace from the open source libvmeta back up
11:46	dv_	hm. true
11:47	dv_	also, trying out suspend/resume when switching between pause and playback , it seems that suspend sometimes freezes the application, sometimes it doesnt
14:01	jnettlet	dv_, sorry I missed you been running around today. On our hardware we are stopping the clocks and powering down the hardware on both CLK_OFF and POWER_OFF commands. There are #ifdefs in the uio_driver to have separate control over these.
14:29	dv_	jnettlet: okay
14:30	dv_	jnettlet: so, do you have any idea why suspend/resume is necessary after each completed frame?
14:31	dv_	my guess, after what _rmk_ said, is that DecodeFrame_Vmeta() sometimes turns off power & clk
14:31	dv_	(sometimes after a frame was completed)
14:32	dv_	and therefore this suspend/resume thing makes sure it does not stay that way
14:32	dv_	unfortunately I didnt have the time to trace libvmeta .... will do that when I can
14:33	dv_	oh, and does the armada 510 also support encoding? I have seen marvell slides where only "decoding" was listed for the 500 series and en/decoding for 600 and later
14:44	jnettlet	dv_, I don't know if it really has to. I was wondering if it was some sort of additional power saving attempt.
14:47	dv_	I'll leave it in for now then
14:48	dv_	do you have access to an OLPC?
14:48	dv_	and why are there multiple gst-plugins-marvell versions for the different OLPC ones? what are the relevant differences?
14:58	_rmk_	dv: it's probably all caused by no one maintaining a central repository for it, so everyone just grabs a copy of one of them, modifies it a bit and there's no focus to recombine them back together
15:00	dv_	yeah, makes sense
15:02	_rmk_	I'm also debating about augmenting the vdec_os API with functions which provide buffers with physical addresses only - this need to have a process virtual address associated with each buffer is craaaaaazy
15:04	_rmk_	that only comes from someone's broken idea in the BMM stuff that on allocation, you're given a physical address by the kernel, but on free you need to give the kernel your process virtual address for that buffer - which of course you successfully mmap'd
15:24	dv_	well, as long as there is a function for getting a virtual address for the physical one
15:24	dv_	otherwise it will be difficult to get to the data
15:24	_rmk_	for things like the reference buffers, it doesn't need to be mapped
15:24	dv_	yeah, but for input, it needs to
15:25	_rmk_	for input and maybe output depending on what you do with the decoded frames
15:25	dv_	yes
15:25	dv_	but I'd be happy with something like vdec_map / vdec_unmap
15:26	dv_	or vdec_get_virt_addr . both are okay
15:31	_rmk	15:31 * _rmk_ needs to reboot the cubox with kernel profiling enabled... but...
15:31	_rmk_	PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15:31	_rmk_	9734 cubox 20 0 167m 57m 36m S 29.0 8.0 0:58.51 vlc
15:31	_rmk_	Cpu(s): 27.2%us, 9.2%sy, 0.0%ni, 55.4%id, 0.0%wa, 2.4%hi, 5.8%si, 0.0%st
15:32	_rmk_	that's with AC3 output via spdif
15:32	_rmk_	448kb/s A52 to be precise
15:33	dv_	nice. is this 5.1 ?
15:33	dv_	(I am not familiar with AC3)
15:33	_rmk_	yep
15:33	dv_	and, vlc built with iwmmxt enabled in libavcodec?
15:34	dv_	I was thinking about how ac3 is the one case where hw acceleration might pay off on cortex A8 class machines
15:34	dv_	*hw acceleration for audio decoding
15:34	_rmk_	it's my own vlc and libavcodec, libva and vmeta
15:35	_rmk_	I have a "HD audio rush" box connected to the spdif which can do non-audio decode
15:37	_rmk_	my el-cheapo tv can't cope with the AC3 stream, but then its got soo crappy speakers on it anyway that the volume is permanently set to -?dB
15:41	dv_	I found it funny the other day that video was using less CPU% than audio (thanks to vmeta and AC3)
15:43	jnettlet	dv_, sorry was out walking the dogs while weather permitted.
15:43	jnettlet	I do have XO-1.75's and XO-4's for testing
15:44	jnettlet	The different versions are due to linking to different binaries, and also kernel interface versions. the XO-1.75 is using iWMMXt accelerated binaries and the XO-4 is using NEON
15:45	jnettlet	although the XO-4 can also use the iWMMXt binaries. I never got a chance to profile the two instruction sets against one another. We figured NEON is better supported so we will just go with that.
15:46	jnettlet	dv_, we have found audio using more cpu than video in a lot of our testing.
15:48	dv_	uh what?
15:48	dv_	I thought they all use the armada chips
15:48	dv_	so some armada chips have NEON?
16:00	jnettlet	yep the new mmp3/pxa2128 or whatever you want to call it does. Supports both iWMMXt and NEON
16:00	dv_	whew. nice.
16:00	dv_	also, about the encoding I mentioned earlier. any idea
16:00	dv_	?
16:01	dv_	it would otherwise be pointless for me to try to write an encoder as well, if the cubox cant encode
16:02	jnettlet	yes the vMeta chip does support encoding. The newest marvell gstreamer plugins have a semi-working version of it implemented.
16:03	dv_	and the newest plugins are in dev.laptop.org/dsd/
16:03	dv_	uh no, wrong urk
16:03	dv_	*url
16:03	dv_	this http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/
16:04	jnettlet	looks like he has stripped out the encoder plugin.
16:04	jnettlet	1 sec
16:05	dv_	no wait
16:05	dv_	switch the branch to mmp3
16:05	dv_	http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta?h=mmp3
16:07	jnettlet	yep they are under mmp3
16:07	dv_	hmm strange how there are different elements for each bitstream
16:07	jnettlet	the source in that branch is not mmp3 specific just the binary code
16:08	jnettlet	well you need to tell the encoder what to output.
16:08	jnettlet	and each format takes different parameters
16:08	dv_	hmm actually thats a good point :)
16:08	dv_	you could probably also do it using explicit caps
16:08	dv_	but thats rather awkward to do
16:09	dv_	alright, and the last bit: xv output. I wonder, why does vmetaxv - the bmmxvimagesink successor - now use libphycontmem instead of libbmm?
16:09	dv_	is phycontmem an attempt to fix libbmm design flaws?
16:11	jnettlet	libphycontmem is an abstraction that Marvell introduced when switching from bmm to pmem. Which was good because it made it relatively painless to then switch to using ION memory manager.
16:12	jnettlet	Really we could add a bmm backend to libphycontmem as well. Danial wrote it so it tries to auto-detect the api supported by the kernel
16:12	dv_	ah
16:12	dv_	and pmem no longer requires the vmem= kernel parameter?
16:13	jnettlet	pmem does, but ION doesn't if you have it patched with Linaro's CMA backend.
16:13	dv_	either way, I'll use phycontmem instead of libbmm then.
16:13	dv_	(I also want to add a vmetaxv element for gstreamer 1.0)
16:14	dv_	what I dont like though is that this bmm hack for passing DMA buffers to the output without requiring a memcpy cannot be detected
16:14	dv_	that is, there is no way to tell if the driver supports this hack
16:15	jnettlet	that needs to be cleaned up. I thought _rmk_ had written something to fix this.
16:17	jnettlet	dv_, which driver? The xorg driver?
16:17	dv_	yes
16:17	jnettlet	It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over.
16:18	jnettlet	The hackiness is that the "id" needs to match between the gstreamer plugin and the xorg driver
16:18	jnettlet	really this is where v4l dmabuf is supposed to kick in.
16:19	dv_	I didnt see any "id" in the modified xvimagesink
16:19	dv_	perhaps I overlooked it
16:19	dv_	which is my point. I am concerned that this would cause segfaults then
16:20	jnettlet	not that I have seen. I think all the code just falls back to the old slow memcpy way if it doesn't match the id.
16:21	jnettle	16:21 * jnettlet has to go tend to his bbq. Be back in a half an hour or so
16:50	dv_	that needs to be cleaned up. I thought _rmk_ had written something to fix this. <- do you refer to his dmabuf/DRM patches?
16:51	_rmk_	that and my X server driver too
16:51	_rmk_	and converted bmmxvimagesink stuff
16:52	dv_	since i will port these changes to 1.0 , I should eventually sync up with you about these patches you've made
16:53	dv_	currently trying to understand the vmetaxv changes. this system with two magic IDs is ... weird.
16:54	rabeeh	dv_: the first magic word is for queuing; the second one is for retrieving presented frames.
16:54	rabeeh	dv_: crappy mechanism.
16:54	dv_	yeah, this I found out
16:54	dv_	i also found the note in the code about how broken this is :)
17:10	_rmk_	rabeeh isn't quite correct there.
17:11	_rmk_	first word is a magic word which indicates that it's the BMM buffer passing mechanism
17:12	_rmk_	next word is the count of buffers
17:12	_rmk_	and _then_ is a list of the physical addresses of the buffers
17:12	_rmk_	final word is a checksum on the entire thing
17:12	dv_	well, you could call this queuing I guess
17:14	dv_	but yes, what you describe is what I found in the code. one physical address is put in the list
17:14	_rmk_	yes, because how Marvell had this all working before is you could throw a load of buffers into the video driver and it switches between each one on each vsync
17:14	dv_	and the magic ID there is 0x13572468
17:15	_rmk_	and the video driver returns a list of physical addresses, which the X server then writes back into the SHM region in the same format, but a different magic ID to pass them back to the application as "completed"
17:15	_rmk_	again, checksummed
17:16	dv_	yes. and the reason why it is marked as "broken" in the code is that there is no way to be 100% sure that the frames were fully consumed by that point
17:18	_rmk_	actually, using that BMM passing method there is. you pass the physical address via XShmPutImage - at that point, the X server owns the buffer until it returns its physical address to you. Once it has, you own the buffer again.
17:20	dv_	according to the code, not in all cases. see http://dev.laptop.org/git/users/dsd/gst-plugins-vmetaxv/tree/vmetaxvimagesink.c#n1040
17:21	_rmk_	frankly, that comment's crap - look at it, they've commented out the code which receives the buffers back and _then_ complain that they don't know when the X server/driver has done with it.
17:22	_rmk_	and those changes break overlay display.
17:23	dv_	yeah, they just delete the buffer immediately now
17:23	_rmk_	with overlay, the frame is displayed until you replace it with a different frame, and if you're doing "zero copy" you can't free the buffer until you've replaced it and _know_ that it's been replaced.
17:23	_rmk_	yes, which is total trash
17:24	_rmk_	sure, if you use the GPU to copy it in the X server driver, then the buffer is done with once XvShmPutImage returns
17:24	_rmk_	but with overlay that's definitely not the case
17:24	dv_	agreed
17:24	dv_	also, I still do not see how you can recognize if the driver supports this hack. the IDs are only used for passing buffers . I am unfamiliar with xv/xshm - does the API complain that it does not recognize the ID at some point?
17:24	_rmk_	you can't
17:25	dv_	It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over. <- then I misunderstood this
17:25	_rmk_	if it doesn't and you try using this method, you'll get the contents of the entire SHM buffer displayed
17:25	dv_	oh
17:25	dv_	which is just a bunch of bytes
17:26	dv_	in other words, the magic ID + the physical addresses + the checksum will be interpreted as pixel data, right?
17:26	_rmk_	jnettlet is wrong there. If you pass the X server this magic SHM buffer with the magic word and physical addresses, and the X server doesn't support it, it will dutifully send the entire contents of the SHM buffer to the display
17:26	_rmk_	yes
17:26	dv_	oh. ugly, but at least it doesnt crash
17:27	_rmk_	note that even using the BMM method, the SHM buffer must be big enough to store the full sized image
17:27	_rmk_	even though it only passes a few (maybe 4) words
17:27	dv_	as a fallback? or does the driver to internal copies to it?
17:27	_rmk_	{ magic, 1, phys, checksum }
17:28	_rmk_	the X server checks the size of the SHM buffer to validate the XvShmPutImage call way before the DDX driver gets a look-in
17:28	dv_	oh, I see
17:29	dv_	and what do you think, would v4l be ultimately better here?
17:30	_rmk_	no, just a saner API - but... there's still this big question over "when has the display done with my buffer"
17:31	_rmk_	if you have sane and fast buffer allocation, then what you can do is forget about having the app free the buffer...
17:31	_rmk_	or you do and you have refcounting on the buffer
17:32	_rmk_	basically, with dmabuf/drm...
17:32	dv_	the refcounting is done by gstreamer already
17:32	_rmk_	if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system
17:32	dv_	this is in fact how I do things currently
17:32	_rmk_	gstreamer refcounting is useless here
17:32	dv_	oh you mean x11-side refcounts
17:32	dv_	x11/v4l/whateverapi side
17:33	_rmk_	no, I'm talking kernel here... dmabuf is all kernel
17:33	_rmk_	so...
17:33	_rmk_	if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system. once its imported into DRM, you can drop your reference to it in the exporting subsystem.
17:33	dv_	yes, the application can consider it "done"
17:34	dv_	hmm
17:34	dv_	that would be nice indeed
17:34	_rmk_	when DRM has finished with it, DRM is done with its object, and frees its own object - that causes the import to be dropped, which drops the refcount on the exporting subsystem, and finally frees the buffer
17:36	_rmk_	that's what's possible to do... personally though, in my vlc/libva driver code, I keep 16 picture buffers in a fifo structure, and cycle them round without freeing - that's more than enough to avoid any problems
17:37	_rmk_	but then it doesn't matter in vlc because vlc itself copies the data out and doesn't do the "zero copy" thing with X
17:37	dv_	yes, I do something similar with a custom gstbufferpool