IRC log of #cubox of Tue 30 Jul 2013. All times are in CEST < Back to index

07:08 jnettlet dv_, One of the GoogleTV boxes runs a Marvell Chip, Vizio made an android tablet that had the same chip as the XO 1.75, Panasonic has a new tablet based on the mmp3
07:12 jnettlet Both the Asus Cube and Vizio Co-Star Google TV boxes use the Armada 1500 that sports a vMeta co-processor
08:54 shesselba jnettlet, dv_: IIRC all 2nd gen googletvs comprise the Armada 1500. But be careful, it's using some "secure" bootloader.
08:55 shesselba I have been playing with sony's nsz-gs7. Except that I have serial output and can enforce spi boot, I haven't got far. Maybe that usb boot hack from chromecast also works
08:57 shesselba look for the bootloader sources gtvhacker mentioned, it gives some impression of the emmc layout. but there is some encryption scheme on most of the data stored
10:41 dv_ ah okay
10:42 dv_ jnettlet, shesselba_away: I ask because I need somebody to test my GStreamer 1.0 vmeta plugin on non-Dove hardware
10:43 dv_ there are some distinctions inside the plugins made by marvell. for example, on non-Dove hardware, vmeta is suspended when paused, and resumed when playback continues
10:44 dv_ and there is no explanation why this isnt done on Dove. also, on Dove, after each completed (= decoded) frame, Dove is suspended, vdec_os_api_suspend_ready() is called, and then vmeta is immediately resumed. again, no explanation why, and why only on Dove.
10:45 _rmk_ maybe they measured the power usage and found one scheme better for one SoC than the other
10:52 dv_ some comments would have been helpful in the code..
10:52 dv_ but I wonder how power usage could be improved by suspending and immediately resuming after frame completion
10:54 _rmk_ depends how much state has to be reloaded, and what the effect of suspending it is (iow, is it just stopping clocks or removing power too)
11:01 rabeeh the state reload should be immediate; should be a single pointer to some sort of firmware
11:02 rabeeh but then; if vmeta had some caches inside it then the pointer is just to get it up and running; but caches needs to be filled in back again
11:03 dv_ hm
11:04 _rmk_ I disagree - looking at how many register writes the closed source libs do via the open source libvmeta
11:04 dv_ the part marvell's code is here: https://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta/gstvmetadec.c#n2770
11:05 _rmk_ its not just one or two writes that happen to the hardware, there's sometimes hundreds of writes
11:05 dv_ my point is: do I really need to do this suspend-and-immediate-resume for dove?
11:06 dv_ I do not like to make dove/non-dove distinctions in my code, and wish to avoid it if possible
11:06 _rmk_ for a first shot, I'd just make it always do that :)
11:07 rabeeh vmeta has both clocking gating and power down features
11:08 rabeeh dv_: do you know if that suspend stops the clocks? or powers down the unit?
11:08 dv_ nope
11:08 dv_ well okay. I will make this weird suspend-resume stuff and the *actual* suspend-resume (which happens when changing vom playing to paused state and vice versa) configurable by properties
11:09 dv_ rabeeh: good point though. will check
11:10 rabeeh dv_: namely; does it call - vdec_os_api_power_on() or vdec_os_api_clock_on()?
11:10 dv_ no
11:10 dv_ neither does marvell's code
11:11 rabeeh oh; i just found a small surprize in my kernel -
11:11 rabeeh drivers/uio/uio_vmeta.c
11:11 rabeeh int vmeta_power_on(struct vmeta_instance *vi)
11:11 rabeeh {
11:11 rabeeh return 0; // Rabeeh - hack
11:11 rabeeh down_read(&vi->sem);
11:11 rabeeh ...
11:11 dv_ haha
11:19 bencoh :?
11:42 _rmk_ not sure why that's a surprise - I asked you about that last year
11:43 _rmk_ dv: which of the closed source libs are you using?
11:43 dv_ miscgen, vmeta, vmetahal, codecvmetadec
11:44 _rmk_ so you're using DecodeFrame_Vmeta() then...
11:44 dv_ yes
11:44 dv_ initially I wanted to use vdec_api.h , but this one does not even have constants for codec types etc.
11:44 _rmk_ that internally issues the vdec_os_api_power_on()/vdec_os_api_clock_on() calls
11:46 _rmk_ you can find that by observing the behaviour & call trace from the open source libvmeta back up
11:46 dv_ hm. true
11:47 dv_ also, trying out suspend/resume when switching between pause and playback , it seems that suspend sometimes freezes the application, sometimes it doesnt
14:01 jnettlet dv_, sorry I missed you been running around today. On our hardware we are stopping the clocks and powering down the hardware on both CLK_OFF and POWER_OFF commands. There are #ifdefs in the uio_driver to have separate control over these.
14:29 dv_ jnettlet: okay
14:30 dv_ jnettlet: so, do you have any idea why suspend/resume is necessary after each completed frame?
14:31 dv_ my guess, after what _rmk_ said, is that DecodeFrame_Vmeta() sometimes turns off power & clk
14:31 dv_ (sometimes after a frame was completed)
14:32 dv_ and therefore this suspend/resume thing makes sure it does not stay that way
14:32 dv_ unfortunately I didnt have the time to trace libvmeta .... will do that when I can
14:33 dv_ oh, and does the armada 510 also support *en*coding? I have seen marvell slides where only "decoding" was listed for the 500 series and en/decoding for 600 and later
14:44 jnettlet dv_, I don't know if it really has to. I was wondering if it was some sort of additional power saving attempt.
14:47 dv_ I'll leave it in for now then
14:48 dv_ do you have access to an OLPC?
14:48 dv_ and why are there multiple gst-plugins-marvell versions for the different OLPC ones? what are the relevant differences?
14:58 _rmk_ dv: it's probably all caused by no one maintaining a central repository for it, so everyone just grabs a copy of one of them, modifies it a bit and there's no focus to recombine them back together
15:00 dv_ yeah, makes sense
15:02 _rmk_ I'm also debating about augmenting the vdec_os API with functions which provide buffers with physical addresses only - this need to have a process virtual address associated with each buffer is craaaaaazy
15:04 _rmk_ that only comes from someone's broken idea in the BMM stuff that on allocation, you're given a physical address by the kernel, but on free you need to give the kernel your process virtual address for that buffer - which of course you successfully mmap'd
15:24 dv_ well, as long as there is a function for getting a virtual address for the physical one
15:24 dv_ otherwise it will be difficult to get to the data
15:24 _rmk_ for things like the reference buffers, it doesn't need to be mapped
15:24 dv_ yeah, but for input, it needs to
15:25 _rmk_ for input and maybe output depending on what you do with the decoded frames
15:25 dv_ yes
15:25 dv_ but I'd be happy with something like vdec_map / vdec_unmap
15:26 dv_ or vdec_get_virt_addr . both are okay
15:31 _rmk 15:31 * _rmk_ needs to reboot the cubox with kernel profiling enabled... but...
15:31 _rmk_ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15:31 _rmk_ 9734 cubox 20 0 167m 57m 36m S 29.0 8.0 0:58.51 vlc
15:31 _rmk_ Cpu(s): 27.2%us, 9.2%sy, 0.0%ni, 55.4%id, 0.0%wa, 2.4%hi, 5.8%si, 0.0%st
15:32 _rmk_ that's with AC3 output via spdif
15:32 _rmk_ 448kb/s A52 to be precise
15:33 dv_ nice. is this 5.1 ?
15:33 dv_ (I am not familiar with AC3)
15:33 _rmk_ yep
15:33 dv_ and, vlc built with iwmmxt enabled in libavcodec?
15:34 dv_ I was thinking about how ac3 is the one case where hw acceleration might pay off on cortex A8 class machines
15:34 dv_ *hw acceleration for audio decoding
15:34 _rmk_ it's my own vlc and libavcodec, libva and vmeta
15:35 _rmk_ I have a "HD audio rush" box connected to the spdif which can do non-audio decode
15:37 _rmk_ my el-cheapo tv can't cope with the AC3 stream, but then its got soo crappy speakers on it anyway that the volume is permanently set to -?dB
15:41 dv_ I found it funny the other day that video was using less CPU% than audio (thanks to vmeta and AC3)
15:43 jnettlet dv_, sorry was out walking the dogs while weather permitted.
15:43 jnettlet I do have XO-1.75's and XO-4's for testing
15:44 jnettlet The different versions are due to linking to different binaries, and also kernel interface versions. the XO-1.75 is using iWMMXt accelerated binaries and the XO-4 is using NEON
15:45 jnettlet although the XO-4 can also use the iWMMXt binaries. I never got a chance to profile the two instruction sets against one another. We figured NEON is better supported so we will just go with that.
15:46 jnettlet dv_, we have found audio using more cpu than video in a lot of our testing.
15:48 dv_ uh what?
15:48 dv_ I thought they all use the armada chips
15:48 dv_ so some armada chips have NEON?
16:00 jnettlet yep the new mmp3/pxa2128 or whatever you want to call it does. Supports both iWMMXt and NEON
16:00 dv_ whew. nice.
16:00 dv_ also, about the encoding I mentioned earlier. any idea
16:00 dv_ ?
16:01 dv_ it would otherwise be pointless for me to try to write an encoder as well, if the cubox cant encode
16:02 jnettlet yes the vMeta chip does support encoding. The newest marvell gstreamer plugins have a semi-working version of it implemented.
16:03 dv_ and the newest plugins are in dev.laptop.org/dsd/
16:03 dv_ uh no, wrong urk
16:03 dv_ *url
16:03 dv_ this http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/
16:04 jnettlet looks like he has stripped out the encoder plugin.
16:04 jnettlet 1 sec
16:05 dv_ no wait
16:05 dv_ switch the branch to mmp3
16:05 dv_ http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta?h=mmp3
16:07 jnettlet yep they are under mmp3
16:07 dv_ hmm strange how there are different elements for each bitstream
16:07 jnettlet the source in that branch is not mmp3 specific just the binary code
16:08 jnettlet well you need to tell the encoder what to output.
16:08 jnettlet and each format takes different parameters
16:08 dv_ hmm actually thats a good point :)
16:08 dv_ you could probably also do it using explicit caps
16:08 dv_ but thats rather awkward to do
16:09 dv_ alright, and the last bit: xv output. I wonder, why does vmetaxv - the bmmxvimagesink successor - now use libphycontmem instead of libbmm?
16:09 dv_ is phycontmem an attempt to fix libbmm design flaws?
16:11 jnettlet libphycontmem is an abstraction that Marvell introduced when switching from bmm to pmem. Which was good because it made it relatively painless to then switch to using ION memory manager.
16:12 jnettlet Really we could add a bmm backend to libphycontmem as well. Danial wrote it so it tries to auto-detect the api supported by the kernel
16:12 dv_ ah
16:12 dv_ and pmem no longer requires the vmem= kernel parameter?
16:13 jnettlet pmem does, but ION doesn't if you have it patched with Linaro's CMA backend.
16:13 dv_ either way, I'll use phycontmem instead of libbmm then.
16:13 dv_ (I also want to add a vmetaxv element for gstreamer 1.0)
16:14 dv_ what I dont like though is that this bmm hack for passing DMA buffers to the output without requiring a memcpy cannot be detected
16:14 dv_ that is, there is no way to tell if the driver supports this hack
16:15 jnettlet that needs to be cleaned up. I thought _rmk_ had written something to fix this.
16:17 jnettlet dv_, which driver? The xorg driver?
16:17 dv_ yes
16:17 jnettlet It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over.
16:18 jnettlet The hackiness is that the "id" needs to match between the gstreamer plugin and the xorg driver
16:18 jnettlet really this is where v4l dmabuf is supposed to kick in.
16:19 dv_ I didnt see any "id" in the modified xvimagesink
16:19 dv_ perhaps I overlooked it
16:19 dv_ which is my point. I am concerned that this would cause segfaults then
16:20 jnettlet not that I have seen. I think all the code just falls back to the old slow memcpy way if it doesn't match the id.
16:21 jnettle 16:21 * jnettlet has to go tend to his bbq. Be back in a half an hour or so
16:50 dv_ that needs to be cleaned up. I thought _rmk_ had written something to fix this. <- do you refer to his dmabuf/DRM patches?
16:51 _rmk_ that and my X server driver too
16:51 _rmk_ and converted bmmxvimagesink stuff
16:52 dv_ since i will port these changes to 1.0 , I should eventually sync up with you about these patches you've made
16:53 dv_ currently trying to understand the vmetaxv changes. this system with two magic IDs is ... weird.
16:54 rabeeh dv_: the first magic word is for queuing; the second one is for retrieving presented frames.
16:54 rabeeh dv_: crappy mechanism.
16:54 dv_ yeah, this I found out
16:54 dv_ i also found the note in the code about how broken this is :)
17:10 _rmk_ rabeeh isn't quite correct there.
17:11 _rmk_ first word is a magic word which indicates that it's the BMM buffer passing mechanism
17:12 _rmk_ next word is the count of buffers
17:12 _rmk_ and _then_ is a list of the physical addresses of the buffers
17:12 _rmk_ final word is a checksum on the entire thing
17:12 dv_ well, you could call this queuing I guess
17:14 dv_ but yes, what you describe is what I found in the code. one physical address is put in the list
17:14 _rmk_ yes, because how Marvell had this all working before is you could throw a load of buffers into the video driver and it switches between each one on each vsync
17:14 dv_ and the magic ID there is 0x13572468
17:15 _rmk_ and the video driver returns a list of physical addresses, which the X server then writes back into the SHM region in the same format, but a different magic ID to pass them back to the application as "completed"
17:15 _rmk_ again, checksummed
17:16 dv_ yes. and the reason why it is marked as "broken" in the code is that there is no way to be 100% sure that the frames were fully consumed by that point
17:18 _rmk_ actually, using that BMM passing method there is. you pass the physical address via XShmPutImage - at that point, the X server owns the buffer until it returns its physical address to you. Once it has, you own the buffer again.
17:20 dv_ according to the code, not in all cases. see http://dev.laptop.org/git/users/dsd/gst-plugins-vmetaxv/tree/vmetaxvimagesink.c#n1040
17:21 _rmk_ frankly, that comment's crap - look at it, they've commented out the code which receives the buffers back and _then_ complain that they don't know when the X server/driver has done with it.
17:22 _rmk_ and those changes break overlay display.
17:23 dv_ yeah, they just delete the buffer immediately now
17:23 _rmk_ with overlay, the frame is displayed until you replace it with a different frame, and if you're doing "zero copy" you can't free the buffer until you've replaced it and _know_ that it's been replaced.
17:23 _rmk_ yes, which is total trash
17:24 _rmk_ sure, if you use the GPU to copy it in the X server driver, then the buffer is done with once XvShmPutImage returns
17:24 _rmk_ but with overlay that's definitely not the case
17:24 dv_ agreed
17:24 dv_ also, I still do not see how you can recognize if the driver supports this hack. the IDs are only used for passing buffers . I am unfamiliar with xv/xshm - does the API complain that it does not recognize the ID at some point?
17:24 _rmk_ you can't
17:25 dv_ It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over. <- then I misunderstood this
17:25 _rmk_ if it doesn't and you try using this method, you'll get the contents of the entire SHM buffer displayed
17:25 dv_ oh
17:25 dv_ which is just a bunch of bytes
17:26 dv_ in other words, the magic ID + the physical addresses + the checksum will be interpreted as pixel data, right?
17:26 _rmk_ jnettlet is wrong there. If you pass the X server this magic SHM buffer with the magic word and physical addresses, and the X server doesn't support it, it will dutifully send the entire contents of the SHM buffer to the display
17:26 _rmk_ yes
17:26 dv_ oh. ugly, but at least it doesnt crash
17:27 _rmk_ note that even using the BMM method, the SHM buffer must be big enough to store the full sized image
17:27 _rmk_ even though it only passes a few (maybe 4) words
17:27 dv_ as a fallback? or does the driver to internal copies to it?
17:27 _rmk_ { magic, 1, phys, checksum }
17:28 _rmk_ the X server checks the size of the SHM buffer to validate the XvShmPutImage call way before the DDX driver gets a look-in
17:28 dv_ oh, I see
17:29 dv_ and what do you think, would v4l be ultimately better here?
17:30 _rmk_ no, just a saner API - but... there's still this big question over "when has the display done with my buffer"
17:31 _rmk_ if you have sane and fast buffer allocation, then what you can do is forget about having the app free the buffer...
17:31 _rmk_ or you do and you have refcounting on the buffer
17:32 _rmk_ basically, with dmabuf/drm...
17:32 dv_ the refcounting is done by gstreamer already
17:32 _rmk_ if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system
17:32 dv_ this is in fact how I do things currently
17:32 _rmk_ gstreamer refcounting is useless here
17:32 dv_ oh you mean x11-side refcounts
17:32 dv_ x11/v4l/whateverapi side
17:33 _rmk_ no, I'm talking kernel here... dmabuf is all kernel
17:33 _rmk_ so...
17:33 _rmk_ if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system. once its imported into DRM, you can drop your reference to it in the exporting subsystem.
17:33 dv_ yes, the application can consider it "done"
17:34 dv_ hmm
17:34 dv_ that would be nice indeed
17:34 _rmk_ when DRM has finished with it, DRM is done with its object, and frees its own object - that causes the import to be dropped, which drops the refcount on the exporting subsystem, and finally frees the buffer
17:36 _rmk_ that's what's possible to do... personally though, in my vlc/libva driver code, I keep 16 picture buffers in a fifo structure, and cycle them round without freeing - that's more than enough to avoid any problems
17:37 _rmk_ but then it doesn't matter in vlc because vlc itself copies the data out and doesn't do the "zero copy" thing with X
17:37 dv_ yes, I do something similar with a custom gstbufferpool