07:08 | jnettlet | dv_, One of the GoogleTV boxes runs a Marvell Chip, Vizio made an android tablet that had the same chip as the XO 1.75, Panasonic has a new tablet based on the mmp3 |
07:12 | jnettlet | Both the Asus Cube and Vizio Co-Star Google TV boxes use the Armada 1500 that sports a vMeta co-processor |
08:54 | shesselba | jnettlet, dv_: IIRC all 2nd gen googletvs comprise the Armada 1500. But be careful, it's using some "secure" bootloader. |
08:55 | shesselba | I have been playing with sony's nsz-gs7. Except that I have serial output and can enforce spi boot, I haven't got far. Maybe that usb boot hack from chromecast also works |
08:57 | shesselba | look for the bootloader sources gtvhacker mentioned, it gives some impression of the emmc layout. but there is some encryption scheme on most of the data stored |
10:41 | dv_ | ah okay |
10:42 | dv_ | jnettlet, shesselba_away: I ask because I need somebody to test my GStreamer 1.0 vmeta plugin on non-Dove hardware |
10:43 | dv_ | there are some distinctions inside the plugins made by marvell. for example, on non-Dove hardware, vmeta is suspended when paused, and resumed when playback continues |
10:44 | dv_ | and there is no explanation why this isnt done on Dove. also, on Dove, after each completed (= decoded) frame, Dove is suspended, vdec_os_api_suspend_ready() is called, and then vmeta is immediately resumed. again, no explanation why, and why only on Dove. |
10:45 | _rmk_ | maybe they measured the power usage and found one scheme better for one SoC than the other |
10:52 | dv_ | some comments would have been helpful in the code.. |
10:52 | dv_ | but I wonder how power usage could be improved by suspending and immediately resuming after frame completion |
10:54 | _rmk_ | depends how much state has to be reloaded, and what the effect of suspending it is (iow, is it just stopping clocks or removing power too) |
11:01 | rabeeh | the state reload should be immediate; should be a single pointer to some sort of firmware |
11:02 | rabeeh | but then; if vmeta had some caches inside it then the pointer is just to get it up and running; but caches needs to be filled in back again |
11:03 | dv_ | hm |
11:04 | _rmk_ | I disagree - looking at how many register writes the closed source libs do via the open source libvmeta |
11:04 | dv_ | the part marvell's code is here: https://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta/gstvmetadec.c#n2770 |
11:05 | _rmk_ | its not just one or two writes that happen to the hardware, there's sometimes hundreds of writes |
11:05 | dv_ | my point is: do I really need to do this suspend-and-immediate-resume for dove? |
11:06 | dv_ | I do not like to make dove/non-dove distinctions in my code, and wish to avoid it if possible |
11:06 | _rmk_ | for a first shot, I'd just make it always do that :) |
11:07 | rabeeh | vmeta has both clocking gating and power down features |
11:08 | rabeeh | dv_: do you know if that suspend stops the clocks? or powers down the unit? |
11:08 | dv_ | nope |
11:08 | dv_ | well okay. I will make this weird suspend-resume stuff and the *actual* suspend-resume (which happens when changing vom playing to paused state and vice versa) configurable by properties |
11:09 | dv_ | rabeeh: good point though. will check |
11:10 | rabeeh | dv_: namely; does it call - vdec_os_api_power_on() or vdec_os_api_clock_on()? |
11:10 | dv_ | no |
11:10 | dv_ | neither does marvell's code |
11:11 | rabeeh | oh; i just found a small surprize in my kernel - |
11:11 | rabeeh | drivers/uio/uio_vmeta.c |
11:11 | rabeeh | int vmeta_power_on(struct vmeta_instance *vi) |
11:11 | rabeeh | { |
11:11 | rabeeh | return 0; // Rabeeh - hack |
11:11 | rabeeh | down_read(&vi->sem); |
11:11 | rabeeh | ... |
11:11 | dv_ | haha |
11:19 | bencoh | :? |
11:42 | _rmk_ | not sure why that's a surprise - I asked you about that last year |
11:43 | _rmk_ | dv: which of the closed source libs are you using? |
11:43 | dv_ | miscgen, vmeta, vmetahal, codecvmetadec |
11:44 | _rmk_ | so you're using DecodeFrame_Vmeta() then... |
11:44 | dv_ | yes |
11:44 | dv_ | initially I wanted to use vdec_api.h , but this one does not even have constants for codec types etc. |
11:44 | _rmk_ | that internally issues the vdec_os_api_power_on()/vdec_os_api_clock_on() calls |
11:46 | _rmk_ | you can find that by observing the behaviour & call trace from the open source libvmeta back up |
11:46 | dv_ | hm. true |
11:47 | dv_ | also, trying out suspend/resume when switching between pause and playback , it seems that suspend sometimes freezes the application, sometimes it doesnt |
14:01 | jnettlet | dv_, sorry I missed you been running around today. On our hardware we are stopping the clocks and powering down the hardware on both CLK_OFF and POWER_OFF commands. There are #ifdefs in the uio_driver to have separate control over these. |
14:29 | dv_ | jnettlet: okay |
14:30 | dv_ | jnettlet: so, do you have any idea why suspend/resume is necessary after each completed frame? |
14:31 | dv_ | my guess, after what _rmk_ said, is that DecodeFrame_Vmeta() sometimes turns off power & clk |
14:31 | dv_ | (sometimes after a frame was completed) |
14:32 | dv_ | and therefore this suspend/resume thing makes sure it does not stay that way |
14:32 | dv_ | unfortunately I didnt have the time to trace libvmeta .... will do that when I can |
14:33 | dv_ | oh, and does the armada 510 also support *en*coding? I have seen marvell slides where only "decoding" was listed for the 500 series and en/decoding for 600 and later |
14:44 | jnettlet | dv_, I don't know if it really has to. I was wondering if it was some sort of additional power saving attempt. |
14:47 | dv_ | I'll leave it in for now then |
14:48 | dv_ | do you have access to an OLPC? |
14:48 | dv_ | and why are there multiple gst-plugins-marvell versions for the different OLPC ones? what are the relevant differences? |
14:58 | _rmk_ | dv: it's probably all caused by no one maintaining a central repository for it, so everyone just grabs a copy of one of them, modifies it a bit and there's no focus to recombine them back together |
15:00 | dv_ | yeah, makes sense |
15:02 | _rmk_ | I'm also debating about augmenting the vdec_os API with functions which provide buffers with physical addresses only - this need to have a process virtual address associated with each buffer is craaaaaazy |
15:04 | _rmk_ | that only comes from someone's broken idea in the BMM stuff that on allocation, you're given a physical address by the kernel, but on free you need to give the kernel your process virtual address for that buffer - which of course you successfully mmap'd |
15:24 | dv_ | well, as long as there is a function for getting a virtual address for the physical one |
15:24 | dv_ | otherwise it will be difficult to get to the data |
15:24 | _rmk_ | for things like the reference buffers, it doesn't need to be mapped |
15:24 | dv_ | yeah, but for input, it needs to |
15:25 | _rmk_ | for input and maybe output depending on what you do with the decoded frames |
15:25 | dv_ | yes |
15:25 | dv_ | but I'd be happy with something like vdec_map / vdec_unmap |
15:26 | dv_ | or vdec_get_virt_addr . both are okay |
15:31 | _rmk | 15:31 * _rmk_ needs to reboot the cubox with kernel profiling enabled... but... |
15:31 | _rmk_ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |
15:31 | _rmk_ | 9734 cubox 20 0 167m 57m 36m S 29.0 8.0 0:58.51 vlc |
15:31 | _rmk_ | Cpu(s): 27.2%us, 9.2%sy, 0.0%ni, 55.4%id, 0.0%wa, 2.4%hi, 5.8%si, 0.0%st |
15:32 | _rmk_ | that's with AC3 output via spdif |
15:32 | _rmk_ | 448kb/s A52 to be precise |
15:33 | dv_ | nice. is this 5.1 ? |
15:33 | dv_ | (I am not familiar with AC3) |
15:33 | _rmk_ | yep |
15:33 | dv_ | and, vlc built with iwmmxt enabled in libavcodec? |
15:34 | dv_ | I was thinking about how ac3 is the one case where hw acceleration might pay off on cortex A8 class machines |
15:34 | dv_ | *hw acceleration for audio decoding |
15:34 | _rmk_ | it's my own vlc and libavcodec, libva and vmeta |
15:35 | _rmk_ | I have a "HD audio rush" box connected to the spdif which can do non-audio decode |
15:37 | _rmk_ | my el-cheapo tv can't cope with the AC3 stream, but then its got soo crappy speakers on it anyway that the volume is permanently set to -?dB |
15:41 | dv_ | I found it funny the other day that video was using less CPU% than audio (thanks to vmeta and AC3) |
15:43 | jnettlet | dv_, sorry was out walking the dogs while weather permitted. |
15:43 | jnettlet | I do have XO-1.75's and XO-4's for testing |
15:44 | jnettlet | The different versions are due to linking to different binaries, and also kernel interface versions. the XO-1.75 is using iWMMXt accelerated binaries and the XO-4 is using NEON |
15:45 | jnettlet | although the XO-4 can also use the iWMMXt binaries. I never got a chance to profile the two instruction sets against one another. We figured NEON is better supported so we will just go with that. |
15:46 | jnettlet | dv_, we have found audio using more cpu than video in a lot of our testing. |
15:48 | dv_ | uh what? |
15:48 | dv_ | I thought they all use the armada chips |
15:48 | dv_ | so some armada chips have NEON? |
16:00 | jnettlet | yep the new mmp3/pxa2128 or whatever you want to call it does. Supports both iWMMXt and NEON |
16:00 | dv_ | whew. nice. |
16:00 | dv_ | also, about the encoding I mentioned earlier. any idea |
16:00 | dv_ | ? |
16:01 | dv_ | it would otherwise be pointless for me to try to write an encoder as well, if the cubox cant encode |
16:02 | jnettlet | yes the vMeta chip does support encoding. The newest marvell gstreamer plugins have a semi-working version of it implemented. |
16:03 | dv_ | and the newest plugins are in dev.laptop.org/dsd/ |
16:03 | dv_ | uh no, wrong urk |
16:03 | dv_ | *url |
16:03 | dv_ | this http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/ |
16:04 | jnettlet | looks like he has stripped out the encoder plugin. |
16:04 | jnettlet | 1 sec |
16:05 | dv_ | no wait |
16:05 | dv_ | switch the branch to mmp3 |
16:05 | dv_ | http://dev.laptop.org/git/users/dsd/gst-plugins-marvell/tree/src/vmeta?h=mmp3 |
16:07 | jnettlet | yep they are under mmp3 |
16:07 | dv_ | hmm strange how there are different elements for each bitstream |
16:07 | jnettlet | the source in that branch is not mmp3 specific just the binary code |
16:08 | jnettlet | well you need to tell the encoder what to output. |
16:08 | jnettlet | and each format takes different parameters |
16:08 | dv_ | hmm actually thats a good point :) |
16:08 | dv_ | you could probably also do it using explicit caps |
16:08 | dv_ | but thats rather awkward to do |
16:09 | dv_ | alright, and the last bit: xv output. I wonder, why does vmetaxv - the bmmxvimagesink successor - now use libphycontmem instead of libbmm? |
16:09 | dv_ | is phycontmem an attempt to fix libbmm design flaws? |
16:11 | jnettlet | libphycontmem is an abstraction that Marvell introduced when switching from bmm to pmem. Which was good because it made it relatively painless to then switch to using ION memory manager. |
16:12 | jnettlet | Really we could add a bmm backend to libphycontmem as well. Danial wrote it so it tries to auto-detect the api supported by the kernel |
16:12 | dv_ | ah |
16:12 | dv_ | and pmem no longer requires the vmem= kernel parameter? |
16:13 | jnettlet | pmem does, but ION doesn't if you have it patched with Linaro's CMA backend. |
16:13 | dv_ | either way, I'll use phycontmem instead of libbmm then. |
16:13 | dv_ | (I also want to add a vmetaxv element for gstreamer 1.0) |
16:14 | dv_ | what I dont like though is that this bmm hack for passing DMA buffers to the output without requiring a memcpy cannot be detected |
16:14 | dv_ | that is, there is no way to tell if the driver supports this hack |
16:15 | jnettlet | that needs to be cleaned up. I thought _rmk_ had written something to fix this. |
16:17 | jnettlet | dv_, which driver? The xorg driver? |
16:17 | dv_ | yes |
16:17 | jnettlet | It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over. |
16:18 | jnettlet | The hackiness is that the "id" needs to match between the gstreamer plugin and the xorg driver |
16:18 | jnettlet | really this is where v4l dmabuf is supposed to kick in. |
16:19 | dv_ | I didnt see any "id" in the modified xvimagesink |
16:19 | dv_ | perhaps I overlooked it |
16:19 | dv_ | which is my point. I am concerned that this would cause segfaults then |
16:20 | jnettlet | not that I have seen. I think all the code just falls back to the old slow memcpy way if it doesn't match the id. |
16:21 | jnettle | 16:21 * jnettlet has to go tend to his bbq. Be back in a half an hour or so |
16:50 | dv_ | that needs to be cleaned up. I thought _rmk_ had written something to fix this. <- do you refer to his dmabuf/DRM patches? |
16:51 | _rmk_ | that and my X server driver too |
16:51 | _rmk_ | and converted bmmxvimagesink stuff |
16:52 | dv_ | since i will port these changes to 1.0 , I should eventually sync up with you about these patches you've made |
16:53 | dv_ | currently trying to understand the vmetaxv changes. this system with two magic IDs is ... weird. |
16:54 | rabeeh | dv_: the first magic word is for queuing; the second one is for retrieving presented frames. |
16:54 | rabeeh | dv_: crappy mechanism. |
16:54 | dv_ | yeah, this I found out |
16:54 | dv_ | i also found the note in the code about how broken this is :) |
17:10 | _rmk_ | rabeeh isn't quite correct there. |
17:11 | _rmk_ | first word is a magic word which indicates that it's the BMM buffer passing mechanism |
17:12 | _rmk_ | next word is the count of buffers |
17:12 | _rmk_ | and _then_ is a list of the physical addresses of the buffers |
17:12 | _rmk_ | final word is a checksum on the entire thing |
17:12 | dv_ | well, you could call this queuing I guess |
17:14 | dv_ | but yes, what you describe is what I found in the code. one physical address is put in the list |
17:14 | _rmk_ | yes, because how Marvell had this all working before is you could throw a load of buffers into the video driver and it switches between each one on each vsync |
17:14 | dv_ | and the magic ID there is 0x13572468 |
17:15 | _rmk_ | and the video driver returns a list of physical addresses, which the X server then writes back into the SHM region in the same format, but a different magic ID to pass them back to the application as "completed" |
17:15 | _rmk_ | again, checksummed |
17:16 | dv_ | yes. and the reason why it is marked as "broken" in the code is that there is no way to be 100% sure that the frames were fully consumed by that point |
17:18 | _rmk_ | actually, using that BMM passing method there is. you pass the physical address via XShmPutImage - at that point, the X server owns the buffer until it returns its physical address to you. Once it has, you own the buffer again. |
17:20 | dv_ | according to the code, not in all cases. see http://dev.laptop.org/git/users/dsd/gst-plugins-vmetaxv/tree/vmetaxvimagesink.c#n1040 |
17:21 | _rmk_ | frankly, that comment's crap - look at it, they've commented out the code which receives the buffers back and _then_ complain that they don't know when the X server/driver has done with it. |
17:22 | _rmk_ | and those changes break overlay display. |
17:23 | dv_ | yeah, they just delete the buffer immediately now |
17:23 | _rmk_ | with overlay, the frame is displayed until you replace it with a different frame, and if you're doing "zero copy" you can't free the buffer until you've replaced it and _know_ that it's been replaced. |
17:23 | _rmk_ | yes, which is total trash |
17:24 | _rmk_ | sure, if you use the GPU to copy it in the X server driver, then the buffer is done with once XvShmPutImage returns |
17:24 | _rmk_ | but with overlay that's definitely not the case |
17:24 | dv_ | agreed |
17:24 | dv_ | also, I still do not see how you can recognize if the driver supports this hack. the IDs are only used for passing buffers . I am unfamiliar with xv/xshm - does the API complain that it does not recognize the ID at some point? |
17:24 | _rmk_ | you can't |
17:25 | dv_ | It doesn't matter. If the xorg driver doesn't support it then it will always just copy the frame over. <- then I misunderstood this |
17:25 | _rmk_ | if it doesn't and you try using this method, you'll get the contents of the entire SHM buffer displayed |
17:25 | dv_ | oh |
17:25 | dv_ | which is just a bunch of bytes |
17:26 | dv_ | in other words, the magic ID + the physical addresses + the checksum will be interpreted as pixel data, right? |
17:26 | _rmk_ | jnettlet is wrong there. If you pass the X server this magic SHM buffer with the magic word and physical addresses, and the X server doesn't support it, it will dutifully send the entire contents of the SHM buffer to the display |
17:26 | _rmk_ | yes |
17:26 | dv_ | oh. ugly, but at least it doesnt crash |
17:27 | _rmk_ | note that even using the BMM method, the SHM buffer must be big enough to store the full sized image |
17:27 | _rmk_ | even though it only passes a few (maybe 4) words |
17:27 | dv_ | as a fallback? or does the driver to internal copies to it? |
17:27 | _rmk_ | { magic, 1, phys, checksum } |
17:28 | _rmk_ | the X server checks the size of the SHM buffer to validate the XvShmPutImage call way before the DDX driver gets a look-in |
17:28 | dv_ | oh, I see |
17:29 | dv_ | and what do you think, would v4l be ultimately better here? |
17:30 | _rmk_ | no, just a saner API - but... there's still this big question over "when has the display done with my buffer" |
17:31 | _rmk_ | if you have sane and fast buffer allocation, then what you can do is forget about having the app free the buffer... |
17:31 | _rmk_ | or you do and you have refcounting on the buffer |
17:32 | _rmk_ | basically, with dmabuf/drm... |
17:32 | dv_ | the refcounting is done by gstreamer already |
17:32 | _rmk_ | if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system |
17:32 | dv_ | this is in fact how I do things currently |
17:32 | _rmk_ | gstreamer refcounting is useless here |
17:32 | dv_ | oh you mean x11-side refcounts |
17:32 | dv_ | x11/v4l/whateverapi side |
17:33 | _rmk_ | no, I'm talking kernel here... dmabuf is all kernel |
17:33 | _rmk_ | so... |
17:33 | _rmk_ | if you can export a buffer with dmabuf, and import it into DRM, that takes a reference on the buffer in the exporting system. once its imported into DRM, you can drop your reference to it in the exporting subsystem. |
17:33 | dv_ | yes, the application can consider it "done" |
17:34 | dv_ | hmm |
17:34 | dv_ | that would be nice indeed |
17:34 | _rmk_ | when DRM has finished with it, DRM is done with its object, and frees its own object - that causes the import to be dropped, which drops the refcount on the exporting subsystem, and finally frees the buffer |
17:36 | _rmk_ | that's what's possible to do... personally though, in my vlc/libva driver code, I keep 16 picture buffers in a fifo structure, and cycle them round without freeing - that's more than enough to avoid any problems |
17:37 | _rmk_ | but then it doesn't matter in vlc because vlc itself copies the data out and doesn't do the "zero copy" thing with X |
17:37 | dv_ | yes, I do something similar with a custom gstbufferpool |