On 11/03/2015 05:28 AM, Mauro Carvalho Chehab wrote:
Those are the final version of the Etherpad notes we took during the Media Workshop at the Kernel Summit.
Please review.
Minor spelling corrections below. Looks good.
-- Shuah
I should be preparing the final report of the meeting latter this week.
Regards, Mauro
Media Summit Seoul 2015
Attendees list:
- Shuah Khan - Samsung OSG shuahkh@osg.samsung.com
- Laurent Pinchart - Ideas on Board
- Hans Verkuil - Cisco Systems Norway - hverkuil@xs4all.nl
- Javier Martinez Canillas - Samsung OSG
- Tomasz Figa - Google - tfiga@chromium.org
- Mauro Carvalho Chehab - Samsung OSG - mchehab@osg.samsung.com
- David Howells - Red Hat - dhowells@redhat.com
- Seung-Woo Kim - Samsung Software R&D Center - sw0312.kim@samsung.com
- Inki Dae - Samsung Software R&D Center - inki.dae@samsung.com
- Junghak Sung - Samsung Visual Display Business - jh1009.sung@samsung.com
- Geunyoung Kim - Samsung Visual Display Business - nenggun.kim@samsung.com
- Rany Kwon - Samsung Visual Display Business - rany.kwon@samsung.com
- Minsong Kim - Samsung Visual Display Business - ms17.kim@samsung.com
- Ikjoon Kim - Samsung Visual Display Business - ikjoon.kim@samsung.com
- Thiago - Samsung OSG
- Reynaldo - Samsung OSG
- Luis de Bittencourt - Samsung OSG
- Pawel Osciak - Google
- Vinod Koul - Intel
- Mark Brown - Linaro
- Arnd Bergmann - Linaro
- Pawel: Codec API:
Stream API
The original V4L2 codec API was developed along with the Exynos codec driver. As the device implements high-level operations in hardware the resulting API was high-level as well with drivers accepting unprocessed raw streams. This matches older ARM SoCs where CPU power wasn't deemed enough to implement stream parsing.
Drivers implement two V4L2 buffer queues, one on the uncompressed side and one on the compressed side. The two queues operate independently, without a 1:1 correspondence between consumed and produced buffers (for instance reference frames need to be accumulated when starting video encoding before producing any output or bitstream header parsed before CAPTURE buffers can be allocated). The mem2mem V4L2 kernel framework thus can't be used to implement such codec drivers as it hardcodes this 1:1 correspondence. This is a kernel framework issue, not a V4L2 userspace API issue.
(For stream API fundamentals see Pawel's slides)
- Frame API (Slice API)
CPUs are getting faster in the ARM world. The trend is to implement lower-level hardware codecs that require stream parsing on the CPU. CPU code needs to slice the stream, extract information, process them and pass the result to a shader-like device. This is the model used on Intel platforms and implemented in the VA API library.
Drivers still implement two V4L2 buffer queues, but the encoded stream is split into frames of slices, and a large number of codec-specific controls need to be set from parsed stream information.
Stream parsing and parameters calculation is better done in userspace. Userspace is responsible for managing reference frames and their life time, and for passing data to the codec in such a way that an input buffer wlil always produce an output buffer. The two queues operate together with a 1:1 correspondence between buffers. The mem2mem framework is thus usable.
input buffer will
Sources buffer contain only slice data (macroblocks + coefficient data). Controls contain information extracted from stream parsing, list of reference frames and DPB (Decoded Picture Buffer). The request API can be used to associate controls with source buffers.
Keeping references to reference frames is one of the remaining problems (especially with DMABUF, and possibly with MMAP in the future when we'll have the ability to destroy MMAP buffers while streaming). This problem also exists for the stream API. More discussion is needed to design a solution.
Pawel promissed to upstream the ChromeOS codec drivers code this year (and you should be also nagging Tomasz to do it...).
promised
For encoders header generation should probably be done in userspace as well as the code is complex and doesn't require a kernel implementation.
Userspace code should be implemented as libv4l2 plugins to interface between the frame API exposed by the kernel and a stream API exposed to applications.
- Status
See Pawel's slides.
- Discussion points
To be discussed on Tuesday at 11:30.
References:
Intel libVA SDK: https://bugs.freedesktop.org/show_bug.cgi?id=92533 Request API: https://lwn.net/Articles/641204/ Chromium Code (user): https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gp...
2: kABI
kABI needs more review by devs.
New kernel APIs need to be documented, adding e.g. new fields to undocumented structures doesn't require the submitter to document the full struct (although it obviously would be appreciated).
Suggest to use kernel doc for 'top-level' documentation: drivers/gpu/drm/drm_atomic_helper.c (look for "DOC: overview") and Documentation/DocBook/drm.tmpl (look for "!Pdrivers/gpu/drm/drm_atomic_helper.c overview").
3: uAPI for DVB
- videobuf2
No DMABUF support at the moment: recommended that that's added, at the very least planned in the API and likely implemented too.
Support for DVB output ('mem2mem demux') is also recommended (not necessary for the first version)
Streamon/off is automatically called in the DVB API: should this be available as ioctl? No need seen for this.
DVB framework needs to be extended to see whether userspace uses read() or stream I/O: V4L2 does this as well, so the same method should be used as V4L2.
No need to add (or remove) buffers at runtime (à la CREATE_BUF) is foreseen.
- SoC pipelines
Samsung SoCs have full hardware pipelines from the tuner output to the screen without requiring any intervention at runtime. This requires connecting a DVB device to a V4L2 or DRM device.
Hardware pipelines include hardware-configured scalers for seamless resolution change, in order to achieve real-time operation when the input resolution changes.
- v4l2_buffer & Y2038
Should we create a new structure and fix several other issues in one go ? Yes.
There might be a cache-related problem due to the way DMABUF is handled. DMABUF buffers are only unmapped when applications queue a different DMABUF fd on the same V4L2 buffer index. By that time the driver doesn't own access to the buffer but still tries to clean the cache. The root cause needs to be investigated.
A related issue is how V4L2 keeps DMABUF mappings around for performance improvement. This prevents buffer eviction by the GPU. This should be discussed with DRM developers and dma-buf maintainers.
Should we add support for sub-streams in a buffer ? Possibly for 3D capture, for other cases there's not so much interest. Use cases would be needed.
Formats are specified for buffers and makes it difficult to transport meta-data in a plane along with image planes. Redesign of v4l2_buffer and v4l2_plane should take the use case into account.
- poll() for output streams
The current poll() behaviour is not usable for codecs. As mem2mem is considered as the main use case for output devices we need to optimize for that, and implement write emulation specific code in poll() as a special case. We'll revert to the old implementation and update the documentation accordingly.
- poll for capture streams
q->waiting_for_buffers is v4l2 specific, but after the vb2 split it is now part of the vb2 core: should this be moved to the v4l2 part or be kept in the vb2 core? After discussing this the opinion is that we should make this v4l2 specific and allow DVB (and other subsystems) to use the standard behavior. Proposals are welcome on ways to make userspace select the 'waiting_for_buffers' behavior for V4L2 since it is awkward there as well (but needed for backwards compatibility)
CSC
CEC
REQUEST_BUFS: v4l2_format validation: expect that userspace calls TRY_FMT or similar to fill the format field, only use sizeimage field in drivers.
MC
Look at being able to set a topology using 'cat file >/dev/mediaX' (idea from alsa): depends on the details of the proposed format.
Atomic configuration updates across subsystems (MC for links, V4L2/ALSA for other parameters, ...) needs to be taken into account.
For audio devices routing changes ordering matters unless the hardware can apply routing changes atomically.
- VIDIOC_SUBDEV_QUERYCAP
Suggestion: MEDIA_IOC_INTERFACE_INFO: check that it is easy to implement in various subsystems (dvb, alsa, drm, iio)
- DELETE_BUFFERS
Yes, we'd like to have it. Ditto for more than 32 buffers (don't limit this).
- Workshop format
If we have two days available, then post a CFP to various mailinglists (including apps like gstreamer).
mailing lists
Co-locate with alsa workshop.
media-workshop mailing list media-workshop@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/media-workshop