Video Indexing Services

The purpose of video indexing in the IMCOP system is to get various video materials ready for fast and efficient retrieval as well as for selecting video sequences adequately to different requirements of IMCOP’s prospective recipients.
A vast variety of application dependent and thematically differentiated video contents that may be processed and indexed in the IMCOP platform lead to the problem of their proper annotation, which should be made with regard to application/user semantic requirements. According to the concept of microservices architecture, each IMCOP service should be internally coherent, independent from other services, as small as possible and should run in its own process within a specific end-to-end domain while maintaining the capability to communicate with other processes [1]. Regarding the above, the metadata enhancement tasks in the IMCOP platform have been designed as the scenarios where different and precisely specified MES services are involved in a scheduled manner. For example, with reference e.g. to [2], video indexing tasks can be divided between several IMCOP’s MES microservices consecutively responsible for: Shot Boundary Detection (SBD), abstraction of video shots and feature extraction. Of course, the IMCOP platform is capable to carry out many altered scenarios where different services (based on various approaches) may be involved.
Three different video indexing microservices have been developed, so far, to analyze the content of video shots and index them in the IMCOP platform. They can be applied to video shots and other basic as well as more complex video sequences where the SBD step can be omitted or (because of application requirements) is not needed.

In particular, they have been dedicated to:

identify faces and assign them to different classes (distinct characters/persons) regarding the similarity scores between them (namely, facerv service),
detect and recognize text objects (written words or sentences) within the natural content of video frames (namely, textdrv service),
recognize selected spoken words or phrases (sentences) on the base of applied speech recognition toolkit (namely, speechrv service).

[1] https://docs.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/service-oriented-architecture
[2] Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.W.: An integrated system for content-based video retrieval and browsing. Pattern Recognition 30(4), 643 { 658, (1997).

To look at them (to see them in action) you can follow the links given above (they will be added soon) an upload some short video shots in MP4 format, as for example:

Research@WSTKT

"Imagination is more important than knowledge"

Leave a Reply Cancel reply