Patrick Düggelin, Voice isolation, speech transcription and speaker re-identication in video, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
Speech is a salient information channel in recorded media, usually containing relevant semantic information complementing the visual signal. In a video retrieval setting, the speech signal can be transcribed automatically to enable spoken document retrieval by text query. Even though not the only factor, automatic transcription performance is the most important for the quality of such a retrieval system. In this work, we first assess the transcription quality of current state-of-the-art ASR systems and quantify the errors such systems make on a realistic dataset. We then examine if audio-visual speech enhancement methods can be used to improve the transcription quality. Based on these two preliminary studies' findings, we build three spoken document retrieval pipelines to index videos by what was said. We evaluate these systems on a set of manually captioned YouTube videos and find that speech enhancement slightly increases retrieval performance. |
|
Michèle Fundneider, Person Re-Identication in and Across Videos, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
The goal of person re-identification (re-id) is to recognize all instances of a particular person from an image in a gallery of images or videos. So far, research was mostly focused on the re-id of pedestrians in surveillance cameras. Person re-id is not only useful in surveillance scenarios, but also for video analysis and multimedia retrieval applications, wherein all types of videos are relevant. In order to recognize people in videos, a person detection step must be carried out before the re-id step. However, these two tasks pursue opposing goals, which is why one-step methods that combine these tasks are particularly suitable for person search. We analyze two such one-step methods of person search, Online Instance Matching (OIM) and Norm-Aware Embedding (NAE), and test how well they perform on a movie-based dataset. Multi-Object Tracking (MOT) is another task suitable for identifying and tracking several people within a video. Here, FairMOT and JDE are very effective and fast, we test both methods to find out which one gives us better re-identification results. |
|
Martin Sterchi, Cristina Sarasua, Rolf Grütter, Abraham Bernstein, Outbreak detection for temporal contact data, Applied Network Science, Vol. 6 (1), 2021. (Journal Article)
Epidemic spreading is a widely studied process due to its importance and possibly grave consequences for society. While the classical context of epidemic spreading refers to pathogens transmitted among humans or animals, it is straightforward to apply similar ideas to the spread of information (e.g., a rumor) or the spread of computer viruses. This paper addresses the question of how to optimally select nodes for monitoring in a network of timestamped contact events between individuals. We consider three optimization objectives: the detection likelihood, the time until detection, and the population that is affected by an outbreak. The optimization approach we use is based on a simple greedy approach and has been proposed in a seminal paper focusing on information spreading and water contamination. We extend this work to the setting of disease spreading and present its application with two example networks: a timestamped network of sexual contacts and a network of animal transports between farms. We apply the optimization procedure to a large set of outbreak scenarios that we generate with a susceptible-infectious-recovered model. We find that simple heuristic methods that select nodes with high degree or many contacts compare well in terms of outbreak detection performance with the (greedily) optimal set of nodes. Furthermore, we observe that nodes optimized on past periods may not be optimal for outbreak detection in future periods. However, seasonal effects may help in determining which past period generalizes well to some future period. Finally, we demonstrate that the detection performance depends on the simulation settings. In general, if we force the simulator to generate larger outbreaks, the detection performance will improve, as larger outbreaks tend to occur in the more connected part of the network where the top monitoring nodes are typically located. A natural progression of this work is to analyze how a representative set of outbreak scenarios can be generated, possibly taking into account more realistic propagation models. |
|
Luca Rossetto, Werner Bailer, Abraham Bernstein, Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations, In: MultiMedia Modeling, Springer, Cham, p. 605 - 616, 2021. (Book Chapter)
Experimental evaluations dealing with visual known-item search tasks, where real users look for previously observed and memorized scenes in a given video collection, represent a challenging methodological problem. Playing a searched “known” scene to users prior to the task start may not be sufficient in terms of scene memorization for re-identification (i.e., the search need may not necessarily be successfully “implanted”). On the other hand, enabling users to observe a known scene played in a loop may lead to unrealistic situations where users can exploit very specific details that would not remain in their memory in a common case. To address these issues, we present a proof-of-concept implementation of a new visual known-item search task presentation methodology that relies on a recently introduced deep saliency estimation method to limit the amount of revealed visual video contents. A filtering process predicts and subsequently removes information which in an unconstrained setting would likely not leave a lasting impression in the memory of a human observer. The proposed presentation setting is compliant with a realistic assumption that users perceive and memorize only a limited amount of information, and at the same time allows to play the known scene in the loop for verification purposes. The new setting also serves as a search clue equalizer, limiting the rich set of present exploitable content features in video and thus unifies the perceived information by different users. The performed evaluation demonstrates the feasibility of such a task presentation by showing that retrieval is still possible based on query videos processed by the proposed method. We postulate that such information incomplete tasks constitute the necessary next step to challenge and assess interactive multimedia retrieval systems participating at visual known-item search evaluation campaigns. |
|
Luca Rossetto, Ralph Gasser, Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, Tomas Soucek, Phuong Anh Nguyen, Paolo Bolettieri, Andreas Leibetseder, Stefanos Vrochidis, Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019, IEEE transactions on multimedia, Vol. 23, 2021. (Journal Article)
Despite the fact that automatic content analysis has made remarkable progress over the last decade - mainly due to significant advances in machine learning - interactive video retrieval is still a very challenging problem, with an increasing relevance in practical applications. The Video Browser Showdown (VBS) is an annual evaluation competition that pushes the limits of interactive video retrieval with state-of-the-art tools, tasks, data, and evaluation metrics. In this paper, we analyse the results and outcome of the 8th iteration of the VBS in detail. We first give an overview of the novel and considerably larger V3C1 dataset and the tasks that were performed during VBS 2019. We then go on to describe the search systems of the six international teams in terms of features and performance. And finally, we perform an in-depth analysis of the per-team success ratio and relate this to the search strategies that were applied, the most popular features, and problems that were experienced. A large part of this analysis was conducted based on logs that were collected during the competition itself. This analysis gives further insights into the typical search behavior and differences between expert and novice users. Our evaluation shows that textual search and content browsing are the most important aspects in terms of logged user interactions. Furthermore, we observe a trend towards deep learning based features, especially in the form of labels generated by artificial neural networks. But nevertheless, for some tasks, very specific content-based search features are still being used. We expect these findings to contribute to future improvements of interactive video search systems. |
|
Muhammad Saad, Abraham Bernstein, Michael Hanspeter Böhlen, Daniele Dell'Aglio, Single Point Incremental Fourier Transform on 2D Data Streams, In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), IEEE Xplore, New York, p. 852 - 863, 2021. (Book Chapter)
In radio astronomy, antennas monitor portions of the sky to collect radio signals. The antennas produce data streams that are of high volume and velocity (2.5 GB/s) and the inverse Fourier transform is used to convert the collected signals into sky images that astrophysicists use to conduct their research. Applying the inverse Fourier transform in a streaming setting, however, is not ideal since its computational complexity
is quadratic in the size of the image.
In this article, we propose the Single Point Incremental Fourier Transform (SPIFT), a novel incremental algorithm to produce sequences of sky images. SPIFT computes the Fourier transform for a new signal in a linear number of complex multiplications by exploiting twiddle factors, multiplicative constant coefficients. We prove that twiddle factors are periodic and show how circular shifts can be exploited to reuse multiplication results. The cost of the additive operations can be curbed by exploiting the embarrassingly parallel nature of the additions, which modern big data streaming frameworks can leverage to compute slices of the image in parallel. Our experiments suggest that SPIFT can efficiently generate sequences of sky images: it computes the complex multiplications 4 to 12x faster than the Discrete Fourier Transform, and its parallelisation of the additive operations shows linear speedup. |
|
Joachim Baumann, Driving Argument Mining with the Help of the Crowd: Crowdsourcing Argumentative Annotations in Scientific Papers, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
Huge volumes of human knowledge are available in many different data sources, many of which contain thoughtful and well-reasoned arguments in the form of natural language text. Along with recent advances in machine learning (ML) techniques, researchers have increasingly started to investigate possibilities to automatically extract argumentative components and the relations between them – a process which is called Argument Mining (AM). As an emerging research area, one of the major challenges in AM research is the lack of annotated datasets. These datasets are needed as training data as well as for benchmark experiments. With a focus on scientific publications, we implemented a system that could be used to create gold standard datasets for AM with the help of hundreds of thousands of ordinary workers (i.e. the crowd). Our system includes two types of tasks, one for the annotation of argument components and one for the annotation of argumentative relations that hold between those components. To evaluate and improve our system, we conducted experiments for both of these task types with 70 participants on the crowdsourcing platform Amazon Mechanical Turk. Detecting argumentative components and relations is a very complex task, especially for untrained, non-expert crowdworkers. We found that by introducing a quality assurance filter mechanism, it is possible to detect high-performing workers and also to detain workers who are expected to perform poorly from participating. In this way, it is possible, to some extent, to steer the quality of the crowd-annotated dataset, in exchange for money and time – money, because workers need to complete the task that will determine whether they will be filtered out or not, and time, because filtering out workers results in a smaller workforce, meaning it could take longer for all annotation tasks to be completed by the crowd. Our work denotes another step towards an effective interaction between researchers and the crowd in the field of AM and, thereby, decisively contributes to an emerging research area. |
|
Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, Abraham Bernstein, Implementations in Machine Ethics A Survey, ACM Computing Surveys, Vol. 53 (6), 2020. (Journal Article)
Increasingly complex and autonomous systems require machine ethics to maximize the benefits and minimize the risks to society arising from the new technology. It is challenging to decide which type of ethical theory to employ and how to implement it effectively. This survey provides a threefold contribution. First, it introduces a trimorphic taxonomy to analyze machine ethics implementations with respect to their object (ethical theories), as well as their nontechnical and technical aspects. Second, an exhaustive selection and description of relevant works is presented. Third, applying the new taxonomy to the selected works, dominant research patterns, and lessons for the field are identified, and future directions for research are suggested. |
|
Matthias Baumgartner, Luca Rossetto, Abraham Bernstein, Towards Using Semantic-Web Technologies for Multi-Modal Knowledge Graph Construction, In: MM '20: The 28th ACM International Conference on Multimedia, ACM, New York, NY, USA, 2020-11-12. (Conference or Workshop Paper published in Proceedings)
While a multitude of approaches for extracting semantic information from multimedia documents has emerged in recent years, isolating any form of holistic semantic representation from a larger type of document, such as a movie, is not yet feasible. In this paper we present our approaches used in the first instance of the Deep Video Understanding Challenge, using a combination of several multi-modal detectors and an integration scheme informed by methods from the semantic web context in order to determine the capabilities limitations of currently available methods for the extraction of semantic relations between the characters and locations relevant to the narrative of a movie. |
|
Ralph Gasser, Luca Rossetto, Silvan Heller, Heiko Schuldt, Cottontail DB: An Open Source Database System for Multimedia Retrieval and Analysis, In: MM '20: The 28th ACM International Conference on Multimedia, ACM, New York, NY, USA, 2020-11-12. (Conference or Workshop Paper published in Proceedings)
Multimedia retrieval and analysis are two important areas in "Big data" research. They have in common that they work with feature vectors as proxies for the media objects themselves. Together with metadata such as textual descriptions or numbers, these vectors describe a media object in its entirety, and must therefore be considered jointly for both storage and retrieval.
In this paper we introduce Cottontail DB, an open source database management system that integrates support for scalar and vector attributes in a unified data and query model that allows for both Boolean retrieval and nearest neighbour search. We demonstrate that Cottontail DB scales well to large collection sizes and vector dimensions and provide insights into how it proved to be a valuable tool in various use cases ranging from the analysis of MRI data to realizing retrieval solutions in the cultural heritage domain. |
|
Romana Pernisch, Mirko Serbak, Daniele Dell' Aglio, Abraham Bernstein, ChImp: Visualizing Ontology Changes and their Impact in Protégé, In: Visualization and Interaction for Ontologies and Linked Data, co-located with ISWC2020, CEUR-WS.org, 2020-11-02. (Conference or Workshop Paper published in Proceedings)
Today, ontologies are an established part of many applications and research.
However, ontologies evolve over time, and ontology editors---engineers and domain experts---need to be aware of the consequences of changes while editing.
Ontology editors might not be fully aware of how they are influencing consistency, quality, or the structure of the ontology, possibly causing applications to fail.
To support editors and increase their sensitivity towards the consequences of their actions, we conducted a user survey to elicit preferences for representing changes, e.g., with ontology metrics such as number of classes and properties.
Based on the survey, we developed ChImp---a Protégé plug-in to display information about the impact of changes in real-time.
During editing of the ontology, ChImp lists the applied changes, checks and displays the consistency status, and reports measures describing the effect on the structure of the ontology.
Akin to software IDEs and integrated testing approaches, we hope that displaying such metrics will help to improve ontology evolution processes in the long run. |
|
Mirko Serbak, Protégé Plugin for Change and Impact Visualization, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
With the emergence of the Semantic Web, the application of ontologies has increased in many different fields. Along with that, the development of ontologies has become an active and diverse research field. One yet unexplored aspect is that many ontology developers are unaware of the consequences of their modifications (Pernischová et al., 2020). To address this problem, this thesis presents ChImp, a Protégé plugin that displays change impact information about an ontology. Furthermore, the thesis also covers an evaluation with a technical analysis and a user experiment. The technical evaluation resulted in the conclusion that the plugin is generally stable and expected to scale to large ontologies. The user experiment showed that developers generally like the visualization of the plugin. The thesis was not able to determine if the plugin conveys a perceived information benefit. |
|
Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Abraham Bernstein, A Knowledge Graph-based System for Retrieval of Lifelog Data, In: International Semantic Web Conference, CEUR-WS, 2020-11-01. (Conference or Workshop Paper published in Proceedings)
|
|
Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont, Are You Watching Closely? Content-Based Retrieval of Hand Gestures, In: Proceedings of the 2020 International Conference on Multimedia Retrieval, ACM Digital library, New York, NY, USA, 2020-10-26. (Conference or Workshop Paper published in Proceedings)
|
|
Daniela Flüeli, MARG: Automatic Visualization of a Data Science Notebook's Narrative: Further Development of a Prototype, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Bachelor's Thesis)
Computational notebooks' high flexibility concerning code organization and execution optimally supports the generally non-linear and iterative way of data scientists' work and is, therefore, a tool they use frequently. However, the same flexibility makes many notebooks difficult to comprehend.
This bachelor thesis presents the Jupyter extension MARG 2.0, a visualization plugin, which aims to improve notebooks' comprehensibility. It offers the user an interactive and dynamic tree diagram that visualizes the notebook cells' workflow structure and allows them to keep track of their exploration. The tree shows additional information for the individual cells, such as their position in the linear cell sequence, their place in the workflow, the type of the data science activity performed in them, their execution numbers, and the code's rationale and intent in them. The visualization facilitates navigating and orientating oneself within a notebook during and after development. The additional information can be entered and modified directly by the user via the MARG user interface, whereupon the tree diagram is updated dynamically. MARG also includes a dashboard that can be used to analyze the development of a computer notebook. |
|
Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, Abraham Bernstein, Implementations in Machine Ethics: A Survey, In: ArXiv.org, No. 07573, 2020. (Working Paper)
Increasingly complex and autonomous systems require machine ethics to maximize the benefits and minimize the risks to society arising from the new technology. It is challenging to decide which type of ethical theory to employ and how to implement it effectively. This survey provides a threefold contribution. First, it introduces a trimorphic taxonomy to analyze machine ethics implementations with respect to their object (ethical theories), as well as their nontechnical and technical aspects. Second, an exhaustive selection and description of relevant works is presented. Third, applying the new taxonomy to the selected works, dominant research patterns, and lessons for the field are identified, and future directions for research are suggested. |
|
Daniele Dell'Aglio, Abraham Bernstein, Differentially private stream processing for the semantic web, In: The Web Conference 2020, ACM, New York, NY, USA, 2020-09-20. (Conference or Workshop Paper published in Proceedings)
Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site. |
|
David Lay, Knowledge Graph Driven Text Generation Using Transformers, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
Understanding the semantics and interpreting the information inside a knowledge graph is challenging for an untrained user. To ease the access to this knowledge, we investigate how natural language-like sentences can be generated from a sequence of knowledge graph entities and relations between them. Whereas early work is based on template-like
architectures or specialized encoder-decoder architectures, this work focuses on the use of Transformers and large pretrained language models. To deal with real-world knowledge graphs and text across many different domains we incorporate the T-REx dataset aligning Wikidata entities and relations with Wikipedia articles. We compare the performance between baseline models and netuned large pretrained language models on the task of generating Wikipedia alike sentences. Additionally, we show the impact of using an input sequence of Wikidata IDs over an input sequence of the corresponding labels. By training over 60 different model configurations, we do an exhaustive parameter search
to investigate our models. Results suggest that netuning a pretrained language model outperforms the trained baseline model with respect to generating natural language-like sentences. Furthermore, we show that training using entity IDs over their respective labels requires task-specific adaptions with which the proposed models have difficulties. |
|
Santiago Cepeda, Mining Data Management Tasks in Computational Notebooks: an Empirical Analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
The aim of this thesis is to further our understanding of how data scientist work, specifically with regards to data management tasks. The motivation behind this goal is the prevalent gap in respect to the empirical evidence showcasing concrete data management tasks in data science, and the role which it plays in relation to the entire data science process. Furthermore, the main focus has been narrowed down to analyze specifically data cleaning and data integration tasks within data management. This goal was achieved by labelling, mining and applying statistical tests to real-world data science notebooks. A keyword labelling system was created in the process, which was able to identify and label multiple types of cells within notebooks. The end results were three different annotated datasets. This constitutes one dataset for each notebook type identified during this thesis: simple descriptive, descriptive mining and predictive mining notebooks.
Based on the empirical analysis, it can be concluded that on average there are 6.56 total data cleaning tasks, and 5.38 total data integration tasks per notebook across all notebook types. Furthermore, there are on average between 5.7 to 6.9 files being imported inside of a notebook. The results also indicate that data cleaning amounts on average between 10.18\% and 10.98\% of an entire notebook, depending on the notebook type . For data integration tasks it is between 9.55\% and 11.31\%. This research also backs Krishnan et al. (2016) claim that data cleaning is a non-linear and iterative process. Moreover, this thesis has shown that data integration as well, is a non-linear and iterative process.
References
Krishnan, S., Haas, D., Franklin, M. J., and Wu, E. (2016). Towards reliable interactive
data cleaning: A user survey and recommendations. In Proceedings of the Workshop
on Human-In-the-Loop Data Analytics, pages 1-5. |
|
Silvan Heller, Loris Sauter, Heiko Schuldt, Luca Rossetto, Multi-Stage Queries and Temporal Scoring in Vitrivr, In: 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, 2020-08-06. (Conference or Workshop Paper published in Proceedings)
The increase in multimedia data brings many challenges for retrieval systems, not only in terms of storage and processing requirements but also with respect to query formulation and retrieval models. Querying approaches which work well up to a certain size of a multimedia collection might start to decrease in performance when applied to larger volumes of data. In this paper, we present two extensions made to the retrieval model of the open-source content-based multimedia retrieval stack vitrivr which enable a user to formulate more precise queries which can be evaluated in a staged manner, thereby improving the result quality without sacrificing the system’s overall flexibility. Our retrieval model has shown its scalability on V3C1, a video collection encompassing approx. 1000 hours of video. |
|