Conference Publications

Welcome to the Conference Papers section of MindGarage’s Publications page. Our Deep Learning lab has presented several papers at top-tier conferences worldwide, showcasing our latest research in the field of Artificial Intelligence, Machine Learning, and Neural Networks. Browse through our collection of conference papers below to learn more about our research findings and methodologies. Click on any paper to view its abstract, as well as links to the full paper and related resources. If you’re interested in collaborating with us or learning more about our research projects, don’t hesitate to get in touch using the contact form.


  • Khan et al. (2023). Learning Attention Propagation for Compositional Zero-Shot Learning. WACV. Compositional zero-shot learning aims to recognize unseen compositions of seen visual primitives of object classes and their states. While all primitives (states and objects) are observable during training in some combination, their complex interaction makes this task especially hard. For example, wet changes the visual appearance of a dog very differently from a bicycle. Furthermore, we argue that relationships between compositions go beyond shared states or objects. A cluttered office can contain a busy table; even though these compositions don't share a state or object, the presence of a busy table can guide the presence of a cluttered office. We propose a novel method called Compositional Attention Propagated Embedding (CAPE) as a solution. The key intuition to our method is that a rich dependency structure exists between compositions arising from complex interactions of primitives in addition to other dependencies between compositions. CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions. In the challenging generalized compositional zero-shot setting, we show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
  • Hashmi et al. (2023). BoxMask: Revisiting Bounding Box Supervision for Video Object Detection. WACV. We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we pro-pose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods. The code will be available at https://github. com/khurramHashmi/BoxMask.


  • Pontes-Filho et al. (2018). Bidirectional Learning for Robust Neural Networks. IJCNN. A multilayer perceptron can behave as a generative classifier by applying bidirectional learning (BL). It consists of training an undirected neural network to map input to output and vice-versa; therefore it can produce a classifier in one direction, and a generator in the opposite direction for the same data. In this paper, two novel learning techniques are introduced which use BL for improving robustness to white noise static and adversarial examples. The first method is bidirectional propagation of errors, which the error propagation occurs in backward and forward directions. Motivated by the fact that its generative model receives as input a constant vector per class, we introduce as a second method the hybrid adversarial networks (HAN). Its generative model receives a random vector as input and its training is based on generative adversarial networks (GAN). To assess the performance of BL, we perform experiments using several architectures with fully and convolutional layers, with and without bias. Experimental results show that both methods improve robustness to white noise static and adversarial examples, but have different behaviour depending on the architecture and task, being more beneficial to use the one or the other. Nevertheless, HAN using a convolutional architecture with batch normalization presents outstanding robustness, reaching state-of-the-art accuracy on adversarial examples of hand-written digits.
  • Kölsch et al. (2018). Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks. ICFHR. This paper introduces a very challenging dataset of historic German documents and evaluates Fully Convolutional Neural Network (FCNN) based methods to locate handwritten annotations of any kind in these documents. The handwritten annotations can appear in form of underlines and text by using various writing instruments, e.g., the use of pencils makes the data more challenging. We train and evaluate various end-to-end semantic segmentation approaches and report the results. The task is to classify the pixels of documents into two classes: background and handwritten annotation. The best model achieves a mean Intersection over Union (IoU) score of 95.6% on the test documents of the presented dataset. We also present a comparison of different strategies used for data augmentation and training on our presented dataset. For evaluation, we use the Layout Analysis Evaluator for the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts.


  • Afzal et al. (2017). Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification. ICDAR. We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDoc-Classifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400; 000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91:13% for the Tobacco-3482 dataset while earlier approaches reach only 77:6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90:97% is achieved, corresponding to a relative error reduction of 11:5%.
  • Kölsch et al. (2017). Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines. ICDAR. This paper presents an approach for real-time training and testing for document image classification. In production environments, it is crucial to perform accurate and (time-)efficient training. Existing deep learning approaches for classifying documents do not meet these requirements, as they require much time for training and fine-tuning the deep architectures. Motivated from Computer Vision, we propose a two-stage approach. The first stage trains a deep network that works as feature extractor and in the second stage, Extreme Learning Machines (ELMs) are used for classification. The proposed approach outperforms all previously reported structural and deep learning based methods with a final accuracy of 83.24 % on Tobacco-3482 dataset, leading to a relative error reduction of 25% when compared to a previous Convolutional Neural Network (CNN) based approach (DeepDocClassifier). More importantly, the training time of the ELM is only 1.176 seconds and the overall prediction time for 2, 482 images is 3.066 seconds. As such, this novel approach makes deep learning-based document classification suitable for large-scale real-time applications.
  • Kölsch et al. (2017). Multilevel Context Representation for Improving Object Recognition. ICDAR. In this work, we propose the combined usage of low- and high-level blocks of convolutional neural networks (CNNs) for improving object recognition. While recent research focused on either propagating the context from all layers, e.g. ResNet, (including the very low-level layers) or having multiple loss layers (e.g. GoogLeNet), the importance of the features close to the higher layers is ignored. This paper postulates that the use of context closer to the high-level layers provides the scale and translation invariance and works better than using the top layer only. In particular, we extend AlexNet and GoogLeNet by additional connections in the top n layers. In order to demonstrate the effectiveness of the proposed approach, we evaluated it on the standard ImageNet task. The relative reduction of the classification error is around 1-2% without affecting the computational cost. Furthermore, we show that this approach is orthogonal to typical test data augmentation techniques, as recently introduced by Szegedy et al. (leading to a runtime reduction of 144 during test time).
  • Singh et al. (2017). Transforming Sensor Data to the Image Domain for Deep Learning - an Application to Footstep Detection. IJCNN. Convolutional Neural Networks (CNNs) have become the state-of-the-art in various computer vision tasks, but they are still premature for most sensor data, especially in pervasive and wearable computing. A major reason for this is the limited amount of annotated training data. In this paper we propose the idea of leveraging the discriminative power of pre-trained deep CNNs on 2-dimensional sensor data by transforming the sensor modality to the visual domain. By three proposed strategies, 2D sensor output is converted into pressure distribution imageries. Then we utilize a pre-trained CNN for transfer learning on the converted imagery data. We evaluate our method on a gait dataset of floor surface pressure mapping. We obtain a classification accuracy of 87.66%, which outperforms the conventional machine learning methods by over 10%.


At MindGarage, we believe that creativity and innovation are essential for advancing the field of Artificial Intelligence. That's why we provide an open and unconstrained environment for highly motivated students to explore the possibilities of Deep Learning. We encourage freedom of thought and creativity in tackling challenging problems, and we're always on the lookout for talented individuals to join our team. If you're passionate about AI and want to contribute to groundbreaking research in Deep Learning, we invite you to learn more about our lab and our projects.


Gottlieb-Daimler-Str. 48 (48-462),
67663 Kaiserslautern

Copyright © 2023 RPTU. All rights reserved.

Contact | Imprint | Privacy Policy