The Deep Learning Lab at RPTU

Recent Publications

  • Khan et al. (2023). Learning Attention Propagation for Compositional Zero-Shot Learning. WACV. Compositional zero-shot learning aims to recognize unseen compositions of seen visual primitives of object classes and their states. While all primitives (states and objects) are observable during training in some combination, their complex interaction makes this task especially hard. For example, wet changes the visual appearance of a dog very differently from a bicycle. Furthermore, we argue that relationships between compositions go beyond shared states or objects. A cluttered office can contain a busy table; even though these compositions don't share a state or object, the presence of a busy table can guide the presence of a cluttered office. We propose a novel method called Compositional Attention Propagated Embedding (CAPE) as a solution. The key intuition to our method is that a rich dependency structure exists between compositions arising from complex interactions of primitives in addition to other dependencies between compositions. CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions. In the challenging generalized compositional zero-shot setting, we show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
  • Hashmi et al. (2023). BoxMask: Revisiting Bounding Box Supervision for Video Object Detection. WACV. We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we pro-pose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods. The code will be available at https://github. com/khurramHashmi/BoxMask.
  • Hussain et al. (2023). Development of Cost-Effective and Easily Replicable Robust Weeding Machine—Premiering Precision Agriculture in Pakistan. Machines. Weed management has become a highly labor-intensive activity, which is the reason for decreased yields and high costs. Moreover, the lack of skilled labor and weed-resistant herbicides severely impact the agriculture sector and food production, hence increasing the need for automation in agriculture. The use of agricultural robots will help in the assurance of higher yields and proactive control of the crops. This study proposes a laser-based weeding vehicle with a unique mechanical body that is adjustable relative to the field structure, called the Robot Operating System (ROS) based robust control system, and is customizable, cost-effective and easily replicable. Hence, an autonomous-mobile-agricultural robot with a 20 watt laser has been developed for the precise removal of weed plants. The assembled robot's testing was conducted in the agro living lab. The field trials have demonstrated that the robot takes approximately 23.7 h at the linear velocity of 0.07 m/s for the weeding of one acre plot. It includes 5 s of laser to kill one weed plant. Comparatively, the primitive weeding technique is highly labor intensive and takes several days to complete an acre plot area. The data presented herein reflects that implementing this technology could become an excellent approach to removing unwanted plants from agricultural fields. This solution is relatively cost-efficient and provides an alternative to expensive human labor initiatives to deal with the increased labor wages.
  • Afzal et al. (2022). DeHyFoNet: Deformable Hybrid Network for Formula Detection in Scanned Document Images. This work presents an approach for detecting mathematical formulas in scanned document images. The proposed approach is end-to-end trainable. Since many OCR engines cannot reliably work with the formulas, it is essential to isolate them to obtain the clean text for information extraction from the document. Our proposed pipeline comprises a hybrid task cascade network with deformable convolutions and a Resnext101 backbone. Both of these modifications help in better detection. We evaluate the proposed approaches on the ICDAR-2017 POD and Marmot datasets and achieve an overall accuracy of 96% for the ICDAR-2017 POD dataset. We achieve an overall reduction of error of 13%. Furthermore, the results on Marmot datasets are improved for the isolated and embedded formulas. We achieved an accuracy of 98.78% for the isolated formula and 90.21% overall accuracy for embedded formulas. Consequently, it results in an error reduction rate of 43% for isolated and 17.9% for embedded formulas.
  • Kanchi et al. (2022). EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data. Applied Sciences. Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network (HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3482 from scratch. Therefore, we outperform the state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.
  • Hashmi et al. (2022). Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments. Sensors. In recent years, due to the advancements in machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvements in deep learning, traditional approaches, such as sliding windows and manual feature selection techniques, have been replaced with deep learning techniques. However, object detection algorithms face a problem when performed in low light, challenging weather, and crowded scenes, similar to any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE, and achieve a mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.
  • Kallempudi et al. (2022). Toward Semi-Supervised Graphical Object Detection in Document Images. Future Internet. The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.
  • Naik et al. (2022). Investigating Attention Mechanism for Page Object Detection in Document Images. Applied Sciences. Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables and figures. This paper addresses the localization problem and classification of graphical objects that visually summarize vital information in documents. Furthermore, this work examines the benefit of incorporating attention mechanisms in different object detection networks to perform page object detection on scanned document images. The model is designed with a Pytorch-based framework called Detectron2. The proposed pipelines can be optimized end-to-end and exhaustively evaluated on publicly available datasets such as DocBank, PublayNet, and IIIT-AR-13K. The achieved results reflect the effectiveness of incorporating the attention mechanism for page object detection in documents.
  • Khan et al. (2022). Three-Dimensional Reconstruction from a Single RGB Image Using Deep Learning: A Review. J. Imaging. Performing 3D reconstruction from a single 2D input is a challenging problem that is trending in literature. Until recently, it was an ill-posed optimization problem, but with the advent of learning-based methods, the performance of 3D reconstruction has also significantly improved. Infinitely many different 3D objects can be projected onto the same 2D plane, which makes the reconstruction task very difficult. It is even more difficult for objects with complex deformations or no textures. This paper serves as a review of recent literature on 3D reconstruction from a single view, with a focus on deep learning methods from 2018 to 2021. Due to the lack of standard datasets or 3D shape representation methods, it is hard to compare all reviewed methods directly. However, this paper reviews different approaches for reconstructing 3D shapes as depth maps, surface normals, point clouds, and meshes; along with various loss functions and metrics used to train and evaluate these methods.
  • Minouei et al. (2022). Continual Learning for Table Detection in Document Images. Applied Sciences. The growing amount of data demands methods that can gradually learn from new samples. However, it is not trivial to continually train a network. Retraining a network with new data usually results in a phenomenon called “catastrophic forgetting”. In a nutshell, the performance of the model on the previous data drops by learning from the new instances. This paper explores this issue in the table detection problem. While there are multiple datasets and sophisticated methods for table detection, the utilization of continual learning techniques in this domain has not been studied. We employed an effective technique called experience replay and performed extensive experiments on several datasets to investigate the effects of catastrophic forgetting. The results show that our proposed approach mitigates the performance drop by 15 percent. To the best of our knowledge, this is the first time that continual learning techniques have been adopted for table detection, and we hope this stands as a baseline for future research.
To see more publications, please visit our publications page.

Project/Seminar Topics in WS 2022/2023

We are offering the following project (8 CP) and seminar (4 CP) topics in the WS 2022/2023.

Visualization Recommender

Do any of these topics interest you? If you are interested in working with us, please write us an email. You can find more details on this page, including a list of projects and seminars our students have completed in the past. If you have your own topic idea in the field of Deep Learning, you can also write us about that!

Research Projects

Our team is also working on a number of research and programming projects. If you are interested in contributing to any of these projects as a developer or a research assistant, please contact us or drop by our office.

Active Projects

Completed Projects

Our History

The MindGarage originates in a movement of students, teachers, and friends who are interested in Deep Learning.

Winter 2013
Marcus Liwicki becomes W3 substitute Professor (as Prof. Breuel left) at the University of Kaiserslautern and delivers lectures on Artificial Neural Networks and Data Mining
Summer 2015
The substitute Professorship ends as all PhD students finished their work - he concentrates on his position as “Maitre Assistant” at the University of Fribourg. As no lecturer offers a lecture on neural networks, Marcus guides the students through a self-study course on neural networks. However, Marcus continues supervising motivated Master students and mentors them for their future carreer.
Summer 2016
Highly motivated by students wish to have a subject on most recent Deep Learning technologies and strongly supported by Werner Weiss and his company Insiders Technologies GmbH, Marcus decided to establish a new lecture on Very Deep Learning. This lecture covers bleeding-edge technologies as well as deep investigation and analysis of Neural Networks and other Deep Learning strategies.
Autumn 2016
In order to let students do meaningful exercices, Insiders offers high-end GPU computers and the University of Kaiserslautern provides Marcus with three rooms which can be used for the exercises as well as for project an Master students supervised by Marcus. We call this independent “lab” the MindGarage
The Future
We (Team and Supporters) strongly believe that the MindGarage is just at it starting point and will become the perfect place for motivated students for discovering and exploring deep learning, evolving their ideas, investigating novel research ideas, and bringing newest technologies close to the industrial application!

Frequently Asked Questions

What is the MindGarage?

The MindGarage is a lab for deep learning activities under the guidance of Marcus Liwicki. It is connected to the teaching activities of Marcus in the area of deep learning, including the lecture "Very Deep Learning", his supervised projects and theses, and individual studies.

What is the purpose of the MindGarage?

The purpose of the MindGarage is to provide a platform for students and researchers to conduct research in the area of deep learning. The MindGarage is also a place where students can meet and discuss ideas.

How can I become a part of the MindGarage team?

If you are a student at the University of Kaiserslautern, it is quite simple: Take the Very Deep Learning lecture offered at the university and tell us you want to join our team. Our doors are open for highly motivated students who want to do cutting edge deep learning research.

How can I join MindGarage if I am not a student of RPTU Kaiserslautern?

Contact us with resume and cover letter containing your motivation for joining the team and we will get in touch with you.


At MindGarage, we believe that creativity and innovation are essential for advancing the field of Artificial Intelligence. That's why we provide an open and unconstrained environment for highly motivated students to explore the possibilities of Deep Learning. We encourage freedom of thought and creativity in tackling challenging problems, and we're always on the lookout for talented individuals to join our team. If you're passionate about AI and want to contribute to groundbreaking research in Deep Learning, we invite you to learn more about our lab and our projects.


Gottlieb-Daimler-Str. 48 (48-462),
67663 Kaiserslautern

Copyright © 2023 RPTU. All rights reserved.

Contact | Imprint | Privacy Policy