2023
2
Abstract
Predicting pedestrian motion is essential for developing socially-aware robots that interact in a crowded environment. While the natural visual perspective for a social interaction setting is an egocentric view, the majority of existing work in trajectory prediction therein has been investigated purely in the top-down trajectory space. To support first-person view trajectory prediction research, we present T2FPV, a method for constructing high-fidelity first-person view (FPV) datasets given a real-world, top-down trajectory dataset; we showcase our approach on the ETH/UCY pedestrian dataset to generate the egocentric visual data of all interacting pedestrians, creating the T2FPV-ETH dataset. In this setting, FPV-specific errors arise due to imperfect detection and tracking, occlusions, and field-of-view (FOV) limitations of the camera. To address these errors, we propose CoFE, a module that further refines the imputation of missing data in an end-to-end manner with trajectory forecasting algorithms. Our method reduces the impact of such FPV errors on downstream prediction performance, decreasing displacement error by more than 10% on average. To facilitate research engagement, we release our T2FPV-ETH dataset and software tools.
T2FPV: Dataset and Method for Correcting First-Person View Errors in Pedestrian Trajectory Prediction
2023
Benjamin Stoler, Meghdeep Jana, Soonmin Hwang, Jean Oh
International Conference of Intelligent Robots and Systems (IROS)
Conference
Trajectory Prediction
First-Person View
Simulation
Abstract
Despite the increasing popularity of LiDAR sensors, perception algorithms using 3D LiDAR data struggle with the ‘sensor-bias problem’. Specifically, the performance of perception algorithms significantly drops when an unseen specification of LiDAR sensor is applied at test time due to the domain discrepancy. This paper presents a fast and flexible LiDAR augmentation method for the semantic segmentation task, called ‘LiDomAug’. It aggregates raw LiDAR scans and creates a LiDAR scan of any configurations with the consideration of dynamic distortion and occlusion, resulting in instant domain augmentation. Our on-demand augmentation module runs at 330 FPS, so it can be seamlessly integrated into the data loader in the learning framework. In our experiments, learning-based approaches aided with the proposed LiDomAug are less affected by the sensor-bias issue and achieve new state-of-the-art domain adaptation performances on SemanticKITTI and nuScenes dataset without the use of the target domain data. We also present a sensor-agnostic model that faithfully works on the various LiDAR configurations.
Instant Domain Augmentation for LiDAR Semantic Segmentation
2023
Kwonyoung Ryu*, Soonmin Hwang*, Jaesik Park
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference
Sensor-Bias Problem
Point Cloud Semantic Segmentation
Efficient Data Augmentation
2022
1
Abstract
Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.
TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning
2022
Daechan Han, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi
IEEE Robotics and Automation Letters (RA-L)
Conference
Transformer
Monocular Depth Estimation
2019
1
Abstract
Despite recent advances in machine learning, it is still challenging to realize real-time and accurate detection in images. The recently proposed StairNet detector (Sanghyun et al. WACV 2018), one of the strongest one-stage detectors, tackles this issue by using a SSD in conjunction with a top-down enrichment module. However, the StairNet approach misses the finer localization information which can be obtained from the lower layer and lacks a feature selection mechanism, which can lead to suboptimal features during the merging step. In this paper, we propose what is termed the gated bidirectional feature pyramid network (GBFPN), a simple and effective architecture that provides a significant improvement over the baseline model, StairNet. The overall network is composed of three parts: a bottom-up pathway, a top-down pathway, and a gating module. Given the multi-scale feature pyramid of deep convolutional network, two separate pathways introduce both finer localization cues and high-level semantics. In each pathway, the gating module dynamically re-weights the features before the combining step, transmitting only the informative features. Placing GBFPN on top of a basic one-stage detector SSD, our method shows state-of-the-art results.
Gated Bidirectional Feature Pyramid Network for Accurate One-shot Detection
2019
Sanghyun Woo, Soonmin Hwang, Ho-Deok Jang, In So Kweon
Machine Vision and Applications (MVA)
Conference
Object Detection
One-stage Object Detector
2018
3
Abstract
Robustness is one of the desired properties of many computer vision algorithms. Most existing efforts toward robustness have focused on the natural changes in nature, e.g. day/night and various weather conditions. However, real-world challenges do occur from accidental situations that are not expected at the training phase. In this paper, we address a practical multispectral fusion issue as unexpected image contamination in day and night conditions. Based on our observation that the change of a few parameters in the fusion part is enough to achieve good performance in such conditions, we propose a fault-tolerant training strategy for normal and abnormal conditions of multispectral pedestrian detection. Through the extensive experiments on KAIST multispectral benchmarks, the proposed method significantly reduces the performance degradation in the unseen contamination by a large margin. Furthermore, our model shows comparable performance with state-of-the-art methods in normal conditions.
Pedestrian Detection in the Wild: A Fault Tolerant Approach
2018
Soonmin Hwang, Namil Kim, Yukyung Choi, In So Kweon
Preprint
Multispectral
Pedestrian Detection
Fault Tolerant
Abstract
This paper presents all-day dataset of paired a multi-spectral 2d vision (RGB-Thermal and RGB stereo) and 3d lidar (Velodyne 32E) data collected in campus and urban environments. Over all days, we successfully captured 50km sequences of synchronized multiple sensors at 25Hz using a fully aligned visible and thermal device, high resolution stereo visible cameras, and high accuracy GPS/IMU inertial navigation system. Therefore, this dataset contains various illumination conditions (day, night, sunset, and sunrise) of multimodal data, which are of particular interest in autonomous driving-assistance tasks such as localization (place recognition, 6D SLAM), moving object detection (pedestrian or car) and scene understanding (drivable region). In this paper, we provide the instruction of our dataset, including a recoding platform, the data format and the software for MATLAB and C++, demonstrating how to load and use the dataset.
KAIST Multispectral Recognition Dataset in Day and Night
2018
Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyunghwan An, In So Kweon
IEEE Transactions on Intelligent Transportation Systems (T-ITS)
Conference
Multispectral
Aligned Visible and Thermal
LIDAR
GPS/IMU
Benchmark
Abstract
For real-world understanding, it is essential to perceive in all-day conditions including cases which are not suitable for RGB sensors, especially at night. Beyond limitations, we innovate multispectral solution as depth estimation from illumination invariant thermal sensor without an additional depth sensor. Based on the analysis of multispectral properties and relevance of a depth prediction, we propose the efficient and novel multi-task framework called Multispectral Transfer Network (MTN) to estimate depth image from a single thermal image. By exploiting geometric priors and chromaticity, our model can generate the pixel-wise depth image in unsupervised manners. Moreover,we propose a new type of multitask module called Interleaver as a way to incorporate the chromaticity and fine details of skip-connections into depth estimation framework without sharing feature layers. Lastly, we explain a novel technical approach for stably training and covering a large disparity and extending the thermal image to data-driven methods for all-day conditions. In experiments, we demonstrate better performance and generalization ability in depth estimation through our proposed multispectral stereo dataset, including various driving conditions.
Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision
2018
Namil Kim*, Yukyung Choi*, Soonmin Hwang, In So Kweon
Association for the Advancement of Artificial Intelligence (AAAI)
Conference
Multispectral Transfer Network
Depth Estimation
Thermal Image
Single Image
2016
3
Abstract
With the advent of commodity autonomous mobiles, it is becoming increasingly prevalent to recognize under extreme conditions such as night, erratic illumination conditions. This need has caused the approaches using multimodal sensors, which could be complementary to each other. The choice for the thermal camera provides a rich source of temperature information, less affected by changing illumination or background clutters. However, existing thermal cameras have a relatively smaller resolution than RGB cameras that has trouble for fully utilizing the information in recognition tasks. To mitigate this, we aim to enhance the low-resolution thermal image according to the extensive analysis of existing approaches. To this end, we introduce Thermal Image Enhancement using Convolutional Neural Network (CNN), called in TEN, which directly learns an end-to-end mapping a single low resolution image to the desired high resolution image. In addition, we examine various image domains to find the best representative of the thermal enhancement. Overall, we propose the first thermal image enhancement method based on CNN guided on RGB data. We provide extensive experiments designed to evaluate the quality of image and the performance of several object recognition tasks such as pedestrian detection, visual odometry, and image registration.
Thermal Image Enhancement using Convolution Neural Network
2016
Yukyung Choi*, Namil Kim*, Soonmin Hwang*, In So Kweon
International Conference of Intelligent Robots and Systems (IROS)
Conference
Thermal Image Enhancement
Super Resolution
Abstract
For many robotics and intelligent vehicle applications, detection and tracking multiple objects(DATMO) is one of the most important components. However, most of the DATMO applications have difficulty in applying real-world applications due to high computational complexity. In this paper, we propose an efficient DATMO framework that fully employs the complementary information from the color camera and the 3D LIDAR. For high efficiency, we present a segmentation scheme by using both 2D and 3D information which gives accurate segments very quickly. In our experiments, we show that our framework can achieve the faster speed (∼4Hz) than the state-of-the-art methods reported in KITTI benchmark (>1Hz).
Fast Multiple Objects Detection and Tracking Fusing Color Camera and 3D LIDAR for Intelligent Vehicles
2016
Soonmin Hwang*, Namil Kim*, Yukyung Choi, Seokju Lee, In So Kweon
International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)
Conference
Object Detection
One-stage Object Detector
Abstract
Drivable region detection is challenging since various types of road, occlusion or poor illumination condition have to be considered in a outdoor environment, particularly at night. In the past decade, many efforts have been made to solve these problems, however, most of the already existing methods are designed for visible light cameras, which are inherently inefficient under low light conditions. In this paper, we present a drivable region detection algorithm designed for thermal-infrared cameras in order to overcome the aforementioned problems. The novelty of the proposed method lies in the utilization of on-line road initialization with a highly scene-adaptive sampling mask. Furthermore, our prior road information extraction is tailored to enforce temporal consistency among a series of images. In this paper, we also propose a large number of experiments in various scenarios (on-road, off-road and cluttered road). A total of about 6000 manually annotated images are made available in our website for the research community. Using this dataset, we compared our method against multiple state-of-the-art approaches including convolutional neural network (CNN) based methods to emphasize the robustness of our approach under challenging situations.
Thermal-Infrared based Drivable Region Detection
2016
Jae Shin Yoon, Kibaek Park, Namil Kim, Soonmin Hwang, Yukyung Choi, Francois Rameau, In So Kweon
IEEE Intelligent Vehicles Symposium (IV)
Conference
Thermal Image
Drivable Region
2015
6
Abstract
One-stage object detectors such as SSD or YOLO already have shown promising accuracy with small memory footprint and fast speed. However, it is widely recognized that one-stage detectors have difficulty in detecting small objects while they are competitive with two-stage methods on large objects. In this paper, we investigate how to alleviate this problem starting from the SSD framework. Due to their pyramidal design, the lower layer that is responsible for small objects lacks strong semantics(e.g contextual information). We address this problem by introducing a feature combining module that spreads out the strong semantics in a top-down manner. Our final model StairNet detector unifies the multi-scale representations and semantic distribution effectively. Experiments on PASCAL VOC 2007 and PASCAL VOC 2012 datasets demonstrate that StairNet significantly improves the weakness of SSD and outperforms the other state-of-the-art one-stage detectors.
StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection
2015
Sanghyun Woo, Soonmin Hwang, In So Kweon
IEEE Winter Conference on Applications of Computer Vision (WACV)
Conference
Object Detection
One-stage Object Detector
Abstract
In this paper, we introduce a low-cost multicamera synchronization approach. Our system is lowcost to make, easy to handle and convenient to use. Proposed system can be employed in single- and multi- spectral various cameras, and also used in any devices which support the external trigger. As a result, our system shows a good performance comparing with hand-eye synchronization, and we also shows that synchronized images are enough to use in ADAS systems.
Low-Cost Synchronization for Multispectral Cameras
2015
Soonmin Hwang, Yukyung Choi, Namil Kim, Kibaek Park, Jae Shin Yoon, In So Kweon
International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)
Conference
Multispectral
Synchronization
Abstract
In this paper, we introduce a novel calibration pattern board for visible and thermal camera calibration. Our pattern board is easy to make, handy to move and efficient to heat. Also, it preserves a uniform thermal radiance for a long time. Proposed method can be employed in single- and multi- spectral camera system, and also used in the splitter or stereo camera system. As a result, our method shows a good performance comparing with previous works, and we also shows that the calibrated system is enough to use in ADAS systems.
Geometrical Calibration of Multispectral Calibration
2015
Namil Kim, Yukyung Choi, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, In So Kweon
International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)
Conference
Multispectral
Calibration
Abstract
As people are becoming interested in paintings, various userinteractive search systems have been presented in recent times. Many systems encourage users to search paintings by prior knowledge on paintings. We discover the limitation for existing methods on how well the query is represented by the user, and propose a simple, yet effective way to search the painting by exploiting the color to express human visual memory. To achieve our goal, we suggest color clustering based on human color perception, and hierarchical metric learning to accommodate the locality of colors. With userinteractive drawing through learned colors, the user completes the abstract image to resemble the visual memory. We show that our system is easy to use, fast to process, accurate to search and fully extensible to cover deviation among users.
Artrieval: Painting Retrieval Without Expert Knowledge
2015
Namil Kim, Yukyung Choi, Soonmin Hwang, In So Kweon
IEEE Conference on Image Processing (ICIP)
Conference
Painting Retrieval
Interactive Search
Color Clustering
Metric Learning
Abstract
With the increasing interest in pedestrian detection, pedestrian datasets have also been the subject of research in the past decades. However, most existing datasets focus on a color channel, while a thermal channel is helpful for detection even in a dark environment. With this in mind, we propose a multispectral pedestrian dataset which provides well aligned color-thermal image pairs, captured by beam splitter-based special hardware. The color-thermal dataset is as large as previous color-based datasets and provides dense annotations including temporal correspondences. With this dataset, we introduce multispectral ACF, which is an extension of aggregated channel features (ACF) to simultaneously handle color-thermal image pairs. Multispectral ACF reduces the average miss rate of ACF by 15%, and achieves another breakthrough in the pedestrian detection task.
Multispectral Pedestrian Detection: Benchmark Dataset and Baselines
2015
Soonmin Hwang,, Jaesik Park, Namil Kim, Yukyung Choi, In So Kweon
Conference
Pedestrian Detection
Multispectral
Benchmark
Abstract
This paper introduces all-day dataset captured from KAIST campus for use in mobile robotics, autonomous driving, and recognition researches. Totally, we captured 42km sequences at 15∼100Hz using multiple sensor modalities such as fully aligned visible and thermal devices, high resolution stereo visible cameras, and a high accuracy GPS/IMU inertial navigation system. Despites of a particular scenario, we provide the first aligned visible/thermal all-day dataset, including various illumination conditions: day, night, sunset, and sunrise. With this dataset, we introduce multi-spectral loop-detector as a baseline. We will open all calibrated and synchronized datasets, and hope to make a various state of the art computer vision and robotics algorithms.
All-Day Visual Place Recognition: Benchmark Dataset and Baselines
2015
Yukyung Choi, Namil Kim, Kibaek Park, Soonmin Hwang,, Jae Shin Yoon, In So Kweon
Conference
Place Recognition
Multispectral
Benchmark
2014
1
Abstract
Most of current pedestrian detectors have pursued high detection rate without carefully considering sample distributions. In this paper, we argue that the following characteristics must be considered; 1) large intra-class variation of pedestrians (multi-modality), and 2) data imbalance between positives and negatives. Pedestrian detection can be regarded as one of finding needles in a haystack problems (rare class detection). Inspired by a rare class detection technique, we propose a two-phase classifier integrating an existing baseline detector and a hard negative expert by separately conquering recall and precision. Main idea behind the hard negative expert is to reduce sample space to be learned, so that informative decision boundaries can be effectively learned. The multi-modality problem is dealt with a simple variant of a LDA based random forests as the hard negative expert. We optimally integrate two models by learned integration rules. By virtue of the two-phase structure, our method achieve competitive performance with only little additional computation. Our approach achieves 38.44% mean miss-rate for the reasonable setting of Caltech Pedestrian Benchmark.
A Two Phase Approach for Pedestrian Detection
2014
Soonmin Hwang, Tae-hyun Oh, In So Kweon
In Asian Conference on Computer Vision Workshops (ACCVw-IVVT).
Conference
Pedestrian Detection
2013
1
[PDF]
Abstract
Vocabulary tree based place recognition is widely used in topological localization and its various applications have been proposed during the past decade. But the bag-of-words representations from the vocabulary tree which is trained with fixed training data, are difficult to be optimal to dynamic environments. To solve this problem, adaptive vocabulary tree is proposed. However, there has been no comparison considering the adaptive property of conventional vocabulary tree. This paper provides the performance evaluation of the vocabulary tree and the adaptive vocabulary tree in dynamic scenes.
Evaluation of Vocabulary Trees for Localization in Robot Applications
2013
Soonmin Hwang, Chaehoon Park, Yukyung Choi, Donggeun Yoo, In So Kweon
In International Conference on Control Automation and Systems (ICCAS)
Conference
Vocabulary Tree
Place Recognition