Authors:
(1) Vinicius Yu Okubo, B.S. in electrical engineering from the University of São Paulo in 2022 and currently, he is pursuing his M.S. in electrical engineering at the University of São Paulo;
(2) Kotaro Shimizu, B.S. degree in Physics from Waseda University, Japan, in 2019 and M.S. degree in Physics from the University of Tokyo, Japan, 2021 and He has been pursuing his Ph.D. in Physics as a JSPS research fellowship for young scientists in the University of Tokyo since 2021;
(3) B.S. Shivaram, received his B.S. degree in Physics, Chemistry and Mathematics from Bangalore University, India, in 1977 and the M.S. degree in Physics from the Indian Institute of Technology, Madras, India, in 1979 and his Ph.D. in experimental condensed matter physics from Northwestern University, Evanston, Illinois in 1984;
(4) Hae Yong Kim, He received the B.S. and M.S. degrees (with distinctions) in computer science and the Ph.D. degree in electrical engineering from the Universidade de São Paulo (USP), Brazil, in 1988, 1992 and 1997, respectively.
In material sciences, characterizing faults in periodic structures is vital for understanding material properties. To characterize magnetic labyrinthine patterns, it is necessary to accurately identify junctions and terminals, often featuring over a thousand closely packed defects per image. This study introduces a new technique called TM-CNN (Template Matching - Convolutional Neural Network) designed to detect a multitude of small objects in images, such as defects in magnetic labyrinthine patterns. TMCNN was used to identify these structures in 444 experimental images, and the results were explored to deepen the understanding of magnetic materials. It employs a two-stage detection approach combining template matching, used in initial detection, with a convolutional neural network, used to eliminate incorrect identifications. To train a CNN classifier, it is necessary to create a large number of training images. This difficulty prevents the use of CNN in many practical applications. TM-CNN significantly reduces the manual workload for creating training images by automatically making most of the annotations and leaving only a small number of corrections to human reviewers. In testing, TM-CNN achieved an impressive F1 score of 0.988, far outperforming traditional template matching and CNN-based object detection algorithms.
INDEX TERMS Computer vision, convolutional neural networks, deep learning, magnetic labyrinthine patterns, material science, object detection, template matching.
In materials, a wide range of periodic structures are seen accompanied by defects within their spatial arrangement. Some magnetic materials exhibit stripe orders, wherein the orientation of electron spins i.e. the magnetic moments show periodic alterations [1], as well as labyrinthine structures, characterized by magnetic arrangements resulting from the propagation of stripes in various directions throughout space [2]–[4]. The two images in Fig. 1 are representative examples showcasing labyrinthine patterns of magnetic domains in Bismuth-doped Yttrium Iron Garnet (Bi:YIG) films at zero field; the intensity represents the out-of-plane component of the magnetic moments. These labyrinthine patterns may present discernible characteristics that are difficult to quantify. As shown in Fig. 1a, which we label as the “quenched” state, the border of dark and bright domains exhibit a sinuous nature and do not appear as parallel. In contrast, the “annealed” state, shown in Fig. 1b, consists of regions with nearly parallel domains. This state exhibits roughly equal widths of dark and bright domains, and the areas occupied by them are also approximately equal for any sampled region. Therefore, the stripes in the annealed state show greater spatial coherence.
Within these magnetic structures, defects take the form of interruptions in the stripes known as “terminals” and points where multiple stripes conjoin, referred to as “junctions” (Fig. 2). The presence of defects not only serves as a crucial metric for quantifying the deviation of a structure from a perfectly periodic structure, but has also gathered considerable attention in recent years due to its implications on physical phenomena arising from the nontrivial geometric properties associated with magnetic structures [5]. Thus, in the realm of condensed matter physics, experimental identification of the number and positions of such defects plays an important role in characterizing material properties.
Accurately counting and differentiating genuine structures
from misidentifications is crucial to a quantitative physical understanding of the origins and evolution of these patterns. Manual annotation of defects in unfeasible. For instance, we used 444 images with 641,649 structures [6]. Furthermore, manual annotation relies on subjective interpretation of junctions and terminals, which could lead to counting inconsistencies. In order to address these issues, an automated process is required. Here, algorithms for finding objects in an image known as object detectors are an excellent choice and can be very effective. They can be broadly categorized into classical and deep learning-based methods.
Classical object detection methods span a variety of techniques to extract and process features from the image. For instance, template matching [7] is a technique employed to find a pre-defined template within a larger image. This is achieved by scanning the entire image and calculating the correlation between the template and the scanned region. Viola and Jones object detecting algorithm [8] employs multiple template matchings using Haar-like features, each of them serving as a weak detector. These features are ensembled through a boosting strategy to form a strong detector. The Histogram of Oriented Gradients (HOG) [9] represents another approach to extract useful features, dividing the image into cells and calculating a gradient histogram for each. This extracted information can be used by machine learning algorithms, such as support vector machine, to identify detections.
In the last decade, deep learning approaches have surpassed traditional machine learning techniques in multiple image processing tasks [10], [11]. For object detection, Girshick et al. introduced R-CNN [12], marking it as one of the pioneering detection techniques rooted in deep learning. RCNN operates as a classification-based model: multiple regions are extracted from an image and each is classified independently. This straightforward method led to significant improvements in detection, achieving new state-of-the-art results in the Pascal VOC dataset [13], compared to earlier techniques like Haar features and HOG. A distinguishing feature of R-CNN is its region proposal step. Directly processing every conceivable region of varying sizes and positions in an image is computationally impractical. Hence, this step selects a simplified set of regions from the original image for individualized classification by the CNN model. Further developments brought by Faster R-CNN [14] have improved both accuracy and speed by integrating the region proposal into the model.
Redmon et al. introduced YOLO [15], a deep-learning detection approach modeled as a regression task. This method partitions the image into a grid and each cell contain their own set of outputs. The grid cell where the object is centered has the task of identifying the position, dimensions and class of the object. A standout benefit of this approach is its efficiency: YOLO processes the image in a single pass, contrasting with R-CNN-based models that are divided into region proposal and classification steps. This significantly reduces inference time, enabling real-time video detection. However, YOLO was not able to achieve the precision and recall rates of Faster R-CNN when tested on the Pascal VOC dataset. Comparisons in small object detection settings have also shown Faster RCNN to outperform YOLO [16].
Modern digital microscopy allows the easy acquisition of high-resolution experimental images, which contain intricate stripe patterns and a large number of defects, sometimes numbering in the thousands. This makes it difficult to use deep learning, as creating an accurately annotated dataset would be a laborious process due to the large number of defects. Furthermore, both YOLO and Faster R-CNN were not designed to detect thousands of small objects. Benchmarks with both methods show that performance degrades when detecting small objects, with Faster R-CNN maintaining a slight advantage over YOLO [16].
Correlation-based granulometry is a technique that can be used to detect a large number of small objects in the image. It was proposed by Maruta et al. [17], [18] to analyze the distribution of square and circular pores in the macroporous silicon layer in scanning electron microscope images. This technique is called “granulometry” because the objective of the original application was to obtain a histogram of pore distribution as a function of size. It was later used by Araújo et al. [19] to detect individual bean grains, analyze each grain, and calculate the quality of the bean batch. This technique performs multiple template matchings to achieve robustness against angle and shape variations, and then performs non-maximum suppression to avoid finding the same object multiple times. However, this technique could not be applied directly in our problem, because defects in labyrinthine magnetic structures vary greatly in shape and cannot be accurately detected using only template matchings.
To address these challenges, we developed a new two step technique that we named Template Matching - Convolutional Neural Network (TM-CNN), which integrates correlationbased granulometry with the CNN classifier. First, a series of template matchings detects potential candidates for junctions and terminals. By setting a low threshold, we cast a broad net for candidates detection, minimizing false negatives but increasing false positives. These candidates are filtered by non-maximum suppression to avoid multiple detections of the same defect. Second, a CNN classifier filters out the false positives.
We demonstrate that this method substantially outperforms template matching alone, while streamlining the image annotation process and reducing the computational burden typically associated with deep learning detection techniques. We also show that the TM-CNN technique outperforms Faster RCNN, achieving a significantly higher F1-score in junctions and terminals detection.
The remainder of this work is organized as follows. Section II presents existing techniques for detecting junctions and terminals, differentiating classical and deep learning approaches. Section III describes the dataset used and the proposed TMCNN technique in detail. Section IV presents experimental results from the detection of defects in magnetic labyrinthine patterns and discusses their significance in understanding physical phenomena. Finally, Section V concludes the article by reflecting on the merits of our work.