Basically, MemuConvert is an image processing method aimed at reducing the effort required in the object detection labeling process.
Why I Developed The Method ?
I participated in a competition as part of a team where our task was to detect cars using data from a drone. Initially, we attempted to accomplish this task using YOLOV3 and its pre-existing dataset. However, this approach proved to be unsuccessful due to the inadequacy of the dataset. As a result, we realized that we needed to train an AI model from scratch.
In object detection, the process typically involves three steps:
- Data collection,
- Training stage
Our main challenge arose from the significant amount of labeled data required. We explored in this case three potential solutions:
- To train an AI model using existing AI systems.
- To utilize websites such as scale.ai, or employing algorithms. However, we didn’t have access to another AI system for training, and we were reluctant to invest money as we aimed to make a difference in the competition without significant financial resources.
- To develop a new method.
Therefore, we focused on researching algorithms to solve the problem. However, we encountered the same problem as the process still heavily relied on manpower. Each image necessitated the definition of a new threshold, which proved to be time-consuming. Consequently, I had to limit the threshold selection to streamline the process.
There are two main types of data that can be extracted from images: colors and shapes. However, each type has its own challenges. Shape-based algorithms, for example, may struggle to accurately identify certain objects such as faces or cars because these objects are composed of multiple elements. We define a “car” based on the presence of a body and tires, but shape algorithms often struggle to identify the entire object as they focus on individual components. Additionally, shape algorithms may generate false positives, detecting objects that are not relevant, such as artificial expressions like a skyline.
As a result, it becomes necessary to rely more on color-based methods for object detection. However, many images contain undesired pixels in the desired colors, making the process more challenging. The new method for object detection should take these factors into account and develop techniques that effectively handle the complexities associated with colors and eliminate false detections.
In scenario A1, where the building and taxi share the same color palette, color-based algorithms might mistakenly consider them as the same object due to the similar color representation. Therefore, I should have got rid of those extra colors.
- To get a color from original image for drawing boundary boxes.
- To divide an image into rectangles.
- Sponge Test
- To find dominant color.
- To rebuild each rectangles with its own dominant color.
- To creating a new image with those rectangles.
- To transform image into black and white. (desired color is white and undesired color is black.)
- To bulur image for getting rid of noise.
- To draw boundary box.
The sponge test is a physical description of the process I performed on an image. Imagine a sponge with a single red dot. When you squeeze the sponge, the redness on the sponge disappears. In conclusion, this physical application reveals the dominant color, providing an opportunity to reconstruct the object using its dominant color.
Kimage: It represents the number of divided rectangles in an image with sponge test.
Kxcolor: It represents dominant color of specific rectangle.
Initially, I set Kimage to a fixed value of 1024. However, I encountered difficulties when working with low-quality images as this fixed value did not produce satisfactory results. Consequently, I adjusted Kimage to be equal to the length of the image (x) in order to accommodate different image sizes.
However, in certain cases, the images may not be suitable for equal rectangles due to their content or composition. As a result, some rectangles end up overlapping with each other. To address this issue, I implemented a blurring technique on the image to remove these unwanted overlapping rectangles and enhance the overall clarity and quality of the image.
Note: All of the images have the same threshold! This is the idea.
The method provides a valuable advantage in training object detection AI quickly since it eliminates the need for users to manually adjust the threshold. This makes it a versatile algorithm capable of labeling all images using a specific threshold. However, due to its generality, there is a trade-off in precision. The algorithm may not label all images with absolute accuracy, as some images may require different thresholds or more specific techniques to achieve precise labeling.