Hailo8 Application – ReID

1. Introduction

ReID (Re-Identification) is a computer vision technique that belongs to a subtask of image retrieval. The goal is to continuously recognize the same target across different cameras, times, and locations through algorithms and object tracking technology, storing images into a gallery for subsequent tracking. Common applications include security monitoring, law enforcement surveillance, pedestrian flow analysis, and vehicle flow analysis.

The difference between ReID and Object Detection is that, taking humans as an example, Object Detection can only identify “what people or objects are present in the image,” while Person ReID further addresses “whether the same person can be recognized when appearing again in another camera or at a different time.” This solves the problem of time-consuming manual searches and cases where facial recognition systems fail to capture the face.

In real-world ReID applications, many environmental factors can affect accuracy—different cameras have varying resolutions, focus, and angles; clothing color, occlusions, lighting, and shadows can all influence ReID performance. Therefore, an effective cross-camera tracking algorithm is critical.

2. Models Used

YOLO

The purpose of Object Detection is to classify and group objects based on their characteristics and assign labels to each category. In ReID, its function is to detect the target object in an image and obtain its bounding box and coordinates for subsequent feature extraction and comparison. Since ReID focuses on identity recognition and cross-camera tracking, the quality of the front-end detection directly impacts the accuracy of back-end recognition.

YOLOv5s is adopted as the Object Detection model due to its advantages:

  • Lightweight and High Performance: YOLOv5s uses a compact backbone structure that maintains detection accuracy while reducing computation and improving inference speed. This makes it suitable for real-time applications. As shown in the figure, AIEH2000 running YOLOv5s achieves 393.29 Frames/Sec.


  • P.S. The yolov5s model from the Hailo Model Zoo can be used to capture objects in images, or other YOLO models can be downloaded from the Hailo Model Zoo as well.

RepVGG

After front-end detection, the main task of the ReID model is to extract features of the detected person as input and output a feature vector representing that person’s unique appearance. Considering real-time requirements and limited computing resources, the RepVGG model from the Hailo Model Zoo can be used as the backbone for ReID.

The key advantages of RepVGG are as follows:

  • High Inference Speed: During training, RepVGG employs a multi-branch structure to improve representation capability. However, during inference, through Structural Re-parameterization, it compresses into a single-path convolutional network, greatly reducing computation and accelerating inference — ideal for real-time ReID systems. For example, running RepVGG on AIEH2000 achieves 5199.33 Frames/Sec.


  • Reduced Hardware Requirements: Since the inference structure consists solely of convolutions, RepVGG is particularly suited for deployment on embedded systems with Hailo AI Accelerator Series, taking full advantage of hardware acceleration to deliver high efficiency in low-power environments.
  • Balanced Accuracy and Efficiency: Despite structural simplification, RepVGG retains rich feature representation capabilities learned during training, ensuring good accuracy and discriminative ability in ReID tasks.

In summary, RepVGG’s advantages in ReID tasks lie not only in accuracy but also in inference speed and hardware adaptability, meeting the needs of real-time and cross-camera tracking.

3. Implementation

Development Platform

Hardware Dell XE4 / Intel 12th Alder Lake i7-12700
OS Ubuntu 22.04 Kernel 6.8
Hailo AI Accelerator / HailoRT SUNIX AIEH2000 / Single Hailo8 AI inference processor / HailoRT 4.20.0
API OpenCV 4.8.0 / XTensor Stack 0.24.7 / XTL 0.7.7

Development Process

  • Object Detection:

    Perform letterbox pre-processing on the input image to maintain its aspect ratio when resized to model input dimensions while avoiding distortion.

    Execute the YOLOv8s model inference using HailoRT to obtain feature outputs required for detection tasks.

    Read the model’s raw tensor output data from HailoRT as input for further parsing.

    Perform post-processing on model output, including BBox decoding and NMS, to remove redundant boxes and retain the most representative detections.

    Final Output: Return processed BBoxes containing target positions and confidence scores for subsequent ReID module input.

  • ReID:

    Resize and crop the input image to match the RepVGG model’s input requirements.

    Run RepVGG inference through HailoRT to extract deep feature representations of the detected person.

    Read raw feature vectors from HailoRT as input for identity recognition.

    Apply post-processing such as normalization or embedding transformation to ensure feature vector comparability and consistency.

    Call the Gallery Management Function to compare processed feature vectors with the database, outputting a matching ID or creating a new gallery entry.

Process Flow and Description

  1. Image Input and Pre-processing: The system receives continuous image streams from cameras, performs Load Image and Rescale/Letterbox preprocessing, and converts them into formats suitable for YOLO model input.
  2. Object Detection: The preprocessed image is input to the YOLO model to produce BBoxes. Through BBox decoding and NMS, overlapping boxes are removed, retaining the most accurate results.
  3. Person Cropping and ReID Pre-processing: The detected bounding boxes are used to crop target persons from images, followed by Crop & Rescale.
  4. Feature Extraction and ReID Post-processing: Input to the ReID model to generate normalized feature vectors ensuring stable comparability in the embedding space.
  5. Identity Matching and Database Update: The feature vector is compared within the gallery to determine whether it belongs to an existing identity or a new one.
  6. Final Output: The system outputs BBox and ID, completing the detection and cross-camera re-identification process.

Execution Results

The execution results, as shown below, demonstrate that the system can accurately identify the same person under different camera angles and perspectives. Although the two images come from different cameras with varying poses and viewpoints, the ReID module successfully matches them to the same identity. This verifies the effectiveness of the Detection + ReID integrated system under cross-view and cross-camera conditions, proving its practical applicability in smart surveillance and cross-scene identity tracking.

News & Event

Hailo8 Application – ReID

1. Introduction

ReID (Re-Identification) is a computer vision technique that belongs to a subtask of image retrieval. The goal is to continuously recognize the same target across different cameras, times, and locations through algorithms and object tracking technology, storing images into a gallery for subsequent tracking. Common applications include security monitoring, law enforcement surveillance, pedestrian flow analysis, and vehicle flow analysis.

The difference between ReID and Object Detection is that, taking humans as an example, Object Detection can only identify “what people or objects are present in the image,” while Person ReID further addresses “whether the same person can be recognized when appearing again in another camera or at a different time.” This solves the problem of time-consuming manual searches and cases where facial recognition systems fail to capture the face.

In real-world ReID applications, many environmental factors can affect accuracy—different cameras have varying resolutions, focus, and angles; clothing color, occlusions, lighting, and shadows can all influence ReID performance. Therefore, an effective cross-camera tracking algorithm is critical.

2. Models Used

YOLO

The purpose of Object Detection is to classify and group objects based on their characteristics and assign labels to each category. In ReID, its function is to detect the target object in an image and obtain its bounding box and coordinates for subsequent feature extraction and comparison. Since ReID focuses on identity recognition and cross-camera tracking, the quality of the front-end detection directly impacts the accuracy of back-end recognition.

YOLOv5s is adopted as the Object Detection model due to its advantages:

  • Lightweight and High Performance: YOLOv5s uses a compact backbone structure that maintains detection accuracy while reducing computation and improving inference speed. This makes it suitable for real-time applications. As shown in the figure, AIEH2000 running YOLOv5s achieves 393.29 Frames/Sec.


  • P.S. The yolov5s model from the Hailo Model Zoo can be used to capture objects in images, or other YOLO models can be downloaded from the Hailo Model Zoo as well.

RepVGG

After front-end detection, the main task of the ReID model is to extract features of the detected person as input and output a feature vector representing that person’s unique appearance. Considering real-time requirements and limited computing resources, the RepVGG model from the Hailo Model Zoo can be used as the backbone for ReID.

The key advantages of RepVGG are as follows:

  • High Inference Speed: During training, RepVGG employs a multi-branch structure to improve representation capability. However, during inference, through Structural Re-parameterization, it compresses into a single-path convolutional network, greatly reducing computation and accelerating inference — ideal for real-time ReID systems. For example, running RepVGG on AIEH2000 achieves 5199.33 Frames/Sec.


  • Reduced Hardware Requirements: Since the inference structure consists solely of convolutions, RepVGG is particularly suited for deployment on embedded systems with Hailo AI Accelerator Series, taking full advantage of hardware acceleration to deliver high efficiency in low-power environments.
  • Balanced Accuracy and Efficiency: Despite structural simplification, RepVGG retains rich feature representation capabilities learned during training, ensuring good accuracy and discriminative ability in ReID tasks.

In summary, RepVGG’s advantages in ReID tasks lie not only in accuracy but also in inference speed and hardware adaptability, meeting the needs of real-time and cross-camera tracking.

3. Implementation

Development Platform

Hardware Dell XE4 / Intel 12th Alder Lake i7-12700
OS Ubuntu 22.04 Kernel 6.8
Hailo AI Accelerator / HailoRT SUNIX AIEH2000 / Single Hailo8 AI inference processor / HailoRT 4.20.0
API OpenCV 4.8.0 / XTensor Stack 0.24.7 / XTL 0.7.7

Development Process

  • Object Detection:

    Perform letterbox pre-processing on the input image to maintain its aspect ratio when resized to model input dimensions while avoiding distortion.

    Execute the YOLOv8s model inference using HailoRT to obtain feature outputs required for detection tasks.

    Read the model’s raw tensor output data from HailoRT as input for further parsing.

    Perform post-processing on model output, including BBox decoding and NMS, to remove redundant boxes and retain the most representative detections.

    Final Output: Return processed BBoxes containing target positions and confidence scores for subsequent ReID module input.

  • ReID:

    Resize and crop the input image to match the RepVGG model’s input requirements.

    Run RepVGG inference through HailoRT to extract deep feature representations of the detected person.

    Read raw feature vectors from HailoRT as input for identity recognition.

    Apply post-processing such as normalization or embedding transformation to ensure feature vector comparability and consistency.

    Call the Gallery Management Function to compare processed feature vectors with the database, outputting a matching ID or creating a new gallery entry.

Process Flow and Description

  1. Image Input and Pre-processing: The system receives continuous image streams from cameras, performs Load Image and Rescale/Letterbox preprocessing, and converts them into formats suitable for YOLO model input.
  2. Object Detection: The preprocessed image is input to the YOLO model to produce BBoxes. Through BBox decoding and NMS, overlapping boxes are removed, retaining the most accurate results.
  3. Person Cropping and ReID Pre-processing: The detected bounding boxes are used to crop target persons from images, followed by Crop & Rescale.
  4. Feature Extraction and ReID Post-processing: Input to the ReID model to generate normalized feature vectors ensuring stable comparability in the embedding space.
  5. Identity Matching and Database Update: The feature vector is compared within the gallery to determine whether it belongs to an existing identity or a new one.
  6. Final Output: The system outputs BBox and ID, completing the detection and cross-camera re-identification process.

Execution Results

The execution results, as shown below, demonstrate that the system can accurately identify the same person under different camera angles and perspectives. Although the two images come from different cameras with varying poses and viewpoints, the ReID module successfully matches them to the same identity. This verifies the effectiveness of the Detection + ReID integrated system under cross-view and cross-camera conditions, proving its practical applicability in smart surveillance and cross-scene identity tracking.

News & Event

Hailo8 Application – ReID

1. Introduction

ReID (Re-Identification) is a computer vision technique that belongs to a subtask of image retrieval. The goal is to continuously recognize the same target across different cameras, times, and locations through algorithms and object tracking technology, storing images into a gallery for subsequent tracking. Common applications include security monitoring, law enforcement surveillance, pedestrian flow analysis, and vehicle flow analysis.

The difference between ReID and Object Detection is that, taking humans as an example, Object Detection can only identify “what people or objects are present in the image,” while Person ReID further addresses “whether the same person can be recognized when appearing again in another camera or at a different time.” This solves the problem of time-consuming manual searches and cases where facial recognition systems fail to capture the face.

In real-world ReID applications, many environmental factors can affect accuracy—different cameras have varying resolutions, focus, and angles; clothing color, occlusions, lighting, and shadows can all influence ReID performance. Therefore, an effective cross-camera tracking algorithm is critical.

2. Models Used

YOLO

The purpose of Object Detection is to classify and group objects based on their characteristics and assign labels to each category. In ReID, its function is to detect the target object in an image and obtain its bounding box and coordinates for subsequent feature extraction and comparison. Since ReID focuses on identity recognition and cross-camera tracking, the quality of the front-end detection directly impacts the accuracy of back-end recognition.

YOLOv5s is adopted as the Object Detection model due to its advantages:

  • Lightweight and High Performance: YOLOv5s uses a compact backbone structure that maintains detection accuracy while reducing computation and improving inference speed. This makes it suitable for real-time applications. As shown in the figure, AIEH2000 running YOLOv5s achieves 393.29 Frames/Sec.


  • P.S. The yolov5s model from the Hailo Model Zoo can be used to capture objects in images, or other YOLO models can be downloaded from the Hailo Model Zoo as well.

RepVGG

After front-end detection, the main task of the ReID model is to extract features of the detected person as input and output a feature vector representing that person’s unique appearance. Considering real-time requirements and limited computing resources, the RepVGG model from the Hailo Model Zoo can be used as the backbone for ReID.

The key advantages of RepVGG are as follows:

  • High Inference Speed: During training, RepVGG employs a multi-branch structure to improve representation capability. However, during inference, through Structural Re-parameterization, it compresses into a single-path convolutional network, greatly reducing computation and accelerating inference — ideal for real-time ReID systems. For example, running RepVGG on AIEH2000 achieves 5199.33 Frames/Sec.


  • Reduced Hardware Requirements: Since the inference structure consists solely of convolutions, RepVGG is particularly suited for deployment on embedded systems with Hailo AI Accelerator Series, taking full advantage of hardware acceleration to deliver high efficiency in low-power environments.
  • Balanced Accuracy and Efficiency: Despite structural simplification, RepVGG retains rich feature representation capabilities learned during training, ensuring good accuracy and discriminative ability in ReID tasks.

In summary, RepVGG’s advantages in ReID tasks lie not only in accuracy but also in inference speed and hardware adaptability, meeting the needs of real-time and cross-camera tracking.

3. Implementation

Development Platform

Hardware Dell XE4 / Intel 12th Alder Lake i7-12700
OS Ubuntu 22.04 Kernel 6.8
Hailo AI Accelerator / HailoRT SUNIX AIEH2000 / Single Hailo8 AI inference processor / HailoRT 4.20.0
API OpenCV 4.8.0 / XTensor Stack 0.24.7 / XTL 0.7.7

Development Process

  • Object Detection:

    Perform letterbox pre-processing on the input image to maintain its aspect ratio when resized to model input dimensions while avoiding distortion.

    Execute the YOLOv8s model inference using HailoRT to obtain feature outputs required for detection tasks.

    Read the model’s raw tensor output data from HailoRT as input for further parsing.

    Perform post-processing on model output, including BBox decoding and NMS, to remove redundant boxes and retain the most representative detections.

    Final Output: Return processed BBoxes containing target positions and confidence scores for subsequent ReID module input.

  • ReID:

    Resize and crop the input image to match the RepVGG model’s input requirements.

    Run RepVGG inference through HailoRT to extract deep feature representations of the detected person.

    Read raw feature vectors from HailoRT as input for identity recognition.

    Apply post-processing such as normalization or embedding transformation to ensure feature vector comparability and consistency.

    Call the Gallery Management Function to compare processed feature vectors with the database, outputting a matching ID or creating a new gallery entry.

Process Flow and Description

  1. Image Input and Pre-processing: The system receives continuous image streams from cameras, performs Load Image and Rescale/Letterbox preprocessing, and converts them into formats suitable for YOLO model input.
  2. Object Detection: The preprocessed image is input to the YOLO model to produce BBoxes. Through BBox decoding and NMS, overlapping boxes are removed, retaining the most accurate results.
  3. Person Cropping and ReID Pre-processing: The detected bounding boxes are used to crop target persons from images, followed by Crop & Rescale.
  4. Feature Extraction and ReID Post-processing: Input to the ReID model to generate normalized feature vectors ensuring stable comparability in the embedding space.
  5. Identity Matching and Database Update: The feature vector is compared within the gallery to determine whether it belongs to an existing identity or a new one.
  6. Final Output: The system outputs BBox and ID, completing the detection and cross-camera re-identification process.

Execution Results

The execution results, as shown below, demonstrate that the system can accurately identify the same person under different camera angles and perspectives. Although the two images come from different cameras with varying poses and viewpoints, the ReID module successfully matches them to the same identity. This verifies the effectiveness of the Detection + ReID integrated system under cross-view and cross-camera conditions, proving its practical applicability in smart surveillance and cross-scene identity tracking.

News & Event

Hailo8 Application – ReID

1. Introduction

ReID (Re-Identification) is a computer vision technique that belongs to a subtask of image retrieval. The goal is to continuously recognize the same target across different cameras, times, and locations through algorithms and object tracking technology, storing images into a gallery for subsequent tracking. Common applications include security monitoring, law enforcement surveillance, pedestrian flow analysis, and vehicle flow analysis.

The difference between ReID and Object Detection is that, taking humans as an example, Object Detection can only identify “what people or objects are present in the image,” while Person ReID further addresses “whether the same person can be recognized when appearing again in another camera or at a different time.” This solves the problem of time-consuming manual searches and cases where facial recognition systems fail to capture the face.

In real-world ReID applications, many environmental factors can affect accuracy—different cameras have varying resolutions, focus, and angles; clothing color, occlusions, lighting, and shadows can all influence ReID performance. Therefore, an effective cross-camera tracking algorithm is critical.

2. Models Used

YOLO

The purpose of Object Detection is to classify and group objects based on their characteristics and assign labels to each category. In ReID, its function is to detect the target object in an image and obtain its bounding box and coordinates for subsequent feature extraction and comparison. Since ReID focuses on identity recognition and cross-camera tracking, the quality of the front-end detection directly impacts the accuracy of back-end recognition.

YOLOv5s is adopted as the Object Detection model due to its advantages:

  • Lightweight and High Performance: YOLOv5s uses a compact backbone structure that maintains detection accuracy while reducing computation and improving inference speed. This makes it suitable for real-time applications. As shown in the figure, AIEH2000 running YOLOv5s achieves 393.29 Frames/Sec.


  • P.S. The yolov5s model from the Hailo Model Zoo can be used to capture objects in images, or other YOLO models can be downloaded from the Hailo Model Zoo as well.

RepVGG

After front-end detection, the main task of the ReID model is to extract features of the detected person as input and output a feature vector representing that person’s unique appearance. Considering real-time requirements and limited computing resources, the RepVGG model from the Hailo Model Zoo can be used as the backbone for ReID.

The key advantages of RepVGG are as follows:

  • High Inference Speed: During training, RepVGG employs a multi-branch structure to improve representation capability. However, during inference, through Structural Re-parameterization, it compresses into a single-path convolutional network, greatly reducing computation and accelerating inference — ideal for real-time ReID systems. For example, running RepVGG on AIEH2000 achieves 5199.33 Frames/Sec.


  • Reduced Hardware Requirements: Since the inference structure consists solely of convolutions, RepVGG is particularly suited for deployment on embedded systems with Hailo AI Accelerator Series, taking full advantage of hardware acceleration to deliver high efficiency in low-power environments.
  • Balanced Accuracy and Efficiency: Despite structural simplification, RepVGG retains rich feature representation capabilities learned during training, ensuring good accuracy and discriminative ability in ReID tasks.

In summary, RepVGG’s advantages in ReID tasks lie not only in accuracy but also in inference speed and hardware adaptability, meeting the needs of real-time and cross-camera tracking.

3. Implementation

Development Platform

Hardware Dell XE4 / Intel 12th Alder Lake i7-12700
OS Ubuntu 22.04 Kernel 6.8
Hailo AI Accelerator / HailoRT SUNIX AIEH2000 / Single Hailo8 AI inference processor / HailoRT 4.20.0
API OpenCV 4.8.0 / XTensor Stack 0.24.7 / XTL 0.7.7

Development Process

  • Object Detection:

    Perform letterbox pre-processing on the input image to maintain its aspect ratio when resized to model input dimensions while avoiding distortion.

    Execute the YOLOv8s model inference using HailoRT to obtain feature outputs required for detection tasks.

    Read the model’s raw tensor output data from HailoRT as input for further parsing.

    Perform post-processing on model output, including BBox decoding and NMS, to remove redundant boxes and retain the most representative detections.

    Final Output: Return processed BBoxes containing target positions and confidence scores for subsequent ReID module input.

  • ReID:

    Resize and crop the input image to match the RepVGG model’s input requirements.

    Run RepVGG inference through HailoRT to extract deep feature representations of the detected person.

    Read raw feature vectors from HailoRT as input for identity recognition.

    Apply post-processing such as normalization or embedding transformation to ensure feature vector comparability and consistency.

    Call the Gallery Management Function to compare processed feature vectors with the database, outputting a matching ID or creating a new gallery entry.

Process Flow and Description

  1. Image Input and Pre-processing: The system receives continuous image streams from cameras, performs Load Image and Rescale/Letterbox preprocessing, and converts them into formats suitable for YOLO model input.
  2. Object Detection: The preprocessed image is input to the YOLO model to produce BBoxes. Through BBox decoding and NMS, overlapping boxes are removed, retaining the most accurate results.
  3. Person Cropping and ReID Pre-processing: The detected bounding boxes are used to crop target persons from images, followed by Crop & Rescale.
  4. Feature Extraction and ReID Post-processing: Input to the ReID model to generate normalized feature vectors ensuring stable comparability in the embedding space.
  5. Identity Matching and Database Update: The feature vector is compared within the gallery to determine whether it belongs to an existing identity or a new one.
  6. Final Output: The system outputs BBox and ID, completing the detection and cross-camera re-identification process.

Execution Results

The execution results, as shown below, demonstrate that the system can accurately identify the same person under different camera angles and perspectives. Although the two images come from different cameras with varying poses and viewpoints, the ReID module successfully matches them to the same identity. This verifies the effectiveness of the Detection + ReID integrated system under cross-view and cross-camera conditions, proving its practical applicability in smart surveillance and cross-scene identity tracking.

News & Event

Subscribe SUNIX eDM

Subscribe SUNIX eDM

Subscribe SUNIX eDM

Subscribe SUNIX eDM

Back to top
返回頂部
返回頂部