YOLOv5

Human Detection + Counting

Roboflow 에서 제공하는 notebook github 코드를 바탕으로 실습을 진행

import torch
import supervision as sv

Python
복사

< supervision 라이브러리 >

컴퓨터 비전 작업을 쉽게 만들어주는 파이썬 라이브러리

•

객체 감지 시각화 : bbox, 레이블, 마스크 등을 이미지나 비디오에 표시

box_annotator = sv.BoxAnnotator()
frame = box_annotator.annotate(scene=frame, detections=detections)
Python
복사

•

영역 모니터링 : 특정 영역(zone)을 정의하고 객체의 진입/이탈 감지

zone = sv.PolygonZone(polygon=polygon)
zone_annotator = sv.PoygonZoneAnnotator(zone=zone)
Python
복사

•

객체 추적 : ByteTrack과 같은 추적 알고리즘과 통합하여 객체 추적

tracker = sv.ByteTrack()
detections = tracker.update_with_detections(detections)
Python
복사

•

다양한 모델 출력 형식 변환 : YOLOv5, YOLOv8 등 다양한 모델의 출력을 표준화된 형식으로 변환

detections = sv.Detections.from_yolov8(results)
detections = sv.Detections.from_yolov5(results)
Python
복사

•

비디오 처리 : 비디오 파일 처리 및 결과 저장

sv.process_video(source_path='input.mp4',
									target_path='output.mp4',
									callback=process_frame)
Python
복사

주요 장점

요약

모델 정의하기

# YOLOv5 모델을 PyTorch Hub를 통해 불러오는 명령
# 'ultralytics/yolov5' : ultralytics의 YOLOv5 Github 저장소를 지정
model = torch.hub.load('ultralytics/yolov5', 'yolov5x6')  
Python
복사

torch.hub.load : Pytorch Hub 에서 사전 학습된 모델을 다운로드하고 로드하는 함수

하나의 frame에서 객체 탐지 및 필터링

# 비디오 프레임 추출
generator = sv.get_video_frames_generator(VideoAssets.MARKET_SQUARE.value) # 비디오에서 프레임을 추출하는 생성기
################# 자신의 비디오 사용할 경우 ####################
video_path = "비디오 경로"
generator = sv.get_video_frames_generator(video_path)
###############################################################
iterator = iter(generator) # 생성기를 반복자로 변환
frame = next(iterator) # 첫 번째 프레임을 가져옴

# detect : 객체 감지
results = model(frame, size=1280) # 입력 이미지 크기를 1280X1280 으로 조정
detections = sv.Detections.from_yolov5(results) # 욜로v5 출력 결과를 sv 라이브러리의 표준형식으로 변환

# annotate
# 시각화 도구 설정 및 적용
box_annotator = sv.BoundingBoxAnnotator(thickness=4)
label_annotator = sv.LabelAnnotator(text_thickness=4, text_scale=2)
frame = box_annotator.annotate(scene=frame, detections=detections)
frame = label_annotator.annotate(scene=frame, detections=detections)

%matplotlib inline
sv.plot_image(frame, (16, 16))
Python
복사

여기서 VideoAssets.MARKET_SQUARE.value 는 supervision 라이브러리에서 예제로 제공하는 비디오 파일 경로다. 만약, 자신의 비디오를 사용하고 싶다면, 이 부분에 비디오 파일 경로를 지정하면 된다.

< iter() 함수 & next() 함수 >

파이썬의 반복 관련 기본 함수다.

iter()

•

반복자(iterator)를 생성하는 함수

•

순회 가능한 객체를 받아서 반복자로 변환해준다.

•

한 번에 하나의 요소에만 접근할 수 있게 해준다.

next()

•

반복자에서 다음 요소를 가져오는 함수

•

반복자가 가진 다음 값을 반환한다.

•

더 이상 가져올 값이 없으면 StopIteration 예외를 발생시킨다.

→ sv.Detections.from_yolov5 는 왜할까??

오픈 소스의 객체 탐지 모델들은 객체 탐지 결과가 다 따로따로 그 형식에 따라 달라지게 된다. 따라서 보다 편리한 사용을 위해 표준화된 형식으로 변환하는게 필요했고, 이 역할을 수행해주는 것이 supervision인 것이다. 이렇게 변환된 형식은 아래와 같이 정보가 변하게 된다.

•

xyxy : 바운딩 박스 좌표 ( x1, y1, x2, y2 )

•

confidence : 감지 신뢰도

•

class_id : 객체 클래스 ID

결국 이렇게 표준화된 형식으로 모두 바꿈으로서 supervision의 편리한 시각화 도구들을 사용할 수 있기에 꼭 필요하다.

같은 코드에서 다음의 코드를 추가하면, 사람 class와 confidence 가 0.5 보다 큰 객체들만 나타내는 것을 확인할 수 있다.

# extract video frame
generator = sv.get_video_frames_generator(VideoAssets.MARKET_SQUARE.value)
iterator = iter(generator)
frame = next(iterator)

# detect
results = model(frame, size=1280)
detections = sv.Detections.from_yolov5(results)
detections = detections[(detections.class_id == 0) & (detections.confidence > 0.5)]

# annotate
box_annotator = sv.BoundingBoxAnnotator(thickness=4)
label_annotator = sv.LabelAnnotator(text_thickness=4, text_scale=2)
frame = box_annotator.annotate(scene=frame, detections=detections)
frame = label_annotator.annotate(scene=frame, detections=detections)

%matplotlib inline
sv.plot_image(frame, (16, 16))
Python
복사

len(detections) 를 하면 몇명의 person 이 세졌는지 확인 가능!

Polygon 영역 설정에 따른 객체 탐지

import numpy as np
import supervision as sv

# initiate polygon zone : 영역 설정
polygon = np.array([
    [0, 0],
    [1080 - 5, 0],
    [1080 - 5, 1300 - 5],
    [0, 1300 - 5]
])
video_info = sv.VideoInfo.from_video_path(VideoAssets.MARKET_SQUARE.value) # 비디오 파일 정보 로드
zone = sv.PolygonZone(polygon=polygon) # zone 객체 생성

# extract video frame
generator = sv.get_video_frames_generator(VideoAssets.MARKET_SQUARE.value)
iterator = iter(generator)
frame = next(iterator)

# detect
results = model(frame, size=1280)
detections = sv.Detections.from_yolov5(results)
mask = zone.trigger(detections=detections)
detections = detections[(detections.class_id == 0) & (detections.confidence > 0.5) & mask]

# annotate
box_annotator = sv.BoundingBoxAnnotator(thickness=4)
label_annotator = sv.LabelAnnotator(text_thickness=4, text_scale=2)
frame = box_annotator.annotate(scene=frame, detections=detections)
frame = label_annotator.annotate(scene=frame, detections=detections)
frame = sv.draw_polygon(scene=frame, polygon=polygon, color=sv.Color.ROBOFLOW, thickness=6)

%matplotlib inline
sv.plot_image(frame, (16, 16))
Python
복사

sv.VideoInfo.from_video_path(VideoAssets.MARKET_SQUARE.value)
Python
복사
VideoInfo(width=2160, height=3840, fps=60, total_frames=474)

여러 Zone 에 따른 객체 탐지 및 Counting

colors = sv.ColorPalette.DEFAULT

polygons = [
    np.array([
        [0, 0],
        [1080 - 5, 0],
        [1080 - 5, 1300 - 5],
        [0, 1300 - 5]
    ], np.int32),
    np.array([
        [1080 + 5, 0],
        [2160, 0],
        [2160, 1300 - 5],
        [1080 + 5, 1300 - 5]
    ], np.int32),
    np.array([
        [0, 1300 + 5],
        [1080 - 5, 1300 + 5],
        [1080 - 5, 3840],
        [0, 3840]
    ], np.int32),
    np.array([
        [1080 + 5, 1300 + 5],
        [2160, 1300 + 5],
        [2160, 3840],
        [1080 + 5, 3840]
    ], np.int32)
]
video_info = sv.VideoInfo.from_video_path(VideoAssets.MARKET_SQUARE.value)

zones = [sv.PolygonZone(polygon=polygon) for polygon in polygons]
zone_annotators = [
    sv.PolygonZoneAnnotator(
        zone=zone,
        color=colors.by_idx(index),
        thickness=4,
        text_thickness=8,
        text_scale=4
    )
    for index, zone
    in enumerate(zones)
]
box_annotators = [
    sv.BoundingBoxAnnotator(
        color=colors.by_idx(index),
        thickness=4,
    )
    for index
    in range(len(polygons))
]

# extract video frame
generator = sv.get_video_frames_generator(VideoAssets.MARKET_SQUARE.value)
iterator = iter(generator)
frame = next(iterator)

# detect
results = model(frame, size=1280)
detections = sv.Detections.from_yolov5(results)
detections = detections[(detections.class_id == 0) & (detections.confidence > 0.5)]

# 마스킹되는(즉, zone이 여러곳인 경우이기 때문에 이렇게 진행)
for zone, zone_annotator, box_annotator in zip(zones, zone_annotators, box_annotators):
    mask = zone.trigger(detections=detections)
    detections_filtered = detections[mask]
    frame = box_annotator.annotate(scene=frame, detections=detections_filtered)
    frame = zone_annotator.annotate(scene=frame)

%matplotlib inline
sv.plot_image(frame, (16, 16))
Python
복사

영상 데이터를 바탕으로 영역별 객체 탐지

colors = sv.ColorPalette.DEFAULT
polygons = [
    np.array([
        [540,  985 ],
        [1620, 985 ],
        [2160, 1920],
        [1620, 2855],
        [540,  2855],
        [0,    1920]
    ], np.int32),
    np.array([
        [0,    1920],
        [540,  985 ],
        [0,    0   ]
    ], np.int32),
    np.array([
        [1620, 985 ],
        [2160, 1920],
        [2160,    0]
    ], np.int32),
    np.array([
        [540,  985 ],
        [0,    0   ],
        [2160, 0   ],
        [1620, 985 ]
    ], np.int32),
    np.array([
        [0,    1920],
        [0,    3840],
        [540,  2855]
    ], np.int32),
    np.array([
        [2160, 1920],
        [1620, 2855],
        [2160, 3840]
    ], np.int32),
    np.array([
        [1620, 2855],
        [540,  2855],
        [0,    3840],
        [2160, 3840]
    ], np.int32)
]

video_info = sv.VideoInfo.from_video_path(VideoAssets.MARKET_SQUARE.value)

zones = [sv.PolygonZone(polygon=polygon) for polygon in polygons]
zone_annotators = [
    sv.PolygonZoneAnnotator(
        zone=zone,
        color=colors.by_idx(index),
        thickness=6,
        text_thickness=8,
        text_scale=4
    )
    for index, zone
    in enumerate(zones)
]
box_annotators = [
    sv.BoundingBoxAnnotator(
        color=colors.by_idx(index),
        thickness=4,
    )
    for index
    in range(len(polygons))
]

def process_frame(frame: np.ndarray, i) -> np.ndarray:
    # detect
    results = model(frame, size=1280)
    detections = sv.Detections.from_yolov5(results)
    detections = detections[(detections.class_id == 0) & (detections.confidence > 0.5)]

    for zone, zone_annotator, box_annotator in zip(zones, zone_annotators, box_annotators):
        mask = zone.trigger(detections=detections)
        detections_filtered = detections[mask]
        frame = box_annotator.annotate(scene=frame, detections=detections_filtered)
        frame = zone_annotator.annotate(scene=frame)

    return frame




sv.process_video(source_path=VideoAssets.MARKET_SQUARE.value, # 입력 비디오 경로
                 target_path=f"{HOME}/market-square-result.mp4", # 출력 비디오 경로
                 callback=process_frame) # 각 프레임마다 호출될 함수 


# 비디오 표시 함수
def show_video(video_path):
		# 비디오 파일을 바이너리 모드로 읽기
    mp4 = open(video_path,'rb').read()
    # 비디오 데이터를 base64로 인코딩
    data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
    
    # HTML 비디오 플레이어 생성
    return HTML(f'''
    <video width=800 controls> 
        <source src="{data_url}" type="video/mp4">
    </video>
    ''')


#from IPython import display
# display.clear_output()


# 비디오 표시 실행 : 처리된 비디오를 notebook 이나 colab에 표시 -> 저장 가능
show_video(f'{HOME}/market-square-result.mp4')
Python
복사