Using MP and CV2 Pose for Athletic Performance Analysis

This side learning project will explore how mediapipe can be used to improve athletic performance through analysis of pose estimation. The below code has been extrapolated from several open code sources and edited for use and understanding.

Colab Files

Pose Colab #1

Pose Colab #2

Introduction to MP

Computer Vision can aid in limb and joint detection in both images and videos. The Open-source Computer Vision library can help process images and extract certain necessary features. MediaPipe is a pipeline which can provide solutions for facial detection, poses, object detection, box tracking, and motion detection. Videos must be converted into RGB format. Pose marks landmark coordinates on the image/video that ultimately seperate an object from its background. MP first finds the Region of Interest (ROI) before adequately marking landmarks.

Exploring the Code

First we must import our python libraries:

#Mount google drive to access data  
from google.colab import drive
drive.mount('/content/drive')

import cv2
!pip install mediapipe==0.8.8
from google.colab.patches import cv2_imshow
import mediapipe as mp
import time
import math
from mediapipe.python._framework_bindings import packet

We must then initialize the MP pose object. Each argument is defined below:

static_image_mode: This will decide whether to treat the input as a video stream or as a batch of images. If set to True, MP will run an ROI detector for each image. Since the video input used does not need to localize the object in question multiple times, we have set it to False. If the video had unrelated ROIs, we might set this to True.
model_complexity: This sets the complexity of the landmark models (0,1,2). Landmark accuracy will increase with model complexity. 1 is the default.
smooth_landmarks: Decides whether to filter landmarks across images.
enable_segmentation: Decides whether to predict a segmentation mask
smooth_segmentation: Decides whether to filter segmentation across different images.
min_detection_confidence: Decides the minimum confidence value (0.0, 1.0) from the detection model for the detection to be successful.

min_tracking_confidence: Decides the minimum confidence value (0.0, 1.0) for the pose landmarks to be considered succesful. Detection will be invoked if it fails. Higher values increase robustness but with higher latency.

class poseDetector():
def __init__(self,
             static_image_mode=False,
             model_complexity=1,
             smooth_landmarks=True,
             enable_segmentation=False,
             smooth_segmentation=True,
             min_detection_confidence=0.5,
             min_tracking_confidence=0.5):
    
      self.static_image_mode = static_image_mode
      self.model_complexity = model_complexity
      self.smooth_landmarks = smooth_landmarks
      self.enable_segmentation = enable_segmentation
      self.smooth_segmentation = smooth_segmentation
      self.min_detection_confidence = min_detection_confidence
      self.min_tracking_confidence=min_tracking_confidence

        # DRAW LANDMARKS
      self.mpDraw = mp.solutions.drawing_utils
        #  Using a detector, the pipeline first locates the person/pose region-of-interest (ROI) within the frame. 
        #The tracker subsequently predicts the pose landmarks and segmentation mask within the ROI using the ROI-cropped frame as input
      self.mpPose = mp.solutions.pose
      self.pose = self.mpPose.Pose( static_image_mode,
             model_complexity,
             smooth_landmarks,
             enable_segmentation,
             smooth_segmentation,
             min_detection_confidence,
             min_tracking_confidence)

def findPose(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.pose.process(imgRGB)
        if self.results.pose_landmarks:
            if draw:
                self.mpDraw.draw_landmarks(img, self.results.pose_landmarks,
                                          self.mpPose.POSE_CONNECTIONS)
        return img
def findPosition(self, img, draw=True):
        self.lmList = []
        if self.results.pose_landmarks:
            for id, lm in enumerate(self.results.pose_landmarks.landmark):
                h, w, c = img.shape
                # print(id, lm)
                cx, cy = int(lm.x * w), int(lm.y * h)
                self.lmList.append([id, cx, cy])
                if draw:
                    cv2.circle(img, (cx, cy), 5, (255, 0, 0), cv2.FILLED)
        return self.lmList
#MEASURE ANGLE...

def findAngle(self, img, p1, p2, p3, draw=True):

        # Get the landmarks
        x1, y1 = self.lmList[p1][1:]
        x2, y2 = self.lmList[p2][1:]
        x3, y3 = self.lmList[p3][1:]

        # Calculate the Angle
        angle = math.degrees(math.atan2(y3 - y2, x3 - x2) -
                            math.atan2(y1 - y2, x1 - x2))
        if angle < 0:
            angle += 360

        if draw:
            cv2.line(img, (x1, y1), (x2, y2), (255, 255, 255), 3)
            cv2.line(img, (x3, y3), (x2, y2), (255, 255, 255), 3)
            cv2.circle(img, (x1, y1), 10, (0, 0, 255), cv2.FILLED)
            cv2.circle(img, (x1, y1), 15, (0, 0, 255), 2)
            cv2.circle(img, (x2, y2), 10, (0, 0, 255), cv2.FILLED)
            cv2.circle(img, (x2, y2), 15, (0, 0, 255), 2)
            cv2.circle(img, (x3, y3), 10, (0, 0, 255), cv2.FILLED)
            cv2.circle(img, (x3, y3), 15, (0, 0, 255), 2)
            cv2.putText(img, str(int(angle)), (x2 - 50, y2 + 50),
                        cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 255), 2)
        return angle