This side learning project will explore how mediapipe can be used to improve athletic performance through analysis of pose estimation. The below code has been extrapolated from several open code sources and edited for use and understanding.

Colab Files

Pose Colab #1

Pose Colab #2

Introduction to MP

Computer Vision can aid in limb and joint detection in both images and videos. The Open-source Computer Vision library can help process images and extract certain necessary features. MediaPipe is a pipeline which can provide solutions for facial detection, poses, object detection, box tracking, and motion detection. Videos must be converted into RGB format. Pose marks landmark coordinates on the image/video that ultimately seperate an object from its background. MP first finds the Region of Interest (ROI) before adequately marking landmarks.

image

Exploring the Code

First we must import our python libraries:

#Mount google drive to access data  
from google.colab import drive
drive.mount('/content/drive')

import cv2
!pip install mediapipe==0.8.8
from google.colab.patches import cv2_imshow
import mediapipe as mp
import time
import math
from mediapipe.python._framework_bindings import packet

We must then initialize the MP pose object. Each argument is defined below:

  • static_image_mode: This will decide whether to treat the input as a video stream or as a batch of images. If set to True, MP will run an ROI detector for each image. Since the video input used does not need to localize the object in question multiple times, we have set it to False. If the video had unrelated ROIs, we might set this to True.
  • model_complexity: This sets the complexity of the landmark models (0,1,2). Landmark accuracy will increase with model complexity. 1 is the default.
  • smooth_landmarks: Decides whether to filter landmarks across images.
  • enable_segmentation: Decides whether to predict a segmentation mask
  • smooth_segmentation: Decides whether to filter segmentation across different images.
  • min_detection_confidence: Decides the minimum confidence value (0.0, 1.0) from the detection model for the detection to be successful.
  • min_tracking_confidence: Decides the minimum confidence value (0.0, 1.0) for the pose landmarks to be considered succesful. Detection will be invoked if it fails. Higher values increase robustness but with higher latency.
    class poseDetector():
    def __init__(self,
                 static_image_mode=False,
                 model_complexity=1,
                 smooth_landmarks=True,
                 enable_segmentation=False,
                 smooth_segmentation=True,
                 min_detection_confidence=0.5,
                 min_tracking_confidence=0.5):
        
          self.static_image_mode = static_image_mode
          self.model_complexity = model_complexity
          self.smooth_landmarks = smooth_landmarks
          self.enable_segmentation = enable_segmentation
          self.smooth_segmentation = smooth_segmentation
          self.min_detection_confidence = min_detection_confidence
          self.min_tracking_confidence=min_tracking_confidence
    
    
            # DRAW LANDMARKS
          self.mpDraw = mp.solutions.drawing_utils
            #  Using a detector, the pipeline first locates the person/pose region-of-interest (ROI) within the frame. 
            #The tracker subsequently predicts the pose landmarks and segmentation mask within the ROI using the ROI-cropped frame as input
          self.mpPose = mp.solutions.pose
          self.pose = self.mpPose.Pose( static_image_mode,
                 model_complexity,
                 smooth_landmarks,
                 enable_segmentation,
                 smooth_segmentation,
                 min_detection_confidence,
                 min_tracking_confidence)
    
    def findPose(self, img, draw=True):
            imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            self.results = self.pose.process(imgRGB)
            if self.results.pose_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, self.results.pose_landmarks,
                                              self.mpPose.POSE_CONNECTIONS)
            return img
    def findPosition(self, img, draw=True):
            self.lmList = []
            if self.results.pose_landmarks:
                for id, lm in enumerate(self.results.pose_landmarks.landmark):
                    h, w, c = img.shape
                    # print(id, lm)
                    cx, cy = int(lm.x * w), int(lm.y * h)
                    self.lmList.append([id, cx, cy])
                    if draw:
                        cv2.circle(img, (cx, cy), 5, (255, 0, 0), cv2.FILLED)
            return self.lmList
    #MEASURE ANGLE...
    
    def findAngle(self, img, p1, p2, p3, draw=True):
    
            # Get the landmarks
            x1, y1 = self.lmList[p1][1:]
            x2, y2 = self.lmList[p2][1:]
            x3, y3 = self.lmList[p3][1:]
    
            # Calculate the Angle
            angle = math.degrees(math.atan2(y3 - y2, x3 - x2) -
                                math.atan2(y1 - y2, x1 - x2))
            if angle < 0:
                angle += 360
    
            if draw:
                cv2.line(img, (x1, y1), (x2, y2), (255, 255, 255), 3)
                cv2.line(img, (x3, y3), (x2, y2), (255, 255, 255), 3)
                cv2.circle(img, (x1, y1), 10, (0, 0, 255), cv2.FILLED)
                cv2.circle(img, (x1, y1), 15, (0, 0, 255), 2)
                cv2.circle(img, (x2, y2), 10, (0, 0, 255), cv2.FILLED)
                cv2.circle(img, (x2, y2), 15, (0, 0, 255), 2)
                cv2.circle(img, (x3, y3), 10, (0, 0, 255), cv2.FILLED)
                cv2.circle(img, (x3, y3), 15, (0, 0, 255), 2)
                cv2.putText(img, str(int(angle)), (x2 - 50, y2 + 50),
                            cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 255), 2)
            return angle            
    

    image