This capstone project will attempt to complete Phase 1 of a wardrobe recommendation algorithm. In Phase 1, we will classify images of fashion items by gender, item category, type of item, season worn, colors, year, and usage. In order to classify the image, we will need to create a multi-label system rather than a binary label system. Then, a CNN will be used to predict classifications.

Introducing the Data

In order to make a prototype model, I have found a Fashion Product Image Dataset. This dataset can be downloaded from Kaggle. Due to time, memory, and RAM constrictions, we will be using the smaller version of the dataset (280 MB instead of 15GB, the images are reduced in size).

from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np 
import pandas as pd 
import os 

PATH = "/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/"
print(os.listdir(PATH))

#create df
df = pd.read_csv(PATH + "styles.csv",nrows=20000, error_bad_lines=False)
df['image'] = df.apply(lambda row: str(row['id']) + ".jpg", axis=1)
df = df.reset_index(drop=True)
df.head(10)

This dataset is a collection of 44,000 products which are classified into 7 categories: Gender, a masterCategory, articleType, baseColour, season, year, and usage. Within these categories, the images are classified into specific labels. Below are the unique labels within each category in a sample size of 20,000 images:

target = ['gender', 'masterCategory', 'subCategory', 'articleType',
       'baseColour', 'season', 'year', 'usage']
for col in target:
    print(col)
    print(df[col].unique())
    print('-------------------------')

For this project, the input variable will be images while the output variable (target) will be the above categories. Because we have more than one target classification, we will need to create a system in which an image may be classified as multiple labels. Because the labels are not mutually exclusive, we cannot formulate a multi-class problem.

Below is an example of what a multi label/multi class neural network may look like:

Computer Vision & FAST Methods

In this project, we will be using computer vision to categorize and detect products. The following images are example images from the dataset:

from matplotlib import pyplot as plt
import cv2

for i in range(1, 10):
    
    thisId = str(df[i:i+1].id.values[0])
    
    imageName = '/kaggle/input/fashion-product-images-small/myntradataset/images/'+ thisId +'.jpg'
    image = cv2.imread(imageName)
    image = RGB_im = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    plt.imshow(image)
    plt.title(f'Image {thisId}')
    plt.show()

One method in which a machine can detect products is by applying a corner detection test known as features from accelerated segment test (FAST). This method extracts corner points by using a circle of sixteen pixels around the point in question and uses a brightness threshold value to identify whether pixels around the point are white space.

We can apply a FAST algorithm to the example images to visualize this method:

from matplotlib import pyplot as plt
import cv2

for i in range(1, 10):
    
    thisId = str(df[i:i+1].id.values[0])
    
    imageName = '/kaggle/input/fashion-product-images-small/myntradataset/images/'+ thisId +'.jpg'
    image = cv2.imread(imageName)
    image = RGB_im = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    plt.imshow(image)
    fast = cv2.FastFeatureDetector_create(50)
    kp = fast.detect(image,None)
    img2 = cv2.drawKeypoints(image, kp, None, color=(255,0,0))
    print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)))
    fast_image=cv2.drawKeypoints(image,kp,image)
    plt.imshow(fast_image);plt.title('FAST Detector')
    plt.show()

Data Visualizations

Let us explore frequency of labels in each category:

Master Category

In the following graph, we can see that the majority of the images contain apparel (~9,000) with accessories coming in second (~5,000). Apparel labels are slightly less than half of the sampled group.

plt.figure(figsize=(7,20))
df.masterCategory.value_counts().sort_values().plot(kind='barh')

Article type

In the chart below, our most frequent labels are as follows: tshirt, shirt, casual shoes, watches, kurtas, tops, and handbags. There are almost 3,000 labels for tshirts. All other article types previously listed are between the ranges of 1500 and 1000.

plt.figure(figsize=(7,20))
df.articleType.value_counts().sort_values().plot(kind='barh')

SubCategory

The chart shows that topwear and shoes are the most frequent labels, both having a frequency between 4,000 and 6,000.

plt.figure(figsize=(7,20))
df.subCategory.value_counts().sort_values().plot(kind='barh')

Season

In the graph below, we see that a little less than half of the items are summer items (~10,000). The second most frequent season label is fall, with about 5,000 images.

plt.figure(figsize=(7,20))
df.season.value_counts().sort_values().plot(kind='barh')

Year

Below we can see that the most frequent year labels are 2012, 2011, 2016, and 2017 all in respective order of most to least frequent. Items labeled 2012 carry number to about 7,000 with 2011 taking a close second at ~6,000

plt.figure(figsize=(7,20))
df.year.value_counts().sort_values().plot(kind='barh')

Usage

The bar chart is heavily skewed, showing that the majority of images are for casual usage. In fact, out of the 20,000 images, about 15,000 are casual fashionwear.

plt.figure(figsize=(7,20))
df.usage.value_counts().sort_values().plot(kind='barh')

Gender

This graph shows that the most frequent gender label is man, with woman coming in close second. Both have about 8,000-9,000 images

plt.figure(figsize=(7,20))
df.gender.value_counts().sort_values().plot(kind='barh')

Color

This graph shows that the most frequent color labels are black, white, blue, brown, gray, and red. Black dominates all other colors, with a frequency of about 4,000.

plt.figure(figsize=(7,20))
df.baseColour.value_counts().sort_values().plot(kind='barh')

From the charts, we can see that some categories are quite skewed in frequency, thus leaving our model to be susceptible to bias. For example, the color black dominated the color category. If we introduce a fluorescent green item (the least frequent color), the probability that the model will accurately label it is much lower than if we introduced a black item. This is because the model has not been able to train itself equally in all color subcategories. The color category, article type category, and subcategory will be most affected by this uneven data. Other categories such as usage and year may see similar trends due to uneven data.

Data Preprocessing

Before creating a model, the data will need to be preprocessed. In previous sections, we have downloaded the dataset, created a dataframe, and established target categories. We must now process the image dataset. We must be able to read all images, reset them to a particular size, and insert them into a data list.

from tensorflow.keras.preprocessing.image import img_to_array
import cv2

#Create Data List
data = []

#Resizing Numbers
IX = 80
IY = 60

#List holding invalid images
invalid_ids = []

#Read and Resize Images

for name in df.id:
    try:
        image = cv2.imread('/kaggle/input/fashion-product-images-small/myntradataset/images/'+str(name)+'.jpg')
        image = cv2.resize(image, (IX,IY) )
        image = img_to_array(image)
        data.append(image)        
    except: 
        # Images for certain ids are missing, so they are not added to the dataset  
        invalid_ids.append(name)

Now that we have a list of resized images, we must move on to processing the target labels. We must go through each row and add each labels to a list while making sure each label combination is kept together.

labels = []

for index, row in df.iterrows():
     #invalid ids
    if row['id'] in invalid_ids:
        continue
     
    tags = []
    
    # go through each column in the specified row and add to list
    for col in target:
        tags.append(row[col])

#append the sublist to the labels list
    labels.append(tags)

We can convert both the data and labels into numpy arrays:

import numpy as np

data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)

print(labels)

Our Labels Array will now look as follows:

When working with categorical data, we often need to one hot encode our data into numbers. One hot encoding will assign binary vectors to categorical data. On the first attempt, the entire dataset was one-hot encoded, but the method expanded the dataframe to over 222 columns. Because we are one hot encoding the target variables, we can use the Label Binarizer from Sklearn. In this project we have multiple categories, and thus must use the MultiLabelBinarizer to one hot encode our categories.

#binary vectors for each row
from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
labels = mlb.fit_transform(labels)

print(mlb.classes_)
print(labels[0])

Models

For this classification problem, we will be using a convolutional neural network. We must first split the training and test data. For this model, we will be using an 70:30 split. A typical CNN design begins with feature extraction and finishes with classification. Feature extraction is performed by alternating convolution layers with sublayers.

Let us split the train and test set:

from sklearn.model_selection import train_test_split

# splitting data into testing and training set 
(x_train, x_test, y_train, y_test) = train_test_split(data,labels, test_size=0.3, random_state=2021)

Let us load the necessary libraries:

from numpy import mean
from numpy import std
import tensorflow as tf
from sklearn.model_selection import KFold
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Dropout
from keras.layers import LeakyReLU
from keras.optimizers import SGD, Adam

We must also define the input shape of the pictures in the Neural Network. These will signify the shape and colors used in the pictures. 2 signifies black and white, 3 signifies the color wheel.

inputShape = (IY, IX, 3) #shape,shape,color

Model 1

Before we attempt a model, let us consider possible factors. Since we are making a multi-label classification algorithm we need:

Input should match the shape and dimensions
The output layer must have the same amount of neurons as the number of labels
A sigmoid function for the activation function in the output layer (softmax for multiclass)
Binary cross-entropy loss function
Dropout to discard neurons It is important to note that the sigmoid activation function predicts the probability of the image belonging to a specific label, so the output will be a vector of numbers pertaining to each class.

When I first started this project, I did not plan out my CNN’s architecture. I reduced the architecture to one output neuron and had a prediction accuracy of 98%. I did not realize that the sigmoid function returned probabilities, and thus did not think there was anything wrong with the model until I started printing/checking the algorithm and doing the writeup. My new algorithm covers each bullet point above and does place multiple labels on images.

def define_model():
    model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(64, (3, 3), activation='relu',padding="same", input_shape=inputShape),
      tf.keras.layers.MaxPooling2D(2, 2), #downsize
      tf.keras.layers.Dropout(0.25),   #to prevent having dead neurons

      tf.keras.layers.Dense(64,  activation='relu'),    
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(len(mlb.classes_), activation='sigmoid')
    ])
    
    model.compile(optimizer = 'adam',
              loss = 'binary_crossentropy')
    return model
model.summary()

Let us plot the loss function over 100 epochs:

history= model.fit(x_train, y_train, epochs=100, steps_per_epoch = 1, batch_size = 5)
%matplotlib inline

import matplotlib.image  as mpimg
import matplotlib.pyplot as plt

loss=history.history['loss']
epochs=range(0, 100) # Get number of epochs

plt.plot(epochs, loss, 'r', "Training Loss")
plt.figure()

Because our model returns probabilities, we must convert the output to binary to resemble the labels. We can do this by rounding and then tranforming the binary back to labels.

yhat = model.predict(x_train)
yhat = yhat.round()
true_test_labels = mlb.inverse_transform(y_train)
pred_test_labels = mlb.inverse_transform(yhat)

correct = 0
wrong = 0

We then must make an accuracy function in order to evaluate the model:

for i in range(len(y_train)):

    true_labels = list(true_test_labels[i])

    pred_labels = list(pred_test_labels[i])

    label1 = true_labels[0]
    label2 = true_labels[1]

    if label1 in pred_labels:
        correct+=1
    else:
        wrong+=1

    if label2 in pred_labels:
        correct+=1
    else:
        wrong+=1    

print('correct: ', correct)
print('missing/wrong: ', wrong)
print('Accuracy: ',correct/(correct+wrong))   

Model 1’s accuracy is 72.3 % showing that there is room for improvement in the model.

Predictions on the Testing Set

In order to cross validate our results, we must now predict the labels for the testing set:

yhat = model.predict(x_test)
yhat = yhat.round()
true_test_labels = mlb.inverse_transform(y_test)
pred_test_labels = mlb.inverse_transform(yhat)

correct = 0
wrong = 0
# Evaluating the predictions of the model
for i in range(len(y_test)):
    true_labels = list(true_test_labels[i])
    pred_labels = list(pred_test_labels[i])
    label1 = true_labels[0]
    label2 = true_labels[1]

    if label1 in pred_labels:
        correct+=1
    else:
        wrong+=1

    if label2 in pred_labels:
        correct+=1
    else:
        wrong+=1    

print('correct: ', correct)
print('missing/wrong: ', wrong)
print('Accuracy: ',correct/(correct+wrong))   

The testing set’s accuracy is slightly less than the training set, thus signifying the model may be slightly overfit.

Let us now compare this model at other hyperparameters:

Model 2

Let us try to improve the architecture of our Neural Network and see if it may increase the accuracy. Below we have increased the dropout and reduced the amount of neurons.

def define_model():
    model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(64, (3, 3), activation='relu',padding="same", input_shape=inputShape),
      tf.keras.layers.MaxPooling2D(2, 2), #downsize
      tf.keras.layers.Dropout(0.5),   #to prevent having dead neurons

      tf.keras.layers.Dense(32,  activation='relu'),    
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(len(mlb.classes_), activation='sigmoid')
    ])
    
    model.compile(optimizer = 'adam',
              loss = 'binary_crossentropy')
    return model
    
history= model.fit(x_train, y_train, epochs=100, steps_per_epoch = 15, batch_size = 32)

Training Accuracy:

Testing Accuracy:

Our Training and Testing accuracy scored higher with this model.

Model 3

Because we are feeding certain types of pictures into the model, we can perform data augmentation on our dataset in order to train the model to recognize all images at all angles. Data Augmentation can rotate, crop, zoom, brighten, or flip the current images and add them to the dataset. This technique can be done manually or done algorithmically and should help improve the accuracy. My attempt is below:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu',padding="same", input_shape=inputShape),
    tf.keras.layers.MaxPooling2D(2, 2), #downsize
    tf.keras.layers.Dropout(0.5),   #to prevent having dead neurons

    tf.keras.layers.Dense(32,  activation='relu'),    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(len(mlb.classes_), activation='sigmoid')
])

from keras.preprocessing.image import ImageDataGenerator
data_augmentation = True
batch_size=1
epochs=1
optimizer = 'adam'

model.compile(optimizer = optimizer,
              loss = 'binary_crossentropy')


x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, trainY,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, testY),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        zca_epsilon=1e-06,  # epsilon for ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        # randomly shift images horizontally (fraction of total width)
        width_shift_range=0.1,
        # randomly shift images vertically (fraction of total height)
        height_shift_range=0.1,
        shear_range=0.,  # set range for random shear
        zoom_range=0.,  # set range for random zoom
        channel_shift_range=0.,  # set range for random channel shifts
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        cval=0.,  # value used for fill_mode = "constant"
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False,  # randomly flip images
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

    # Compute quantities required for feature-wise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)
    
# Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train,
                                     batch_size=1),
                        epochs=1,
                        validation_data=(x_test, y_test),
                        workers=8)    
                        
yhat = model.predict(x_train)
yhat = yhat.round()
true_test_labels = mlb.inverse_transform(y_train)
pred_test_labels = mlb.inverse_transform(yhat)

correct = 0
wrong = 0
# Evaluating the predictions of the model
for i in range(len(y_test)):
    true_labels = list(true_test_labels[i])
    pred_labels = list(pred_test_labels[i])
    label1 = true_labels[0]
    label2 = true_labels[1]

    if label1 in pred_labels:
        correct+=1
    else:
        wrong+=1

    if label2 in pred_labels:
        correct+=1
    else:
        wrong+=1    

print('correct: ', correct)
print('missing/wrong: ', wrong)
print('Accuracy: ',correct/(correct+wrong))                           

Unfortunately, it seems that I may have applied this wrong as it severly impacted the model. Due to time constraints, I will continue to try to apply this in my free time.

Conclusion

From the results, it is clear that the model still needs a lot of work. Perhaps eliminating some of the labels that would be hard to predict in the small image datset should be removed such as the year of the clothes or the season it is worn in. Another major error in the pre-processing concerns the class imbalance of the dataset. In the future, it would be recommended to create synthetic data for the model or add futher images to the dataset. Early stopping or a higher dropout may regularize the neural network in the future.

In the future, I would also like to make the accuracy function an actual python function that can be placed as a DIY metric in order to be able to track overfitting over epochs. I also would like to modify the below algorithm in order to prompt the user for an image:

import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(150, 150))
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)

  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  print(classes[0])
  if classes[0]>0.5:
    print(fn + " is a dog")
  else:
    print(fn + " is a cat")

This algorithm was taken from the google colab tensorflow tutorials on youtube. It prompts the user for an image and predicts the classification. The above code is a binary classification model but it may be applicable for a multi-label model.

Lastly, it would be interesting to see if this model could be completely reimagined. The dataset is divided into categories with subcategories with smaller subcategories. For example, perhaps a neural network can be designed to designate an image as an accessory, apparel, footwear, etc (the master category). Then, depending on which category it chooses, the image could be released into another neural network that divides it into a subcategory. The question would then be whether this is more computationally expensive and perhaps has greater overall error. Perhaps there is a way to design a more complex neural network in which optimization constraints are stronger. I will be further looking at current machine learning algorithms such as the Pin2Vec Pinterest Algorithm and Image Similarity Batch Pipeline (image below). These algorithms may help future phases of the project.

Multi-Label Classification of Fashion Products