July 20, 2020
Final Project
Migraines have been named by the WHO as one of the top 10 worst conditions in the world. Of the 10-15% of people in the world who experience migraines, 90% of victims lose normal functionality. That’s about 630 million people around the world! A couple of years ago, I myself began to experience migraines. These were not just chronic headaches; I would experience distorted vision, neck stiffness, visual auras, sensitivity to light/sound, mood swings, nausea, and pain…lots of pain. It didn’t help when out of the blue, I began to experience chronic anxiety. Although the two seem quite different in nature, studies have found that 50% of those with migraines experience anxiety disorders. With so many statistics out there and the WHO’s recognition as a taxing condition, one would think there is tons of research surrounding migraines. However, as I searched for potential causes, I found little information. In fact, the medical community to this day argues about the causes of migraines, and although a correlation between anxiety and migraines have been established, scientists still do not know more about the nature of this relationship. Thus, this is an excellent field to apply Machine Learning! When I first explored this idea, I wanted to create a CNN using known biomarkers of migraines and anxiety to further try to look for an explanation. However, due to the limitations in research, I may need to simplify this idea further. Biomarkers for both migraines and anxiety are still widely debated, and the few machine learning models that have been applied to the field failed to provide significant findings. Thus, depending on the amount of data I find, I plan to create a CNN testing possible biomarkers with image detection characteristics (for MRI detection/other imaging), cross feature columns, and an analysis of contributing features (violin plots/predicted probability bar plots). Currently, I am compiling a long list of possible biomarkers for both GAD and migraines. I will be looking into genetic biomarkers, structural abnormalities, neuroendocrinology, and other biochemical changes. Although it is common to look for a particular biomarker that differentiates one condition from another, scientists have recently theorized that the key to understanding migraines is a combination of biomarkers. Although I am still searching for data and my independent variables, I did find that one of the first machine learning algorithms applied to this field used imaging data from the Brain Genomics Superstruct Project. The only possibility I have completely discarded using is EEG data, due to the fact that migraines are episodic and do not show up on an EEG unless being tested while having one.
Cats and Dogs
- Question 1 Which optimizer have you selected, and how might it compare to other possible choices? (have a look at this site
- We selected optimizer RMSprop from the tensorflow keras library. Before we go into RMSprop optimizers, lets look at optimizers in general. Optimizers are formulations that minimize loss and improve speed by tackling massive optimization problems. Calculus tells us that optimization problems can be solved by looking at the gradient of a loss function and searching for the minima. The optimizer first checks which direction it must move in to find the minimum value. Then, it must decide the size of the step it will take in that direction, otherwise known as the learning rate. The minima should have a gradient of zero.
- However, our optimizer may run into some challenges with this method. If there is more than one local minima, it may get stuck at the wrong minimum value. A second problem for us may occur if the loss function has any saddle points, as the saddle point is both a maxima and a minima. Some optimizers will reduce these errors by applying theories of randomness. But what happens when pathological curvature becomes a problem for us? Pathological curvature occurs when the structure of a function is irregular, such as having ravines. In order to account for curvature, the second derivative is brought into the conversation.
- So how does this all relate to RMSprop? RMSprop stands for Root Mean Square Propogation and adjusts the learning rate automatically for each parameter. By implementing very scary looking equations using gradient at time t along some parameter, initial learning rate, and the exponential average of the squares of the gradients.
- The first equation computes the exponential average of the square of the gradient. This is done for each parameter. This helps us weigh the recent gradient updates with the past ones. The second equation determines step size, which is determined by the exponential average. This equation will help avoid bouncing between two ridges. The third equation updates the step. In simpler terms, RMSProp will decrease the size of the steps towards the minima if they are too large.
- So why did we stop using Adam for this one? Ones best optimizer usually relies on use of effective parameters. Adam is sensitive to such, thus making it harder to tune. Secondly, using moment will benefit some types of problems, but affect others.
- Question 2 Describe your selected loss function and it’s implementation. How is it effectively penalizing bad predictions?
- We are using the Binary_crossentropy loss function. This implies that there are two labels, one of which we are trying to predict for some feature. This function works with probabilities; the probability of being label A is 1.0 while the probability of being label B is 0.0. Although it may seem counterintuitive to say one has a label of 0.0, it is simply saying that label B is the absence of label A. The loss function will evaluate how good or bad a prediction will be. By using a logistic regression to classify our points, we can know the probability of it being label A for any x. By comparing each probability to its true probability using the negative log, we can average all the losses at each point and compute one binary cross-entropy/log loss.
- Question 3 What is the purpose of the metric= argument in your model.compile() function?
- A metric also judges the performance of the model; however, the results do not train the model. The model.compile() function is used to take a list of metrics.
- Question 4 Plot the accuracy and loss results for both the training and test datasets. Include these in your response. Assess the model and describe how good you think it performed.
- The first graph depicts training and validation accuracy while the second features their respective loss. In the first graph, the training accuracy (red line) increases as epochs increase while the validation accuracy plateaus. In the second graph, the training loss decreases and the validation loss slightly begins to increase around the seventh epoch. Thus, we can determine that after this point, the model is beginning to be overfit.
- Question 5 Use the model to predict 3 dog images and 3 cat images. Upload you images and the prediction. How did your model perform in practice? Do you have any ideas of how to improve the model’s performance? Image 1 was labeled as a dog Image 2 was labeled as a dog Image 3 was labeled as a cat Image 4 was labeled as a cat Image 5 was labeled as a cat Image 6 was labeled as a dog