Annette-s-Responses

View on GitHub

July 21, 2020

Premade estimators

  1. How did you split the labels from the training set? What was the name of the labels dataset?
    • In order to split the labels, we use the pop function. The model will be using the species labels which are Setosa, Versicolor, Virginica.
  2. List 5 different estimators from tf.estimator and include the base command as you would write it in a script
    • DNNClassifier:
      • For deep models that perform multi-class classification

          tf.estimator.DNNClassifier(
         				feature_columns=my_feature_columns,
          				hidden_units=[30, 10], n_classes=3)
        
    • DNNLinearCombinedClassifier:
      • For wide and deep models

          tf.estimator.DNNLinearCombinedClassifier(
         				model_dir=None, linear_feature_columns=None, linear_optimizer='Ftrl',
             dnn_feature_columns=None, dnn_optimizer='Adagrad', dnn_hidden_units=None,
         				dnn_activation_fn=tf.nn.relu, dnn_dropout=None, n_classes=2, weight_column=None,
             label_vocabulary=None, config=None, warm_start_from=None,
         				loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, batch_norm=False,
         				linear_sparse_combiner='sum'
          )
        
    • LinearClassifier:
      • For classifiers based on linear models

        tf.estimator.LinearClassifier( feature_columns, model_dir=None, n_classes=2, weight_column=None, label_vocabulary=None, optimizer=’Ftrl’, config=None, warm_start_from=None, loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, sparse_combiner=’sum’ )

    • LinearRegressor:
      • Estimator for linear regression problems

        tf.estimator.LinearRegressor( feature_columns, model_dir=None, label_dimension=1, weight_column=None, optimizer=’Ftrl’, config=None, warm_start_from=None, loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, sparse_combiner=’sum’ )

    • DNNRegressor:
      • A regressor for DNN models

        tf.estimator.DNNRegressor( hidden_units, feature_columns, model_dir=None, label_dimension=1, weight_column=None, optimizer=’Adagrad’, activation_fn=tf.nn.relu, dropout=None, config=None, warm_start_from=None, loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, batch_norm=False )

  3. What are the purposes input functions and defining feature columns?
    • Input functions: return a Dataset object which then outputs a label for the training batch and a dictionary of feature column names that map them to the tensors containing feature data. In other words, input function pass the input data to the model
    • Feature Columns: specifications for how model should interpret the data from the features dictionaries. The feature columns describe the features you want the model to use. estimators
  4. Describe the command classifier.train() in detail. What is the classifier and how did you define it? Which nested function (and how have you defined it) are you applying to the training and test detests?
    • The classifier.train() command trains the model. Its input can be the steps of the function as well as the training input function. The first part of the command, classifier, describes which estimator we are using. In this example, we are referring to DNN Classifier. However, if it were the LinearRegressor estimator, our command could look like linear_regressor.train(). The classifier command is defined earlier in the model when we instantiate the estimator. For example, classifier has been defined as classifier=tf.estimator.DNNClassifier(). Thus, this is the part of the code that actually lets us apply the estimator!
    • Our training input function was defined with all the other input functions; however, this one takes the training set features and labels, specifies batch size, whether the data is shuffled, and the number of epochs to iterate over the data.
  5. Redefine your classifier using the DNNLinearCombinedClassifier() as well as the LinearClassifier(). Retrain your model and compare the results using the three different estimators you instantiated. Rank the three estimators in terms of their performance.
    • LinearClassifier: 0.967
    • DNNLinearCombinedClassifier:0.733
    • DNNClassifier: 0.533
    • The Linear Classifier worked best, the DNNLinearCombinedClassifier worked second best, and the DNNClassifier was the worst

Build a Linear Model

  1. Using the dftrain dataset, upload an image where you used the seaborn library to produce a sns.pairplot(). Also include a histogram of age using the training set and compare it to the seaborn plot for that same feature (variable). What interpretation can you provide of the data based on this plot?

PAIRPlOT histogram