Predict prices using regression with ML.NET

Create a console application

  1. Create a .NET Core Console Application called “TaxiFarePrediction”.
  2. Create a directory named Data in your project to store the data set and model files.
  3. Install the Microsoft.ML NuGet Package:In Solution Explorer, right-click the project and select Manage NuGet Packages. Choose “nuget.org” as the Package source, select the Browse tab, search for Microsoft.ML, select the package in the list, and select the Install button. Select the OK button on the Preview Changesdialog and then select the I Accept button on the License Acceptance dialog if you agree with the license terms for the packages listed. Do the same for the Microsoft.ML.FastTree Nuget package.

Prepare and understand the data

  1. Download the taxi-fare-train.csv and the taxi-fare-test.csv data sets and save them to the Datafolder you’ve created at the previous step. We use these data sets to train the machine learning model and then evaluate how accurate the model is. These data sets are originally from the NYC TLC Taxi Trip data set.
  2. In Solution Explorer, right-click each of the *.csv files and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.
  3. Open the taxi-fare-train.csv data set and look at column headers in the first row. Take a look at each of the columns. Understand the data and decide which columns are features and which one is the label.

The label is the column you want to predict. The identified Featuresare the inputs you give the model to predict the Label.

The provided data set contains the following columns:

  • vendor_id: The ID of the taxi vendor is a feature.
  • rate_code: The rate type of the taxi trip is a feature.
  • passenger_count: The number of passengers on the trip is a feature.
  • trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment you don’t know how long the trip would take. Thus, the trip time is not a feature and you’ll exclude this column from the model.
  • trip_distance: The distance of the trip is a feature.
  • payment_type: The payment method (cash or credit card) is a feature.
  • fare_amount: The total taxi fare paid is the label.

Create data classes

Create classes for the input data and the predictions:

  1. In Solution Explorer, right-click the project, and then select Add > New Item.
  2. In the Add New Item dialog box, select Class and change the Name field to TaxiTrip.cs. Then, select the Add button.
  3. Add the following using directives to the new file:C#Copyusing Microsoft.ML.Data;

Remove the existing class definition and add the following code, which has two classes TaxiTrip and TaxiTripFarePrediction, to the TaxiTrip.cs file:C#Copy

public class TaxiTrip
{
    [LoadColumn(0)]
    public string VendorId;

    [LoadColumn(1)]
    public string RateCode;

    [LoadColumn(2)]
    public float PassengerCount;

    [LoadColumn(3)]
    public float TripTime;

    [LoadColumn(4)]
    public float TripDistance;

    [LoadColumn(5)]
    public string PaymentType;

    [LoadColumn(6)]
    public float FareAmount;
}

public class TaxiTripFarePrediction
{
    [ColumnName("Score")]
    public float FareAmount;
}

TaxiTrip is the input data class and has definitions for each of the data set columns. Use the LoadColumnAttribute attribute to specify the indices of the source columns in the data set.

The TaxiTripFarePrediction class represents predicted results. It has a single float field, FareAmount, with a Score ColumnNameAttribute attribute applied. In case of the regression task the Score column contains predicted label values.

 Note

Use the float type to represent floating-point values in the input and prediction data classes.

Define data and model paths

Add the following additional using statements to the top of the Program.cs file:C#Copy

using System;
using System.IO;
using Microsoft.ML;

You need to create three fields to hold the paths to the files with data sets and the file to save the model:

  • _trainDataPath contains the path to the file with the data set used to train the model.
  • _testDataPath contains the path to the file with the data set used to evaluate the model.
  • _modelPath contains the path to the file where the trained model is stored.

Add the following code right above the Main method to specify those paths and for the _textLoadervariable:C#Copy

static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

All ML.NET operations start in the MLContext class. Initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. It’s similar, conceptually, to DBContext in Entity Framework.

Initialize variables in Main

Replace the Console.WriteLine("Hello World!") line in the Main method with the following code to declare and initialize the mlContext variable:C#Copy

MLContext mlContext = new MLContext(seed: 0);

Add the following as the next line of code in the Main method to call the Train method:C#Copy

var model = Train(mlContext, _trainDataPath);

The Train() method executes the following tasks:

  • Loads the data.
  • Extracts and transforms the data.
  • Trains the model.
  • Returns the model.

The Train method trains the model. Create that method just below Main, using the following code:C#Copy

public static ITransformer Train(MLContext mlContext, string dataPath)
{

}

Load and transform data

ML.NET uses the IDataView class as a flexible, efficient way of describing numeric or text tabular data. IDataView can load either text files or in real time (for example, SQL database or log files). Add the following code as the first line of the Train() method:C#Copy

IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(dataPath, hasHeader: true, separatorChar: ',');

As you want to predict the taxi trip fare, the FareAmount column is the Label that you will predict (the output of the model)Use the CopyColumnsEstimator transformation class to copy FareAmount, and add the following code:C#Copy

var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName:"FareAmount")

The algorithm that trains the model requires numeric features, so you have to transform the categorical data (VendorIdRateCode, and PaymentType) values into numbers (VendorIdEncodedRateCodeEncoded, and PaymentTypeEncoded). To do that, use the OneHotEncodingTransformertransformation class, which assigns different numeric key values to the different values in each of the columns, and add the following code:C#Copy

.Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "VendorIdEncoded", inputColumnName:"VendorId"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "RateCodeEncoded", inputColumnName: "RateCode"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "PaymentTypeEncoded", inputColumnName: "PaymentType"))

The last step in data preparation combines all of the feature columns into the Features column using the mlContext.Transforms.Concatenate transformation class. By default, a learning algorithm processes only features from the Features column. Add the following code:C#Copy

.Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PassengerCount", "TripTime", "TripDistance", "PaymentTypeEncoded"))

Choose a learning algorithm

This problem is about predicting a taxi trip fare in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers or paying with a credit card instead of cash. You want to predict the price value, which is a real value, based on the other factors in the dataset. To do that, you choose a regression machine learning task.

Append the FastTreeRegressionTrainer machine learning task to the data transformation definitions by adding the following as the next line of code in Train():C#Copy

.Append(mlContext.Regression.Trainers.FastTree());

Train the model

Fit the model to the training dataview and return the trained model by adding the following line of code in the Train() method:C#Copy

var model = pipeline.Fit(dataView);

The Fit() method trains your model by transforming the dataset and applying the training.

Return the trained model with the following line of code in the Train() method:C#Copy

return model;

Evaluate the model

Next, evaluate your model performance with your test data for quality assurance and validation. Create the Evaluate() method, just after Train(), with the following code:C#Copy

private static void Evaluate(MLContext mlContext, ITransformer model)
{

}

The Evaluate method executes the following tasks:

  • Loads the test dataset.
  • Creates the regression evaluator.
  • Evaluates the model and creates metrics.
  • Displays the metrics.

Add a call to the new method from the Main method, right under the Train method call, using the following code:C#Copy

Evaluate(mlContext, model);

Load the test dataset using the LoadFromTextFile() method. Evaluate the model using this dataset as a quality check by adding the following code in the Evaluate method:C#Copy

IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(_testDataPath, hasHeader: true, separatorChar: ',');

Next, transform the Test data by adding the following code to EvaluateModel():C#Copy

var predictions = model.Transform(dataView);

The Transform() method makes predictions for the test dataset input rows.

The RegressionContext.Evaluate method computes the quality metrics for the PredictionModel using the specified dataset. It returns a RegressionMetrics object that contains the overall metrics computed by regression evaluators.

To display these to determine the quality of the model, you need to get the metrics first. Add the following code as the next line in the Evaluate method:C#Copy

var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

Once you have the prediction set, the Evaluate() method assesses the model, which compares the predicted values with the actual Labels in the test dataset and returns metrics on how the model is performing.

Add the following code to evaluate the model and produce the evaluation metrics:C#Copy

Console.WriteLine();
Console.WriteLine($"*************************************************");
Console.WriteLine($"*       Model quality metrics evaluation         ");
Console.WriteLine($"*------------------------------------------------");

RSquared is another evaluation metric of the regression models. RSquared takes values between 0 and 1. The closer its value is to 1, the better the model is. Add the following code into the Evaluatemethod to display the RSquared value:C#Copy

Console.WriteLine($"*       RSquared Score:      {metrics.RSquared:0.##}");

RMS is one of the evaluation metrics of the regression model. The lower it is, the better the model is. Add the following code into the Evaluate method to display the RMS value:C#Copy

Console.WriteLine($"*       Root Mean Squared Error:      {metrics.RootMeanSquaredError:#.##}");

Use the model for predictions

Create the TestSinglePrediction method, just after the Evaluate method, using the following code:C#Copy

private static void TestSinglePrediction(MLContext mlContext, ITransformer model)
{

}

The TestSinglePrediction method executes the following tasks:

  • Creates a single comment of test data.
  • Predicts fare amount based on test data.
  • Combines test data and predictions for reporting.
  • Displays the predicted results.

Add a call to the new method from the Main method, right under the Evaluate method call, using the following code:C#Copy

TestSinglePrediction(mlContext, model);

Use the PredictionEngine to predict the fare by adding the following code to TestSinglePrediction():C#Copy

var predictionFunction = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(model);

The PredictionEngine class is a convenience API, which allows you to pass a single instance of data and then perform a prediction on it.

This tutorial uses one test trip within this class. Later you can add other scenarios to experiment with the model. Add a trip to test the trained model’s prediction of cost in the TestSinglePrediction()method by creating an instance of TaxiTrip:C#Copy

var taxiTripSample = new TaxiTrip()
{
    VendorId = "VTS",
    RateCode = "1",
    PassengerCount = 1,
    TripTime = 1140,
    TripDistance = 3.75f,
    PaymentType = "CRD",
    FareAmount = 0 // To predict. Actual/Observed = 15.5
};

Next, predict the fare based on a single instance of the taxi trip data and pass it to the PredictionEngine by adding the following as the next lines of code in the TestSinglePrediction()method:C#Copy

var prediction = predictionFunction.Predict(taxiTripSample);

The Predict() function makes a prediction on a single instance of data.

To display the predicted fare of the specified trip, add the following code into the TestSinglePrediction method:C#Copy

Console.WriteLine($"**********************************************************************");
Console.WriteLine($"Predicted fare: {prediction.FareAmount:0.####}, actual fare: 15.5");
Console.WriteLine($"**********************************************************************");

Run the program to see the predicted taxi fare for your test case.

Congratulations! You’ve now successfully built a machine learning model for predicting taxi trip fares, evaluated its accuracy, and used it to make predictions. You can find the source code for this tutorial at the dotnet/samples GitHub repository.

github code: click here.

One thought on “Predict prices using regression with ML.NET

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s