Kaggle Home Depot competition notes: model validation

In my last post, I discussed how the features of a machine learning model could be represented as simple functions, extracting a value from an observation. This allowed us to specify a model as a simple list of features/function, defining what information we want to extract from an observation. Today, I want to go back to where we left off, and talk about model validation. Now that we have a simple way to specify models, how can we go about deciding whether they are any good?

More...

Kaggle Home Depot competition notes: features

Against my better judgment, I ended up getting roped in entering the Kaggle Home Depot Search Relevance machine learning competition. As expected, this has been a huge time sink, and a lot of fun so far. One thing I found interesting is that this time I am working with a team. Having people to discuss ideas with is awesome; it is also an interesting opportunity to observe how others approach problems, and offers a chance to contrast methods and understand better what problem they are trying to address. In that frame, I thought I would try to put together some notes on recurring patterns I seem to repeat when setting myself up for this type of problem.

More...

Converting a DSL to Executable F# Code On-the-Fly, Part 2

In our previous post, we started attacking the following problem: we want our application to take in raw strings, representing code written in our own, custom domain-specific language, and convert them on the fly to F# functions, so that our use can change the behavior of the application at run time. In our particular example, to keep simple, we are simply trying to inject arbitrary functions of the form f(x) = (1 + 2 * x) * 3, that is, functions that take in a float as input, and return a float by combining addition and multiplication.

As a first step, we created an internal representation for our functions, using F# discriminated unions to model functions as nested expressions. This internal DSL gave us a type-safe, general representation for any function we might want to handle. However, we are still left with one problem: what we want now is to convert raw strings into that form. If we manage to do that, we are done: our user can, for instance, write functions in our own language in a text file, and have the application pick that file and convert it to F# code it can run.

More...

Converting a DSL to Executable F# Code On-the-Fly, Part 1

I have had a fun problem to solve for work recently. Suppose you have an application, happily running in production. Imagine that application is computing some result, based on rules. Perhaps you are computing taxes for customers, or the cost of a type of product. Now, your end-user wants the ability to change the application behavior at run time, without having to stop the application. To spice things up a bit, the end-user is not a developer. He is not particularly interested in learning our favorite programming language, and wants to specify that function in a language close to what he speaks, with no tools beyond Notepad available.

To keep it simple, for illustrations purposes, let’s imagine that our application is simply taking a number (a float), and computing something, like f(x) = 2.0 * x + 1.0. What we want is to be able to change what function is used, and replace it with any arbitrary function f, like f(x) = x * x + 3.0, or f(x) = 42.0, without modifying the code of the application itself. In this post and the next, I’ll explain how I approached it.

More...

10 Tips for Productive F# Scripting

Scott Hanselman recently had a nice post on C# and F# REPLs, which reminded me of the time I started using F# scripts. Over time, I found out a couple of small tricks, which helped make the experience productive. I found about them mainly by accident, so I figured, let’s see if I can list them in one place! Some of these are super simple, some probably a bit obscure, but hopefully, one of them at least will make your path towards scripting nirvana an easier one…

Note: these tips are not necessarily ordered by usefulness. For that matter, there might or might not be exactly 10 of them :)

More...