Monday, February 16, 2009 6:20 AM bart

The M Programming Language – Part 1 – Structural Typing

Welcome to the first real post in my new series on the M Programming Language. Last time, we bootstrapped ourselves by looking at the tool spectrum available in the Oslo SDK. If you haven’t read through it yet: The M Programming Language – Part 0 – Intellipad, MrEPL, VS2008 project support and Excel integration. Today, we’ll dive into the type system of M.

 

Nominal or structural?

The first thing to realize about M is its fundamentally different type system, compared to the languages most readers will be familiar with (which I suspect to be OO languages from the curly brace family and such, like C#). The taxonomy of type systems we’re looking at right now differentiates between:

  • Nominal (sometimes referred to as nominative) type systems
  • Structural type systems

Let’s dive into both families a little bit. First, nominal type systems. What’s in a name? Nominal comes from the Latin word ‘nomen’ which stands for name. In other words, the name of a type is relevant for something. That something is about “type compatibility”. What makes it possible for you to say that a Giraffe object can be treated as an Animal, for instance in C#? Determining whether this statement is true has to be carried out by analysis of the type hierarchy, which defines relationships between types by their names. For instance, the declaration of the type Giraffe will say: “my base type is (referred to be name) Mammal”. Next, we analyze the Mammal type in a similar way and come to conclude it derives from Animal, allowing us to conclude a Giraffe instance can be treated as an Animal instance. What we’ve been analyzing here are the “is a” relationships, which are declared by names. In other words, compatibility is an explicitly stated fact and no accidental equivalence is possible. For example, instances of the types below cannot be treated as equivalent:

class Point { int X; int Y; }
class Punt { int X; int Y; }

even though they’re structurally equivalent. And that brings us seamlessly to the concept of structural type systems. In a structural type system, all that matters to make decisions about type compatibility or subtype relationships is the structure of the declared types, not the name. Let’s go straight to an example, from M this time:

type Point { X; Y; }
type Punt { X; Y; }

This time, we can say that instances of Point and Punt can be treated as equivalent in the type dimension. Even more, when you ask the system whether a 2D point can be treated as a 3D point, the subtype relationship will confirm that’s the case:

type Point2D { X; Y; }
type Point3D { X; Y; Z; }

So, we didn’t have to say something like “Point3D derives from Point3D and just add the value Z to its representation”. Clearly, structural typing is more flexible although it has the feel of “compatible by accident” to it. This also means concepts like sealing don’t work in a structural type system. We can create subtypes and supertypes of any given type just by looking at that type’s structure and providing type declarations that are either “more” or “less” than the given type. All that matters in a structural world is the “has a” relationship: as long as an object has an X and Y (with compatible types, i.e. applying the rules of structural subtyping recursively) it will be compatible with any of the above type declarations.

Actually, allow me to deviate a bit from my path here and draw a parallel with Windows PowerShell. If you’re familiar with the pipeline processor of PowerShell, you know it’s role is to flow objects (.NET, WMI, etc) through different cmdlets that act upon them. In doing so, the pipeline processor needs to determine for each object that passes through the pipeline whether or not it can act as the input of the next cmdlet. For instance, here’s the help for get-process:

image

Take a look at the part indicated in green. This is one of the crucial parts of PowerShell that makes it such a powerful and flexible environment. PowerShell isn’t pesky about types in order to determine compatibility of objects in relation to the cmdlets that act upon them. As soon as the input object has a property name called “ComputerName” that property can be bound to the ComputerName parameter of the Get-Process cmdlet. This too is all about a “has a” relationship. If PowerShell were to demand nominative-based compatibility only, you wouldn’t be able to get a list of processes without using “exactly the right type of input”, which would be far from flexible (isn’t it nice to be able to use CSV-files, output of WMI commands, rich .NET objects, XML-based data, etc as input to the command, with the only requirement of having a ComputerName property?).

Back to our main discussion thread though. One more thing about structural typing: don’t confuse it with duck typing, although (agreed) the distinction is a bit blurry. In duck typing, the “has a” relationship is exercised too, but it happens in a dynamic fashion at runtime. Structural typing doesn’t imply a dynamic runtime environment though. However, there’s another distinction that’s more relevant to point out here. Duck typing only cares about the parts of a type that are used by the program. For example, given an object (of who knows what type) you can use duck typing to access its X property without requiring a full-blown interface or base type to be used that makes the “has an X property” requirement explicit (like IHaveX). Similarly you can (optimistically, as violations will only be detected at runtime) access a hypothetical Y property. Notice you never said you wanted to treat the input object as “something compatible with, say, Point2D”. In other words: you’ve not really tried to establish some kind of type relationship.

In summary:

  • Structural typing establishes type compatibility based on the structure of a type, i.e. not based on names of types.
  • Structural typing is still statically typed.
  • Structural typing allows more flexibility for treatment of data.

It also helps to think about types as sets of possible values. In such a mindset, structural typing means that two types are considered identical when they describe the same set of possible values.

 

Built-in types

Time to take a look at M in practice. We’ll be using MrEPL for this purpose, so time to spin up Intellipad and start MrEPL. For more information on how to do this, see my previous post: The M Programming Language – Part 0 – Intellipad, MrEPL, VS2008 project support and Excel integration.

First, a short exploration of some of the built-in types. Obviously you’ll expect numbers, strings, dates and times, and such just to work. And luckily that’s the case. A few types are shown below:

image

Notice the use of the “in” keyword to check for types. I’ve shown positive cases, but obviously things like “true in Text” will produce a negative result. Just to prove it doesn’t always print true, I’ve shown how errors are handled at the bottom of the screen :-). Since we’re having relationships between types, it makes sense to have a “mother of all”, which is the Any type. In the above, I’ve been using some abstract types like Number, but concrete types like Integer32, Decimal19 and Double exist too. An exhaustive list of all the types can be found in the documentation in the Oslo SDK. I’ll cover built-in operations that act on values in a subsequent post.

Why “in”? Because types are set-oriented constructs. The mother of all types, Any, can contain any value that’s valid in the language. Subtypes like Number restrict that set of acceptable types; hence, a subtype denotes a subset. The “in” keyword reflects the set-oriented nature, because a type-check in this world corresponds to a set membership check.

One final thing to pay attention to in this sample is nullability. As you know, in the .NET Framework and the CLR nullability has been tied to the distinction between value types and reference types, reasons for which Nullable<T> was invented in the 2.0 timeframe. M is value-centric and makes nullability an orthogonal concept. Although there’s notation for nullability, similar to C#’s and VB’s, it doesn’t seem to work yet in MrEPL. In a future post, where I’ll be translating things into SQL again, I’ll dive a little deeper into nullability. For the curious: here’s the notation:

Integer?

which is shorthand for the following set-notation:

Integer | { null }

 

Defining types

Next, let’s define a custom type. Staying in the world of points:

image

Notice how the same value is a member of two different, but structurally equivalent, types. Also notice how the three-dimensional point value is accepted to be typed as a two-dimensional one (assuming we’re using the same coordinate letters though). Notice I haven’t specified types for the X and Y members yet: anything (Any) will do. To restrict this, we use the type ascription operator ‘:’ as shown below:

image

Again, think about a type as a set of possible values. Based on this, a subtype can be thought of as a subset of an existing type, and indeed one can declare such a subtype easily in M. In the sample below I’m adding a constraint to the accepted values, restricting Point values to those in the first quadrant:

image

I’ll talk about constraints again in later posts. Similarly, a subtype can have additional members as illustrated below:

image

Once more, pay attention to the membership tests and the relationship with structural typing. Let’s go one step further and try to ascribe values to the respective types and see what happens:

image

Nothing should be too surprising here. A few notes though:

  • Member access is carried out using the familiar ‘.’ operator.
  • Ascription can be used safely to make a type less specific (e.g. treating Point3D-compatible values as Point2D) but is not guaranteed to work the other way around.
  • Trying to access an undefined member produces a null value (“undefined”).

For now, this discussion should suffice for the reader to start declaring types. Next time we’ll take a closer look at other core concepts of “M”: collections and extents.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #288

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #288

# Dew Drop - February 17, 2009 | Alvin Ashcraft's Morning Dew

Tuesday, February 17, 2009 7:26 AM by Dew Drop - February 17, 2009 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - February 17, 2009 | Alvin Ashcraft's Morning Dew

# Binzy Wu &raquo; Blog Archive &raquo; [Links] Bookmarks for 2009-02-17

Tuesday, February 17, 2009 4:17 PM by Binzy Wu » Blog Archive » [Links] Bookmarks for 2009-02-17

Pingback from  Binzy Wu  &raquo; Blog Archive   &raquo; [Links] Bookmarks for 2009-02-17

# re: The M Programming Language – Part 1 – Structural Typing

Tuesday, February 17, 2009 10:13 PM by Sinix

Hello!

It seems, ability to reference to undeclared members is a big issue due to increasing cost of the maintability. There is no way to distinguish typing error from intended behavior. It increases cost of verification and makes debugging harder. I'll prefer to get exceptions, not nulls.

Implementing robust code will result in multiple checks for nulls as it was with @@Error with MS SQL. Bad enough.

As it seems M is all about simplifying data representation with ability to cast some data to any representation including constraints. Also, as it uses set-based type-matching, it could be implemented above any rdbms engine, so it wil be effective enough. However, ability to dynamically change structure of the entyty will require EAV-based storage which is ineffective in case of massive set-based operations.

How do you estimate issues related to using duck-typing insde DSL-descriptor language? One of main requirement of DSLs - to provide ability to describe specific entities with restricted set of nouns. How this will work with set-based type system and with inability to restrict definition of the new types?

Did I miss something? Correct me, please;)

# The M Programming Language (Oslo)

Monday, March 09, 2009 2:08 PM by IHateSpaghetti {code}

” Oslo ” is the codename for Microsoft’s forthcoming modeling platform that helps you build your own

# The M Programming Language – Part 1 – Structural Typing - B# .NET Blog

Saturday, March 21, 2009 1:56 PM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout