Rethinking ar goals

Last winter my brain said, "I want to rewrite the Arc runtime".  I said, "ack! what?!"  :-)   It turned out to be a good stretch for me: hard, very hard, almost too hard for me, but in the end achievable.

Next I thought about what would make ar useful.  I wrote down some goals in the readme such as that ar should be "at least as good as Arc 3.1 at running a production website; thus for example you should be able to run a news.arc site on top of ar if you wanted to."

And other interests caught my attention, I mostly stopped working on ar, and ar has been languishing.

Recently I've been reading The Personal MBA and this paragraph about goals caught my eye:

For best effect, your Goals should be under your control. Goals like “losing twenty pounds” are soul crushing because they’re not directly under your control—losing weight is a result, not an effort. If your weight randomly moves up a few pounds on a given day, it’s easy to feel defeated, even though you had no choice in the matter. For best results, make your Goals actions that are within your Locus of Control (discussed later), like doing a minimum of thirty minutes of exercise every day and controlling the number of calories you consume.

and I realized that "ar should be good at running a production website" is a result.  It's not directly under my control: ar may have bugs I don't know about that would bite someone trying to run a production website.  What I can do is fix reproducible bugs reported to me.  Whether this results in a runtime ready to run a high traffic news site depends on things outside of my direct control... such as whether anyone tries to run news on ar, and if they report bugs to me.

So thinking about things that I can do (and achievable in the sense that I might actually take the time to do them), here are some ideas:

  • Clean up messes.  A bunch of things got rewritten to work with the runtime (and it's good that they now work), but a lot of it is still ugly.
  • Remove cruft and move half-finished side projects out into their own code repositories.

Thoughts?

On Modules

By "modules", I mean:

- I can use a macro from another module which expands into a helper function "bar", but there should be some way for me to have my own function named "bar" without conflict.

- Definitions should still be extendable even after being imported; so if I use defrule or extend on a "foo" that I imported from another module, it should be able to redefine the "foo" in the source module.

There are some implications following from these requirements.  "Importing" something means that we need to be creating an alias for a variable, instead of creating a new variable that contains the same value as the source variable.  We need some way of distinguishing identifiers that came from different modules, so that when a macro expands into a "bar" the compiler knows whether that's supposed to be the source bar or the target bar.

One way to do this is to enrich symbols with namespace information.  Thus identifiers passed to the macro would already be marked with the target namespace, and the macro could expand into symbols either in its own namespace or in the target namespace.  This is how Clojure works, at least as I understand it just from skimming the documentation.  (I may be wrong in the details).

Another approach is rather than passing symbols to a macro, instead pass in syntax objects which include information such as the source module, whether the identifier is a lexical variable or a global variable, as well as useful information such as the source line number.  This is Racket's approach.

The two approaches don't actually need to be mutually exclusive.  Scheme has a long tradition of making everything into disjoint types, so in Racket symbols are one type and syntax objects are a different type.  But we could probably enrich symbols with syntax information instead.  If we have syntax information, then we might as well make it available to macros: a macro might find it useful to know whether an identifier is a lexical variable or not.

A form like (let x ...) does two things: it allocates a variable, and it's setting up a "mini namespace" that says that an identifier "x" in the body will refer to that variable.  I can imagine we might be able to separate these into more primitive operations, so that perhaps we could explicitly say what we wanted "x" to refer to (which might then be an "import" of a variable from another module).

However, other things are higher priority for me.  I realize that modules are important for a lot of people, but for me personally, if there's a conflict in a macro expanding into a helper function "bar" I just rename the helper function and keep on going... it's not a big deal.  So I'll be putting modules on the back burner for now while I work on other stuff.

Modules Unsolved

In A solution for using macros with namespace modules I thought I had found a solution for using macros and modules. To have a macro expand into a helper function, I suggested having the macro expand into the function value (instead of expanding into a symbol referring to the function value).  But as rocketnia pointed out, this means that redefining or extending the function after the macro has been expanded won't affect the code using the macro: the expanded code will still be referring to the old function.

If today I wanted to use a macro "foo" that expanded into a function "bar", but I had something of my own that I wanted to call "bar" myself, I'd simply rename the macro's helper function and give it a globally unique name such as "bar-h7kM2FpZ".  This wouldn't be a completely transparent solution (if I printed the name of the helper function I'd see "bar-h7kM2FpZ"), but I'm not thinking of a situation off hand where that would bother me.  So maybe some kind of simple abbreviation system would be good enough.

Reifying Dependencies

In my previous post about unprotected variation, I explained how this style of programming led to ending up with a larger number of smaller, simpler libraries and functions than what one usually sees with more conventional programming.  Arc is an example of this style: it tries to protect very little, and the resulting code is strikingly simple.

Writing libraries by factoring code into small pieces without worrying about protecting anything has a number of ramifications:

  • Some "libraries" become very small.  For example, some of my published hacks are only one function or macro long.
  • Examples of how to do things become especially important.  If you need to do X and there's some big, monolithic library that does X, the monolithic approach may have its own problems but at least you know which library you should try to get to work.  With the small pieces approach you might be able to do X just by combining the right 4 hacks or small libraries, but knowing which 4 hacks to use (out of the thousands available) may often not be apparent without an example.

  • Tests (and perhaps examples can also serve as a form of unit tests) also become especially useful.  While being able to combine small parts yourself gives you a lot of flexibility, it would also be nice to have a way to easily tell if you've broken anything.

  •  While libraries in this style usually don't provide encapsulation themselves, we'll still need some way of encapsulating code.  If haven't done much with this yet, but I'm imagining that encapsulating code is something we'll do to the code in some way.

  •  I'm not able to articulate this very well yet, but my intuition is that the traditional method of specifying dependencies between libraries by including the typical "require" or "import" type of statement with the library is inadequate.

My most recent stab at a dependency system that works well with having a large number of small libraries or hacks is the lastest version of the hackinator.  It is currently nearly unusable, but working on it gives me some insight into what the issues are so that I can start figuring it out.

 

Unprotected Variation

Protected variation is where you allow a user to configure or customize your library while ensuring that it still works.  Protected variation is often important.  For example, if my microwave doesn't work, whose fault is it?  If the electric company had accidentally hooked up 4kV to my house and the microwave exploded when I plugged it in, it would be the electric company's fault.  If I had opened up the back of the microwave and rewired and it stopped working, it would be my fault.  If I bought a microwave, took it home, plugged it into an outlet of the correct voltage, pressed some buttons on the front, and it stopped working, it would be the manufacturer's fault.

It is critical to know whose fault it is in commercial situations.  A manufacturer may make a 10% profit on the sale of a unit, but having a unit returned costs the manufacturer 100%.  (Perhaps a little less if the unit can be refurbished and resold, and probably more when you count the time involved to process the return, the loss of customer loyalty, etc.)  Any manufacturer who didn't keep tight control on when things could be returned would quickly go out of business.

Since it's so important to know whose fault it is in commercial situations (or, to put it in a more positive way, which party is taking responsibility for making sure that something works), there's a lot of information in professional software development articles and books about how to implement protected variation.  (See The Importance of Being Closed [PDF] for one example).

Protected variation is also important in many non-commercial situations.  Even in a purely open source setting where I'm publishing my software for free with no warranty, I may still want to at least be able to explain in what situations you can expect my software to work.

However, protected variation has costs.  Adding protective layers makes the software more complicated and less flexible.

Imagine that an existing library does A, B, and C.  Along comes someone who wants the library to also do D but not B.

A typical approach if we needed protected variation would be to add a way to configure the library (such as in its constructor) where you could say whether you wanted the library to do A,B,C or A,C,D (and perhaps other alternatives as well, as needed).  This configuration layer adds complexity (I've even seen the occasional library that had more core handling the configuration than actually doing the work) and reduces flexibility.  If I decide I want A and D instead, I have more work to do to add that to the configuration, or to break encapsulation, or to rewrite the library to get at the parts I want.

An alternative when we need variation but we don't need it to be protected is to factor the library into smaller pieces, making A, B, and C available separately.  Now if I want A,B,D or A,D, I can just make use of A and B and combine them with D.  (The trade-off of course is that I need to know what I'm doing.  Since I'm now working with what would normally be considered the "internals" of the library, it's now just that much easier to mess things up).

As an example, in Arc 3.1 when the web server handles a request, it calls a function called "respond" which first constructs the "req" request object and then figures out whether its a redirect type of op or a regular op:

I wanted to have another option, to be able to have a completely raw op that didn't have any headers output for it at all.  I could have added this as to the conditional, so that the function would now check for regular ops, redirect ops, and for my kind of op.  What I did in http://awwx.ws/srv-misc1 instead was factor out the code which constructs the "req" request object:

and the "respond" function now handles the response, but has had the req object already constructed for it:

At this point I can use "defrule" or "extend" to extend respond to do my kind of op, without having to include my code in respond itself.

With protected variation, the tendency is to end up a smaller number of larger, more complicated libraries or functions.  With unprotected variation, we tend to end up with a larger number of smaller, simpler libraries and functions.

Continued in Reifying Dependencies.

Making radical language changes and having backwards compatibility too

Lisp is described as the programmable programming language: if you have a hard program to write, you can first create a language in which writing that program would be easy, and then you can easily write your program in the language you've created.

Macros are an easy way to program Lisp, but they are limited in the sense that a macro "foo" only transforms the parts of your program that you've said you want transformed by explicitly wrapping that code with "(foo ...)".

A goal of ar is make Arc even more hackable by allowing you to create arbitrary code transformations simply by loading a library.  For example, optional arguments are implemented in arc.arc by extending the compiler.

A natural concern with radically changing a language is what about backwards compatibility?  The best language for your program might be incompatible with Arc 3.1, but if you also want to use Arc 3.1 libraries such as the web server it would be a pain to have to rewrite them so that they work in your language.

This is just speculation on my part since ar isn't even running news yet... but I suspect this may not turn out to be a big problem in practice.  The "new-arc" function creates an entirely new, fresh copy of the Arc runtime and compiler.  You can go ahead and make whatever changes you want to the compiler in that copy and you won't be affecting the compiler in the original runtime or in any other copies.

(This is different from modules because importing functions from a module is essentially making a shallow copy of those identifiers... you can't go in and redefine things in the original module without affecting other code which uses the module).

While you can have as many copies of the Arc runtime as you want, they do all run within the same Racket memory space, so it's easy to share functions and data between them.  Thus you could have a standard copy of Arc running in one namespace:

and "foo" would be a reference to a function defined in another namespace which could, if you wanted, be a radically different language that the web server itself couldn't run in.

 

A solution for using macros with namespace modules

I'm excited.  Using namespaces for modules seems like a natural and simple idea, and it does work well for functions... but not very well for macros.  This morning I had an idea -- and as far as I can tell so far it seems to be working.

First some background.  What is a namespace?  When an expression such as (+ a b) is compiled, the Arc compiler first checks if a symbol is a local lexical variable, and if it isn't, generates code to get the value of the variable in the global variable namespace.

Conceptually, a namespace is just a table with variable names as keys and the variable values as values.  A namespace table can be implemented in different ways such as with a hash table, with an association list, or with a Racket namespace object.  As it happens when compiling Arc code into Racket, using a Racket namespace object for Arc's global variables is the fastest implementation to use (I imagine because the Racket folks have worked hard to optimize the Racket compiler), but in my Arc runtime project you can choose a different implementation for global variables if you want to.  There's nothing magical about a namespace: it's just a data structure mapping global variable names to variable values.

Since a namespace is conceptually just a table, it could actually appear as a table in Arc, and you could get or set the global variable "x" in a namespace using namespace!x.  I haven't implemented this yet (though it wouldn't be very hard), but I do have some functions to get and set namespace values:

What gets interesting is that we can create more than one namespace.  Arc code running in one namespace has a different set of global variables than Arc code running in a different namespace: "map" in another namespace could refer to the standard function or it could be set to something different.

In ar a new Arc namespace can be created with "new-arc".  This creates a namespace populated with Arc's primitive functions and the Arc compiler, but doesn't have arc.arc loaded yet.

We can get arc.arc loaded with aload, which gives us the usual Arc functions and macros such as map:

Once we've fetched a function out of a different namespace, we can use it like any other function:

We can also evaluate code in a different namespace using eval:

Once a function has been created in another namespace, we can "import" it by copying the value into our namespace:

which of course could be made more succinct by writing an "import" macro.

So the natural question to ask is could we use namespaces for modules?  Well, namespaces work great for creating modules for functions. Imagine that we want to import a function foo that calls a helper function bar:

If we aren't going to use bar ourselves, we don't need to import it.  foo calls bar in its own namespace, and so we can import and use foo without needing to import bar.  In fact we can use bar as a global variable in our namespace, and foo still works calling bar in its namespace:

But this idea doesn't work very well for macros.  If I define a macro foo that expands into a helper function bar, just importing foo by itself doesn't work:

The problem here is that the macro foo doesn't expand into the function bar, it expands into the symbol bar.  Thus when I expand foo in my namespace, the resulting expression tries to look up bar in my namespace.

Of course I could also import bar.  But if I have to import all of foo's prerequisites when all I want is foo, then there's not much point in having modules.  I might as well just load everything into my namespace.

One solution to this problem is to use hygienic macros. The "bar" that the macro expands into would then be a syntax object that remembers where it came from instead of being a plain symbol.  Should we be able to use hygienic macros in Arc if we want to?  Sure.  But I'd also prefer not to have to deal with hygienic macros just to be able to use modules.

Instead of having our foo macro expand into the bar symbol, we could instead have it expand into the literal bar function.  Since function values aren't considered literals by the Arc compiler like numbers or strings are, the function value needs to be quoted:

We can choose to have the Arc compiler treat function values as literals if we want to:

and now we don't need to quote the function value:

Now we can see the tradeoff involved in choosing whether to implement my macro as a hygienic macro or not.  If I had implemented foo as a hygienic macro, I would have already had to have gone to the trouble of specifying whether bar was supposed to refer to my environment or the caller's environment; but then I could have moved it into my module unchanged.  In the non-hygienic macro symbols always refer to the caller's environment, so if I want it to refer to something in my module environment instead I have to explicitly say so with a comma.

The final piece of the puzzle is what to do if bar is also a macro.  I can have my macro foo expand into the literal macro value bar by using a comma in the same way, but the Arc compiler checks if something is a macro by looking at a symbol and seeing if the symbol is a global variable that has a macro value.  Having it also check for macro values is easy to add:

Now I can use bar as a helper macro in a macro foo, and I can import foo without having to also import bar.

 

An experimental step towards Arc style macros in Racket

This implementation allows "mac" to be used in the "racket/load" language environment to define Arc-style macros.  It employs some shenanigans to get around Racket's separation of code into compile-time and run-time phases, so that the macro expansion code can use helper functions defined earlier in the code without having to put them into the comple-time phase (that is, the helper functions can be defined with a plain "define" instead having to use "define-for-syntax").

This isn't very useful yet because code defined with racket/load can't be imported from another module in Racket, which means that we couldn't define an Arc-style macro in one file and use it in another file. I imagine that it might be possible to export an Arc-style macro from a Racket module with some further shenanigans such as running all the Arc-ish run-time code in Racket's compile-time phase (ha!)... but that would be another step.

Why would being able to write Arc code in a Racket module be interesting?  It allow for the evolution of code: I could start with writing a macro in the easy way using "mac", and then, if it turned out that the macro is generally useful and I wanted to make use of Racket's features such as hygiene, source code location information, and so on, I could take the extra time and effort to write it as a Racket macro.

Here's an example of defining a "do" macro:

Note that of course I haven't implemented Arc here, just the "mac" form, so the expansion of the macro is written in Racket.

I can now use the Arc-style macro like I would a Racket macro (as long as I'm in the same file... I can't define "do" in one file and use it in another file with this implementation):

And I can have a Arc-style macro expand into a Racket macro and vice versa:

 

Choosing a default open source license

The Winner

The tr;dr summary: the shortest of the widely used open source licenses: the MIT license.

When not to choose a default license

You may, of course, have some reason for choosing a particular open source license (or the public domain, or to keep your code proprietary, etc).  You might, for example, have some goal that is better met by the wording of one license over another.  If so, you can stop reading now ^_^  This post instead looks at the question: what if you actually don't care which license you use?  Is there then some secondary reason to choose one over another?

Why have a default license at all?

Joi Ito writes eloquently about the problem of the endless proliferation of open source licenses.  Using different licenses is like throwing sand into the gears: every time someone encounters a new license they have to stop, figure out what's allowed and what's not, how a judge might interpret it, how their downstream users might be affected, and whether it's compatible or not with all the other licenses that they're trying to use.

So, all else being equal (if you don't have a reason to pick a particular license), choosing a widely used license is advantageous because it minimizes the number of licenses that people have to deal with and combine.

And, there's a caching advantage: the first time someone encounters a license, they need to read it, understand, and figure out its ramifications.  The next time they run into the same license they've already done that work for the license.

Why not make the public domain be the default?

If someone doesn't actually care which license they use, why use a license at all?  Why not just put their code in the public domain?

Back to caching again: code being put in the public domain is relatively rare, so someone encountering code said to be in the public domain will probably have to figure out what ramifications (if any) this has.  The Creative Commons also claims that few jurisdictions have a process for dedicating works to the public domain easily and reliably, so someone encountering your public domain code also needs to figure out if it was dedicated properly.

Again, this argument is only for if you don't care whether you give your code a license or put it in the public domain... if you have some reason that you want to use the public domain, I'm not arguing with you :-)

So, how to choose which widely used license to make the default?

Given that someone will have to read the license and figure out the ramifications, and again, under the assumption that there isn't an actual reason to prefer one license over another, a shorter license will at least be quicker to read, and will have fewer clauses to have to check for ramifications and compatibility with other licenses.

And, if we're making a arbitrary choice, we might as well choose a simple and arbitrary metric for making that choice. :-)

If shorter is better, why not use an even shorter license?

We're hackers, we like to make stuff, so if a short license is good... why not make it really short like: "do whatever you want with it"?

Well, now we're back to the whole issue of the proliferation of vanity licenses.  I mean, everyone who comes out with a vanity license has some reason why they think their vanity license is better.  And sure, I like short, and I like shorter even better.  But no matter how short it is, someone still has to figure out the ramifications, what it means, and whether it's compatible with other licenses or not.  To take one example, say you leave off the disclaimer of liability.  Now someone has to figure out what, if anything, that means.  Does it mean that you are taking on liability?  Or, are they taking on liability if they pass on your software to someone else?  These may turn out to be not a problem, but the issues have already been figured out for widely used licenses.

And The Winner

I counted the characters of the widely used  licenses to see which one was the shortest.  The winner (by a slim margin over the BSD license): the MIT license.

Thus the MIT license is now my default license that I use when I don't care about which license I use, which is why I released the eval.to source code under the MIT license.

 

Endlessly spinning browser page loading indicator bug fixed on eval.to

Kirubakaran fixed the endlessly spinning browser page loading indicator bug: https://github.com/awwx/evalto/pull/1

It turns out that if we make an XMLHttpRequest while the page loading indicator is currently spinning, making the request causes the indicator to continue to spin for the duration of the request.  However if we initiate the request after the spinner has stopped, the request doesn't start the spinner again.