VermillionAzure

Programming and things

Things They Won’t Teach You In School — August 1, 2017

Things They Won’t Teach You In School

Disclaimer: at least at my school. And in years of lurking the Internet.

#1 – Libraries (and making them too)

Most people will NOT teach you libraries. Anything from Qt in C++ to Django in Python is going to be less likely than assignments to roll your own. Of course, some fields, like graphics and compilers will use more standard tools and libraries (OpenGL, ANTLR/yacc/bison).

Now, making your own library? I’ve rarely seen that being emphasized in discussion on the Internet but you will surely write code that will be used by somebody else. In a way, many people make libraries for each other everyday — it’s just not the primary idea.

How do I learn it?

Create your own. Look at others. Documentation is about 50% of using a library, so writing it should be 50% of making a library. Look into your language documentation libraries, like Doxygen for C++ and Javadoc for Java.

If you think about it, operating systems are like the O.G. libraries. They’re just such a part of the system that they have their own club now.

#2 – Build Systems

I find it strange how people do not cover build systems as a first-class…. well… class (double pun). It should deserve it’s own semester class, in my opinion. How many times do you use a build system and you are required to understand your dependencies and how they fit together? In my short experience in working, build systems are everything, from the compilation and customization process all the way up to deployment and support.

How do I learn it?

C++ has no good universal build system. The closest one is CMake or Visual Studio. Solutions like Meson are up-and-coming, but often have larger dependencies like Python.

Java is probably the best example of having build systems with Maven and Gradle. It’s easier to integrate with libraries since repositories are well-established and JVM is a cross-platform blessing.

Start with a continuous integration tool and actually use it — TravisCI, Circle CI, and Jenkins are popular for open-source.

 

#3 – Rigorous Testing and Validation

Testing and validation may be covered, but people will not usually go into the differences between dumb mocks and spies and why you should care. It appears that testing is often a big part of the deployment process for larger companies. Additionally, static analysis may be used at more careful organizations in aerospace and hardware work. The balance between the perfect ideals of correctness and soundness and “just work, damn you!” is something that you’re not often to get formally trained in.

How do I learn it?

Learn the difference between unit and integration tests, and if you’re headed for an OOP place, maybe they’ll be using something like Google Mock/Google Test or EasyMock/Junit.

Explore standard debugging tools and some basic static analysis tools. For starters, try Cobertura and FindBugs in Java. If you’re really interested, learn a little more about your language’s type system and the benefits and downsides it has.

#4 – Critical Diagnosis and Debugging

Nobody teaches you to how to diagnose a bug definitively because we’re not there yet. The tools vary too much across industries — heck, even debugging formats are not standardized across platforms, and if they are, they’re not de facto standards.

For all our formal knowledge about static analysis and correctness in the years of academic research, it appears that rigor has not bled into the blood of common CS discussion. Somewhere, people are using FindBugs at top companies, and people are not being taught the working and practical use of it.

How do I learn it?

No idea, but algorithms and experience just seem to help. Tools can help with diagnosis. Tests, static analysis are a part of the process.

 

 

 

 

Advertisements
Doing Scheme – The Big Lie — May 18, 2017

Doing Scheme – The Big Lie

This semester, I started and led a team of 10+ people to complete a basic Scheme interpreter with C++ — it’s up on GitHub as Shaka Scheme, and is still missing tail-call optimization, macros, and continuations. It’s also pretty raw, so brace yourself.

People say Lisp is so easy to implement. So easy.” Sure, the idea that “it’s easy” is not up-front and everywhere, but it’s told in whispers around the interwebs:

It’s just SO easy to write a Lisp interpreter!

Actually, I agree. Lisp, in its original form, was extremely simple — it was designed to manipulate what are basically linked lists and do it dynamically, and do it well.

It’s elegant:

  • It has a simple grammar. It’s context-free, and thus, we can use the various context-free parsing algorithms — the particular ones of note are the LL(k) recursive descent (currently used by GCC and LLVM for parsing C++) and LR parsers, which things like yacc and bison generate.
  • Homoiconity is useful. Since everything is a list and all code is basically in the form of S-expressions (lists), everything can be manipulated super easily.
  • Macros are super powerful. Lisp macros work on expressions instead of tokens like C preprocessor macros. This means macros are powerful enough to generate ANY type of code.
  • The implementation is easy. The above examples are a pretty clear example that Lisp can be very minimal.

So yes. Lisp is easy, and fun.

Scheme is not. And this is “The Big Lie” — Scheme is a Lisp, but Scheme is definitely NOT Lisp.

Why is Scheme harder than Lisp to make work?

Scheme is a modern Lisp dialect, first created at MIT, and now on its seventh iteration with the Revised (to the 7th power) Report on the Algorithmic Language Scheme specification (R7RS).

Even though Scheme has a reputation for being minimalist, it’s implementation is Lisp, but on steroids:

  • Macros are not so simple. R7RS Scheme has template macros, which are declarative rather than procedural — in other words, it has its own pattern language that it matches expressions to (similar to Haskell’s pattern matching) and then has its own template expression language as well with certain special semantics.
  • First-class continuations are time-machines. They require you to keep an explicit control stack and have a way to save and restore the state of the control stack. This means the evaluation system must have a way of expressing it as an explicit, reified item. No implicit recursion variables — it must ALL be saved.
  • Circular lists are valid in Scheme. We have datum recursive list notation to thank for that. It’s not covered as a section in R7RS, and, like any details on garbage collection or memory management algorithms, the document is almost silent on the matter itself.

Meta-circular evaluators should go away if you’re doing Scheme — essentially, you’ll have to start playing with an intermediate form like continuation passing style (CPS) in order to solve the continuation problem or have reified continuations through exposing it directly in your evaluation system, and you’ll have to start playing with statically-analyzed scopes to deal with macros.

At this point, your evaluator will probably start looking more like a compiler. Hence, this is why implementations like Chez and Guile compile to native or bytecode already.

Are you sure?

There’s been ways of getting around the two, but according to R. Kent Dybvig (who is a prolific writer on Scheme and has been heavily involved with Chez Scheme), he thinks that meta-circular evaluators have to go:

An iterative approach is necessary because the more straightforward recursive approach of the meta-circular interpreter presented in Chapter 2 cannot properly support continuations or tail calls. … Tail calls cannot be properly supported unless the implementation (at the meta level) supports them properly.

So the “circular” part is sort of problematic — the information required to support continuations should be explicit and not implicit as with recursion in order to make sure that it can be saved and restored as data.

So what?

When people say “Lisp,” what comes to mind? Is it Clojure? Scheme? Common Lisp? I think a lot more about what type of Lisp that people want to talk about because they look the same, but all come out differently in terms of semantics, and even syntax sometimes.

I was fooled by the misnomer that “Lisp interpreters are easy” meant that it still applied to modern Lisps (particularly, Scheme, since its reputation as “the simple modern Lisp” is kind of there). I’m pretty sure that when people say “Lisp,” it’s closer to the original McCarthy Lisp instead of modern Scheme or modern Common Lisp.

Food for thought.

But that Ruby one supports continuations…

Ruby already has it with its own callcc.

 

Q: Why is Racket’s parsing system so complicated (Scheme)? — April 20, 2017

Q: Why is Racket’s parsing system so complicated (Scheme)?

A: Circa 2016, Racket’s internal syntax representation system had to be revised because of macros, or “code rewriting” or “code-to-code procedures.” Matthew Flatt can explain in his paper, “Binding as Sets of Scopes” in POPL 2016, and also on his online notes on the new macro expander system for Racket (also formerly PLT Scheme).

Q: But who is Matthew Flatt?

A: Matthew Flatt is the most active contributor to Racket as measured by GitHub’s commit graph, circa 2017. He is also a member of the PLT Racket or Racket contributor group, and a faculty member at the University of Utah.

Q: Okay, so what?

A: Racket’s macro system recently got a rehaul to implement a new syntax system so that it keeps track of scopes. This is effectively so that they can implement hygienic macros more easily (e.g. syntax-case). Essentially, every single identifier can be associated with a scope given that the primitive expressions are indeed the primitive expressions.

In Scheme, even your keywords could be redefined. — April 13, 2017

In Scheme, even your keywords could be redefined.

Over the past few months, I’ve started the shaka-scheme project for my senior project at the University of Hawaii at Manoa, and I wanted to highlight two pieces of code that recently changed our entire course of development:

(define proc1 (lambda ()
  (define define 1)
  (display "define is actually ")
  ; This doesn't display a procedure literal...
  (display define)
  (display " in this scope\n")))
(proc1)

(define proc2 (lambda ()
  (define-syntax define
    (syntax-rules ()
      [(define a b) (display "fooled you\n")]))
  ; This doesn't do the normal define...
  ; And it's not an error.
  (define 1 2)))
(proc2)

In Scheme, you can redefine what are called primitive expression symbols inside of non-global scopes, such as within a lambda expression. If you run this using GNU Guile or Racket, you’ll find that this code runs flawlessly — it prints out:

guile-redefine-define

So what?

On the Shaka Scheme project, we decided to go with a tree-based approach — essentially, we would build an AST in-memory during parsing, and then evaluate the AST through tree traversal.

Untitled Diagram.png
A conceptual diagram of our tree-based AST.

This seemed like a relatively simple way of doing things. We would adopt a supposedly more efficient primitive (the IDataNode tree with children stored in a std::vector), and we would also get a nice AST if we ever wanted to debug in the future.

Unfortunately, the design called for strict primitive expression forms that assumed that (define a b) was the only valid form — because why wouldn’t it be? Of course, that all changed when you realize that (define a b c) could be valid in a certain context.

In that instant, the “fixed” structure of the various AST specifications we were using to represent the parsed forms of the various Scheme expressions basically broke. In addition, the fixed “parsing” grammar defined for the top-level <expression> rule can no longer look ahead for (define and correctly assume that the rest of the <define> rule will look like: ( define <identifier> <expression> ) as define itself could possibly be a completely different procedure… or even just 1.

The solution

The solution is to simply redo the entire evaluation and parsing specification to make NO ASSUMPTIONS.

Scheme, like most other Lisps, are notorious for being runtime-heavy in the sense that figuring out the types of things and the bindings of identifiers is harder to do statically without going down the rabbit hole of more rigorous analysis.This rings especially true here, as almost any primitive expression keyword can be a complete different procedure given the right context and redefinitions. Therefore, all parsing must be agnostic to the idea that define means only the usual define procedure, and every single expression is simply a list waiting to be evaluated (unless, of course, you can prove that it does represent the usual define expression in that context).

Second of all, Scheme lists may not be more elegant when stored as in-memory trees where the list is represented as a root with its children as the elements. The reason being is that car  and cdr may require taking slides of an array or vector of children, and keeping track of the size of slices of a vector is somewhat more cumbersome than simply using pointer access to check of the existence of the next element. Why keep track of slices or trees if one can simply do the usual, canonical representation of Scheme lists as linked lists?

Third of all, Scheme R7RS macros become significantly more complicated without the uniform structure of “everything is a list,” as traversal over non-uniform trees makes parsing on macro-derived expression types becomes harder to reason about.

For example, parsing (let ((x 1) (y 2)) (+ x y)) (which is a classic macro-implemented or derived expression type, according to R7RS) requires knowledge that

  • let is a macro
  • ((x 1) (y 2)) is not a procedure call, but a list of syntax primitives that will be manipulated by the macro

How is one to decide these things before actually evaluating the value of the let identifier in the current environment? There is no answer without more rigorous static analysis of the scope in question.

Remarks

I personally chose Scheme as “the easy” language to implement as a semester-long “programming language-focused” project in C++. Unfortunately, the extremely dynamic nature of Scheme (let’s not get started on continuations…) means that implementing its finer parts is definitely not a trivial task.

Lisp is an especially interesting case for evaluation, because so little things are known at compile-time. The extremely flexible semantics of Scheme gives its implementation a sort of hardness that surpasses that of something like a MIPS processor simulator.

If you’re still not convinced of the non-toyness of Scheme, why not read some of the latest Scheme standard, R7RS?Scheme is a relatively full-featured language compared to the Lisps of old, and sports the following features:

  • Continuations with call/cc
  • Expression-based macros with the define-syntax/syntax-rules system (standing apart from the R6RS syntax-case macro system and the classic defmacro system)
    • Note that C has token-based macros with its preprocessor. Expression-based macros are more powerful.
  • A full-featured numeric system, with support for true fractional/rational types, inexact/exact number differentiations, and complex numbers.
  • Optional lazy evaluation
  • Records (also known as product types and comparable to C’s struct)
  • Strings, with some optional Unicode support
  • Bytevectors and vectors, which represent linear data structures with fixed-size
  • Exceptions
  • Library/Module system

 

C++: Template duck-typing in 30 lines — March 10, 2017

C++: Template duck-typing in 30 lines

#include 

struct Duck {
    void quack(std::ostream& out) {
        out << "Quack!" << std::endl;
    }
};

struct Platypus {
    void quack(std::ostream& out) {
        int* i = NULL;
        out << *i << std::endl;
    }
};

template 
void do_quack(std::ostream& out,
        MaybeDuck duck) {

    duck.quack(out);
}

int main() {
    Duck duck;
    Platypus platypus;

    do_quack(std::cout, duck);
    do_quack(std::cout, platypus);
}

C++ has compile-time duck typing

(See it live on Coliru)

The C++ compiler doesn’t actually do much semantic checks in terms of whether it can “figure out” whether the type you give a template is correct.

All it does is kind of “copy-and-paste” the type you give it into the placeholder type, and sees if it compiles correctly.

Here, our two classes will both quack(), but will not necessarily do the same
thing. One will print; the other will most likely give a segmentation fault, because you’re trying to dereference a  NULL pointer.

In other words, C++ does not do special semantic checking on templates unless you do it yourself.

Hence, this program crashes.

So what’s the solution?

There are many.

Recursion is NOT recursion. It’s implicit looping for repetition. — February 26, 2017

Recursion is NOT recursion. It’s implicit looping for repetition.

I’m tired of seeing THIS as an example of recursion for new programmers.

Suppose you’re talking to a 5 year old. Let’s call him Faive.

  • “What’s recursion, Vermie?”
  • “Okay, Faive. What if I told you that if you want to understand recursion, you first need to understand…”
  • “What is it?!”
  • “-recursion!”
  • “That’s dumb.”
  • “Well, recursion requires using itself.”
  • “But then what is recursion?”
  • “Recursion!”
  • “…Oh! I get it! Recursion is when something loops forever! Thanks, Vermie!”
  • …and then Faive walks away. Faive is now misinformed.

We don’t want to give Faive and other people who are new to recursion a wrong impression.

How to understand recursion

There are 3 aspects:

  • Repetition
  • Continuation
  • Iteration vs. Recursion

Repetition

Let’s use an example: grocery shopping.

You want to buy 3 items: milk, bread, and a knife.

You have a simply buy procedure:

define buy = procedure(item):
   go to store;
   buy item;
   return home;

So if you have the 3 items: milk, bread, knife. So you have a list of procedures to do:

buy(milk);
buy(bread);
buy(knife);

Simply put, you have repeated buy over a set of items: milk, bread, knife. Simple. You have performed repetition in buying 3 different items.

Continuation

Suppose you want a way to express a procedure to take a list of items, and then buy them. Then you might say:

define shopping = procedure(items):
   for each (i) in (items):
       buy(i);

When you would carry out this computation, it might look something like this:

shopping({milk, bread, knife})

which would turn into:

for each (i) in ({milk, bread, knife}):
   buy(i);

and then:

buy(milk); then
buy(bread); then
buy(knife);

At each step, there is a next step: then: buy(next_item). This is what you call a continuation of a current step in a procedure: the next procedure after you’re done with stuff in the current procedure, as well as the context you’re going to do it in (e.g. after I’ve done buy(milk)

Usually, continuations are handled automatically or “invisibly” by higher-level control features. However, iteration and recursion both use them differently.

Iteration vs. Recursion

There are two primary ways of expressing repetition: iteration and recursion. I also like to call them explicit and implicit repetition (or looping).

Iteration

Iteration tells you:

  • What to start with
  • When to end
  • How to go to the next element from the start

For example, in C, there exists a for loop construct that expresses these three steps explicitly. Here is “semi-C” pseudocode:

for (
   start: item i = first_of(items); // start condition
   end: i is at END of (items);     // end condition
   next: i becomes the next item;   // next step setup
) {
   do something;
}

If we apply this to shopping({milk, bread, knife}), and expand the list of steps, this will become:

start: item i = milk;
buy(milk);
next: i = bread;
buy(bread);
next: i = knife;
buy(knife);
next: i = END;
stop: because end condition met;

Notice: there are “helper” steps to help define the repetition and the flow of execution. You can see that the list of steps is done for each item, and has clear boundaries for the start and end of the procedure.

The pattern of repetition is uniform and set:

  • Before each valid iteration, the end condition must not be met
  • The current list of iteration steps are done
  • The next action to get the next item to be done is performed

Recursion

Recursion gives you:

  • A rule: if some condition, then do something and then perform the continuation.
  • Other rules in terms of a rule.

Recursion’s repetition is implicit because some rules may lead back into the greater list of rules as a part of its flow of execution. Thus, recursion(x) may often make use of some other case of recursion(y).

In a way, recursion abuse that that the continuation of a procedure can be itself, making termination entirely dependent on whether:

  • There exists a rule to stop the computation
  • Whether other rules will always lead to a rule that stops the computation.

For example, in Scheme (Lisp), it is common to see recursion as a pattern. Here is some pseudocode for recursion:

recursion(x):
   if (condition1(x)) { then do something; then stop; }
   if (condition2(x)) { then recursion(x); }

Recursion does not give a special place to mark a start condition or a stop condition. It simply consists of a list of rules to decide what to do next. In other words, a recursive procedure defines what to do, and then what should be the continuation.

So, if shopping({milk, bread, knife}) were written in a recursive fashion:

define shopping(items):
   if (first(items) != END) {
       then buy(first(items));
       then continue with: shopping(rest(items))
   }
   if (first(items) == END) {
       then stop: because end condition met;
   }

Notice that there are two helper functions: first and rest. Here, we can think of items as an ordered list instead of just an unordered set of items. We take a look at the front of the list, perform an action, and then either continue to the next item or stop if there is no more items.

Why recursion can be harder and easier

Recursion can be harder because:

  • It requires the designer to understand whether they have covered all the cases
  • It is possible to make the control flow such that the recursion never ends.
  • Identifying the start and end conditions and how to phrase them can be harder.
  • Certain patterns of recursion can be harder to recognize and optimize.

Recursion can be easier because:

  • Recursion is suited for recursive data structures (such as trees and graphs)
  • Expressing some algorithms that are rule-driven are easier
  • It’s harder to get confused if you focus on simple rules to drive continuations versus creating a larger iterative piece of code that must keep track of many different intermediate variables.

Why iteration can be easier and harder

Iteration can be easier because:

  • People are sometime more naturally suited to think in a step-by-step manner.
  • It’s often easier to figure out the start, end, and continuation parts of the procedure.
  • There is often less need to repetitively go back through the list of rules when analyzing recursion

Iteration can be harder because:

  • Algorithms that are written step by step are often longer, and thus, probably take longer to read and analyze
  • Turning a recursive data structure into a view that can be used with iteration might take some extra work
  • Refactoring iterative code to handle additional cases may be bulky or burdensome, depending on language.

And so I hope this helps people explain recursion!

Is intuition in programming important to you? — November 9, 2016

Is intuition in programming important to you?

Note: Read only the bold statements if you want to skim this quickly.

Do you remember the first time you saw classes? Objects? The word “string?” What about binary trees, regular languages, compilation of languages, and recursion?

I don’t realize it myself a lot of times, but I’ve come a long way in changing my mindset to handle the terminology of programming ideas and make them seem natural. I think others are the same.

Do you remember the day when you didn’t understand anything of programming terminology?

Today, hundreds or thousands of people are in that same situation. Today, many people do not know what a “segmentation fault” is or why their program is crashing. They don’t know why a reference for WinMain@16 is missing from their MinGW program that makes use of wxWidgets when first tangling with Code::Blocks, and they certainly won’t know how monads relate to endofunctors.

Do you remember how you first grasped your first to most recent ideas of programming?

I remember my personal experience: it was late at night trying to figure out why my 5th project in C++ was not compiling and I began helplessly throwing questions at Google and Stack Overfow.

I had no idea what I was doing, and — who’s to blame? — I just didn’t know what I was doing.

I remember that my account was question-banned on Stack Overflow for trying to ask questions as an independent learner, one or two semesters before I would ever encounter my first C++ program in college.

Of course, Stack Overflow has a strict policy on repeating questions that have already been answered, and an unwritten, unspoken policy on downvoting really basic questions — which I quickly learned after the fact.

But after three months of hitting roadblocks in compilation and unable to pass barriers of discipline and knowledge required to go directly to documentation, I found myself on top of a basic GUI system implemented on-top of SDL with Lua scripting on the side.

Learning can be a frustrating and humiliating experience; it can also be thrilling and easy given the right mindset and the right resources. To me, learning is not about wading through descriptions of mind-constructs and precise, compiler-like terminology to simply parse the mechanisms by which we build, arduously and untiringly — we make and mutate what is.

It’s not about the methods by which we gather information — we do not learn by reading documentation, but, rather, by the methods by which we internalize, understand, and distill such knowledge.

Learning is compressing methods and ideas intuitively so that they can be reconstructed and utilized at will.

Who is aiming to precisely teach “how to compress methods and ideas intuitively so that they can be reconstructed and utilized at will” for basic programming concepts? For searching and understanding basic programming terminology?

Opponents of doing such might give the arguments that “we don’t baby people around” and “people are capable of Googling the most basic of concepts,” but I don’t blame a baby for not being able to walk when all he or she can do is crawl.

If you remember the time that you searched “What is the best programming language?” on the Internet, you may remember that the best answer that  was given was no answer at all. People even think that asking such as question is wrong (and “wrong questions” do exist, in a sense).

I tried to write a tutorial myself. I worked on a C++ tutorial blog a few months back. And it’s horrible. It’s a mish-mash of concepts that are out-of-scope, poorly explained and written, and are not focusing on compression of knowledge into intuition. There is no notion of audience or aim; just a spewing of bits of knowledge that fail to present a coherent whole.

I now think that it’s not about just simplicity or a short word count, but teaching concepts so that they become intuitive to the receiver.

Many people want to learn programming today. Today, many people are starting at ground zero, with no equipment to learn for themselves. And today, you are the more experienced.

I’m writing this because I want to ask three questions and see what other people think:

  1. Is intuition in programming important for you? How would you teach correct intuition to a beginner?
  2. What concepts were important for you for sustained learning in programming?
  3. What resources do people need to map out programming for themselves and say to themselves: “This is what I know,” “This is what I don’t know,” and, “This is how I’m going to learn this?”