Source

Background: how we got the generics we have


Erasure


p. “Erasure is probably the most broadly and deeply misunderstood concept in Java.”

p. “Erasure is not specific to Java, nor to generics”

p. “This is because as we move down the stack from high-level languages to intermediate representations to native code to hardware, the type abstractions offered by the lower level are almost always simpler and weaker than those at the higher level – and rightly so.”

p. “Erasure is the technique of mapping richer types at one level to less rich types at a lower level”

p. “For example, the Java bytecode set contains instructions for moving integers values between the stack and local variable set (iload, istore)”

p. “But there are no such instructions for bytes, shorts, chars, or booleans – because these types are erased to ints by the compiler, and use the int-movement and arithmetic instructions.”

p. “it reduces the complexity of the instruction set, which in turn can improve the efficiency of the runtime.”

p. “Similarly, when compiling C to native code, both signed and unsigned ints are erased into general-purpose registers”

Homogeneous vs heterogeneous translations


p. “There are two common approaches for translating generic types in languages with parameteric polymorphism – homogeneous and heterogeneous translation.”

p. “In a homogeneous translation, a generic class Foo<T> is translated into a single artifact, such as Foo.class (and same for generic methods)”

p. “In a heterogeneous translation, each instantiation of a generic type or method (Foo<String>, Foo<Integer>) is treated as a separate entity, and generates separate artifacts.”

p. “The choice between homogeneous and heterogeneous translations involves making the sorts of tradeoffs language designers make all the time.”

Erased generics in Java


p. “Java translates generics using a homogeneous translation.”

p. “a generic type like List<String> is erased to List when generating bytecode, and type variables such as <T extends Object> are erased to the erasure of their bound (in this case, Object).”

p. “type variables are erased to their bounds, generic types are erased to their head (List<String> erases to List) as follows:”

p. “At the use site, the same thing happens: references to Box<String> are erased to Box, with a synthetic cast to String inserted at the use site.”

When there's Box<String>box = new Box<>("Hello");, String result = box.t(); is supplemented by compiler like Object temp = box.t(); String result = (String) temp;. This is called Synthetic Cast

Why? What were the alternatives?


p. “we should also ask: were we to reify that type information, what would we expect to do with it, and what are the costs associated with that?”

p. “Reflection. For some, “reified generics” merely means that you can ask a List what it is a list of”

p. “Layout or API specialization. In a language with primitive types or inline classes, it might be nice to flatten the layout of a Pair<int, int> to hold two ints, rather than two references to boxed objects.”

p. “Runtime type checking. When a client attempts to put an Integer in a List<String> (through, say, a raw List reference), which would cause heap pollution, it would be nice to catch this and fail at the point where the heap pollution would be caused, rather than (maybe) detecting it later when it hits a synthetic cast.”

p. “To understand how erasure was the sensible and pragmatic choice here, we also have to understand what the goals, priorities and constraints, and alternatives were at the time.”

Goal: Gradual migration compatibility


p. “an ambitious requirement: It must be possible to evolve an existing non-generic class to be generic in a binary-compatible and source-compatible manner.”

p. “Supporting this meant that clients and subclasses of generified classes could choose to generify immediately, later, or never, and could do so independently of what maintainers of other clients or subclasses chose to do.”

p. “Without this requirement, generifying a class would require a “flag day” where all clients and subclasses have to be at least recompiled, if not modified – all at once.”

p. “By making generification a compatible operation, the investment in that code could be retained, rather than invalidated.”

p. “The aversion to “flag days” comes from an essential aspect of Java’s design: Java is separately compiled and dynamically linked.”

p. “you can compile C against one version of D and run with a different version of D on the class path (as long as you don’t make any binary incompatible changes in D.).”

p. “The pervasive commitment to dynamic linkage is what allows us to simply drop a new JAR on the class path to update to a new version of a dependency”

p. “At the time generics were introduced into Java, there was already a lot of Java code in the world, and their classfiles were full of references to APIs like java.util.ArrayList.”

p. “One consequence of this choice, though, is that it will be an expected occurrence that a generic class will simultaneously have both generic and non-generic clients or subclasses.”

Heap pollution


p. “Erasing in this manner, and supporting interoperability between generic and non-generic clients, creates the possibility of heap pollution – that what is stored in the box has a runtime type that is not compatible with the compile-time type that was expected.”

p. “Heap pollution can come from when non-generic code uses generic classes, or when we use unchecked casts or raw types to forge a reference to a variable of the wrong generic type.”

p. “The sin in this code is the unchecked cast from Box&lt;?&gt; to Box&lt;Integer&gt;; we have to take the developer at their word that the specified box really is a Box&lt;Integer&gt;.”

p. “the heap pollution is not caught right away; only when we try to use the String that was in the box as an Integer, do we detect that something went wrong.”

p. “If a program compiles with no unchecked or raw warnings, the synthetic casts inserted by the compiler will never fail.”

p. “heap pollution can only occur when we are interoperating with non-generic code, or when we lie to the compiler.”

Context: Ecosystem of JVM implementations and languages


p. “The design choices surrounding generics were also influenced by the structure of the ecosystem of JVM implementations and of languages running on the JVM.”

p. “in fact the Java Language and the Java Virtual Machine (JVM) are separate entities, each with their own specification.”

p. “there are over 200 languages that use the JVM as compilation target, some of which have a lot in common with the Java language (e.g., Scala, Kotlin) and others which are very different languages (e.g., JRuby, Jython, Jaskell.)”

p. “Reifying generics would mean that not only would we need to enhance the language to support generics, but also the JVM.”

p. “If, for example, the interpretation of reification included type checking at runtime, would Scala (with its declaration-site generics) be happy to have the JVM enforce Java’s (invariant) generic subtyping rules?”

Erasure was the pragmatic compromise


p. “Runtime costs. A heterogeneous translation entails all sorts of runtime costs: greater static and dynamic footprint, greater class-loading costs, greater JIT costs and code cache pressure, etc.”

p. “Migration compatibility.”

p. “Runtime costs, bonus edition. If reification is interpreted as checking types at runtime (just as stores into Java’s covariant arrays are dynamically checked), this would have a significant runtime impact, as the JVM would have to perform generic subtyping checks at runtime on every field or array element store, using the language’s generic type system.”

p. “JVM ecosystem.”

p. “Delivery pragmatics.”

p. “Language ecosystem.”

p. “Users would have to deal with erasure (and therefore heap pollution) anyway.”

p. “Certain useful idioms would have been inexpressible.”

p. “The common misconception that erasure is “a dirty hack” generally stems from a lack of awareness of what the true costs of the alternative would have been, both in engineering effort, time to market, delivery risk, performance, ecosystem impact, and programmer convenience given the large volume of Java code already written and the diverse ecosystem of both JVM implementations and languages running on the JVM.”