While M13 is approaching, we are planning a little ahead. This is a request for feedback on some future changes in Kotlin.
We want to deliver Kotlin 1.0 rather sooner than later, and this makes us postpone some design choices we don’t have enough confidence about. Today let’s discuss data classes.
Introduction
The concept of data classes has proven very useful when it comes to simply storing data. All you need is say:
1 2 |
data class Foo(val a: A, val b: B) |
and you get equals()/hashCode()
, toString()
, copy()
and component functions for free.
The most common use case works like a charm, but interaction of data classes with other language features may lead to surprising results.
Issues
For example, what if I want to extend a data class? What if the derived class is also a data class?
1 2 3 4 |
open data class Base(val a: A, val b: B) data class Derived(a: A, b: B, val c: C) : Base(a, b) |
Now, how does equals()
or copy()
work in Derived
? All the well-known issues arise at once:
- should an instance of
Base
be equal to an instance ofDerived
if they have the same values fora
andb
? - what about transitivity of
equals()
? - what if I copy an instance of
Derived
through a reference of typeBase
?
And what about component functions that enable multi-declarations? It seems more or less logical that c
simply becomes the third component in Derived
in this basic case:
1 2 |
val (a, b, c) = Derived(...) |
But nothing prevents us from writing something like this:
1 2 |
data class Derived(b: B, a: A, val c: C) : Base(a, b) |
Note that the parameter order is reversed: first b
, than a
. Now it’s not that clear any more. And it may get worse:
1 2 |
data class Derived(val c: C, b: B, a: A) : Base(a, b) |
Now c
comes first, and the inherited component1(): A
is simply a conflict, it is not an override, but such an overload is not legal either.
And these are only some examples, there’re many more issues, big and small.
Our strategy
On the one hand, we are not sure whether there is an elegant design for inheritance involving data classes. We have some sketches, but none of them looks promising enough.
On the other hand, we want to finalize the language design now, to be able to ship 1.0.
So, we decided to restrict data classes quite a bit to rule out all the problematic cases in 1.0, so that we can get back to them later and maybe lift some of the restrictions.
Proposed restrictions
We are going to do the following:
- allow to inherit data classes from interfaces
- forbid to inherit data classes from other classes
- forbid open data classes (i.e. other classes can not extend data classes)
- forbid inner data classes (not clear how
equals()/hashCode()
should treat the outer reference) - allow local data classes (the closure is not structured, so it’s OK for
equals()/hashCode()
to ignore it) - require
val
/var
on all primary constructor parameters for data classes - require at least one primary constructor parameter for data classes
- allow private primary constructor parameters for data classes
var
’s are as good asval
’s in all respects (they participate inequals()/hashCode()
etc)- forbid
varargs
in primary constructor parameters for data classes
Again, some of the restrictions in this list may be lifted later, but for now we don’t want to deal with these cases.
Appendix. Comparing arrays
It’s a long-standing well-known issue on the JVM: equals()
works differently for arrays and collections. Collections are compared structurally, while arrays are not, equals()
for them simply resorts to referential equality: this === other
.
Currently, Kotlin data classes are ill-behaved with respect to this issue:
- if you declare a component to be an array, it will be compared structurally,
- but if it is a multidimensional array (array of arrays), the subarrays will be compared referentially (through
equals()
on arrays), - and if the declared type of a component is
Any
orT
, but at runtime it happens to be an array,equals()
will be called too.
This behavior is inconsistent, and we decided to fix it following the path of least resistance:
- arrays are always compared using
equals()
, as all other objects
So, whenever you say
arr1 == arr2
arr in setOfArrays
DataClass(arr1) == DataClass(arr2)
- or anything else along these lines,
you get the arrays compared through equals()
, i.e. referentially.
We’d love to fix the inconsistency with collections, but the only sane way of fixing it seems to be fixing it in Java first, which is beyond anybody’s power, AFAIK
Call for feedback
Please share your opinion on the proposed changes. We are more or less sure about arrays, and pretty confident about limitations on data classes too, but it’s always a good idea to double-check with a wider range of use cases.
Thanks for your help!
LGTM!
I’d also suggest you to look at AutoValue’s README to their design principles, AutoValue brings immutable value classes to Java with help of annotation processing.
require val/var on all primary constructor parameters for data classes
Why? Normal parameters can be pretty useful. I’d love this to be kept allowed.
I’m not saying they are not useful. I’m saying that we are not ready to decide on the intricacies that they bring at the moment.
Why can’t we forbid to reorder parameters in data classes? It can be checked in compiletime.
We can. It’s a tiny part of a possible design. There are too many such parts for us to be confident about arranging them the right way under the time pressure.
One else semi-obvious decision is just skip parent constructor argemunts in child cpnstructor while data class inheritance.
Thing that frustrates me is that we loose beatiful hibernate inheritance things
“Composition” of data classes is still available right. So inheritance is not an option but composition was probably my preferred option anyway. Yes, happy with those restrictions on data classes for my use cases.
So with var’s I’d expect the hashCode() value can change. Mutable data classes are very handy so I’m happy there but that is going to catch the uninitiated with use in Set’s etc. No difference to Java here but good documentation on var’s / changing hashCode() values might be good.
My 2c says “fine by me”. For me the need/use of “structural equals for array” is extremely rare (for what it’s worth I don’t remember fussing on this in 17 years of Java coding) so for me this is fine.
Is it a priority to be able to use Kotlin 1.0 easily with JPA and Spring Boot?
I’d say yes, why?
The continuing lack of Serializable in various Kotlin classes typically used in JPA @Id fields is a major roadblock in this kind of basic usage.
We are working on this. Unfortunately, it turned out to be a lot more work than we anticipated, but we’ll finish it by 1.0
How about using Kotlin with JUnit? Not being able to declare public instance fields (correct me if I’m wrong, but I couldn’t find a way) means missing out on one of JUnit’s most important features (@Rule). To work around this, I had to introduce Java base classes for my tests. JUnit isn’t the only library with this requirement; without a way to declare public instance fields, the Java interop story isn’t complete.
This is an important case. We have partly addressed it in M13 (coming soon), and will provide some more support a little later
I’m curious. What Kotlin classes are you using in JPA @Id fields?
See this-SHOULD-work vs. master branches:
https://github.com/mikaelhg/kotlin-spring-boot-data-rest/tree/this-SHOULD-work/src/main/kotlin/io/mikael/app
vs
https://github.com/mikaelhg/kotlin-spring-boot-data-rest/tree/master/src/main/kotlin/io/mikael/app
So I had a quick look and it is not clear to me. The branches seem to be swapping between @Id of Long and Key and both should work fine and both of those types are nothing to do with Kotlin per say so I’m missing your point/issue.
Certainly there are no issues with Kotlin and Ebean ORM (which uses JPA mapping and entity bean enhancement that would be similar to Eclipselink).
It seems that you are expecting to use Kotlin data classes as JPA entity beans which is interesting. As the author of Ebean ORM I’m not going to be recommending that approach to anyone as most commonly it is good practice to use inheritance with entity beans and have a @MappedSuperclass bean with common properties such as @Id, @Version, @WhoCreated, @WhoModified, @WhenCreated, @WhenModified etc.
Data classes implement hashCode()/equals() so that could conflict with JPA vendor enhancement/weaving (when using data classes as @EmbddedId for example).
Sorry, probably no helpful comments there.
Cheers, Rob.
so basically we should avoid Arrays in data classes or provide our own equals() every time?
I’d say you should avoid arrays everywhere (including your pure Java projects) unless you are doing some low-level optimizations.
And yes, if you want structural equality for arrays anywhere (including pure Java code), you have to provide custom implementations for
equals()/hashCode()
For now this restrictions seems logical to me. Rather restrictive now and expand later then the other way around.
I believe equals (and hashcode) should work something along the lines like this
example. So a data class should only equal the exact same type, everything else should be nonsense. But you could use propertiesEquals to test properties of unequal types.
Maybe I’m too big a fan of purity, but allowing mutable properties to participate in equals/hashCode by default sounds like a heresy to me. Such a nice opportunity to shoot oneself in the foot! OTOH silently excluding them from equals/hashCode would probably also be unexpected by the code authors
This is exactly the concern we had, but simply disallowing var’s would be overly restrictive in some use cases. This is a discouraged practice, as having vars anywhere else, basically.
I too am disturbed by the prospect of mutable vars in a data class.
I was wondering if it would be possible to declare a data class as mutable or immutable and then assert on the use of mutable data classes in places whether they would cause problems, for example anywhere that depends on a stable hashCode.
I think the best we can do for you is have an opt-in inspection that would warn you on the declaration site that
var
‘s in data classes require careful treatmentI think the issue here is that we only know this (places where we want stable hashCode) based on knowing how specific implementations (HashMap, HashSet etc) actually work – hence Andrey’s answer of an inspection warning.
+1. I think every one of the data class restrictions is completely reasonable.
I throw up my hands on arrays because that’s a pretty messed up situation to begin with.
I mostly like the data annotation (I know it’s a modifier, but I’m gonna call it like that here) because it allows me to have equals/hashCode/toString methods automatically generated, I don’t care too much about the other functions, though.
I think a good way of making everyone happy would be to allow for various configurations on the data annotation, kind of like project Lombok (https://projectlombok.org/features/index.html) allows you to configure certain annotations and fine-tune what you get in Java (such as the callSuper option of the EqualsAndHashCode annotation https://projectlombok.org/features/EqualsAndHashCode.html).
The only issue with that would be that you could get pretty lengthy data class declarations when specifying multiple options, so it would be definitely great that besides that you also provide a way to create custom data annotations (that would be allowing to annotate your annotations with the data annotation or being able to alias it on a per-module basis), that way you get your own data annotations that behave exactly as you want/prefer. Something like this maybe:
data(sameClassEquals=true, transitive=true) class Foo(val a: A, var b: B)
Regarding the restriction of only being able to use val for data classes, I’m glad you will not be doing that, I definitely like freedom when modeling my classes (such as aggregates/entities when using DDD).
Also, something else I would like to add is that I would certainly love being able to annotate non-primary-constructor properties and get them included in equals/hashCode/toString calculation for the data class. Something like what I commented on this ticket: https://youtrack.jetbrains.com/issue/KT-8466
class A() {
data val a: Int
get() = 123*321 //I don't know, do something cooler here instead
}
In this particular edge case (i.e. a class with data annotations only on its properties), though, you would not get component or copy methods generated, but you would get the rest for free (which I would certainly hate having to implement manually).
Sorry, I screwed-up the text formatting but I’m unable to edit the comment.
Fine-grained annotations along the lines of what you are proposing are under consideration for future versions. Thanks!
I think it’s a good idea you are aiming for delivering Kotlin 1.0 rather sooner than later. One more year and Scala gets more and more ground and people waiting for Kotlin might get tired of waiting. Once KT-3029 is implemented real Kotlin life will start :-).
I don’t think KT-3029 will be implemented per se. Most likely, we’ll just forbid protected in interfaces on the JVM
Do you mean it won’t be implemented for Kotlin 1.0 or do you mean it won’t be implemented at all? Thanks.
Since Java does not allow
protected
in interfaces on the class file level, there’s no way to implement this properly on the JVM. So, unless we find a really clever trick, it won’t happen in any version of Kotlin/JVM.Traits may have protected vars and methods in Scala. I just tried it out to be sure. Think also Ceylon has that. So they did find some trick. I really hope Kotlin will have protected methods in traits. This is for modelling purposes very important to make sure encapsulation of the class extending the trait is not broken. Really hoping it can be done :-).
Protected vars and methods in Scala are compiled down to public methods, which is unfortunately not very “protected”.
@Alexander: Thanks for sharing this. A trait might be in a different package than the class extending it. So methods/vars in a trait have to be public. Painful situation, really ;-).
Everything in an interface must be public (private static members are allowed since Java 8), there’s no way around it on the JVM.
Those data class restrictions seem reasonable. Personally, I don’t think I’ll ever use vars on mine, but I don’t mind you leaving that possibility open.
For beginners, it may be useful to get a compiler warning if they use data classes with var properties as keys in maps or sets.
I love these restrictions, because they make data classes very easy and intuitive to use. For more complex cases, there are still regular classes.
Array handling sounds good.
We’ve got about 60 data classes in our current codebase and all conform to the proposed restrictions. I guess this is because they were created according to the “spirit” of how data classes are intended to be used. The restrictions sound reasonable to me. Looking forward to 1.0!
Final-by-default is the most inconvenient feature of Kotlin to me at the moment. Trying to pair Spring with Kotlin, I often find myself opening classes and methods as I add Spring features to allow Spring to generate proxies as necessary. Final-without-recourse really scares me.
The status quo feels like being engrossed in a great film only to see the sound man suddenly walk into the scene. It is jarring, and shatters the magic of the moment.
Is it possible to make final-by-default a compile-time only constraint? After all, in the case of a Spring RestController class that I need to open for Spring to do its magic, I have no intention of ever inheriting from the type and would be perfectly happy to have the compiler enforce this so long as the final modifier didn’t make it into the bytecodes.
I know I’m late to the party but I was on holiday and only just saw this post.
In my day job I have to use primitive arrays quite often (unfortunately). If data classes compare arrays by reference it would make them a lot less useful for cases like mine. I would have to wrap them in a type that correctly implements
hashCode()
andequals()
or manually writehashCode()
andequals()
methods for any data classes containing arrays. That would remove a lot of the benefits of using a data class.I think it would be better for data classes to use
Arrays.deepEquals()
for comparing arrays. Using reference equality will inevitably lead to subtle bugs. The caller would have to check the types of all the fields in the data class to know how equality works for that particular class.I understand that arrays aren’t very popular, and for good reason, but when you need them there is nothing else that will do. It would be a shame if data classes were broken WRT arrays when it shouldn’t be hard to support them.
I don’t think it’s a problem that the
equals()
method of a data class containing an array won’t have the same behaviour as using==
on two array references. If you’re using arrays you already need to be aware of the pitfalls and you have the option of usingArrays.deepEquals()
to get the correct behaviour. But if reference equality checking is baked into data classes there is no workaround.I can’t agree. If in a rare case you can’t use a data class and have to resort to manually implementing equals, it’s not very much a problem, IMO.
What are the downsides of supporting arrays properly in data classes?
Inconsistency and unpredictability. It’s better to learn that arrays are compared by identity everywhere, once and for all than debug weird behaviours with special-case behaviours. Scala has been there, and it’s very hard
Allowing data classes to inherit from interface with combination of non-val/var parameters – Does it allow you something more than using tag interfaces on data classes?
The reason is why I’m asking is this simle case:
interface Pet {
val name: String
}
// I'm lost here as I cannot have non-val/var in constructor and cannot override
data class Puppy(name: String) : Pet {
override val name = name
}
Did I miss something or you simply cannot make use of “reasonable” interface usage?
Oops, thank you. I tried something similar but with wrong syntax. Maybe it is worth noting in data class documentation