Productivity of UML

Found an old discussion on the MDSN site about a study on the productivity of UML, brought up by the DSM folks. You can see some of the common caveats raised in this comment by MetaCase’s Steve Kelly. Please read his points and come back here.

I actually didn’t notice it was an old thread and replied to it. Call me cheap, but I hate perfectly good arguments going to waste on a dead thread, so I am recycling my original response (now deleted) here as a blog post.

1) repeat with me, UML is not a graphical language – it has a graphical notation, but others are allowed. Criticism of UML as a whole based on the productivity issues around the graphical notation is cherry picking or (at best) a misinformed opinion. If you don’t like the default notation, create one (like we did!) to suit your taste (and it will still be UML). The specs are public, and there are good open source implementations of the metamodel, that are used by many tools.

2) you don’t need to give up on the semantics of UML to map a modeled class to multiple artifacts. That is just plain OO design mapping to real-world implementation technologies. UML is an OO language first and foremost.

3) There is no need to mix languages, UML has support for both structural and behavioral modeling (since 2002!). Action languages are not (or don’t have to be) “other languages” – but just a textual notation on top of the existing abstract syntax and semantics. That is not a marketing ploy, incorporating elements of the Shlaer-Mellor approach was just a sound strategic decision that made UML much better.

4) Annotations (or stereotypes) is an established (see C#, Java) and cost effective way of tailoring a general purpose language to one’s needs. Not everything calls for a DSL. Both approaches have pros and cons, one has to pick what is best for the situation at hand.

5) All the stories of failure or limited success with generating code from UML models I heard or read are caused by the decision of ignoring behavioral modeling in UML and doing partial code generation. That is a losing proposition, no matter the modeling language. Again, just like the notation issue, analyzing UML productivity based exclusively on those narrow minded cases is at best spreading misinformation. Kudos to MetaCase for promoting full code generation, that is the way to go. But full code generation is not an exclusivity of DSL, the Executable UML folk (and other modeling communities) have been doing it successfully for a long time as well.

Can we move away from the pissing contest between modeling approaches? That got old ages ago. There are way more commonalities than differences between DSM and executable modeling with GPLs like UML, productivity gains included. There is room for both approaches, and it would not be wise to limit oneself to one or another.

What is your opinion? Are you still using old school UML and limiting yourself to generating stubs? Why on earth haven’t you moved to the new world of executable models yet?

Email this to someoneShare on FacebookShare on LinkedInShare on Google+Tweet about this on Twitter

19 thoughts on “Productivity of UML

  1. Daniel

    April 8, 2011 at 12:39am

    UML is data-centric.
    Algorithm and Data structure are inseparable. Generated code limit the choice of algorithm.

    • rafael.chaves

      April 9, 2011 at 9:42am

      > UML is data-centric.

      @Daniel care to explain, maybe with a concrete example?

      You can specify behaviour in UML. And in the case of generated behaviour, you can drive variation based on the model.

  2. UML Guru

    April 8, 2011 at 1:21am

    I feel embarrassed because I usually say to my customer that UML is a graphical language but it seems it is only a notation today.
    For me a language is a reality when two person can talk together and understand each other. UML is therefore language but because you can not talk in UML but can just design then is UML a language or a notation ?

    • rafael.chaves

      April 9, 2011 at 9:49am

      @UML Guru UML is a language, but which admits multiple notations. But most people will think of the language in the graphical notation. Much of the value of UML is not in the notation, but in the semantics and abstract syntax.

      And UML is not only a language for people. It is a language for tools as well. And different people and use cases will have different needs. The clear separation between concrete syntax (the notation) and abstract syntax and semantics is intended to allow sharing much of the value of the language while providing the best notation for the use case at hand.

  3. Damien Cassou

    April 8, 2011 at 1:31am

    During my PhD thesis I created a domain-specific language to let architects describes architectures such as:

    context AccessLogParser as Access { … }
    context IP2Profile as Profile indexed by ip as IPAddress { … }

    context AccessingProfile as IdentifiedAccess {
    interaction {
    when provided AccessLogParser
    get IP2Profile
    always publish
    }
    }

    From such a description, a code generator produces a programming framework which includes the following abstract method declaration:

    abstract IdentifiedAccess onNewAccessLogParser(Access access, PullFromIP2ProfileCallback ip2Profile);

    This can then be implemented, in a subclass, by a Java developer like this:

    @Override
    protected IdentifiedAccess onNewAccessLogParser(Access access, PullFromIP2ProfileCallback ip2Profile) {
    Profile profile = ip2Profile.get(access.getHost_ip());
    return new IdentifiedAccess(access, profile);
    }

    Here, you can see that my approach leverages partial code generation; the developer is really guided by the generated code, i.e., the programming framework contains types dedicated to help the developer in implementing the architecture as described by the architect (e.g., PullFromIP2ProfileCallback which hides an RPC call to a remote object). Also, this approach leverages existing tools (IDEs, Java type checker, Java libraries…). For example, the developer only needs to use parameters of the abstract method and code completion to do his job. Have a look at the slides on my website if you want to know more about my approach.

    With the executable model approach, if I understand it correctly, it would not be possible to guide the developer or leverage existing general-purpose programming language tools. What do you think?

    • rafael.chaves

      April 9, 2011 at 10:08am

      @Damien

      The approach of executable modeling does not force you to model everything. You can still combine generated code with hand-written code.

      Also, whether to model or to code is not a clear cut decision. If the level of abstraction your domain requires is best provided by an ordinary programming language (and their tools), and there is no interest in having the solution be implementation technology agnostic, there is no reason for modeling. If that is the case, by all means solve it in code.

      Sometimes that is not very obvious to assess though. People will confuse “best suited for” with “can do” (all implementation languages can do anything a modeling language would be best suited for) without realizing the consequences.

      Not saying that is your case, but it is a typical reason why executable modeling will be dismissed. For your case, you need to look at several instances of the hand-written code that is extending generated code and ask yourself if there isn’t a pattern and if concerns the code deals with couldn’t be better dealt with at the model level.

  4. Andreas Leue

    April 8, 2011 at 2:39am

    Rafael, I agree very much with you.

    Just one remark concerning ways of behavioural modelling. Due to our experience with our EM/OS system, depending on the abstraction layer (CIM/PIM/ASM/PSM…) the needs for behavioural modelling are quite different.

    While in a PSM an action language with capabilities to create objects and throw exceptions might be a useful thing, in a PIM this is just not the right abstraction, except for minor usecases like expressing a validation formula (even then, it might be better to have a validation library at hand and refer to named rules).

    Sure, in reality there’s always the point where your abstraction fails and you need a fallback and hack some code into your model, but this is not 80/20, it can be reduced to 99/1.

    More details about our approach can be found e.g. in this thread http://www.modeldrivensoftware.net/forum/topics/can-you-really-use-uml-to

    Andreas

    • rafael.chaves

      April 9, 2011 at 10:19am

      @Andreas Interesting points. But I disagree with “in a PIM this is just not the right abstraction”. Could you illustrate that with an example?

      An action language can (and should) be fully platform independent. I accept that the OO paradigm (even if at a higher level of abstraction) won’t always cut it, and a DSL will be a better solution. But I think that depends on the context (domain/application/users).

  5. Damien Cassou

    April 10, 2011 at 1:54am

    @rafael, you are right, there are some patterns that could be better dealt with at a higher-level than code. However, this is not always the case. That’s why we chose this hand-written approach. Now, nothing prevents our users from using their own models to describe parts of their code and generate this missing code. For example, we experimented with Esper (http://esper.codehaus.org/) and provided a small DSL to let users describe event processing with this SQL-like DSL and get generated code automatically.

    However, I’m still not confident with the distinction between model and code. You advertise that a model is not necessarily graphical. You also say that modeling allows to be technology agnostic. What do you exactly mean by technology? If I use a programming language with a VM, I’m already platform agnostic (e.g., my Java code can run on Linux, Windows, and MacOS). Is it important that my project can run either on Java and C# (if it is what you mean by technology)?

    • rafael.chaves

      April 11, 2011 at 12:13am

      @Damien

      By technology, I mean the different instances of OS/hardware/runtimes/frameworks/implementation languages/databases/middleware we have presently today and that do more or less the same thing in their classes, their predecessors 5-10-20 years ago, and their sucessors 5-10-20 years from now.

      Why should we accept that a solution for a domain that is not technology centric (as most are not) become obsolete just because the technology it was built on is being replaced for newer ones? Or why should we not be able to move the solution across competing technologies (Java->C#, Oracle->MongoDB). Or why should we have to do a lot of work when we want to make an architectural/design improvement (even if the technology stack was not altered). Or why should the technology I want the system to run on determine how developers build the solution, or vice versa?

      Re: code vs. model, it is not really clear cut, so it is hard to draw the difference between them based on isolated aspects. Overall, here are traits that IMO make something more a modeling language than an ordinary programming language:

      the level of detail required to specify a solution is less than an ordinary implementation language typically requires
      more conducive to specifying the what instead of the how
      having transformation to other (programming) languages as a primary use case (sure, it is technically possible to execute a model natively, and one could always write a tool that translates between any language A and B)
      supports the concepts from the domain directly (if a DSL) or, if a GPL, supports high level conceptual modeling without requiring language extensions (state machines, associations, signals)

      Those are the signals I could come up with now. There are probably more.

  6. Andreas Leue

    April 11, 2011 at 3:37am

    > > “in a PIM this is just not the right abstraction”
    > Could you illustrate that with an example?

    I can try, but it’s in general hard to prove a non-existence statement with an example, so maybe we can turn the tables and you give me examples of such uses, and then I try to suggest modelling alternatives? I’d really have fun to do this.

    Nevertheless, here’s one example. “Creating an object” – where and when would you want to do this? A typical situation might be: the user wants to create an instance. We solved this as follows. First, there’s a socalled “CreateTransaction”. This is a standard (yet replaceable) building block which is parametrized by a factory. It manages the presentation of the factory, the validation, the db transaction and the presentation of the result, as well as sending various notifications. Sure, the implementation of the operations here requires code. But this CreateTransaction is not part of a user’s business domain model, it is part of a general interaction library instead. So, the second part is in the business domain model just to specify the factory, and to specify that you want this factory to be assembled with the CreateTransaction – which is both not expressed as code.

    > An action language can/should be fully platform independent.

    @should: yes, sure
    @can: I’m a bit skeptic if this does not end up like Java: starting as THE UNIVERSAL language, being finally complicated and having competitors (iAlf, Alf#, …).

    But I wouldn’t mind to have such a language, my point is more the abstraction level issue.

  7. Andreas Leue

    April 11, 2011 at 7:27am

    Martin Fowler writes:
    > But the platform independent argument has no foundation.

    Well, in this case, I disagree with Martin. His observation is only true with respect to a certain flavour of MDA. Admittedly, this flavour is widespread: use your model as a 1:1 replacement for current 3GL programming.

    And the argument is also true if you limit your scope to just the “action semantics language”, see my comment above.

    The “Hello world” argument is very telling: Hello-world-programs are usually meant to explore the possibilities of the IO-library of a programming language, but the whole concept of an IO-library is misplaced in PIMs.

    To point this out: PIM is not about IO-librabry-independence, but about IO-library-agnostics!

    Real PIMs ™ ;-) are on a different abstraction level. They are not about referring to different technical platforms in a common fashion, they are just not about technical platforms at all. Proof: ask any business domain expert about IO libraries. q.e.d. This shows: it has to (and is) possible to describe systems without referring to IO libraries.

  8. rafael.chaves

    April 11, 2011 at 8:14am

    @Andreas’previous comment:

    I am not sure why a CreateObject action cannot result in code that does everything you mentioned in your example.

    Of course, not all CreateObjects represent creating objects in the database, but there are ways of identifying that (stereotypes, conventions, etc).

  9. Andreas Leue

    April 11, 2011 at 9:22am

    It can – but I wouldn’t put one into my PIM. A CreateObject method does not belong to the domain of a business domain expert. It is already “technical” in a sense. A business domain expert does not “call” methods. When using the system, he enters data, presses buttons. When specifying his needs, all he wants to say is “I’d like to have a possibility to create xy, enter data a, b, and c (plus, needless to say, press a button)”. Therefore in a PIM it’s sufficient to say something like: this business domain class shall be creatable (and I’d like to pick from your catalogue of interaction themes the fashionable premium class varaint “Sunrise in July”).

  10. rafael.chaves

    April 11, 2011 at 11:07am

    I see your point now. But isn’t that true for classes, associations, operations, attributes etc? I’d say that if those are kosher, why aren’t the actions that manipulate them so?

    • rafael.chaves

      April 11, 2011 at 4:24pm

      Replying to myself – I guess pragmatically, I can see that the structural aspects are more easily digested by a non-technical audience than the behavioral elements.

  11. Andreas Leue

    April 12, 2011 at 6:44am

    Hm. That’s actually a good question.

    At present I have no scientific proven answer, only observations and educated guesses.

    Our experience roots in insurance, banking, logistics and trading business (inventory management). In the projects I’ve seen, the vast majority of operations is “data shuffling”, “CRUD++”, i.e. really just creating, searching, rearranging, combining, filtering, aggregating data.

    The associated operations are (more or less) straightforward, and thereby it’s not necessary to mention them explicitly in the PIM (it’s a quite different story in the PSM). There (in the PIM) the behaviour is implicitly assumed to belong to stereotypes or characterized by properties, so there’ simply no need to mention them explicitly. Even more, this behaviour cannot be captured simply as methods at the model class, since this model class maps to 10 or 20 associated technical classes, and the methods do belong there, not to the PIM-class. E.g. “create” is not a method at the PIM class, it is a method at the factory class in the PSM or below. The PIM will only contain method “snippets”, like validation formulas etc.

    I’d say (just a guess) about 78% of behaviour can be captured that way (or simply derived from strutural properties, like a tight bidirectional 1:+ composition association).

    The rest falls into basically two categories. The first category is “advanced data shuffling”. By that, I mean things like multistep-transactions (“wizards”), where things are selected, filtered, temporarily stored, rearranged etc. plus a bit calculations. But not really much. Concerning this category, our system is under development. We’re constantly *decreasing* it, looking for ways to better express it, since it “feels wrong” to describe it programmatically. Or at least factor out technical aspects. Like exception handling, since yes, things can go wrong, but business users don’t throw exceptions – only interpreters and VMs do that. So in the PIM I just want to express assertions, conditions, invariants. The model interpreter/compiler has to take care of what to do if these fail (and there are many possibilites, after all).

    Therefore, yes, at present a procedural language is still needed, but I wouldn’t seek to improve it, but to eliminate it instead. And yes, I wouldn’t mind to have a good and general action language in UML, but I wouldn’t say it too loud, since it opens the floodgates of coding, and by doing that we will never reach the “next generation language”.

    Finally, in all projects there’s a third category which will always require programming. These are core algorithms, like a simulation or optimisation or like financial math. But this category is not a problem to me, I don’t see the need to model it in UML; probably the data it operates on, but not the algorihm itself.

Comments are closed.