How do you avoid the dreaded "this is not what we asked for" and ensure customer satisfaction when building a new system?

In this webinar, Dino Esposito demonstrates a top-down methodology, sometimes mistaken for plain common sense and often boldly ignored, called UX-Driven Design (UXDD).

UXDD means coming to a visual agreement with customers by using wireframing tools to iterate on sketches of the new system before building it. Then, rather than building the system from the data model, you proceed in a top-down fashion instead. The resulting system may be slow or even inefficient but it will never be the “wrong” system! In addition, UXDD leads to clarity on a few of today’s most popular patterns that are sometimes difficult to understand like CQRS and Event Sourcing.

Watch the webinar and learn:

  • An introduction to Wireframing tools
  • Proven ways to save on post-first deployment costs
  • Insights into better customer relationships
  • Better focus on development without 'analysis paralysis'

Building Better Architecture with UX-Driven Design on Vimeo.

You can find the slide deck here: https://www.slideshare.net/sharpcrafters/building-better-architecture-with-uxdriven-design

Video Content

  1. Need for Two Architect Roles (10:12)
  2. UX-Driven Design in Three Steps (19:02)
  3. UXDD Summary (24:35)
  4. Related Terminology (32:33)
  5. Three Levels of Prototyping (37:05)
  6. Bottom-Up vs. Top-Down Approach (44:27)
  7. Q&A (53:46)

Webinar Transcript

Dino:

Hi, everybody. I'm here today to share essentially, a few ideas about how to take advantage of techniques that I collectively call UX-Driven Design, to help us building better architectures and software architectures. The first issue that everyone, every developer, every architect runs into in any software project is making sense of estimates and making sense of requirements. We all know ... We are familiar, very, very familiar with a statement like the one you can see taken from an interesting book called 'The No Estimates Book', that attempts to promote a slightly different and alternative approach to making estimations around software projects. Anyway, the quote says, "A good software project must, like a house, start on the strong foundation of good architecture and good requirements" and I guess, everyone would agree with that. But, the problem is that in software, nobody asks you to simply build a house. You know how it works.

The customer or maybe, even worse. The marketing people will approach you and try to convince you that all that the customer needs is a small and it's a very simple thing, very basic. Nothing really serious to be worried about, but soon it grows slightly bigger and more ... Requires, more importantly, a more solid foundation and this is what we call a hut. But, the hut then is not enough, because it has to be these days mobile. It has to move around, across platforms. Across devices, across whatever, so from the hut we move up to the caravan. You know how it works, in the end they ask us to build a castle. But, the problem is, we never know since the beginning where we are expected to go. All that we get from customers is words and as architects, we must make sense of those words. But, words are of different types, they belong.

They express different concepts, as far as requirements are concerned. There are conscious requirements when users express everything that is relevant, because they know all know the details, but then there are unconscious requirements as well. In which, just a few details are often reckoned so obvious that users don't mention that. They just omit it, but they are fundamental for the architect and finally, there are dreams. Everyone that has been a developer at some point or an architect at some point knows about users' dreams. Those things that they wish to have, but they could even have, but they never mentioned them. In the end, if we want to try to make more financial sense around software projects, we must in my humble opinion find out a way to improve the software development process. In other words, we need a better way to learn. Now, which way?

Here is a couple of funny laws, funny because they express a concept that as architects and developers, we are pretty much familiar with. But, the Mr Humphrey mentioned here is a scientist, so he is not a funny person. Funny, because of the effect it may have on us, but why it's serious. The Humphrey's Law mentions that the user of the software won't know what she wants, until she sees the software. I'm sure that everyone here is smiling, oh I know that. The other lemma to this Humphrey's Law is the Wegner's Lemma. An interactive system can never be fully specified, nor can it ever be fully tested. We may smile and laugh at these things, but at the end of the day they are absolutely pieces of truth, so we must cope with these two pillars and build our new way of learning about software development. Taking into account, these two facts.

There is another fact we have to think about, that drives us straight towards the foundation of this webinar. If you wait until the last minute to complete the user interface, at that point it only takes a minute and this is the foundation of the message. The foundation of the sense, of this UX-Driven Design approach. We never, ever spend enough time on the building and more importantly, on the thinking of the user interface. The main problem here is not much, the user interface intended to be a collection of graphical things. Artifacts, colors, styles, CSS. This or that, but it's just the definition of the interactions expected between the user, any user in the system and the system itself. In a way, the concept that I like to associate with the popular idea of the user interface here is close to the idea of the use case diagrams you may know from your exposure. Your past exposure to the UML modeling language.

It's fundamental to have clear, which interactions should take place between users and the system. You know that every software in this world, many great ideas have been first sketched out on paper napkins. Paper napkins or just paper and a pen. Even, in the days in which everything is done through a computer, still play a role. Because, they are extremely intuitive ways to save, jot down ideas. But, in software, jotting down ideas is great. It's good, it's still effective, but to paraphrase a popular sentence of Linus Torvalds. "Talk is cheap, just show me the product" and this is what customers want. How can we show customers the product, because talk is cheap? How can we do that, in a way that is cheap to us as well? If we could find a way to show customers in a cheap or any financially sustainable way, the kind of software we are going to write.

The kind of interface we are going to produce, we gain significantly in terms of the immediacy of the message. If there are missed points in our understanding of the system, there is a significantly higher chance that those missing points will be caught at a good time. Not to compromise significantly and to make more expensive the development. In doing so, if we focus on providing screens and wireframes, and sketches of the user interface. Whether, on paper or not, we learn about the system in a way that is strictly focused on tasks and actions, that the user has to take with our system. We learn a lot more about the processes that we are then called to implement, but we learn about those processes. Not much, in terms of the physical workflow we have to implement in the backend, but essentially and primarily from the perspective of the impact that those processes have on the way that users work with the application.

We write software, essentially to replace. To help users doing things with computers, we are not necessarily expected to change the processes the way in which users do their own things. We have to lean those processes and then, simply map those things to software artifacts we can work with. In other words, the front end of a software system and the backend must match. If I have to find an abstract reason for many software projects, to be financially not much sustainable these days. It's just because of the front end and backend that don't match the way they should.

Need for Two Architect Roles

Let me confess that I really love actually, quotes. Here is another one, this time it's taken from the Matrix, the popular movie. Who is speaking is Morpheus and this character at some point says, "Remember, all I'm offering is the truth. Nothing more." That's what I'm trying to do now. The piece of truth I want to share with you today is that, we need to have ideally two architect roles around our team. I mentioned roles and I mentioned the word 'Architect.' The first type of architect role is essentially, the software architect. The classical, canonical software architect which is facing the painful truth of user requirements. The other one is the UX architect, the new one. The new entry, who is expected to face the blissful simplicity of the user interface. The software architect collects business requirements with a purpose of building the best possible domain layer, whereas the UX architect collects usability requirements to build the best possible user experience for the presentation layer. We are talking about two different architect roles, focusing on different layers of a software system. The domain layer, the backend where we have business logic, where we have business rules implemented and the presentation layer where we have instead the face of the system shown to the face of users. We want to be as smooth, as pleasant, as comfortable as possible and these two architects will work together.

Tony: Excuse me, I have a question here.

Dino: Yeah, sure.

Tony: You are talking about two roles here, is it possible that these two roles are actually represented by one person?

Dino:

It is possible, definitely. Not coincidentally, I titled the slide two architect roles. The role is a type of work that must be done around a software project. Definitely, it can be the same person with enough skills or it could be two different people. I'm not saying that every team should hire a new person, but every team should have a person that is perfectly sensitive to the needs of the usability, as well as the needs of the business roles. Yes, to answer your question.

Tony: Okay, thank you.

Dino:

You’re welcome. The UX architect, a new role. Let's find out more about the responsibilities, so what is the job that is associated with the additional architect role in a software project? Primarily, the UX architect is responsible for something we can call the architecture of the information. The way in which information is distributed across the screens and dispatched to the values, and personas. Persona in the jargon of UX is essentially, a type of a user and user machine interaction. How we distribute information, how we organize information and how we dispatch information to users. Using, which kind of interaction that users love, so it's about how telling to users how you want to do this task. What is your natural way of performing this task and what kind of information you need, at every step of the procedure that represents the process.

This is apparently, something totally completely obvious, but in practice this is not ... This has very little to do with how we do things together, today. Today, we essentially focus too much at the beginning of our tasks in the understanding of what could be the ideal way or the cooler technology we want to use. We would like to use in a project, so we focus on how to parse this data. How to lay out our data, the data that represents and configures the state of the system. We typically, as software architects, ignore until the last minute what is the really required ideal architecture of the information for users and that information that will make the interaction between users, and machines pleasant as it should be. How can we verify the goodness of any idea, any type of user interface we may work with, we may offer?

There are usability reviews and usability reviews is a new entry. There are unit testing, which is good for validating the performance, but usability reviews are essentially the unit testing of usability and presentation layer. I'm not talking about or not necessarily and not only about automated UI tasks. That could help, but this is something that happens at the higher level of abstraction. It's about figuring out learning from users at work, with our prototypes. If they love it, if they like it. If they feel comfortable, if they find it simple. When you send an email and ask users for feedback and the answer you receive is, it's so simple. Then, you have done a great job, regardless of the fact that it was really easy or not to make it that simple. But, making it simple for users is the goal, is the absolute measurement of performance.

It's the measure of performance for good software architects these days, so what does it mean? What it could mean, evaluating the usability of a software system. Essentially, you do that looking at users while they work with the system. Even, recording them. Even, filming them and learning from the body language. There is a little bit of a cognitive sciences here, some of those principles. It may be, you or a team of experts to extract the real feedback and the way in which you interpret the body language. Could be just reading through the emails or looking into their faces. If you see them or just hiring a separate team of people, who can do the analysis for you. Another important point, another practical approach you can take is monitoring the timing of operations or at least, the operations you reckon important in the software.

This is something you can even do via software, by placing tools around your sensitive calls, that log the start and the end of operations. Then, report those times to some remote database, for further analysis. In some cases, it's just a visual kind of analyzing, in some other cases it could be delegating the task to a separate team of experts. With, psychologists and experts in the cognitive sciences. In some other cases, it can just be extra software. Logging and providing software, not measuring the performance of the software itself, but measuring the time it takes to perform certain visual operations. 

UX-Driven Design in Three Steps

These concepts altogether form something I like to call UX DD or for short, UX-Driven Design. In three steps, UX DD can be summarized as follows. Step number one, create screens, as users love them. Iterate on this process, until you reach the stage in which users really tell you they love the screens you have.

Second step, once you have finalized those screens. It means that you know at that point, with an acceptable degree of certainty what goes in and out of each screen you are expected to have. For every piece of user interface you present to users, you know exactly because you have a sketch of that screen. Exactly, the data flow, in and out what has the user has to type. Has to enter in that screen and what the user is expected to see after the operation starts from there has completed. You know exactly the input and the output of the workflow, you have to trigger on the backend to serve the input from the UI. At that point, you have the screen. You trigger a workflow that takes just the input data as users have entered that into the screen and the workflow at the end has to generate the next screen that presents exactly the information that you can learn from the screen you have. Once you have this point of contact UI and the topmost part of the backend, whatever leads underneath that level is the pure backend.

 

It just consists of essentially, attaching some code and some business logic to those workflows. It's just about filling out with logic and code, and action the workflows. These three steps ideally don't take place sequentially and not necessarily in a single cycle. Ideally, I envision UX DD to be split in two parts, so there is a sequence of strings done initially where you iterate using the classic agile feedback loop to work, and rework. Learn and rework the screens, at that stage in the first step of UX DD, you hardly write any code. You typically work with sketches and with tools that can help you, dealing with digital sketches. There is a good number and still, growing number of new tools emerging in this area. Tools for wireframing, I will be mentioning a few of them later on. The breakpoint that I put on screen here, after the create screens as users love them, that is the point in which you can sign off whatever that means with your users. We are now start coding and we code exactly to that set of screens.

Sign off means all and nothing, the term sign off in this context can be intended as the signature put on some binding contracts. If, you are for example, a software consulting company. But, it could even be just an agreement, verbal agreement with customers if customers are for example internal customers. You are essentially, a corporate developer, so I see a logical breakpoint in between the first step and the other two, versus the actual concrete implementation of this breakpoint. May be different from project to project and from team to team. After that, the second half of the UX DD methodology is about writing code. Essentially, coding the backend and also, coding the mechanics you need to have on the user interface. The second part is just coding in the regular agile feedback loop. The sign off is the guarantee that you have more than ever chances to deliver at the end of the day the software, that is really close to the real expectations.

UXDD Summary

Summary, UXDD can be described as a two phase waterfall methodology. Two phase, because there is a logical breakpoint in between the first wire frame analysis and the actual coding phase. The benefit is that, the extra steps or the step wire framing that is receiving a lot more attention today. Apparently, we spend more but the more we spend is essentially, a low cost activity. Because, we use wire frames and no code, the sign of the front end. The analysis on the front end is essentially, a low cost thing. In return for this small extra step, we have all set for a straight implementation on the backend. We have great chances to make it right, the first time. In summary, slightly longer than classic, bottom-up architecture approach. But, on the upside, nearly no post-deployment costs. Yes, no post-deployment costs and this is the real point that could make the difference.

Tony: I have one more question here about this.

Dino: Absolutely, sure. Yeah, sure.

Tony: You say there are no post-deployment costs, do you mean just costs by fixing issues which were not caused before during the development or is it also about support, which usually doesn't have to do anything with bad design or something like that?

Dino: I think, it's both things. Primarily, the cost that in my opinion and for what is my experience, because by the way I'm practicing this every day. It's just the cost of fixing things, because they don't provide exactly ... They don't let users work exactly in the way they need. It's about work taken out of support and also, work done fixing small, little things in the UI. Just add this piece of information here, can I have the list prefilled or this information sliding in at a certain point? It's both about the support and also quick maintenance. The type of quick maintenance you typically do, right after deploying it.

Tony: Okay, right.

Dino:

You're welcome. I think, it was a couple of years ago. I've seen the picture that is going to appear on screen just in a few seconds, which was Tweeted by a European MSDN account. I don't know if it was MSDN, Italy, France, Germany. I don't even remember, but it was one MSDN account who presented this picture, to say this is how you see your system has a tidy, ordered sequence of layers. The data, then on top of the data, the business logic. Finally, on top of the business logic, the presentation. You see, there is a classic bottom-up way of development, data business presentation. This is how we, as smart software architects, think that software should be done and this is instead how users see our system. A layer of interface and underneath that, under the covers. Under the surface, just like magic. The funny thing is that when this tweet was made, the sense of the tweet itself was just that of a joke.

We are so smart, that we know how to create, design systems and let's look at how dumb users can be. Because, all they are interested in is, user interface and whatever else. All the effort we put into the system to their eyes is just black magic. That's the key point, that's exactly how users see our system, but that's not how we build actually the system. Let's take another perspective on the same point. Imagine, we talk about carrots. If you are a developer, a carrot is for the most part, under the soil. Every carrot has a huge taproot and just small leaves over the soil. But, if you are a user or if you are a designer, it's quite the other way. The leaves is not simply leaves, but they are an actual tree and the taproot is whatever else that you don't see, because it's under the soil. But, the real world is neither, the world has developers see it and nor, the world has users, and designers see it. The real world shows that a carrot has a significantly long taproot and significant leaves. More importantly, the two things grow together.

At the end of the day, UX Driven Design is about mimicking, mirroring how carrot's leaves and taproot grow in the real world. This leads us straight to formulating the definition of user experience. User experience is nothing fancy, user experience is simply the experience that users go through when they interact with the application and we do want that the experience they go through is as pleasant as possible. Otherwise, we failed and look, we failed here means essentially we have spent. Our company has to spend more time, more money, money effort. More resources on fixing those things and experience shows that companies lose good money on that extra effort. There is frustration in the developers involved with that, there is frustration on the users end also because they paid for something they can hardly use. They have to adapt themselves to the thing we have delivered and nobody is happy, so it's a lose-lose kind of thing. Paying more attention, a lot more attention on user experience becomes or has more chances to become, or to really be a win-win kind of thing. We learn more, we do best and everyone is happy. 

Related Terminology

When it comes to user experience and the way we can learn our way through that, there are a few terms. A few words, a few expressions, terminology that look related and is often used interchangeably, but those terms however have some significant and specific meaning. The first term I want to call your attention on is sketch. Sketch is defined as a freehanded drawing which is primarily done with a purpose of jotting down ideas. It's the classic thing we do on the paper napkin of some cafeteria. The wireframe is another term, which essentially identifies a more precise sketch that contains more information. Precisely, it extends the concept of wireframe with information about the layout, about the navigation about the content we are going to present in each, and every screen. Third term is mockup, which is nothing than a wireframe where a lot of UI graphical details have been specified.

A mockup is a wireframe with a sample UI attached. This set, this is the real meaning and these are the three levels of wireframing activities you can find. But, be aware that depending on the context, these three terms may be used interchangeably to mean something that in the end is very close to the idea of a wireframe. Of these three terms, the central idea that is strictly related to UX DD and in general, learning about processes through the UI. The key term is wireframe. Other three terms related are proof of concept, which typically identifies a small exercise with code done to verify the truthfulness or just the viability of an assumption. Or, another scenario pretty common for proof of concept is, getting yourself started. Familiar, with a new technology, just to see if it could work and if it could be used to implement certain features of your system. Prototype is yet another term, but the prototype is not a small exercise actually.

It's a fake system that simulates the behavior of a real system, the real system to be built. When you build a prototype, you are miles away from the real system, even though the goodness of a prototype is making the fake thing look like the real one. If you have one a good prototype, the sign that proves this is when users say it's half done or it's already done. No, the hardest part when you have a good prototype in your hands is telling users it's just a fake thing. The real one is miles away. The pilot is instead a production ready system, the difference between the pilot and the real system is essentially not in the functionality, but the fact that it's tasked and put in action against a subset of the intended audience or the intended data. Anyway, proof of concept, prototypes, pilots are again, terms that depending on the jargon spoken in each company can be used interchangeably. But, again, the central term, the most important thing is prototype and prototype ties up nicely with the concept of wireframing. 

Three Levels of Prototyping

At the end, UX DD identifies three levels of prototyping. Basic understanding of functions, which is typically done via sketches. Very simple, very basic wireframes. Basic prototyping, which is when you essentially go deeper and rewrite sketches into something that is much closer to the real sequences of steps in the process. You get a lot more granular when you move from sketches to wireframes and then, sometimes this is not enough. Not always, not necessarily, you can go and talk to users, just about wirefreames. Wireframes are nice, are PDF files, are PowerPoint kind of things. But, there is no action, even the most detailed wireframe is essentially a storyboard that the users can view as a sequence of hyperlinked screens. But, there is no action, no fake data. There is nothing there that gives the sense of the real application. The term that is advanced prototyping and advanced prototyping is done with code. We are back to square one, so back to the issue that it is expensive to do that.

Tony: I actually have a question here.

Dino: Yeah?

Tony:

You say that we need to create prototypes during the UX DD process. Where are actually the savings here?

Dino:

The saving, if you ... Essentially, if you write the prototype because it's required, because users ask you to show something that looks like the real thing and this happens very often. Not always, but very often. Then, the challenge is not using deep code to build the prototype. Ideally, building the prototype without even creating a visual studio project. This opens up an entire new world of tools, that allow you to achieve the level of prototypes from wireframes. The trick, the lever that makes UX DD really affordable is using a different type of tool side-by-side, with a classic Visual Studio or IntelliJ. Whatever kind of framework you use to write your software, it's having a dual set of tools that could work together and the wireframe is created at some point with one tool. If that tool is particularly smart, you can just derive from there the prototype or if you're unlucky, or if the project is particularly large and big. Okay, prototypes is writing code, but anyway the idea. The experience proves that any effort you put up front tends to be essentially paid back, paid off by the saving at the end of the project. The work you do in advance is much easier to code to customers, that's not a secondary point.

Tony: Okay, thank you.

Dino: You’re welcome. A few products that can help you out with having some preliminary work done around wireframing and around quick prototypes. The entry level tool in this area is BALSAMIQ and BALSAMIQ is a relatively cheap piece of code, that allows you to work in a PowerPoint style. You drag and drop artifacts, graphical shapes to a drawing surface. You compose those things together, you group those things together and the most you can do is linking some of those elements to another view so that you can simulate a storyboard. The effectiveness of BALSAMIQ is that you can save those things as a PDF file, taking advantage of the hyperlinking internal feature of PDF files. All you send to users is a PDF file and you can have users to navigate through the screens, just having an idea of the way in which they will be doing the real thing. Nothing is real here, there is no data. Nothing is live, it's entirely static but it gives an idea and if you need more, there are three other tools that I'm aware of. There might be probably other tools as well in the industry out there.

AXURE, UXPIN, JUSTINMIND. These three tools are nearly doing the same and they do more than BALSAMIQ. They are also more expensive than BALSAMIQ, in terms of licenses and prices. I'm familiar particularly with AXURE and with AXURE, you can create wireframes in much the same way you do in BALSAMIQ but you can easily turn those things into code, into working prototypes that are delivered as HTML, JavaScript and CSS websites. From the tool itself, you can upload and share via a dedicated cloud the prototype for users to play with. The functions that the tool makes available allows you to have a can of data. Pre-field screens random data automatically and you can even add some quick action, essentially using JavaScript code. You don't have to be a developer to use AXURE and usually, AXURE is not used by developers. Yet, with AXURE, you can produce something that really looks like in terms of colors, in terms of layout, in terms of navigation.

In terms of real user interface, look like the real application and can even be the starting point once approved for building the real facade. The real presentation, the real front end for the real system. 

Bottom-Up vs. Top-Down Approach

To finish off, I just want to show you now the difference between the bottom-up and the top-down approach, so that you see once more from a different perspective what is going to be ideally and hopefully beneficial out of UX driven design. The bottom-up approach that we are, everyone is familiar with I guess, starts from requirements and then based on the requirements, we create a data model. On top of the data model, we build up the business logic and then, we just stick some user interface on top of the business logic. It works, where is the problem? The problem, for many years, didn't exist. The problem started existing in recent years, when probably following the introduction of the iPhone, the entire mass of users out there started changing their mindset about the way in which they interact with software. Users started hitting, rightly, the need to have data at hand.

Information at your fingertips Bill Gates used to say many years ago, but never in my opinion that concept turned into reality. What is the problem when the user interface is stick on top of that business logic, in a bottom-up manner? Not necessarily the interface of the user interface matches the interface, the bits out of the business logic, so it's a matter of facing a possible likely model mismatch. When this happens, so when the format, the shape of the data you work with in the backend is significantly different from the data expected to be used and consumed in the user interface, you need to have at some point, in some way, adapters. Adapters must be written and adapters must be tested. Adapters must be debugged and it's a cost, and not necessarily those adapters can easily retrieve the data in the aggregation level that is required by the user interface.

When this happens, you know what's the next point? The next point is that you have to spoil the purity, the beauty of your model and leaving holes here and there, for some code to connect through the layers to retrieve the data. The queries, the inner joints, the aggregations that weren't planned right from the bottom, because building from the bottom is never as optimal as doing it the other way around. Top-down, we have requirements and let's say, we start right from the user interface according to the UX DD idea. Then, once we know exactly what comes in and out of each screen we are going to have, we know all possible ways for the backend to interact with the system. We can create a business logic, the topmost part of the backend and give that a model that is cut to fit perfectly with the bits coming out of the user interface. At that point, the data model is just a detail because we create the data model.

We persist the data in the way that suits our needs. We can do that in total freedom, regardless of how we are retrieving data. We can choose, even the framework, the technology, the persistence technology that best suits. The business logic and the data model together are a single element, they are a piece of black magic. We now, in this way, are back to the ideal vision of the software the users have. Just an interface and whatever sits under the surface, under the covers of user interface is black magic. You can have this black magic kind of thing, only if you go top-down and UX driven design offers you a set of practices, techniques, suggestions on how to achieve that. To summarize, there are some other things that nicely fit in this big design, if only you take it a little bit further.

User interface, collection of screens, pins to connect the backend, the application layer and business logic built to fit with user interface. Screens and workflows, and it's clear in the interaction, the input model that flows out of the user interface and becomes the input for the application layer, and the view model that becomes the output of application, and the input of next screens we present to users. We have here, four different types of data models around. There is the persistence model, how we save data. There is the domain model, how we deal with data in terms of business processes. There is the view model, how we render out and there is the input model, how we receive data from the user interface. Domain layer, infrastructure layer leaves underneath the application layer.

The user experience we deliver is just what users want and we can happily, and comfortably build the backend that is cut to fit, to support just the user experience that users want. There is one more benefit here, CQRS. When you learn about a system, in terms of tasks. Every task tends to be either a command that alters the state of the system or a query that reports the current state of the system. In this context, CQRS makes total sense and it's a nice fit. CQRS recommends that you split your backend into two distinct layers, two distinct stacks. Vertical stacks, the common and the queried stack. The nice thing, because they are separated and because, whatever way you choose to persist and read data doesn't change is transparent to the front-end of the system. There is no reason not to use the technology that fits best. It could be sequel, no sequel, events. Whatever, in memory caches. It can be whatever, for example: It can be an even store for the common stack and a remodel may be sequel, relational real model for the query stack.

In the end, UX-Driven Design just helps us saving cost, especially post-deployment costs in maintenance. But, it helps us to learn, through processes about the domain. Which, also makes us able to suggest new features and grow our business. But, it also connects very well and very clearly with a couple of buzzwords, which are very popular these days. CQRS and event sourcing. All this said, the best way to save money on software projects is learning as much as possible about the domain and about the user expectations. I really don't know if there is someone who said this sentence, probably yes. Probably, no I don't know, I don't even care but this represents however an absolute piece of truth. Okay, so that's all for my end. I will be happy to take a few questions from you, if any questions are.

I just want to thank PostSharp for organizing this webinar and for giving me the chance to talk about UX-Driven Design, so thank you to the people at PostSharp. Which, by the way, is a cool kind of a software that can help in some way to write software in a cost-effective manner. For those of you interested in sort of a followup or in some other resource that goes beyond the boundaries of the webinar. I have a couple of Pluralsight course. Actually one available today, which is in top hundred of Pluralsight courses. Modern Software Architecture, where I talk about domain model, CQRS and event sourcing. There is another one, more specific on UX-Driven Design, which is expected to ship pretty soon. I was hoping to have it available within the end of the year, but it will probably slip to January. It's a matter of weeks anyway, to have UX-Driven software design available in the library of Pluralsight. What else? If you don't mind, follow me on Twitter and thank you very much for your time.

Q&A

Q: Maybe the issue is to stop trying to give the customer what she wants and together, as architects, we should give the customer what she needs. Then, work to persuade her why that's better.

A: Sometimes, this works and sometimes, not. Yeah, understanding what the customer needs is the ultimate purpose of the architect but however, not necessarily our understanding. What we believe is what the customer needs is exactly what the customer needs. It's true, it's correct. However, it relates on how deep is our knowledge that's it. How we manage to understand what the customer really needs. Persuade the customer that something is best, is better? You can do that, but I have a recent experience this week which I'm trying to convince the customer, that the different architecture of information will help her to work more effectively. But, she's coming from a mindset that has taken her and all of the team in expecting data to be laid out in a certain way. No matter, the confusion that can result on screen, that's the way they love it. There's no way to persuade these people, so if you, definitely. Otherwise, I don't know. It depends, it usually depends.

Q: I'm not sure the front-end and backend should match, surely the reason why for each is very different. The UX is contextual and should be tuned to the needs of the now, whereas the backend should be focused on stability. Extensibility and future proofing the data. The front-end should be specific and the backend, generic.

A: In a way, yes. This fits nicely with some of the concepts borrowed from domain-driven design and in particular, the layer. The architecture and the fact that the business logic, the domain logic, so the part of the business logic that never changes that is not dependent on processes. That should be isolated from persistence and from processes. Yeah, exactly. In this context, the business rules go into a layer isolated, which in domain-driven design is called the domain logic. That is that persisted to typical databases, but it's also injected into something that is called application layer which is the point of contact. The facade, the proxy in between the front-end, the presentation and the domain logic. Out of the domain logic which is static, in the sense that it's shared and common. Then, can be optimized for whatever, for business. For performance, for relatability and whatever else. It's about the processes and the processes should be taken out of the logic and made to measure, to coordinate with the needs of processes on the backend of the expectations of users. The application layer plays a key role here, because it's the facade between the front-end and the backend.

Q: What happens in the middle of the backend sprint. If you realize you are missing a screen, should you have it designed within the sprint or wait for the next one?

A: It depends on how relevant is the screen that you realize you're missing. Probably, yeah. I would try to get that fixed in the sprint you're in, so in the second sprint you're in without waiting for completion. I would be ... The purpose is delivering what they need as soon as possible, so if a screen is missed I would try to violate, in a way, the sequentiality of the two waterfall steps. Have the screen added, with all the details, in the context of the development sprint. Unless, there is some contractual issues, so if you signed off and there's a contract that financially binds you to deliver something. At that point, if you miss a screen, it's a matter of legals to lawyers or to experts, to kick in and see how the contract has to be rewritten to let you do that kind of work extra.

Q: If it's agile then how common this that we can see there is no post-deployment cost or there is no post-deployment at all in this process?

A: It was more or less the promise of agile to ... If you talk to customers during the entire cycle, then you deliver what they need at the end of the final sprint. But, actually this is not what most people are experiencing, so there are post-deployment costs. If someone doesn't experience that, okay. Good, it's great. Fantastic, but that's not ... In my experience, it's not exactly the point and that is that primary ... The major source of extra costs in the midst of the project is my experience.

Q: Is UXDD applicable for data driven apps such as most enterprise apps, like in banks?

A: It’s mostly about your interaction with users, whether consumers or corporate or whatever. I’ve seen UXDD similar things applied in a large (very large) pharma company. Missing link was that they simply split design of UX and design of code in two steps but design of code was done bottom-up. In general, I see benefits everywhere mostly because that helps taking a task-focused approach to the way you learn about the domain and processes. Particularly relevant I guess in enterprise business.

Q: What is the best system development process to use with UXDD - e.g. DSDM?

A: I see UXDD as two agile steps. And agile means what it usually means to you. But in some organizations distinct steps are problematic. In this case, I tend to see the first UXDD step (wireframes) as a sort of sprint zero. Possibly a repeated sprint zero not repeated for each sprint but more than once.

Q: How best to handle a situation where the cost to implement a screen is a very high? What if there is a better way to implement at a cheaper cost but at a sacrifice of usability.

A: As long as it keeps customers happy, any UX is fine. But users can appreciate MORE and MORE a UX that is super-simple, easy to understand, tailor-made to processes and friendly.

Q: I've heard a lot of from managers, that UI can be generated by data. So the data is essential and for hardcore enterprise it is tabular data. There are numbers of solutions for that approach. Devs modeling data, domain and then get UI for free - from the managers perspective. Any advice on how to convince those managers?

A: Unfortunately data is less and less simply tabular. And users are less and less willing to accept whatever UI a system can autogenerate. That’s precisely the mindset UXDD attempts to address. The simple answer (and most concrete) is that you save money after deployment. Money in terms of support work, fixing bugs, fixing logical bugs. If you don’t run into these problems, you’re probably OK as is.

Q: How is multiple UX Designs handled for a single data model? I mean if i have different UX for different devices rendering the same data how is that handled in UXDD?

A: Multiple frontends (like Web and mobile) talk to possibly different application layers and application layer is the topmost part of the backend. Under the application layer you find the reusable domain logic. Different UX may refer to different processes that orchestrate common pieces of business in different ways. Different frontend match possibly different application layers. (Application layer is one of the layers of DDD layered architecture. The place that orchestrate use-cases.)

Q: Can you please give an example of working with the top down UXDD approach and adjusting the business logic requirements. What is an example of adjusting the Business Logic to fit the User Interface? For example, if we learn that what users need is out of scope with project scope/timeline/resources, how can we affect the Business Logic?

A: The key fact I see behind UXDD is simply that everything starts from the top and only once you’re satisfied with the UX you start arranging the backend. How can you effectively design the BLL if you’re not sure about what users want? BLL is just the next step after UX or the part of UX that fits your scope/timeline/resources.

Q: When performance tuning comes into the place? During development or at late stages of development when you have UI defined completely?

A: UI and the backend share input and output and defining those things is the primary purpose of UXDD. At that point UI and backend can be optimized separately.

Q: Was UML to purpose this idea? And now MS removing it?

A: UXDD screens are essentially implementation of use-cases. And use-case diagrams are a foundation of UML. But in my opinion UML is too much and too cumbersome. Too many diagrams and formalism. But yes there’s some reasonable conceptual overlap.

Q: What's an example of an Event Store?

A: An event store is a database where you store the state of the system in the form of events (this is event sourcing, or having events as the data source of the application). Technically, an event store can be a NoSQL store (Mongo, DocumentDb, RavenDB) but it can be a JSON-friendly store (SQL Server 2016) or specific products like EventStore. Learning about Event Sourcing is helpful.

Q: What you think about "Advanced Prototyping" level with following workflow (e.g. we are working with ASP MVC):

  • 1) First we implement prototype in ASP MVC Views with faked ViewModels.
  • Use fake ViewModel Repositories, which in future we will replace to real.
  • Not repositories of domain objects, but repositories of view models.
  • At that time there is no domain objects and logic.
  • 2) After some iterations we have working prototype and then we implement domain objects and logic. And connect domain model with view model.

A: What you describe is what I did no more than a couple of times. Today ad hoc tools exist (ie, Axure) that can do most of the work for you and generate HTML frontends with fake data in a PowerPoint style way and a bit of JavaScript knowledge. A work non-developers can do. At the same time, these tools can create only storyboards—sequences of hyperlinked pages. If you need a bit of logic and realistic data in the prototype, then it’s a prototype and what you suggest is probably the least expensive way.

 

About the speaker, Dino Esposito

Dino Esposito

Since 2003, Dino has been the voice of Microsoft Press to Web developers and the author of many popular books on ASP.NET and software architecture, including “Architecting Applications for the Enterprise” or "Modern Web Development" or upcoming "Programming ASP.NET Core". When not training on-site or online for Pluralsight, Dino serves as the CTO of Crionet—a fast growing firm that specializes in Web and mobile solutions for the world of professional sports.

 

Localization is crucial for reaching out to a global audience, however, it’s often an afterthought for most developers and non-trivial to implement. Traditionally, game developers have outsourced this task due to its time consuming nature.

But it doesn’t have to be this way.

Yan Cui will show you a simple technique his team used at GameSys which allowed them to localize an entire story-driven, episodic MMORPG (with over 5000 items and 1500 quests) in under an hour of work and 50 lines of code, with the help of PostSharp.

Watch the webinar and learn:

  • The common practices of localization
  • The challenges and problems with these common practices
  • How to rethink the localization problem as an automatable implementation pattern
  • Pattern automation using PostSharp

Solving localization challenges with design pattern automation on Vimeo.

You can find the slide deck here: http://www.slideshare.net/sharpcrafters/solving-localization-challenges-with-design-pattern-automation

Video Content

  1. Six Sins of Traditional Approach to Localization (6:20)
  2. Automating Patterns with PostSharp (16:08)
  3. Q&A (20:50)

Webinar Transcript

Hi, everyone. Good evening to those of you who are in the UK. My name is Yan Cui, and I often go by the online alias of The Burning Monk because I'm a massive fan of this '90s rock band called Rage Against the Machine. 

I'm actually joined here by Alex from PostSharp as well. 

Alex: Hi, everyone. 

Yan:

And before we start, a quick bit of housekeeping. If you have any questions, feel free to enter them in the questions box in the GoToWebinar control panel you've got. We'll try to answer as many of them as we can at the end of the session, and anything we can't cover, we will try to get back to you guys via email later on. 

We're going to be talking about some of the work that I did while I was working for a gaming company called Gamesys in London. This was up until October last year, and one of the games I worked on was this MMORPG, or massive multi-player online RPG game, called Here Be Monsters. One interesting thing about Here Be Monsters was that it has lots of content, and when it comes to time to localize the whole game, we have some interesting challenges that we want to find a novel way of solving. Just from this simple screen, you can see there's a couple pieces of text. The name of the character, the dialogue that you say, as well as some UI control here that says, “out of bait.” One of these need to be localized for the entire game. And as I mentioned earlier, this game is full of content. In fact, in terms of text, we have more text than the first three Harry Potter books combined. And there are many different screens in the game, one of which is what we call the almanac. Or, you can think of this as an in-game Wikipedia of some sort, where you can find information about different items or monsters in the game. Here is an example of the almanac page for Santa's Gnome, which is only available during Christmas. 

So anyway, there's a couple information about the monster itself. A name, description, some type, et cetera, et cetera. Those all need to be localized, as well as all the UI elements of labels or for bait. The text on the bottom et cetera, et cetera. So even for a very simple screen like this, there's actually a lot of different places where you need to apply localization. 

A few years back, Atlus, who makes very popular RPG games, very niche games as well, they did a post explaining why localization is such a painful process that can sometimes take four to six months, and he touched on many different aspects that's involved in the localization process. And you can see from his list that I hit, by the estimation, programming alone takes between one and one and a half months with the traditional approach of localization. Therefore, each of your platforms, the client, they will ingest a gettext file, which contains a bunch of localization in a very plain text format like this. You've got, basically, a key value of pairs of what the original text is, as well as what the localized text should be. 

Alex:

Yan, so what is this pure file format? Is this the standard for localization, or do you have some tools for that? 

Yan:

Yep. So the gettext file is industry standard for localization. Otherwise, I don't know of any standard tooling for translators. The translator that we were working with, they had internal tools to help their translators, or work more effectively with the gettext file format. And for different languages, there are also libraries available for you to be able to consume those gettext files. We'll look at one for .NET later on. 

Alex: Okay. Yeah, thanks. 

Yan:

Once you know you've consumed those translation files, you'll need to then substitute all the text that you have with the localized versions of those texts. You've got buttons that display some text. This is just do the code, it's not taking from our real code base, but to just give you an idea of where you need to apply localization to labels and buttons and so on, as well as your domain objects. So where you've got the domain object that represents a monster, or the names and descriptions, et cetera, et cetera, will need to be localized.

 

Once you've done that, you do your data binding. Then, assuming you haven't lost anything while you've localized, this screen shows you all information about the centers known. I think this is in Brazilian. Portuguese? I can't read any of it, so I don't know how accurate translations are, but at least you can see the localization has been applied to all the different places. So you pat yourself on the back for a job well done, probably go get a beer with your colleagues, and then you realize, oh wait; what happens if we make some changes or we add new, additional types to our domain? You're gonna have to keep doing this work over and over, each with a platform that we support.

And then to add salt to that injury, look at how much time Atlus reckons we tend to spend on QA. The reason why it takes so much time is because there's a massive scope they need to test. And not only to spot check and make sure the translations is not drastically wrong, but there's also loads of bugs that can creep in and do in client integration work, have people miss out on a particular screen, whether some buttons was left in English instead. And you've gotta do these for every single platform, and because you're doing releases so frequently, or at least I hope you should be, that means you have to put on repeated effort to test localization whenever you make any kind of change on the client, really; which broadens your scope for your regular testing. And it's putting a lot more pressure on your QA teams. 

 

6 Sins of Traditional Approach to Localization 

Here's pretty much a laundry list of the problem that I tend to find with the traditional approach to localization. A lot of up-front effort, which is development, and we have the team doing more work as you introduce more domain times and extend your game. It's also hard to test, and it's prone to repressions. You see, it's normal to feel doom and gloom whenever localization's been mentioned in the company, and it's because it's a pain. Once it's there, it just doesn't go away. Which is why, when it comes to time for us to implement localization in our game, we decided to think outside of the box and see whether or not we can do it better in a way that would be easier and more attainable for our team. 

To give you a bit of background on what sort of hours of pipeline at a time, we built a custom CMS; content management system. We internally call it TNT. It's really just a very thin layer on top of Git where all the information about game design data about the monsters, about different quests, locations; they're all stored as JSON files, and we built in some integration with the Git flow, finding strategies so that we can apply the same Git flow. Finding strategy that we use for developers already, and get our testers to do the same thing. 

Alex:

By the way, why did you choose to build a custom CMS instead of using something else, or benefit to Git and Git 12 in this case?

Yan:

Right. We decided to build a custom CMS because we also wanted to bake into the CMS some basic validations that applies to our particular domain. And the reason we have a pro-lay on top of Git, is because we want to have a source control for all our game data. Well, we do that for all our source code, and the game data is really part of your source code for your game, which can't exist without the validator, the things that makes up the content of the game. 

And Git flow is just a way so that we allow our game designers to work in tandem with each other, and have a well understood process of how to merge things, and how to release things when those things get back to masters. So that when you look at a master branch, you know you're looking at exactly what has to be deployed for production and so on. 

Well, we had a team of game designers who work on different branches of stories. One person may be working on a storyline, their storyline next week, whilst another person may be working on a storyline they're screwing up in a month's time, and you want them to be able to work in parallel without stepping on each other's toes. Git flow comes into the play as part of the mechanism for allowing them to do that. 

Does that answer your question?

Alex:

Yeah. That seems to work well, yes. Good idea. Thanks. 

Yan:

Cool. 

So inside TNT, you have some very simple UI controls that the game designers can do cherry-pick, as well as they do merges of different branches. Once they're happy with the game design for the world they've done, they can then publish to a particular environment so that they can test it out in that environment and see whether the quest is more interesting. Or, put in all the mechanics so that their hopes are in place.

Right, so at this point the custom CMS would package all the JSON files and send it to a publisher survey's, which you would perform deeper validation against the game moves. For example, if you've got an item, there's a high level. Then the pair should be at a particular quest, and it shouldn't be able to give out the item as a reward. And also, we do quite a few pre-computation and auto transform the format from the original JSON into a more suitable format to be consumed by different client platforms. 

Now, all of that then gets pushed to S3 and from the TNT as a publisher, I press a button. Now I can see everything's happened. And as you can see from the logs, we do this an awful lot, which is also one of the reasons why we decided to invest the effort into building two links, such as the CMS, so that we can constantly integrate upon our game design, not just as much as our code. 

Once the publisher has done his job, we will solidify the game specs. We call them into S3, into version folders, and as a game designer, we'll just press a button to publish my work. I'll get an email with a link at the top so that I can click that and load up the web version of the game with just the changes from my branch so that if someone else is using the same test environment to test their changes, I won't be stepping on their toes. 

You may notice that there's a link here to run economic report. This refers to some other piece of work that we've done to use a graph database to help us understand how different aspects of the game connects with each other. So an item could be used in a recipe to make another item, which can then be used to catch a monster, who then drops a loop, which can then be used in another quest, or so on and so forth. So our domain is very highly connected, and to understand all the knock on effects or making even small changes like upping the price of water, you'll have a huge amount of knock on effect that goes through the entire economy of the game. So, we use a database to automate a lot of those validation and auto-balancing. 

You can also see, down the email there, you can see results of the game rule validations, and we report back to the game designer. And then at this point, all the specs are ready. The server, the flash client, as well as the iPad client will be able to consume those data in different formats, and you'll be able to load up a game and test out the changes. 

Alex:

Another question here; so why do you need to produce different formats for all those different platforms? Is that a requirement? 

Yan:

Right. So for example, on the server application, we don't really care about the name of a monster, or the description of a monster. It helps to reduce the size of the file and how long it takes to load it, as well as the memory footprint of several application. So we ship those client only information from the server spec, and we also precalculate a bunch of secret values - coefficients and things like that - into the client and into the server spec, but not make them available in the client spec. 

Of course, the client specs are public, so anyone who's a bit more tech savvy will be able to download the spec, and then work out its format and understand some secret values that we have embedded into our domain. They'll be able to cheat in the game, essentially. 

Alex: Uh-huh, okay. 

Yan:

And also the flash, because it's all web based. They prefer to load the whole file as one big zip, whereas for a iPad, they prefer to have smaller file sizes. But many of them -

Alex: Okay, that makes sense. 

Yan:

So at this point, we thought, “well, if we do localization, what about if we bundle it into our publishing process so that by the time all the files has been generated, they'll only be localized on the client?” We wouldn’t have to do some of the things we saw earlier, where you have to apply localization to your domain objects all the time. You can then publish your localized versions of game specs to language specific folders. Notice that, as you mentioned earlier, the server doesn't care about most of our text being localized, so we're actually gonna need to apply that same sort of path to the server spec. 

So with that, you remove the duplicated effort you need to do on each of the client platforms. At the same time, you reduce the number of the things that can change. They're automatically changed with each release, because they have an automated process of doing this, so there's fewer things that you test. We don't need to touch all these things, but we still have this problem of having to spend a large amount of effort up front. All the things that you were doing on the client before, now has to be done by something else. In this case, the server team, which have to, like I said before, ingest a gettext file to know all the translations and then need to check the domain objects for string field improperties that need to be localized, apply localizations when you transform those domain objects into DTOs. And then again, do the same thing for multiple languages if you're localizing for different targets.

Automating Patterns with PostSharp 

 But notice that the step two and three is actually just a Patient pattern that can actually be automated to help fire proof yourself against future changes or as you add more domain objects, we should get localization for free. 

And in .NET, you can consume a gettext file and gain admission from that using the second language package. And obviously, because we're here, I'm going to be talking about the implementation patterns and how to automate them with Posharp. 

So for those of you who are not familiar with Posharp, you can buy different aspects, which will then apply the post compilation modification to your codes so that you can bake additional logic and behavior into your code. So here, what I've got is a very simple aspect which is applied only to fields or properties of type strength, so that when you code a setter on those properties of fields, this bit of code will run. And as part of that, we check against localization; so, local stress … context object to see whether or not we are in localization. If not, we move on. 

Alex:

So I can see here that actually localization context is doing all the translation work apparently. How do you set it up, or how do you initialize it? Because here, you see just that you call translate. 

Yan:

Yep. So as I mentioned, we use just the gettext translation file. Imagine when the custom CMS, the TNT, calls the service with a big zip file, the publisher will then unpackage that. As part of that package, you will find those PO files. For each of those files, the publisher would load it with a second language, and then create a local context. And within that context, it then transforms the domain objects created from those JSONS, and into DTOs. 

So when the DTO transformation is happening, and you're creating new DTO objects and setting the string values for its fields and properties, this code will kick in. And because it's called inside a localization context, you will contain the information that we have loaded from the gettext file. So that next line, this guy, all he's doing is checking for the gettext file. “Do we have a match for the string that you're trying to localize?” If there is, then we will use that localized string instead. So what we're doing is here, is that we're proceeding with calling the setter as if you've called the setter with the localized string instead of the original string. 

Does that make sense?

Alex: Yeah, that's perfect. 

Yan:

So with this, we can then just multicast all of our DTO objects, which have the convention of having the suffix of VO for legacy reasons. And through this one line of code, plus the 30 we just saw, it pretty much covers over 90% of the localization work we had to do. And as we create new domain objects and new types, those types will be localized automatically without us having to do additional work. 

So with that, we can eliminate the whole up front environment cost, because the whole thing took me less than an hour to implement. And because we're multicasting attributes to all DTO types, it means any time we add a new DTO type in the future, you will be localized automatically by default. 

Again, you have more automations, so there's fewer chance for regression to kick in, because people are not changing things and having to constantly implement new things by hand. Still, you can have things that are regressed, but it's just, in my experience anyway, is far less likely. And since we implemented the localization this way, we actually didn't have any localization related regressions and bugs at all, which is pretty cool for us for not a lot of work. The combined effect of all of these changes is just far less pressure on your QA team to test the changes that you are making to the game; new quest line, new storylines, as well as UI changes and the server changes, and localization as well. So, they can better focus their time and effort on testing things that have actually changed and are likely to cause problems. 

Q&A

“Okay, well,” you may ask, “well, how do I exclude the DTO types from the localization process?” Fortunately, we have a built in mechanism for doing that, where you can just use the attribute property on particular types. In this case, I know the leaderboard player DTO only have the IDs which are called division profile ID and the name of the user, none of which should be localized. And therefore, we can simply exclude this guy from the whole localization process.

Then, you may ask, “well, but then where do you get these gettext files from,” which is a great question. Well, as I mentioned earlier, we actually store those gettext files as part of TNT so that when we call a publisher from there, you include the localization files as well. And to get those files into TNT, there's actually a page in the tool so that the game designer can go in there and make the changes once they're happy with all the contents. At that point, we say, “okay, now let's localize all the new quest lines that we've just created.” 

And there's a button that we click, which will then take the existing localization file, because we don't want to localize the same text if you haven't changed. We actually use comments to basically put a unique identifier for each of your text, so that we can identify that when a particular by-log or name, or description or whatever, has changed, so that we will reset the entry in the gettext file. And then for the new gettext file, we then send you over to the translators, who, with their tools, will be able to pick out the new strings that they need to translate. They only charge us for the new strings that they have to translate and not everything else that we send them. 

Once they send it back, the translated PO file, we upload it into TNT. When we do the next publish, you will have all the organization for the new content. When we release the new content, there is a bit of time where the English version is the head of the Brazilian Portuguese version. So if the player is up to the latest quest, then chances are they will end up playing a single game in English instead of translated the version of the game. 

 

With that, that's everything I've got, and thank you very much for listening. 

 

About the speaker, Yan Cui

Yan Cui

Yan Cui is a Server Architect Developer at Yubl and a regular speaker at code camps and conferences around the world, including NDC, QCon, Code Mesh or Build Stuff. Day-to-day, Yan has worked primarily in a mixture of C# and F#, but has also built some components in Erlang. He's a passionate coder and takes great pride in writing clean, well structured code.
Yan's blog.

 

When addressing website performance issues, developers typically jump to conclusions, focusing on the perceived causes rather than uncovering the real causes through research.

Mitchel Sellers will show you how to approach website performance issues with a level of consistency that ensures they're properly identified and resolved so you'll avoid jumping to conclusions in the future.

Watch the webinar to learn:

  • What aspects of website performance are commonly overlooked

  • What metrics & standards are needed to validate performance

  • What tools & tips are helpful in assisting with diagnostics

  • Where you can find additional resources and learning

Applying a methodical approach to website performance on Vimeo.

You can find the slide deck here: http://www.slideshare.net/sharpcrafters/applying-a-methodical-approach-to-website-performance

Video Content

  1. Why Do We Care About Performance? (3:20)
  2. What Indicates Successful Performance (6:06)
  3. Quick Fixes & Tools (27:27)
  4. Diagnosing Issues (37:10)
  5. Applying Load (40:49)
  6. Adopting Change (50:30)
  7. Q&A (52:14)

 

Webinar Transcript

Mitchel Sellers:

Hello, everybody. My name's Mitchel Sellers and I'm here with Gael from PostSharp.

Gael:Hi.

Mitchel:

And we're here to talk about website performance, looking at taking a little bit of a unique look at performance optimization to get a better understanding of tips and tricks to be able to quickly diagnose and address problems using a methodical process. As I mentioned, my name's Mitchel Sellers. I run a consulting company here in the Des Moines, Iowa area.

My contact information's here. I'll put it up at the very end for anybody that has questions afterwards. During the session today, feel free to raise questions using the question functionality and we will try to be able to address as many questions as possible throughout the webinar today, however if we're not able to get to your question today, we will ensure that we are able to get those questions answered and we'll post them along with the recording link once we are complete with the webinar and the recording is available.

Before we get into all of the specifics around the talk today, I did want to briefly mention that I will be talking about various third party tools and components. The things that I am talking about today are all items that I have found beneficial in all of the time working with our customers. We'll be sure to try to mention as many alternative solutions and not necessarily tie it to any one particular tool other than the ones that have worked well for us in the past. With that said, we're not being compensated by any of the tools or providers that we're going to discuss today.

If you have questions about tooling and things afterwards, please again feel free to reach out to me after the session today and we'll make sure that we get that addressed for you. What are we going to talk about? Well, we're going to talk a little bit about how and why we care about performance. Why is it something that is top of mind and why should it be top of mind in our organizations, but then we'll transition a little bit into what are identifiers of successful performance. I'm trying to get around and away from some of the common things that will limit developers.

We will start after we have an understanding of what we're trying to achieve. We'll start with an understanding of how web pages work. Although fundamental, there's a point to the reason why we start here and then work down. Then what we'll do is we'll start working through the diagnostic process and a methodical way to look at items, address the changes and continue moving forward as necessary to get our performance improvements. We'll finish up with a discussion around load testing and a little bit around adopting change within your particular organization to help make things better, make performance a first class citizen rather than that thing that you have to do whenever things are going horribly wrong.

Why Do We Care About Performance?

One of the biggest things that we get from developers is that sometimes we have a little bit of a disconnect between developers and the marketing people or the business people, anybody that is in charge of making the decision around how important is performance to our organization. One of the things that we always like to cover is why do we care? One of the biggest reasons is we look at trends in technology is that for publicly facing websites, Google and other search engines are starting to take performance into consideration within the search engine placement. We don't necessarily have exact standards or exact considerations for this is good versus this is bad, but we have some educated guesses that we can make.

We want to make sure that we're going to have our site performing as good as possible to help optimize search engine optimization aspects. The other reason that it is important to take a look at performance is user perception. There's a lot of various studies that have been completed over the years that link user perception of quality and or security to the performance of a website. If something takes too long or takes longer than they're expecting, it starts to either A, distract them and get them to think, "Oh, you know what, I'll go find somebody else," but then they start asking questions about the organization and should I really trust this business to perform whatever it is that we're looking for.

One of the last things that is very important with the differences in user trends within the last 24 months or so is device traffic. We have a lot of customers. We work with a lot of individuals where their traffic percentages for mobile is as high as 60 to 80% depending on their audience. What that means is we're no longer targeting people with a good desktop internet connection and large, wide-screen monitors. We are actually working with people that have various network abilities, anything from still being attached to that high speed home internet, but they could be all the way down to a low end cellular edge network that isn't fast.

We have to make sure that we take into consideration what happens with throttle connections, page sizes, all of those things that we used to really take an emphasis on back ten years ago, become front and center with what we're doing today. 

What Indicates Successful Performance 

With this said, one of the things we find a lot of cases is that nobody can put a tangible indication on a site that performs well. That's the hardest thing. Most of the time, when we work with development groups, when we work with marketing people or anyone really to take an existing site and resolve a performance issue, the initial contact is our website performance is horrible.

Although it gives us an idea of where their mind is at, it does not necessarily give us a quantitative answer because if we know it's horrible and we want to make it better, how can we verify that we really made it better? This is where we have to get into the process of deciding what makes sense for our organization. It's not an exact science and it's not necessarily the same for every organization. Almost all of the attendees on the webinar today, I'm going to guess that each of you will have slightly different answers to this indication and it's important to understand what makes sense for your organization.

A few metrics that we can utilize to help us shape this opinion, Google provides tools to analyze page speed and will give you scores around how your site does in relation to performance. If your site responds in more than 250 milliseconds, we would start to receive warnings from Google's page speed tools that mention your website is performing slowly. Improve performance to increase your score. That starts at 250 milliseconds and there's another threshold that appears to be somewhere around the 500 millisecond range.

We can take another look and start looking at user dissatisfaction surveys. Usability studies have been completed by all different organizations and a fairly consistent trend across those studies that we've researched show that user dissatisfaction starts in the two to three second mark for total page load and what we're talking about here from a metric would be a click of a link, a typing in of a URL, and the complete page being rendered and interactive to the visiting user.

We can also look at abandonment studies. There's various companies that track abandonment ecommerce. We can see that abandonment rates start to increase by as much as 25 to 30% after six seconds. This gives us a general idea that we want initial page loads to be really fast to keep Google happy and we want the page load to be fairly fast to keep our users happy. There are exceptions to these rules, things such as login, checkouts, et cetera where the user threshold for exceptionable performance is going to be slightly different, but it's all about finding a starting point and something that we can all utilize to communicate effectively, this is what we're looking for with a performant website.

Gael:

Mitchel, you mentioned that Google will not be happy. What does it mean? What would Google do with my website if the page is too slow?

Mitchel:

Right now there is no concrete answer as to what ramifications are brought on a website if you exceed any of these warning levels within the Google page speed tools. It's a lot like your other Google best practices and things. We know that we need to do them. We know that if we do them, we will have a better option. We'll have better placement, but we don't necessarily know the concrete if we do this, it's a 10% hit, or if we don't meet this threshold, we're no longer going to be in the top five or anything like that. We don't have anything other than knowing that it is a factor that plays into your search rank.

Gael:Okay, thanks.

Mitchel:

Which really gives us a few options. Google looks at individual page assets, so they look at how long it takes your HTML to come back. Another way to look at this that may make sense within an organization is to focus not necessarily on true raw metrics per se, but focus on the user experience. This is something that is much harder to analyze, but it allows you to utilize some different programming practices to be able to optimize your website, to make things work well. Examples of this would be things like Expedia.com, Kayak.com, the travel sites.

Oftentimes, at various events, I've asked people how long they think Expedia.com or Kayak.com takes to execute a search, whether you're searching for hotels, whether you're searching for airfare, anything in between. The general answer is that people think it takes only a few seconds when in all reality, from when you click on search to when the full results are available and all of the work is completed, it's oftentimes 20 to 35 seconds before the whole operations been completed.

However, these sites employ tactics to keep the user engaged, whether it is going from when you click that search to showing a, "Hey we're working hard to find you the best deal," and then taking you to another page, which is the way that Expedia handles it, basically distracts the user during the process that takes some time. Or Kayak will utilize techniques to simply show you the results as they come in. You get an empty results page, we're working on this for you, and then the items start to come in.

These are ways to handle situations where we can't optimize for whatever reason. Case of Expedia or Kayak, we have to ensure that we have appropriate availability of our resources. If we have flights available, we have to make sure that we still have those seats. We can't necessarily cache it. This is going to take some time. It's a great way to be able to still allow the user experience to be acceptable, but we then optimize in other manners.

The last thing from an indication perspective that is fairly absolute across to everybody is we can focus on reducing the number of requests needed to render a website. That is the one metric that in all circumstances, if we reduce the number of things that we need to load, we will improve the performance. It is a metric that regardless of if that is a target for your organization, we really recommend that you track it and the reason for that is as we step into understanding how web pages work, we'll see that the impact here can be fairly exponential. Regardless of if you're a developer, a designer, a front end person, a back end person or even a business user, it's important to understand the high level process of how a web page works before we start actually trying to optimize it.

The reason for this is that we have oftentimes encountered situations were users are trying to optimize something based on an assumption that they have. It has to be the database. It has to be our web code. It may not even be that. It may simply be your server. It may be user supplied content or something else. When we look at a webpage, the process is fairly logical and top down in terms of how the web browsers will render a page. We start out by requesting our HTML document. That is the initial request that the user has initiated. The server, after doing whatever processing that it needs to do, will return an HTML document. The web browser then processes that document and it looks for individual assets that it needs to load.

This will be your CSS, your JavaScript, any additional resource, including images, that needs to be downloaded to be able to properly render. Those items are all cataloged and they start to download. We then may have, in the case of certain development practices have a repeated chain. If we have an HTML document that links to a CSS file, once that CSS file is downloaded, if we find out that CSS file was referencing another CSS file, we can continue that chain. That's when we start to see some of our processing take additional time. The other aspect that's important to understand here is that we are forced with limitations with our web browsers so that we can only request between four and ten items per domain at the same time.

Most of the modern, modern web browsers in the last year or so will get you all the way to that ten item limit, but what that means is that if we have an HTML document that then references 30 JavaScript and CSS files to render a page, we have at least three full sets of round trip processing utilizing those ten concurrent requests, which will push out our page load. If we could take those 30 and make it down into ten resources, we could go out and grab one batch of responses and be ready to render our pages and that's something that will really help with overall performance.

To help illustrate this, I have taken some screenshots of the network view of a local news station's website. This was taken from their website about a month ago or maybe a little bit more. They've since redesigned their website, so it's not quite as extreme as it is in this example, but what we see here is I have two screenshots showing the HTTPS requests to load this web page. In all reality, there are 18 screenshots, if I were to show you the entire timeline view to load their home page. In the end, a total of 317 HTTP requests were necessary to render this web page. Now, obviously this is not something that we want to be doing with our webpages, but it's something that allows us to see what happens as we work to optimize.

We can see in this trend here that we start out by retrieving the main KCCI website, which then redirects us to the WWW. This case, user typed KCCI.com and we do a redirect. We can see here that it took 117 milliseconds to get to the redirect and then 133 milliseconds to actually get the HTML document. We've already consumed 250 milliseconds just retrieving the HTML document. Then we start to see the groups of information being processed. We can see here that we have a first group of resources up through this point being processed. We then see the green bars indicate the resources that are waiting to be able to complete. We'll see that these get pushed out, so the additional resources along the way, keep getting pushed out as the rest of the items are working.

Then the timeline expands and expands. We can see here, even by our second section, we're downloading a bunch of small images, things maybe even as small as 600 byes, two bytes, those types of things and they push out our timeline. We get all the way out here and it keeps going. You can get these types of views for your own websites, regardless of which toolset you want to use by utilizing the web developer tools in your browser of choice. On Windows machines, it's the F12 key, brings up the developer tools and you get a network tab. That network tab will show you this exact same timeline here, so you'll be able to see this in your own applications.

What we want to do is we want to be able to reduce this. If we simply cut out half of the items that we're utilizing, we can then simply move on and not have to do anything specific to optimize by simply reducing the request. Moving forward from this, let's take a look at what a content management system driven website in a fairly default configuration ends up looking like.

In this case, we see that we have a much longer initial page response time here, at 233 milliseconds, but we have a far smaller number of HTTP requests and the blue line here indicates the time at which the webpage was visible to the requesting user. It's at that point in time that sure, there may be additional processing, but that's not typically stopping somebody from being able to see the content that they want to be able to see.

As we look at optimization, this is one of those things that we want to keep a close eye on to make sure that we have improvement. Metrics, and how can we validate this quickly to be able to do a compare and contrast. There's a couple key metrics that will help you identify your HTTP requests to maybe give you an idea of if it's something that you want to do. We can look at the total page load. We compared the KCCI site that we talked about. We compared an out of the box content management system solution and then we took that same content management solution and simply minified our images and minified and bundled together our CSS and JavaScript. What we were able to do is we were able to see improvements in total page size, the total number of HTTP request.

KCCI, we're sitting at an almost five and a half megabyte download. This could cause problems for certain users because if I'm on a mobile network, that's a slightly larger download, so we definitely would want to look at ways to potentially minimize that. Same thing goes here with the out of the box content management system solution where we're at almost two megabytes. The same content, just with properly sized and compressed images in CSS, we were able to drop that down all the way to 817 kilobytes, making it so that we have less information to transfer over the wire.

Same thing goes from an HTTP request perspective. 26 is way better than 59 and 59 is still ridiculously better than 317. We see a fairly decent trend. The one thing that's a little bit misleading is sometimes this page load speed, if you utilize third party tools, that number may not always be exactly what you want it to be because it factors in things such as network latency and some of the tools even add an arbitrary factor to the number. Well, at this point-

Gael:

I'm wondering what is the process, what did you actually do to go from column one to column three. How do I know what I need to do or what is step one, two, three?

Mitchel:

What we use in a lot of these cases is one of two tools or sometimes both to help us identify more quickly some of the things. There's Google Page Speed Insights. Sorry, they renamed it and the Google Page Speed Insights tool is one that allows us to point it at any website and it will give us a score out of 100. That score out of 100 is going to then look at things such as how many HTTP requests do you have? Are your images properly sized? Are your images properly compressed? They talk a little bit about bundling and minification of JavaScript files as well.

What we can do with that tool specifically is it gives us a number we can use to help identify some of those quick hit common problems. It goes as far as if you have images that are not properly scaled or compressed for the web, they actually give you a link that you can click on and you can download properly scaled and sized images. Then you can simply replace the images on your own site with those optimized ones that come from Google.

Gael:Okay, that's cool.

Mitchel:

It's a fantastic tool, but one of the things that we find is that its score is not necessarily indicative, so one of the things that developers are often led to believe is if there's a score out of 100, I have to get 100. With the Google Page Speed Insights tool, it's not exactly that clear cut and the reasoning behind that is just looking at these three examples, it gave a very slow new site a score of 90 out of 100 and the reason for that is that they checked all of those little boxes. They checked the boxes to keep Google happy, but maybe they didn't actually keep their users happy.

We typically cross check this tool with another tool provided by Yahoo called YSlow. This tool takes a slightly different approach to scoring, but it looks at similar metrics and similar things with your website. The KCCI site received an E of 58%. Our CMS that wasn't horrible was 70, but after slight modifications on our side, we were able to get it all the way up to a 92% or an A. We find that a lot of times you want to utilize these two different tools or two other tools of your choice to help aggregate and come to an overall conclusion to what may need to change.

Furthermore, with Google's Page Speed Insights, anybody utilizing Google Analytics will actually never be able to get above a score of 98 because the Google Analytics JavaScript violates one of the Page Speed Insights' rules. I call that out to people just because I don't want to see people get tunnel vision and lunge forward all the way to, "Oh, we have to do this because this is what, we have to get 100."

Quick Fixes & Tools

To help with some of this, one of things that we often will use is a site called GTMetrics.com. It is a free tool. It is actually the tool that was utilized to grab all of the metrics on that table that I just showed and what's nice about it is you can generate a PDF report. You can even do some comparison and contrasting between your results before and after a test, which makes it a lot easier.

What does this give us? At this point, we've looked at how a webpage is structured. We've looked at what it takes to load a webpage and one of the things that we get from this is a lot of quick fixes that we may be able to improve our websites dramatically. Images that aren't compressed. JavaScript files that aren't necessarily combined. These two add to our content quite greatly. Images that aren't compressed or images that aren't properly sized are going to bloat your page load. That's why we may have a five megabyte page instead of a two.

A couple quick examples of this would be in the responsive era, we'd have a known maximum of how big our images are going to be. Uploading anything larger than that size is simply a waste of server space and it's a waste of bandwidth because we don't need to work with it. We worked with a customer that had the initial argument of, "Our website performance is horrible." We went to look at their site and we agreed. It was. The pages were loading incredibly slow and it was consistent across all of the pages. What we found is that for the overall look and feel and design of the website, the design firm never prepared web ready imagery. The homepage was somewhere along the lines of 75 to 80 megabytes in size to download and interior pages were 30 to 40.

Well, the reason why we start here is that would have been a lot harder to catch on the server side because it's not anything wrong with the coding. It's not anything wrong with the database and our web server looked really good performance wise. When we start here, we can catch those kinds of things. This one is really important in terms of image sizes if you allow user uploaded content into your web properties. The other thing that we can look at is static image, hidden images or hidden HTML elements, for those of you ASP.NET developers, older web forms projects that are utilizing use state can often add some overhead. The other big thing that is pointed out by PageSpeed Insights as well as by YSlow is a lack of static file caching.

There's a common misconception with ASP.NET projects that static files will automatically decache. That's not necessarily the case, so you may need to tweak what gets responded as that cache header value, all things that both of these tools will point out and direct you on the fix. The goal here is to take care of the low hanging fruit. Take care of the things that may be stressing an environment in a way we don't know and see what we can do. We've found typically with websites that have never had a focused approach to making these kinds of changes that we've had a 40 to 80% improvement over time as it relates to these changes alone. For those of you who don't have public facing websites, GTMetrics.com and PageSpeed Insights will not work for you because they do have to have access.

Otherwise, they don't have access to get to the website. Yahoo YSlow is available for anybody to use because it's a browser based plugin as well, so you'll be able to utilize that on internal networks. One of the things that we get a lot of times is okay, we've optimized, but we still have a lot of static files and it's adding a lot of activity to our server. Another quick fix would be to implement a content delivery network. A content delivery network's purpose is to reduce the load on your server and to serve content more locally to the user if at all possible.

The situations that we're looking for here would be things where maybe we can take the content, load it to a CDN and when we have a customer visiting our website from California, they can get the content from California. When they are in Virginia, maybe the content delivery network has something closer to them regionally. These types of systems would definitely help. It's a little bit of a moving the cheese kind of game where we've moved potential problems to leverage something else. The difference is the content delivery networks are designed to provide incredibly high throughput for static content.

They're a great way to start improving things. I'll talk a little bit more about a couple implementations specifics in just a second. Another thing that often gets overlooked is we've reduced our JavaScript files. We've reduced our CSS. We've got everything good, but now we have these pesky images where we have 25 little images that are being used as part of our design. Most commonly we see them in things such as Facebook, Twitter, LinkedIn, Instagram. All of them as little one kilobyte images.

Some people opt to use font libraries to be able to handle that type of thing, but one of the things we don't want to lose sight on is that we can utilize an image sprite, the simple combination of multiple images into one larger image and then using CSS to display the appropriate image is definitely an option to help minimize the number of HTTP requests. That does require developer implementation of it, something that can make things work fairly easily.

If we've made these changes, we want to go and roll out to a content delivery network solution, we have two options primarily and we have pull type CDNs, which are things that we don't have to make a bunch of changes to our organization. Service providers such as CloudFlare and Incapsula, I believe that Amazon also has the support to do this. Basically you point your DNS to the CDN provider and then the CDN provider knows where your web server is. Basically they become the middle man to your application.

What's nice about it is you can make the change pretty quickly, simple DNS propagation and they usually give you on/off buttons, so you can turn on the CDN, everything is good. If you notice a problem, you can turn it off until you're able to make changes to resolve things. They're great, however, unusual situations typically will arise at least at some point in your application because you have introduced something, so if it accidentally thinks something is static content when it's not, you may need to add rules to change behaviors. The implementation benefits are massive. We'll go through an example here with some experience I have with Incapsula that makes things easy.

The selling point here though is you can make this change in an afternoon, barring any unusual situations. Another option would be to integrate a more traditional CDN, which basically involves rather than loading content for static content directly to your web server, you utilize a CDN and store that information elsewhere. Rather than linking to an images folder inside of your application, you would link to an images directory on a CDN or something of that nature. It's definitely more granular, easier in some ways to manage, but it requires manual configuration. You have to actually put the content where you want that content to be and that's the only way that it'll work out as expected.

Diagnosing Issues 

We've talked about quick fixes. We've talked about things such as images and tools to validate and quantify things, but that just didn't get us enough. We need to now dig into the server side. To dig into the server side, we need to make sure that prior to us starting anything, we want to make sure that we're monitoring the health of our environments. What we'll want to make sure of is that at all times, whenever possible, we want to have information on our web server and database server, CPU and memory usage. We want to know how many web transactions are actually happening, how long are they taking, what kind of SQL activity is going on?

On top of that, we want to know exactly what our users are doing in our app if at all possible. Regardless of if you're diagnosing a problem today or if you're just simply supporting a web property that you want to make sure that you're keeping an eye on performance overall, you want to make sure that you are collecting these metrics and the reason why this is so important is sometimes you may not be able to jump on the problem a second that it comes up, but by having metrics, we'll be able to go back in a point of time and understand exactly what has happened, what was changed, what was going on yesterday at 2:00 PM in the afternoon when the website was slow?

How you go about doing that, there's a number of different tools out there. I personally am a fan of NewRelic, but we've worked with CopperEgg. We've worked with Application Insights and many other vendors. The key is to collect this information as much as humanly possible to make sure that you really truly have a picture of what's going on with your environment. Most of these tools operate either with minimal or almost no overhead to your server. We typically see a half percent to one percent of server capacity being consumed by these monitoring tools.

Gael:

Mitchel, so how long do you recommend to keep data? How much history should we keep?

Mitchel:

My personal recommendation is to keep as much history as humanly possible, understanding that that's not exactly a practical answer. Really what I like to see is at minimum 30 days, but ideally, six to 12 months is preferred, simple because sometimes you have things that only happen once a year, or twice a year. We encounter a lot of situations such as annual registration for our members happens on October 25th and our website crashes every October 25th, but we're fine 364 other days of the year.

In situations like those where having some historical information to be able to track definitely makes things a little bit easier because we can try to recreate what happened and then validate that, "Oh, well, we recreated this and now we're not using as much CPU. We're not using as much memory as we do our testing."

Gael:Okay, thanks.

Applying Load

With this, we now know that we maybe need to start taking a look at our website under load. One developer clicking around on your dev site just isn't recreating the problems that we're seeing in production, so how do we go about doing this? Well, what we need to do is we need to take all of the information that we were talking about earlier and apply this to the way that we do load testing and what we mean by that is that the load examples need to not be static content examples. In other worlds, we can't go after just our HTML file.

We have to make sure that whatever our load test does recreates real traffic behaviors, including making all of the HTTP requests, waiting between times where it goes from one page to the next, making sure that everything that the user's browser does, we do in our load test. We also need to make sure that we're being realistic. If we know that we get 200 people logging in and a five hour window and that's the most we ever see, we don't want to test our system necessarily to 500 people logged in, in ten minutes because we aren't necessarily going to be able to be successful in that regardless. The other key and this is the hardest part for most organizations is that our hardware environments need to be similar.

If we're testing against a production load, we need to be utilizing production grade hardware. There are certain scenarios where we can scale, but it doesn't always work that way and the general recommendation is that we want to make sure that we really are working in a similar environment. What should happen when we do a load test?

When we do a load test, we should see that everything kind of continues in a linear fashion. If we do a load test and we add more users, the number of requests that we process should go up. The number of, the amount of bandwidth that we use should go up and those really go up in lockstep with each other because more users equals more traffic and more traffic equals more bandwidth. What tools can we utilize for this? Well, there's a large number of tools out there. If you Google load testing tools, you're going to come up with a whole bunch of them.

In my situation, I found that LoadStorm.com is a great tool. The reason why I like it is it does what browsers do. You actually record the script that you want to run using Google Chrome, Internet Explorer, et cetera and the other thing is it gives me some data center aspects, so I can choose with LoadStorm that I want my traffic to come from the central part of the US. I want it to come from Europe or I want it to split between the two. The benefit here is that some of these load testing tools will send you European traffic when you have no European traffic at all, so your numbers re going to look a little bit different because of internet latency.

Really what you need to do is just make sure that whatever tool you're utilizing matches your expectations of what the user should be doing and what your users are doing. When we do load testing, we also want to track some additional metrics. In addition to all of the things that we should be doing, our load testing tool or our server monitoring tool should be giving us some additional things. How many requests per second were we issuing? How many concurrent users were on the site? This is a metric that's very important to understand. A concurrent user may not mean the two users clicking on a button at the same time. A concurrent user is typically referred to as a user that's visited the site whose session is still active, so they could be as much as ten to 20 minutes apart.

We want to make sure we understand that metric so we can get a gauge as to what's going on with our users. We want to look at the average server response time and we want to look at that from the client side. We want to look at the percentage of failed requests and we want to look at how many requests are being queued. The goal here in terms of what number do you have to hit or how do you hit it is going to depend on your environment. Each server configuration, each application is going to be capable of performing to a different level than something else.

Now, with that in mind, one of the things that we do want to make sure of is that our load testing tools, as part of being realistic need to be realistically located. If we have our data center in our office, doing a load test from our office to that data center is not necessarily going to give us a realistic example because we factor out the entire public internet. Keep that in mind as you're configuring things. With this, we got about ten minutes left. We'll get through a quick scenario here. Then we can take some time for a couple quick questions.

One of our success stories was working with a customer that had a large number of, they had six virtual web servers, 18 gigs of RAM a piece, a SQL server cluster with 64 gigs of RAM, 32 cores and load tests resulted basically in failures at six to 700 concurrent users or about 100 users per server and we had a goal. We needed to reach 75,000 concurrent users and we needed to do it in two weeks. We did exactly what we've been talking about here today. We started with optimizing images, CSS, minified things, bundled things. We implemented a CDN and then we have a before and after load test scenario.

Really, the actual numbers don't matter, so I've even taken the scales off of these images. The differences is the behavior that we see. In both of these charts, the top one is a before. The bottom one is an after. The green and purple lines are requests per second and throughput. The light blue line that goes up in a solid line and then levels out, that's the number of concurrent users and the blue line that is completely bouncing around in the top chart is maximum page response time and then the yellow one that's a little bit hard to see is average page response time.

What we saw with the initial test in this situation was that the maximum response time was bouncing around quite a bit. As load was added, we'd see a spike, it stabilized. Then we'd add more and it would spike. Our responses weren't necessarily where we wanted them to be and it just wasn't doing things in quite as linear or smooth of a manner. In the bottom here, we get a much more realistic example of what we'd like to see and that is as our users go up, throughput and bandwidth go up and they continue to exceed. They should never dip below that line of users. As they dip below this line of users in the top example, that's where we start to see that the server's being distressed and not able to handle the load because we have more users, so it should always going further and further away from that line.

We can see at the very bottom, we had average response time was great, error rates were totally acceptable, all the way until this magical failure at 53 minutes. The reason why I bring this up and why we utilize this as an example is to showcase why metrics are so important. In this case, we're doing a load test with 75,000 concurrent users. We are using over 21 gigabytes of bandwidth going in and out of the data center and all of a sudden, things start to fail.

Well, we have monitoring on each web server. We have monitoring on the database server. Everything is saying they're healthy. We only have seven minutes left of this load test to really be able to identify a root cause. We're able to remote desktop into the server and validate that we can browse the website from the server. In the end, we were able to find that we had a network failure of a load balancer. It had been overloaded and overheated and stopped serving traffic.

With the metrics, we were able to confirm that at this load point right before failure, we were at 30% CPU usage on our web nodes which meant we had tons of room to go yet. We were at 20% on the database server, which meant we had tons of room to grow yet. The problem was is that our pipe just wasn't able to deliver the users to us, which allowed us to be able to determine what makes sense as our next step. Now, we got lucky in this case. It was a completely implemented solution and we had to make a bunch of changes. 

Adopting Change 

As we start working in organizations, I encourage people to try to promote a proactive approach to performance in your applications.

Bring it up as a metric for any new project. Set a standard that your new projects will have X page load time or you're going to try to keep your HTTP requests under a certain number. If you can start there and validate it through your development process, it'll be a lot easier to make things work rather than having to go back and make a bunch of changes later.

If you do have to work in that reactive environment, it's important to remember the scientific method that we learned in elementary and middle school, which is change one thing, validate that that change really did something and then revalidate what you're going to do next. If you change three things, you're going to run into situations where it's not as easy to resolve an issue because you don't know what actually fixed it. It was one of those ten things we changed on Friday. Which of those ten things was really the one to make it work?

The whole goal here is if you start with a process, if you start with a purpose, you should be able to get through these situations in a manner that makes it easier to get the complete job taken care of. I'll put my contact information back up for everybody and Gael, do we have a couple questions? We've got about five minutes or so here.

 

Q&A

Q: Will GTMetric or similar tools work within an authenticated area of a website?

A: With regards to authenticated traffic in validating performance. There's a couple options. Google Page Speed Insights is really only for public facing stuff. YSlow as a browser plugin will work on any page, so authenticated, unauthenticated, internal to your network, public facing does not matter. That's probably going to be your best choice of tooling for anything that requires a login to get the client-side activity as well as the developer tools themselves, to be able to get the timeline view.

Q: What is a normal/acceptable request per second (RPS) for a CMS site?

A: The acceptable number really depends on the hardware. From my experience with working with DotNetNuke content management system on an Azure A1 server, we start to see failures at about 102 - 105 requests per second. However, it really varies and will depend on what your application is doing (number of requests, number of static files it has, etc.). While there's not necessarily a gold standard, if you see anything under 100, it's typically a sign of an application configuration or other issue rather than the fact that you do not have a large enough hardware.

Q: When would you start getting tools like MiniProfiler from Stack Overflow or similar tools?

A: Typically what happens is the progression here is at this point in time, what we're trying to do is factor out other environmental issues, so from here, this is where we would typically jump into okay, our HTTP requests look good. Our file sizes look good. Our content looks good. Now we know that we have high CPU usage on the web server and the database server isn't busy. That's where I'm going to then want to dive in with MiniProfiler or any of the other profiling tools and start to look. If I notice that the web server looks good, but my database server is spiking, that's where we're going to want to start looking more at SQL Profiler and other things.

The goal here is to take the mix out of the puzzle of am I making IIS work too hard because it's serving static content and everything else? And focus more on getting that optimized to then dive in and really get our hands dirty with our .NET code or whatever server side code that we're working with.

 

 

 

About the speaker, Mitchel Sellers

Mitchel Sellers

Mitchel Sellers is a Microsoft C# MVP, ASPInsider and CEO of IowaComputerGurus Inc, focused on custom app solutions built upon the Microsoft .NET Technology stack with an emphasis on web technologies. Mitchel is a prolific public speaker, presenting topics at user groups and conferences globally, and the author of "Professional DotNetNuke Module Programming" and co-author of "Visual Studio 2010 & .NET 4.0 Six-in-One".
Mitchel's blog.