Ethereal Computer Architecture: September 2013

Sunday 29 September 2013

My Definition of Enterprise Architecture

Someone recently asked me to define what an Enterprise Architect is. I'm usually pretty good with definitions, but this time I was stuck. I went to Google and found out I wasn't the only one.

After some thought, here is my attempt at a definition:

Enterprise architecture is a practise concerned with:

Business - technology alignment
Disciplined Innovation (innovation where it is needed)
Disciplined delivery (not re-inventing the wheel; consistency with past efforts; repeatability)
Proactive solutions (proposing solutions when systems no longer support needed business capabilities)

I hope to have time to elaborate more on this a bit later.

Thursday 19 September 2013

Using Enterprise Architecture at a Media Company (part two, Zachman framework)

As mentioned in part one, the Zachman framework is a taxonomy for organizing architecture artifacts. In this blog post, we will discuss how we can take use the Zachman framework to guide our thinking about what needs to change in the Sphere's (our imaginary media company) architecture to accommodate a metered pay wall.

If you are not familiar with the Zachman framework, the wikipedia entry is a good place to start. This article also provides an interesting take on Zachman. Please note that all the information about the Zachman framework used below was either taken from publicly available sources or from discussions with other enterprise architects. The information about the Sphere's metered paywall system is a based on an actual implemented system, with some simplifications.

Please note that this blog entry is a bit of a work in progress. I'm hoping to improve it a bit and perhaps add some diagrams. Hopefully it is not too long for people to read.

Perspectives, Fundamental Questions and Paywalls

The Zachman framework is conceptually a grid, whose cells represent types of architectural artifacts (e.g. written documentation, diagrams, models, etc). You don't need to create artifacts to fill in all the cells if it is not useful. However, I find it is helpful to think about all the cells to try and figure out if we are forgetting some implication of a business requirement on our architecture.

The rows in the grid are the perspectives from which the architecture is viewed, or alternatively the stakeholders involved in getting something planned and built. The generic names for the perspectives are: planner, owner, designer, builder, subcontractor, and enterprise. It is Zachman's assertion that these perspectives/stakeholders exist no matter if you are architecting a company, product, building or a software system. Typically, for software systems, we use the following more specific terms: scope, business model, system model, technology model, detailed implementation, functioning enterprise. It's a bit counter-intuitive, but the final row ("functioning enterprise") represents the completed product or software system and therefore does not contain any architectural artifacts.

The columns in the grid represent fundamental questions that need to be answered for each perspective/stakeholder. The columns are often labelled as what, how, where, who, when, why. Again, for enterprise architecture projects there is a more specific labeling (which, in my opinion, doesn't completely make sense for all perspectives): data, function, network, people, time and motivation. I think it is useful to remember both sets of labels for the fundamental questions, as sometimes the label from one set is more intuitive than the label from the other for a given perspective.

In the sections below, we will run through each of the cells, starting with the top row and considering the columns in the order: what, how, where, who, when, why. Ordering the columns this way is done only to make this blog entry easier to understand. I am not trying to break the Zachman rule that "columns have no order".

The Planner or Scope Perspective

Let's start by considering the how our paywall requirement affects the architectural artifacts in first or top row of the Zachman framework. The top row represents the planner or scope perspective and answers the fundamental questions from the point of view of a business planner or a project manager during project pre-planning. This row might also be useful in agile methodologies to orient the agile team prior to the first sprint. I guess it could also be argued that, in agile, this knowledge would be had by the product owner, or perhaps the leader of all the product owners.

The "What" or "Data" fundamental question

The first cell in the top row is the "what" or "data" cell. Let's assume that the artifact in this cell is the list of entities known to the company. The "sphere" has not previously had digital subscriptions to its website. Therefore, we need to add "digital subscriber" to the list of entities known to the company.

The "How" or "Function" fundamental question

The second cell in the top row is the "how" or "function" cell. Let's assume that the artifact in this cell is a list of business processes. We are going to need new business processes to manage the digital subscriptions and do billing. We also need to know some sort of business rule that determines when non-digital subscribers will be blocked from seeing an article.

The "Where" or "Network" fundamental question

The third cell in the top row is the "where" or "network" cell. Let's assume that the artifact in this cell is a list of places that the Sphere does business. The new paywall probably doesn't change this.

The "Who" or "People" fundamental question

The fourth cell in the top row is the "who" or "people" cell. Let's assume the artifact in this cell is a list of the organization units of the company and the company's business partners. The Sphere has an internal development team to write the paywall software. The Sphere's existing credit card billing service can handle billing the new digital subscriptions, although this should be confirmed. The Sphere's SAP Team within the IT department has also confirmed they can handle the billing through SAP. However, it appears that the Sphere's call centre is not adequately staffed to handle complaints from digital subscribers. Therefore, we must consider whether we are going to expand the call centre or outsource customer care for the new digital subscribers. After the business planner consults the executive sponsor for the paywall project, he/she informs the enterprise architect that a new organization needs to be added to the artifact in this cell: a call centre outsourcer.

The "When" or "Time" fundamental question

The fifth cell in the top row is the "when" or "time" fundamental question. Let's assume that the artifact in this cell is a list of all the cycles (or repeated processes and recurring deadlines) in the Sphere's business. There are two basic cycles implied by business processes defined in the "how" cell: the cycle that controls how frequently non-digital subscribers who get blocked from seeing articles are unblocked and the cycle that controls how often digital subscribers will be billed. Let's discuss the second cycle a bit further. The business planner consults with the SAP architect (because SAP is used for billing at the Sphere) and discovers that the existing print subscriber billing cycle could be used for digital subscriptions but that a print subscriber would need a separate digital subscription if subscribed to both print and digital products. The planner consults the executive sponsor and the executive sponsor is concerned that this will create problems offering bundled subscriptions (i.e. paying a single, discounted, price for both digital and print subscriptions). After much discussion with the SAP architect, they decide a trade off can be made that combines both subscriptions into one, but which slightly reduces the amount of revenue that the Sphere will collect from bundled subscriptions. The executive sponsor indicates this is ok in the short term, and so the enterprise architect notes in the artifact that the existing print billing cycle will be used initially, but that another billing cycle may need to be added in the future. Although the actual architectural architect didn't change much, the process of considering the "Time" fundamental question during planning raised an important issue that was resolved before the project got underway.

The "Why" or "How" fundamental question

The sixth cell in the top row is the "why" or "how" fundamental question. Let's assume that the artifact in this cell is some sort of list of general business strategies. Adding a metered paywall is a substantial change in business strategy. Something along the lines of the initial few paragraphs of the first part of this series should be added to the list of general business strategies to explain why the Sphere is building a metered paywall.

The Owner or Business Model Perspective

Let's now move on to the second row in the Zachman framework, which represents the "owner", "business owner" or "business model" perspective. Zachman and others have used a construction analogy for this perspective, comparing it to that of the owner of a building being designed. The building owner cares about things such as which way window are facing, how the building is partitioned into rooms, etc, but does not necessarily care about the where the support columns are or where the water pipes are run. In the same way, a (business or product) owner of a software project does care about what the software does, but not necessarily about whether it uses an Oracle or SQL Server database. In my opinion, people who discuss the Zachman framework often think of a business analyst in the owner role, because the architecture artifacts tend to be things analogous to high level entity-relationship diagrams or data flow diagrams. Also in my opinion (which may not be the opinion of many other practitioners, to be fair), the owner is often a business or product owner who wants to see mockups and sometimes market research rather than data flow diagrams. However, if the owner is, in fact, a business analyst, then high level data-flow diagrams and entity-relationship diagrams may be the correct approach.

The "What" or "Data" fundamental question

We return to the first column in the Zachman framework, but this time for the "owner" or "data" perspective. In the planner perspective, we dealt with this same fundamental question by adding a new type of entity called a digital subscriber. Assuming the owner is a business analyst, the artifact in this cell may be a document describing entities and some of their high level attributes and perhaps their relationship to other entities. In this case, we probably want to add a digital subscriber entity to the document as well as some of the attributes (information) that the owner has decided should be collected when a digital subscriber signs up. In the modelling exercise that this document is based on, using the Zachman framework resulted in a spirited discussion of how much information should be gathered for a digital subscriber. It was useful to have this discussion early in the design process. The business owner also had a marketing research firm produce profiles of imaginary digital subscribers. In my opinion, these could also be considered architectural artifacts that would fit into the "what" or "data" cell in the owner perspective.

The "How" or "Function" fundamental question

The second cell in the owner or business model perspective's row is for artifacts which define business processes from a high level business perspective. Sometimes the artifacts are something very similar to high level data flow diagrams, showing business processes and how they accept inputs from and pass outputs to each other. If we use this approach, we will need to add details about the processes we defined for the cell above this one in the planner row. These processes would probably include signing up a subscriber, modifying the information for an existing subscriber, cancelling a subscription, performing periodic billing for a subscriber and cancelling a billing. In my opinion, an artifact consisting of UML use cases or agile user stories might work just as well and might be easier for some business owners to deal with.

The "Where" or "Network" fundamental question

The third cell in the owner or business model perspective answers the fundamental question "where" and is usually concerned with the locations a company operates from and the logistics between those locations. Because the business owner has decided that the customer care team for digital subscribers will be outsourced, it is probably wise to add the outsourced customer care team to this artifact. We will assume for now that the logistics will consist of a dedicated network connection between the headquarters of the Sphere and the outsourcer.

The "Who" or "People" fundamental question

The fourth cell in the owner or business model perspective answers the fundamental question "who" and has artifacts which show the interactions between the people involved in the system. The people are usually grouped somehow, possibly into departments or other organizational units and workflows are shown between the groups. For our metered paywall, digital subscribers will interact with customer care agents, so digital subscribers, and customer care agents will need to be added to the artifact as will the basic workflows that occur between them (modifying a subscription, creating a subscription, stopping a subscription).

The "When" or "Time" fundamental question

The fifth cell in the owner or business model perspective answers the question "when" and has artifacts which show the cycles and (therefore implicitly) the critical recurring deadlines for the company. When we looked at the "when" fundamental question in the planner perspective (the cell immediately on top of the one we are dealing with now) we determined that we would use the print subscriber billing cycle for digital subscribers. The other cycle that needs to be considering has to do with non-digital subscribers viewing digital content. The metered paywall will block them (and ask them to subscribe) after they have viewed a certain number of articles in a given time period. The business owner needed to decide at this point what this time period would be. The business owner actually threw us a curve ball and said there should be different types of articles, some of which could be viewed without restriction by non-digital subscribers, others that would be subject to a limit and still others that would be never be viewable by non-digital subscribers. It was good we discovered this! We had to go back to the "what" cell and add some attributes for articles and go back to the "how" cell and modify our article viewing procedure a bit. Obviously, it would have been better to catch these when we were considering the "how" and "what" fundamental questions, but we still caught them relatively early. Realistically, I think that modifying or creating the artifacts for a single perspective is a bit of an iterative process in which, like in this case, a discovery while working on one cell may affect a cell that you have already been working on.

The "Why" or "Motivation" fundamental question

The final cell in the owner or business model perspective answers the question "why" or explains the motivation from the business owner perspective. The artifact for this cell is often a business plan. Business plans vary in their content, but usually describe what is to be done, why it needs to be done (usually to reduce costs or drive revenue or both) as well as some basic targets along with any necessary strategies to meet these targets. In the case of the metered paywall, most of this information had been prepared prior to us starting the Zachman process and we were mostly able to use existing documents/e-mails to create a business plan, which can be added the set of business plans that maintain for this artifact.

The designer or logical model perspective

If you've done any data modelling or solutions architecture, the perspectives we've just covered may have been what you considered as "requirements" and may have been gathered by a business analyst or part of the knowledge of the product owner. The designer or logical model perspective is typically the point at which architects and data modelers get involved with a project and many (possibly all) of the artifacts in this perspective will be familiar to them.

The "What" or "Data" fundamental question

The "what" or "data" cell in the logical model perspective contains at least one artifact which describes entities and their relationships. Not surprisingly, an entity-relationship diagram might very well be used here. I like to use class diagrams (from UML), but mostly leave out the methods in this cell. This is because we have a tool for easily drawing class diagrams and we hoped it would save us some time when we got to the next cell and the next perspective (but I'm getting ahead of myself here and sort of breaking the Zachman rule that an artifact can only go into one cell). We created classes for digital subscriber and article group (which would define both a set of article groups, as well as the maximum number of articles in that group that a non-digital subscriber could access). We added an attribute to the article class which defined which article group an article was in. We added a class that counts the number of articles seen by a subscriber in a particular article group. We also created an sap billing document class to hold any billing information that might need to be passed to SAP. We finished by adding a few easy relationships: an article has a many to one relationship with an article group, a digital subscriber has a one to many relationship with an SAP billing document, etc.

The "How" or "Function" fundamental question

The "how" or "function" cell in the logical model perspective contains at least one artifact which describes the user visible functionality of our systems. I've seen people use essentially block diagrams of the components of an application with flows indicating the information that a user can access in each component, as well as sometimes the information that the components pass between themselves. Because we have already started to think in terms of classes, a UML sequence diagram can, in my opinion, be an acceptable artifact for this cell. Sequence diagrams in UML show how user tasks or business processes are executed by classes (which correspond somewhat to Zachman's idea of application components). The sequence diagram ideally can borrow from the cell above it, which contains business processes and use these as the processes it is illustrating. In our case, this meant that we had to do a sequence diagram for the subscription sign up process and the process that happens when a user (either digital subscriber or not) views an article. The sequence diagram can also borrow from the "what" cell in the same row and use the classes defined there as the things in the sequence diagram that make method calls to perform a business process. Realistically, when you start building the sequence diagrams, you may find that you are missing classes and that is exactly what happened to me (We realized we needed a class that counts the number of articles accessed in an article group by a particular user, and then we added it to the class diagram in the "what" cell).

The "Where" or "Network" fundamental question

The "where" or "network" cell in the logical model perspective contains at least one diagram that shows how distributed system components (if any) communicate by drawing lines between them, for example, a web server will often communicate with an application server, which may in turn communicate with a database. In our case, we had to add a link to SAP to retrieve billing information. This raised PCI concerns (because our SAP system stores credit card information and must follow PCI requirements) and so we ended up moving some of the billing functionality inside our PCI web server environment and including it on the diagram. We also realized that our existing web cache servers were not going to allow us to block non-digital subscribers from seeing articles, so we had to modify our distributed system diagram to show the use of a third party CDN (Akamai) that had the required capability.

The "Who" or "People" fundamental question

The "who" or "people" cell in the logical model perspective contains at least one artifact that gives a high level (or architectural) view of the user interface. Zachman states that this artifact should model roles which are connected to deliverables. In my experience most user interface/usability professionals don't really know what this means. For the metered paywall project, we used the wireframes (simple UI mockups with minimal design detail) and basic representations or mockups of the interactions that our two types of users (digital subscriber and non-digital subscriber) would have with the system (which can be done as a series of powerpoint slides if you want). For example, at one point we (or more correctly the product manager and ui team) developed a very simple set of powerpoint slides showing the rough series of web pages that a user sees to sign up as a subscriber. Similarly, we used the initial mockups of the web page that a non-digital subscriber gets when they try to access an article that would cause them to exceed their free article quota. These artifacts are a lot more visual (and perhaps more concrete) than what Zachman seems to have intended, but they are easier for usability professionals to create and work with.

The "When" or "Time" fundamental question

The "When" or "Time" cell in the logical model perspective contains artifacts that describe the way the business cycles uncovered in the corresponding cell in the business owner perspective will be mapped to the cycles of the Sphere's systems. We first determined that, because of the requirements in the previous perspective, we would need to use the print subscriber billing cycle for digital subscribers. This cycle is largely within an SAP ERP system and therefore we engaged the SAP architect to help us with the logical model. We discovered that we would need to define the typical billing frequency, that is, the time that elapses between billings if a subscriber does not temporarily suspend their newspaper. We also discovered that, because of the requirement to bundle digital and print subscriptions, subscribers with bundled subscriptions would essentially extend their billing date if they suspended their newspaper. Many people thought this was not completely desirable, however, after much discussion, we decided that the alternatives were too costly. Therefore, we added this cycle to our logical cycles artifact (a spreadsheet), noting that it was tied to the print subscription cycle and that it would be affected by newspaper suspensions when a customer had a bundled subscription (in Zachman terms the cycle is controlled by the receipt of recurring payment event and the event generated when the subscribers total payment is amortized). We also flagged this as something that should be revisited later. The other cycle that was uncovered in the previous perspective was the period for which a non-digital subscriber would be blocked from seeing some articles after their they exceeded their allowance of free articles. As discovered in the previous perspective, the allowance should reset back to the maximum at the beginning of each month. Therefore, we added a cycle to our logical cycles artifact to reset the number of free articles at the beginning of each month (we can also say that the reset is triggered by a "beginning of month" event).

The "Why" or "Motivation" fundamental question

The "Why" or "Motivation" cell in the logical model contains business rules which can be implemented in a rules engine or possibly code (but we are getting ahead of ourselves). Formal specifications for business rules do exist, but we generally just try to use concise, english statements in the artifact (generally a spreadsheet) for this cell.

Here is a sample of some of the business rules that we discovered by reviewing the business case and getting necessary clarifications from the product owner:

Each article in the system will be assigned one of several colours: green, red and yellow.
Only digital subscribers can view green, yellow and red articles without restriction.
Non-digital subscribers cannot view red articles.
Non-digital subscribers can view green articles without restriction.
Non-digital subscribers can view only n unique yellow articles per month, where n should be a configurable threshold.

The Builder or Technology Model Perspective

I've always had some trouble distinguishing this perspective from the logical model perspective above it and the detailed implementation model perspective beneath it. I think part of the problem is that we often skip parts of the technology model perspective when we do actual projects because it is easier to think about something either as part of the logical model or detailed implementation. The trick I use to try and figure out this perspective is to go back to Zachman's construction (of a building) analogy. The technology perspective corresponds to the builder's perspective. A builder needs to know the materials that will be used but not necessarily the exact details of exactly how they will be fit together to make a building. Extending this to technology projects, the technical leads need to know what technologies will be used, for example, java classes (possibly with methods defined) and Oracle tables, but not the exact algorithms or the data types and indexes of the Oracle database.

The "What" or "Data" fundamental question

I have a bit of a bias towards UML, so I like to use a class diagram as the artifact for this, but a detailed ERD might be more appropriate if you are a purist. When I use a class diagram, I generally only include the classes that are actually persisted (that is, saved into some sort of database or file) entities, as well as how they will be persisted (e.g. file, Oracle, Solr, Cassandra, HBase, MongoDB, etc). Since the metered paywall is part of a larger web system, the classes or entities for it can be added to the class diagram or ERD for the whole web system (assuming it exists!).

The "How" or "Function" fundamental question

Again, I like to use a class diagram as the artifact for this. However, it should include all known classes, their relationships (cardinality and inheritance) and the classes should be annotated with the language in which they will be written, a description of what the class does and the methods and attributes of the class to the extent that these can be known at this point. Again, since the metered paywall is part of a larger system, it's classes would normally be added to the class diagram for that larger system.

The "Where" or "Network" fundamental question

The existing system that we are adding the metered paywall to has a somewhat complex distributed architecture, for which there is an existing network diagram. As previously discussed, the metered paywall will require a new content distribution network (Akamai), which we will need to add to the network diagram. We will need to pass information to Akamai about whether the user is a digital subscriber, and, if not, how many yellow and red articles they have so far viewed this month. We will also need to tell Akamai what type of article (green, yellow, red) they are viewing. These flows were noted on the network diagram (for the larger web system of which the metered paywall is a part), with additional annotations that encrypted cookies will be used to communicate whether the user is a digital subscriber as well as the the number yellow and red articles that they have viewed so far this month. The encryption algorithm will be triple DES and another flow will be added to the network diagram to indicate how the key is exchanged. The colour or group to which an article belongs will sent using Akamai's edge side includes. This will also be indicated on the network diagram.

We also need to add a network flow between our SAP environment and the PCI environment in which will be used to serve web pages that allow a user to subscribe. We augmented this link with a note that communication will be by sending encrypted files in order to make PCI compliance easier.

The "Who" or "People" fundamental question

We used the detailed screen mockups done by the design group for this artifact. For convenience, these can be sequenced in powerpoint presentations to show how the system works for various types of users in various scenarios. For example, powerpoint presentations were built to show the subscription process as well as the user experience when a non-digital subscriber exceeds their monthly quota of yellow articles. I also like to include some web architecture guidelines (such as how to use JQuery, Javascript, the need for CSS, etc) along with the mockups to round out the set of artifacts for this cell.

The "When" or "Time" fundamental question

The artifact for this cell is a spreadsheet listing the cycles of the larger content delivery system, with some information on what components will be used to implement them. After consulting with the SAP functional analysts, we decided that the digital subscriber billing cycle will be done as part of the subscription monitoring process in SAP. After consulting with the web architects, we decided that resetting the monthly quota of yellow articles for non-digital subscribers will be done as part of the Akamai edge side include logic. These decisions were noted in the spreadsheet.

The "Why" or "Motivation" fundamental question

The artifact for this cell is a spreadsheet that contains business rules along with information on how they will be implemented. We decided not to embed a rules engine in the code for the metered paywall. Once this decision was made, the only way to implement the rules is to modify code, in one of four places: jsp or java on the application server, javascript/front end or in the Akamai edge side includes. We decided to implement the rules that I discussed in the system model perspective by using edge side includes and java in the application server. It needs to be done this way because sometimes articles that expire from the Akamai cache (edge sides includes work on Akamai) get fetched from the application server.

The Subcontractor or Implementation Model Perspective

The artifacts in this perspective are essentially very detailed models or descriptions of what is to be implemented. In my experience, many of these models may not be actually be stored anywhere -- they might exist on a whiteboard for a short period of time, they may be rough drawings on scraps of paper or they may just be conversations between members of a (often agile) team. In some cases, I think that some of the artifacts in the implementation model perspective never leave a technologist's mind. In the metered paywall project, these models were mostly done on whiteboards or in conversations, as I mentioned above. However, I will discuss ways in which they could have been recorded (and may have been in some cases).

The "What" or "Data" fundamental question

Some of the data that was persisted in the metered paywall project was stored in Oracle, while other data elements were in Cassandra (a NoSQL database). The creation scripts for the Cassandra keyspaces and column families and the Oracle tables are actually a good artifact for this cell, in my opinion. They get preserved as part of the code deployment process, so they remain available after the project is done.

The "How" or "Function" fundamental question

As an architect, I would have preferred if the UML models developed in previous perspectives for this fundamental question had been augmented with comments and possibly pseudo code for the methods and used to produce code. In reality, there were no artifacts produced for this question as the agile methodology used by the development teams favours working code over documentation. I don't think that Zachman envisioned the effects of agile development on his framework :-).

The "Where" or "Network" fundamental question

The primary architectural issue for this question is the connection to Akamai (our content delivery network) and the information that needs to be passed via cookies and contained in Akamai's edge side includes and configuration so that only digital subscribers have unrestricted access to yellow or red articles. Because some of this work required collaboration between the Sphere's developers and Akamai, network diagrams and detailed written descriptions of cookie formats had to be produced, even though the in-house teams were using an agile methodology. In some ways, our relationship with Akamai was essentially what I believe Zachman envisioned when he created the subcontractor perspective.

The "Who" or "People" fundamental question

In the implementation model perspective, this fundamental question intuitively should be something like a specification for the user interface widgets as this follows nicely from the previous perspective (at least in my opinion). Many user interface technologies are built by responding to user events (this includes web technologies and perhaps surprisingly, SAP screens) and creating a diagram or document that shows user interface widgets and how user interface events are processed would seem to be a rational choice of artifact for this fundamental question. In an agile project, like the metered paywall described in this blog entry, it is very likely that these things are discussed but not written down (due to the agile preference for working code and face-to-face discussion over documentation).

I have read Zachman discussions that argue that security architecture artifacts should be placed in this cell. To me, this doesn't quite seem intuitively correct, however security is important and it definitely needs to go somewhere. In the metered paywall project, we dealt with security in the logical and implementation perspectives and in the "network" fundamental question in this perspective. In systems like SAP ERP and Netweaver (which have extensive role-based security), it is possible to separate out the security configuration and include security artifacts as the answer to this fundamental question. In SAP, for example, we typically have a spreadsheet that lists users and the roles that are assigned to them and then another spreadsheet that lists roles and describes the authorization objects. In SAP this is definitely an implementation level document that is often completed only a short time before a system goes live (and sometimes not until after go-live, unfortunately). Therefore, I think that using this cell for security artifacts can make sense, but it requires a fairly sophisticated security sub system that you can configure separately from everything else. This is not always present in web projects and wasn't part of the metered paywall system that this document describes.

The "When" or "Time" fundamental question

The artifact for this fundamental question in most representations of the Zachman framework that I have seen is a fairly low level (almost assembly language) specification of how events/periodic processing should be implemented. The metered paywall project is a combined web and SAP project and so it does need to deal with concepts at that low a level. Earlier in our analysis, we decided that the billing cycle for digital subscribers would be tied to that of print subscribers and implemented on our SAP system. The implementation model artifacts for doing this in SAP are very well defined: an initiated change control ticket with the necessary SAP configuration objects defined. The required code changes will later be attached to this ticket and the ticket will be retained indefinitely, making it a very suitable artifact.

At the start of every month, we also need to reset to zero the count of yellow articles that non-digital subscribers have viewed, allowing them to view as many yellow articles as the threshold permits in the new month. We decided to do this by modifying java and having Akamai modify configuration and edge side include code. Again, because we involved Akamai, we produced some documents that outlined the rules in a sort of pseudo code that explained how code should be written to detect when a user counter cookie was from a previous month and then reset the count in the cookie. This pseudo code could serve as an artifact for this cell.

The "Why" or "Motivation" fundamental question

As with the "How" fundamental question, the agile development process that we use means that there were very few recorded artifacts for this fundamental question. The business rules that we identified in the previous perspectives were simply implemented by developers by modifying the necessary java, javascript, or edge side include code. It would be possible to create some pseudo code showing how the classes, javascript and edge side include code was modified and this could be the artifact for this fundamental question.

Some closing thoughts

This post was a lot longer than I thought it would be when I started. The Zachman framework can produce a lot of documentation, and I guess even trying to describe it at a high level (as I have done above) can take quite a few words. Overall, it probably does save some time by forcing you to think of things up front. However, it can be hard to justify using it, because it adds overhead to the beginning of a project and produces little in terms of demonstrable results. When starting a project, I like to at least mentally run through the perspectives and fundamental questions. Even if there is not time to properly produce each model, I find that it is a useful tool for thinking about a project.

The Zachman framework has been criticized for not really defining a process for enterprise architecture. In the next post in this series, I will talk about my attempts to take ideas from TOGAF to create a process. I also experimented a bit with lean Enterprise Architecture methods, and hope to produce a blog post on that as well.

Tuesday 17 September 2013

Using Enterprise Architecture at a Media Company (part one)

This post gives a fairly small example how Enterprise Architecture can be used in a media company. This is based on actual work, but, as they say on TV, "the names have been changed to protect the innocent". In order to make the size of this post (and the other parts) manageable, I'm only going to take a single aspect of the Enterprise Architecture for a company which I will refer to as "the Sphere". I'm also only going to deal with it on a fairly high level, in order to make the example clearer, although hopefully it will be straightforward to see how it could be made more detailed.

The Problem

The Sphere runs a newspaper and a website, which has content roughly similar to the newspaper. People are no longer paying for the newspaper because they can read all articles on the website, which is generates revenue from digital advertising. Digital advertising however, only brings in a small percentage (15%) of the revenue necessary to support the company's costs; ads in the newspaper have traditionally brought in about 70% of the Sphere's revenue, with newspaper subscription fees bringing in the remaining 15%. The Sphere's print advertisers have realized that people aren't reading newspapers as much as they once did, and are therefore shifting their advertising dollars elsewhere, seriously impacting the largest source of revenue at the Sphere.

A metered paywall is special software code that allows website visitors to see a certain number of articles without paying, but requires the visitors pay to see additional articles.

The Sphere hopes that, by introducing a metered paywall to their website, they will encourage people to continue to buy the newspaper (thus making it more appealing to print advertisers), and get revenue from a new source: subscriptions to the website that allow visitors to see as many articles as they want.

What's an Enterprise Architect to do?

Enterprise architects need to make sure that a company's technology is aligned with its business strategy. Once the decision makers at the Sphere decide to add a metered paywall to the the website, the enterprise architect must take this information and determine how to adapt the company's technology to it. It is possible that existing technical architectures may need to change and/or new components or technologies have to be architected and built. Ideally, we do this in a systematic and disciplined way.

Usually, this involves examining and manipulating (that is changing or adding to), existing architecture "artifacts", which are usually diagrams, written documents, or models constructed using UML or some other methodology. Since the artifacts represent the company's technology, this gives the enterprise architect a way to think about what needs to be done and hopefully not forget anything. The enterprise architect can then begin discussions with business stakeholders, other enterprise architects, solution architects, development managers and developers to decide what must be done. In my opinion, an enterprise architect does not necessarily solve problems, but instead uncovers and frames them (and possibly has a recommended solution or two in mind).

It often helps if there is a sort of architectural change control process that specifies how the above happens, so that it doesn't happen in an ad hoc way every time business strategy changes.

It can be helpful to use two fairly well known approaches to deal with the artifacts and create a sort of architectural change process. The somewhat inappropriately named "Zachman Framework" is actually a taxonomy for organizing architectural artifacts, making sure they are well defined (that is, don't overlap), complete and that business requirements align with the resulting architectural designs (and the technologies that get built). The TOGAF framework is a sort of strategy for building an architectural change process, which might or might not use the Zachman Framework.

We will look at how we can use the Zachman Framework to handle the Sphere's need for a metered paywall in the next part of this series. We will look at how we can add the TOGAF framework to provide a sort of architectural change process in the third part.

Continue on to part two.

Sunday 15 September 2013

Cassandra as distributed cache?

NoSQL was born as a sort of reaction to the architectural design pattern in which you put a cache (such as Ehcache Enteprise, Redis, or memcached) in front of a relational database in order to better scale the database. One of the basic rationales for NoSQL is that, if a cache is sufficient to handle most of your database queries, then you don't really need a relational database. NoSQL then goes one step further and says that, if you can live without some of the relational database features, then you can trade them off for other useful capabilities like replication.

At the moment, the company which I work for is having trouble with the caching solution we are using in front of our relational database. I don't want to name the solution we are using, because we are not using it properly and the problems we are having are therefore more of our own making. However, we are looking at moving much of our infrastructure into an IAAS cloud solution (possibly Amazon AWS, Google Compute Engine or Rackspace). Our existing caching solution is not well suited to multi-datacentre deployment (which is probably one of the big advantages to using IAAS), so we need to look for something else.

Cassandra is really well suited to this type of cloud deployment for a number of reasons. The Cassandra data model can easily support a key-value store (we will talk more about the Cassandra data model later) and it is possible to put time-to-live (ttl) values on Cassandra columns, which means we can have cached values automatically expire. One big advantage of Cassandra over some key-value stores is that it can flexibly shard and replicate the data to multiple nodes and multiple data centres.

The multi data centre support is very useful. Cloud providers generally allow you to deploy to n data centres, where n is larger than two. You can get really good fault tolerance by dividing your infrastructure into n separate and autonomous units (that I like to call "pods"), putting each one into a separate data centre and then doing load balancing between them (most IAAS providers give you a way to do the load balancing fairly painlessly). This is a pretty powerful idea because you can potentially run on cheaper, smaller cloud instances and you don't need to effectively double your infrastructure like you often do when you deploy to two data centres. Assuming you have n pods, you can probably size your instances so that your applications can run using (n-2) pods. Assuming you can get n > 6, you will likely spend less than you would by spreading your infrastructure over 2 data centres which requires that you have enough infrastructure in each data centre to run in the absence of the other data centre.

As hinted earlier, Cassandra has the concept of data centres, and makes it easy to put at least one complete copy of your data in each. My thinking is that each pod should be configured as a single Cassandra data centre. I'm not sure whether it makes sense to have more than one copy of the cached data in each Cassandra pod, because if you have six pods, you will potentially have six copies of your data, which is plenty. Assuming there is reasonable connectivity between the pods, a Cassandra node failure will cause at least some of the data to be fetched from a different pod, which may be ok.

When cached data is updated in Cassandra, it will be replicated within a few milliseconds to the other pods. There is a risk of nodes in other pods getting stale cached data, which needs to be considered. Typically, I suspect that we will want to make user sessions somewhat sticky to the pod that they initially connect to, which should lower the risk a bit.

Another issue I can see, based on my organization's use of distributed database caches, is that we will sometimes need to invalidate a cache (remove all its entries). I can think of quite a few Cassandra data models that would allow you to invalidate a particular cache, but perhaps it is simpler if we keep each cache in its own column family. We could then drop and recreate the column family to clear or invalidate the cache. I guess we could also just truncate the column family, but my experience with the nodetool truncate command is that it does not work really well on multi-node clusters (it works pretty well on single-node clusters though, but I am sure most people don't have those in production).

Most distributed caches also allow you to place an upper limit on the number of items in a cache. This is generally done to conserve memory. In Cassandra, the cache can spill to disk, so memory is less of a concern. However, it might still be desirable to have a limit on the cache size. One way to do this is to have a row (called an "all_keys" row, probably using the row key "all_keys") in each cache's column family whose column keys are a time stamp (representing cache insertion time) concatenated with the cache key for each entry in the cache. These columns would have the same time to live (ttl) as the cached data. We could also define a counter column in each cache's column family which would keep track of the current number of elements in the cache. When this counter exceeds a certain value, we could have a daemon delete the oldest entries from the cache's column family. These could be determined by doing a column slice on the all_keys row. Having the "all_keys" row would allow us to invalidate the cache by doing a column slice to get all the cache keys and then deleting all the rows, instead of dropping and recreating the column family.