This is the book you need if you wish to develop an application for quickly evolving, complex rules of some enterprise. Though it has some out dated parts probably every serious programmer should read it. The book is about patterns for building enterprise applications those ordered for some specific users and with main limitation being development time and intended to run long. Authors compiled many years of top professional developers wisdom. They do not need an introduction I guess (esp. Martin Fowler).
Book is large and divided into two parts. The first one gives some tutorial on applying patterns. The second part is a reference.
Below is my very short notes to some key chapters.
Authors advice to use iterative development, deliver to the end user as soon as you have something.
Architecture definition (see Key words).
Differences of EA from other kinds of soft: complex data and business rules; but it is easier as there is no complex multi-threading and hardware integration.
EA characterized by persistent data, a lot of data (usually using relational DBs), concurrent data access, many UI screens, integration with other EA, conceptual dissonance, complex business “illogic”. EA can be relatively small (though important also).
There are different kinds. Choose appropriate architecture for each, knowing the differences. Also choose appropriate tools.
Example 1. B2C retail web site. Highly loaded application. Many users. It should be scalable (adding hardware) and good performance.
Example 2. Complex leasing system. Complex and arbitrary business logic.
Example 3. Expense tracking system. Simple logic. Small volume of users. But it should evolve quickly.
Consider getting application up and running before dealing with performance issues.
Apply performance architecture tips carefully.
Ensure optimizations do optimize performance and not just making code harder to understand — test before and after.
Retest all performance optimizations after each environment (re)configuration.
Usually it is better to construct for scalability as adding a resource is cheaper than optimizing a system. It is cheaper to buy a new server than to hire a new programmer.
Understand problems that patterns solve so that when they are came across a pattern can be found. Know what problem a pattern solve and how it does it.
Tweak a pattern so that it is better suited for your solution. Don’t apply it blindly.
Many patterns are connected to each other.
Use patterns to communicate ideas.
Structure of Patterns. … Be careful not using examples. They are not macro to be used.
Know that these patterns are not all for EA.
Have fun applying these patterns.
The most common pattern. Layers are opaque: layer 3 does not know about layer 1. Layer 1 does not know about layer 2. A lower layer does not know anything about a higher layer. (Can be not so, see below, in three principal layers)
Client-server — two layered system.
All layers can run on one machine.
Presentation layer — handles interactions between user and the software. Responsible for handle input and interpret it for domain and data source.
Data source — any kind of other software. This can be DBs, transaction services.
Domain logic — a particular logic about problem the software solve. Can be calculations, validations.
Always try to separate into three layers. From subroutine level to packages level.
This can be thought as a core surrounded by interfaces. Data source is much similar to Presentation. But Fowler divides this two.
Domain and data source should never be dependant on Presentation.
In case of not understanding where domain logic is. Try to introduce another package, command-line to Web in that case what would be duplicated will point to domain logic. The similar for replacing database with xml.
Example of the mixing a domain logic in presentation. Colouring a product to a red colour if it was increase in sales by 10%. This was put to the interface.
(This is obsolete? Now it mostly Web and Mobile.)
Three patterns here: transaction script, table module, domain model.
Transaction Script is procedure for each business action. Can call subroutines. Useful for simple applications. For more complicated Domain Model to be used.
Try to use Domain Model everywhere if you’re used to it.
Table Module is in a middle between Domain model and Transaction Script. It organizes its logic around Record Set, a table in relational database. It’s getting complicated to handle business logic in this case. But it easily works with GUI (MS .NET).
Use Domail Model for complex logic. Use Transaction Script (or Table Module) for simple not evolving logic.
Ask expert for advice what to use in my problem.
Consider using Table Module if a lot of tools support Record Set.
Consider using a mix of patterns for domain logic.
It is like API for application. It is for Domain Model and Table Module. Use it for security and transactions (rollback, commit, etc). It can be as simple as a Facade or as complex as almost all business logic there with Domain Model being simple.
Some controller-entity approach(?) as a middle between those extremes.
Consider not to use a separate architectural Service Layer and introduce it if the app needs it though there are exceptions.
Make choise of mapping as early as possible as it is hard to change later. (Seems not the case now with powerful OR/M tools.)
Separate DB access means (SQL) from Domain Logic. It is better to use application development language than SQL: 1. many don’t know it; 2. there are specifics for each implementation of SQL. For DB level it is better to use SQL so that DB administrators can tweak it.
- Gateway. An object having all code for database interaction (SQL).
- Row Data Gateway. A class per row in table. For simple Domain Model.
- Table Data Gateway. To be used with Record Set pattern. Class per whole table in db. Could be used to wrap Stored Procedures. Can be with Domain Model.
- Active Record. A simple pattern. Have domain logic and knows how to persist itself into DB.
- Data Mapper. A separate object to map to DB a Domain Model so that the Domain Model can be tested separately and evolve separately. Knows how to map. Handles complex mapping scenarios. For complex solutions.
Don’t mess to much those patterns.
Object Oriented Databases are risk for business project at the time they are not well developed.
Consider buying O/R tool for Domain Model as the developing a new one is complex task.
Consider not to use O/R tool for simple solutions.
It is how you load and save objects to database. First sub-problem is that some objects might need ids of newly created objects and that we need to keep track of what was modified to persist it (tracking of modified problem). Another is concurrency problem. Object read should be locked so that other processes don’t modify it.
Unit of Work. It solves the two problems above. It controls mapping to DB (controller).
Identity Map. A way to keep loaded rows from DB so that one row is not loaded twice. Also can cache them (second aim).
Lazy Load aims to load related objects only when process will need them. In that way all objects graph is not loaded from DB.
About a way of querying database. That is about complex methods to query database or “finders”.
Performance problems. Never do multiple queries to get a set of rows from one table.
(and other advice)
Problems. In OO relations — references (managed or not). In rel. DB they’re foreign keys. The representation problem.
Also in OO it is possible to have multiple references in one field (array), it rel. DB — not (it can’t have multivalued field). Reversal problem.
For solution the following used.
Identity Map — a lookup table from ID to objects. Keeping ID of row from DB in objects in OO.
Foreign Key Mapping. To get referenced objects. The same principle as Identity Map but getting referenced object.
_Association Table Mapping_— to handle many-to-many relationships.
Consider not to use ordered collections (lists, arrays) for mapping as on loading from DB it needs ordering, the same for persistence.
Map small objects (money, date) or Value Objects are better mapped into Embedded Values or one or many fields in parent table (the table that owns the value).
Consider to use Serialized LOB (Large Object) for storing highly interconnected objects so that many roundtrips to database and back are not done.
Three patterns here: Single Table Inheritance (all classes in one table; bad in case of many empty columns), Concrete Table Inheritance (a table per concrete class; bad in case of base table modification), Class Table Inheritance (the simplest, but bad in performance as it requires many joins).
Begin with Single Table Inheritance. Mix if needed.
It is possible to use the three patterns for multiple inheritance.
The simplest way: I design DB scheme and simple domain. Then choose Transaction Script or Table Module.
In case of Domain Model be sure not to design it in the way DB scheme designed. The DB scheme — only for persistence. If DB scheme is near Domain Model consider to use Active Record.
Iterate DB scheme with domain (not more than six weeks for being late [?]). So that performance issues can be handled as earlier as possible.
The same advice. In simple case choose Row Data Gateway or Table Gateway. In complex case build up Domain Model and then map to DB.
(Advice for the case when there two or more data sources (DB and messages [?]). Gateway can help.)
About using Metadata Mapping, the reference file where all mapping kept.
From that you can define the read and write code, automatically generate ad hoc joins, do all of the SQL, enforce the multiplicity of the relationship, and even do fancy things like computing write orders under the presence of referential integrity. This is why commercial O/R mapping tools tend to use metadata.)
Query Object (LINQ) can build SQL basing on Metadata Mapping. No need to know SQL.
Repository completely hides DB by using Query Object.
(Introduction to database connection object.)
Use connection pool if available.
Close = returning to pool. Open = getting from pool.
Close connection as soon as you are done.
Common practice: have a connection object for a db commands. Pass it to the
command and then close it.
To pass connection use Registry, thread scoped, thread safe. So that it is
not messed up with parameters.
To close connection. First option is to close explicitly but we need to
remember it(in c# using solves this?). Second — to close
using garbage collection but they remain they live for some time.
Good choice is to tie transaction to connection.
Pooling should allow reading immutable data from within opened connection
using another connection.
Remember the problem with concurrency when data is read and written in
separate transactions (concurrency part in the book). (ALWAYS a problem with
select *: there are some problems with indices (didn’t understand).
(Something about test reading/writing to handle indices problem.)
Good chose is not to use dynamic SQL (string concatenation, etc.).
Do batch querying if allowed.
(Seems mostly outdated. History and reasons behind web interface usage.)
URL is passed to a web server which handles processing to a program. Two main forms of structuring the program are 1) script; 2) server page. The response is a string and needs to be written to some stream, but that is difficult. So for response server page is better: ASP, JSP, PHP.
(Reasons behind dividing processing of request and respond to script and server page respectfully. This is connected to MVC pattern.)
Model View Controller. (Description of MVC work.) (It passes processing back to Web Server after Controller is done? Why?) Main reason behind MVC is to separate Model from View: ability to add views and change model.
The other controller is Application Controller which directs what screens should appear after some others. It mediates between presentation and model layer. Can be reused across different presentations. Consider to use it if the machine should be in control of screens flow not a user (money withdraw?).
(No reasons behind the three patterns so far.)
Main three patterns here: Transform View, Template View, Two Step View.
Choose between Transform View or Template View then choose for chosen if Two Step View is applied.
( Transform View transforms view into HTML, it handles domain elements one by one forming html?)
(Template View inputs into a template (html with special placeholders) the domain and creates the HTML.)
Two Step View — …
Template View like ASP, JSP, PHP are used to put whole programming statements into the view. Consider extracting those statements into helper classes (ME: ASP.NET — code behind?) so that it does not mess a view.
Transform View. Like XSLT can be used for transformation of XML.
One step view — one view for one screen. Two Step View introduces the second stage to produce view, it uses a logic view object; this is useful for screens that share similar design.
(ASP.NET MVC — Tempalte View + Two Step View. Umbraco uses Transform View.)
Two patterns for input controller: Page Controller and Front Controller.
Page controller — one controller per page (asp.net webforms?). It can be server page as well as controller, that is it can form the output or this can be divided. In some time we can find that Page controller goes for every action on the page (button click or smth) and serves out not a single page but many pages.
Front controller continues separation of handling and processing of requests. In this case there is only one controller that divides who would be responsible for any request.
(What’s difference between Front controller and Application controller?)
This problem is raised whenever two calculation processes (OS processes or threads) share one data.
Concurrency is hard problem to solve as there are many unthought situations, it is hardly can be automatically tested.
Many enterprise application solves the problem using transactions. Transacitons don’t solve everything. When there is data that spread across transactions the offline concurrency problem appears.
Application server concurrency is problem with dealing with multiple threads with the same data on the server when scaling up or out an application.
This chapter introduces to issues of the concurrency on the examples of source code control systems.
Lost updates - when an update is lost because of overriding by an update longer it time (started before the second and finished after). Read1 - Read2 - Change2 - Change1(overrides Change2).
Inconsistent read - reading of parts of a system while someone modifies them. Read by 1 of A - Change by 2 of A and B - Read by 1 of B (1 has inconsistent data - A).
Trade-offs: concurrency (liveliness) or correctness. These problem breaks correctness of the system because two users work with the same data at the same time (concurrency). We can solve it by denying concurrency and thus increasing correctness but that reduces liveliness.
Consider if we can tolerate the concurrency issues, thus no new problems will arise.
The most essential ones are Request and Session.
Client sends request that server processes and sends back a response. Client can send other requests while server processes the first one.
Session is long running series of requests: log in, queries, business transaction, .., log out.
Thread - execution in process that shares (or not in isolated threads) a data, it is light. Process - is heavy and have isolation and big execution context.
Thoughts on handling requests. It should be one process per one request but that requires many resources. It tends to associate one process with one request at a time but not kill process after the request is done. (In ASP.NET one thread per one request.)
Transaction - requests that should be treated as one request (see below). System transaction for application needs (example from app to DB). Business transaction for user’s needs; from user to application.
Two solutions: isolation and immutability.
Isolation - isolate a thing from write by others. E.g. files locks, or memory for a process in OS.
Don’t use methods that spread concurrency problems. Make isolation areas and do as much programming there as possible.
Identify immutable data and share it widely so concurrency methods don’t take processing time or produce errors.
Pessimistic concurrency is when shared resource is locked by the first one who reads it. In optimistic — the changes can’t be applied to already modified resource (like in GIT on push it needs to be merged).
The locks (optimistic and pessimistic) is about conflict prevention. For pessimistic — it is on read, for optimistic — it is on write.
Choosing between optimistic and pessimistic lock: consider how many conflicts (if small number then optimistic), how important a data (if not important and it can be lost then optimistic as automerging is almost always impossible).
Deadlock — is when each of two processes needs to edit data that was locked by each another process (can be for more processes).
It is a bounded set of actions (withdraw cash from ATM, buy a beer, etc.) It starts and ends with consistent resources. It should be based on all-or-nothing basis.
Properties of a transaction — ACID: atomicity (it does all or nothing); consistency (it leaves resources consistent at the end and starts with with c. state); isolation (results of it is visible to other transactions only after committing); durability (committed transaction must survive any crash).
Transactional Resources or “database”
This is any resource that uses transactions to control concurrency like printer, ATM, message queue. It can be called as “database”.
Try not to create long transactions, i.e. the one that spawns multiple requests. To get it use request transaction, ie a transaction(s) per a request.
Try late transaction which helps in case of long transactions but that does not help against inconsistent reads.
Avoid some global locking as in case of some “object” table in DB.
From most isolated to the least:
Serializable transaction — the most strongest isolation; transactions should be put into series or can be put. Guarantees correct answer. A transaction is put before or after another transaction so the answer can be different from time to time but it insures it is correct. Current transaction holds locks on all the range affected (if it once queries a table it holds a insert lock on all the table; hence no phantoms).
Repeatable read — phantoms (info that can be distinguished from correct or not) error. Current transaction holds read-lock on its rows after read (so another tran. can insert a new one hence phantoms).
Read committed — unrepeatable read phenomena. Current transaction releases read-lock after the read.
Read uncommitted — dirty info from uncommitted transactions. No read or write locks by current trans.; it reads all.
System transaction (ST) — similar to DB transaction.
Business transaction (BT) — meaningful to a user, e.g. in a bank, user opens a bill, checks it, pays it or can avoid paying.
BT usually bigger than ST. Long transaction in case of BT=ST is not handy as leads to bottlenecks in DB. So BT is broken into some number of ST. Hence a problem with ACID of BT happens. That is called offline concurrency.
Atomicity and Durability of BT can be easily achieved by committing on the end. Consistency is not as Isolation sometime is broken. Thus problems as Inconsistent Reads and invalid updates appear.
Usually multiple BT exists in a user session. Can be many sessions per a BT but that is not advisable.
Strongly advisable to match BT to ST where possible with a victim of scalability. The patterns are for cases where it can’t be done.
Optimistic Offline Lock — lock on commit
Pessimistic Offline Lock — lock on read (harder to implement (?))
Also there are patterns to manage locks: Coarse-Grained Locks, Implicit Lock
(Not important now as there are good thread-per-request frameworks?)
About handling concurrency on server for user requests, not business transactions world.
“Avoid the need for explicit handling of synchronization and locks as much as possible.”
- process-per-request; too many resources; concrete isolation;
- (AK: middle way. thread-per-request with isolation between threads (asp.net, nodejs, etc?)
- thread-per-request; no isolation between threads?
Author advises to go process-per-request with not experienced team. (AK: Valid now? Asp.net? etc.)
Create objects for each request. Improves scalability (asp.net). Avoid global objects. Avoid Singletons. Use Registry.
STATE — a data that not yet ready to be committed to DB (persisted).
STATELESS SERVER — state is not stored between requests.
Stateless server is useful as it does not require many resources. (And more talk on this and other.)
Stateless server is to be used with many idle users with transactions required state.
State is required as many transactions are with state.
Session state — something not persisted; one state for one business transaction; state is not shared to other transactions.
Session state can sometime be inconsistent; invalid. Compare with transaction which is consistent. State should be consistent at the end.
Note there is difference between cached data and a state (e.g. policy — state and zip code of a person is not).
** Ways to Store Session State **
- Client Session State (CSS) — objects are mapped to some storage on client (url, cookie, smth else).
- Server Session State (SSS) — objects are kept in memory, serialized to db, to file system.
- Database Session State (DSS) — objects are mapped to tables, fields (much like usual persistence).
- not stored on server;
- if large it takes time to transmit;
- transmits data not used for presentation;
- security vulnerable.
- state data needs to be isolated from record data.
- needed to be moved if some clustering used.
SERVER AFFINITY — only one machine used to handle one session state in Server Sessions State pattern.
DSS can be used with many idle clients and little state (retail system). SSS can be used for large state and not many users (insurance).
CSS can be used with many users not saying when they are cancelling the session (B2C).
BSS survives connection and server crash. CSS could survive. SSS can not survive.
Development effort: easiest — SSS; CSS — can be hardest; DSS — in the middle.
Fowler prefers SSS and sometime CSS. He doesn’t like DSS.
Distribution of the Objects. Pitfalls of it.
When objects are put inside different nodes (processes or machines) in order to get scalability and performance. But it fails as author says. DISTRIBUTION OF OBJECTS is a way to make them transparent and allow to communicate with each other.
The pitfall of the distribution is that it makes objects to be harder accessed. As there is a fact that LOCAL INTERFACE (procedure call in process) is easily called compared to REMOTE INTERFACE (procedure on another machine, e.g.).
COARSE-GRAINED interfaces are for remote. FINE-GRAINED interfaces for local. As for remote you aim for making many in one call.
- many network calls;
- hard to maintain.
Don’t distribute objects. Make CLUSTERING (several copies of the same application).
On making clustering and eliminating as much distribution as possible. Where distribution will be:
- Client-server divide;
- Database and application server;
- Web server and Application Server;
- Vendor differences;
- Possible spread of application server (Avoid! Sell your grandma not to do it)
You should still consider seriously three technical practices: continuous integration [Fowler CI], test driven development [Beck TDD], and refactoring [Fowler Refactoring].
This ch. repeats previous ones.
Think for yourself. Author doesn’t know your project, your problem.
You should still consider seriously three technical practices: continuous integration [Fowler CI], test driven development [Beck TDD], and refactoring [Fowler Refactoring].
Choose from the three: Table Model, Transaction Script, Domain Module.
Consider this factors:
- Complexity of domain logic.
- Difficulty of the connection with a DB.
Transaction script fits well for not complex logic. E.g. a catalogue with a shopping basket. It is intuitive.
Domain Model for complex and big solutions. To learn it well you should learn OOP. The second difficulty is a need for O/R mapping.
Table Module stands in the middle of the two. It maps to DB from the box.
Make separation for TS though it seems fit well to contain DB logic. Here choose from Row Data Gateway and Table Data Gateway (DataAdapter).
Use with Table Data Gateway and Record Set.
For simple — Active Record or if decoupling desired Table Data Gateway or Row Data Gateway.
For more complex — O/R Mapper. (Now, 15.08.2014 Fri 13:21, it is used widely? EF and other.)
Choose between Rich client and HTML browser. (Not vivid now, 15.08.2014 Fri 13:23).
For HTML browser. Choose MVC. Then choose patterns for controllers and views. Complex navigation and UI lead to Front Controller.
(Unclear about choosing between views.)
Consider running lower layers in the same process. If that is not possible, then wrap all in Data Transfer Objects and send to Remote Facade.
(Skipping some Java and .NET)
- Tied to vendor;
- Runs on another process and requires remote calls (except for Oracle).
- Faster run. Use for optimization.
Construct as if there is no web services. And use then as Remote Facades. Don’t use web services to make distributed objects design.
(Comparision to other layering.)
(see the book)
Architecture — a subjective thing to point to something important; that can be some division on parts or some decisions to be made early as it is hard to change them.
Business logic — many irrational things that change over time.
Capacity — maximum load a system can bear or getting to some maximum point.
Conceptual dissonance — different understanding of terms by many users.
Concurrent - происходящий одновременно, соперничающие.
Concurrent access — simultaneous access by two or more users to some data.
Efficiency — performance divided by hardware. E.g. tps / CPU number.
Enterprise application (EA) - …
Latency — a minimum time that is required to get any request (much depends on wire).
Layer — a part of a system that has a link to only one other layer.
Load sensitivity (degradation) — how quickly a system degrades under different loads.
Load — current status of stress a system under. E.g. 0.5 seconds for 10 users.
Mouthful — a word or a phrase that is quite hardly pronounced
Pattern — a solution to a repetitive problem, a description of this problem and solution.
Performance — is either response time or throughput.
Persistent data — data that persists for a long period of time. Probably longer than the soft that created it.
Relational database (RDB) —
Response time — how long it takes to process any request and send response back.
Responsiveness — how long it takes to acknowledge a request (less or equal to response time).
Scalability — how much an adding new resources affects throughput. Scale vertically (up) — adding resources to one server. Scale horizontally (out) — adding more servers.
Table (general meaning) — term for table/view/query/stored procedure (which encapsulates a query). A tabular data.
Throughput — how much stuff I can do per a time. E.g. bytes per second, transactions per second (consistent transactions) (TPS).
Tier — physical part of a system resembling a layer.