Splitting User Stories. Part 1: Incremental Development of Complex Algorithms.

By Alex Yakyma

Splitting user stories into smaller stories is a very important skill for absolutely every agile team. There are good general approaches to doing it in the following post by Bill Wake, and an overview of reasons why to split stories by Bob Hartman. Splitting stories is outlined as a part of a bigger discipline of creating quality user stories in chapter 6 of Agile Software Requirements book by Dean Leffingwell. But expressing it in one sentence we may say that agile team cannot be successful without this skill – high risks and high variability of the outcome of an iteration will always be there if stories are too big; on the other hand if they are so big that they don't even fit into an iteration and we can't do anything about it - then this is not incremental development of user value at all and we are back to the doom of spawning large amounts of uncontrollable work-in-progress that inevitably leads to mediocre results.

Current post is the first one in series of posts on splitting user stories. This is an attempt to provide deeper analysis on how to ensure fine granularity in delivery of user value. Below is the full list of areas that we will cover in this blog divided into “classes” of stories:

Splitting User Stories. Part 1: Incremental Development of Complex Algorithms.

Splitting User Stories. Part 2: Evolving Rich UI.

Splitting User Stories. Part 3: Incremental Architecture.

Splitting User Stories. Part 4: Simple Steps in Implementing Complex Scenarios.

Splitting User Stories. Part 5: Evolving APIs, Protocols, Interfaces.

Before we dive deep into our first class of user stories, it is worthy emphasizing the importance of this skill especially for large scale development:

Splitting user stories is a necessary skill of every agile team. But it becomes extremely critical on the large enterprise scale where only small increments of user value, made by each individual agile team, give a chance for continuous synchronization of the whole program.

This post describes 14 ways to split stories that include certain algorithmic complexity or require processing of large amounts of data.

We grouped our methods into two sub-groups: 1) splitting stories based on Processing type, and 2) Data structure. Every method includes a little note, whether splitting stories this way affects story functionality (usage scenarios) or system qualities (usually system performance) or both. Examples should help to better understand how each method works in practice. Every method is briefly referenced in the following format: First Cut – Second Cut.

Before we start considering these methods of splitting stories, it is important to stress that this is not an attempt to build an exhausting catalog but rather to provide a reader with few insights that could be a helpful starting point for teams in their real project environments.

Group 1: By processing type.

1. In-memory processing – Disk-based processing (Affects: functionality and system qualities).
This approach can be effectively applied when we need to process relatively large amount of data and this assumes saving some intermediate or final data to disk for further reference or processing. Instead of taking such a big story all at once, our first step would be only in-memory processing that would only perform the basic function. Example:

Story: As an accountant I can upload all quarter invoices so I can see totals for a quarter and use it for reporting purposes.

First cut: As an accountant I can upload all quarter invoices so I can see totals.

Second cut: As an accountant I can upload all quarter invoices so I can use it for reporting purposes in the future.

Obviously Cut 1 assumes that we simply calculate all necessary totals in-memory and show them to the user. In the meantime, Cut 2 assumes saving different kinds of data (both during and after the processing) to disc.

2. Disk-based processing – Cached processing (Affects: system qualities).
This type of splitting stories is somewhat reverse to #1 above but it is used in different circumstances. Typical use – look-up operation. First cut is to allow the system to query the data source every time the information is needed for further processing. Cut two will be the implementation of a caching mechanism that would keep the most frequently requested data items in RAM.

Usually following this method does not change any user scenarios, but system qualities, usually system performance.

3. One-by-one processing – Batch processing (Affects: system qualities).
This method is good when processing a large number of entities which in turn will be saved to disk or sent over some network protocol (or the processing of each item requires disk or network “read” operation at some point). In such a case we first provide a simple implementation that “saves each item to disc individually” as Cut one and Cut two will be implementing the batch processing model.
Cut 2 usually improves the performance and allows for performance tuning based on varying the batch size.

4. Standard algorithm – Custom algorithm (Affects: functionality and system qualities).
Let’s start with the example:
Story: As an online consumer I can see products ordered by relevance which reflects my personal preferences.

First step will be Cut 1: As an online consumer I can see products ordered by relevance based on different priorities of product attributes. This story assumes that we apply well-known lexicographical order to the product list. For example if the user orders the list by price most of the time and sometimes by editor rating, then for the initial page load the system will apply a two-dimensional lexicographical ordering where “price” will be primary and “editor rating” a secondary parameter.

And Cut 2: As an online consumer I can see products ordered by organic relevance which reflects my preferences. In this case we may want to research a more sophisticated way of ordering results based on more complex models than the lexicographical order.

Note that sometimes the difference between standard and custom algorithms does not impact the functionality but only the speed of processing or other system qualities.

5. Synchronous – Asynchronous (Affects: functionality and system qualities).
Example where functionality is affected:

Story: As an admin I can run data backup process.

First cut: As an admin I launch data backup process and wait until it completes to see the status. (Synchronous processing)

Second cut: As an admin I can launch data backup process asynchronously so I can switch to other tasks and get system notification when backup job is over.

The case when functionality remains unchanged appears when the “actor” is not a live user but a system or part of the system and then the asynchronous operations with input/output streams simply provide a faster and more scalable way of implementing same user scenarios.

6. Real-time processing – Pre-processed data (Affect: system qualities)
Few generic examples:

Example 1. The First cut in implementing a set of financial reports could be simply implementing them separately based on complex and time-consuming SQL queries (each report would build its data set in real time each time we run it). The Second cut would be building a shared OLAP Cube (preprocessed data structure and data set) and then running each report on top of it with incomparably higher speed of to data.

Example 2. The simplest way (Cut 1) to implement any kind of search would be to run thru whole set of elements (in real time) and compare them against the search criteria. Second cut would be building a search index (pre-processed data set).

7. Pre-processed data – Real-time processing (Affects: system qualities).
Sometimes to get some results we need to pre-process data and then only we have a chance to calculate something on top of that data. In the future though we re-architect the algorithms so that end-to-end real-time processing becomes possible.

8. Omitted steps – All steps (Affects: functionality and system qualities).
Story: As a third party application I can perform my query on least loaded server.

First cut: As a third party app I can perform my query. Here the part of the algorithm responsible for evaluating server loads and selecting the least loaded one is a stub – simple randomized selection is used instead.

Second cut: As a third party app I can be directed to the least loaded server.

9. Approximation – Exact processing (Affects: functionality and system qualities).
Some algorithms may use certain approximations and still deliver valuable results.


Story: As an admin I can see the exact number of DB calls during a day.

First cut: As an admin I can see the approximate number of DB calls based on number of users during a day. In this case we simply multiply the number of users for the day by the average number of DB queries per user.

Second cut: …the exact number… . In this case we will have to log all queries and calculate their exact amount.

Group 2: By data structure.

10. Flat data structure – Complex data structure. (Affects: functionality and system qualities)
Example: Instead of applying the lexicographical order in item 4 above for the attributes of products we simply assume that they are all of equal importance (flat structure) and take their sum as the overall relevance for Cut 1. In Cut 2 we will implement the sophisticated algorithm we need.

11. Less dimensions – More dimensions (Affects: functionality and system qualities).
Example: As an admin I can see a report of online consumers and their favorite items.

First cut: ...favorite items identified based on one main parameter.

Second cut: ...favorite items identified based on whole set of factors.

12. Single data type – Various data types (Affects: functionality and system qualities).

Story: As an admin I can upload an XLS report file into system DB.

First cut: As an admin I can upload XLS report and process all entries as plain text data.

Second cut: …text, numerical, currency, boolean data types.

ETL (extract, transform and load) is a big area where this method of splitting stories is highly applicable.

13. Independent data – Linked data (Affects: functionality).
Linked objects appear everywhere: HTML documents pointing to each other, expense reports from the same employee, TCP packages as parts of the same SOAP call, products that are often bought together etc etc.

Story: As a user I can see most relevant documents in the top.

First cut will be implementing relevancy algorithm based on individual properties of a document while Cut 2 will “boost” the relevance of the documents referenced by other documents.

14. Trimmed data set – All data (Affects: functionality and system qualities).
Sometimes relatively simple algorithms may work well on limited number of elements. This can be good start with still sufficient quality/depth of results. The next natural step then is to improve the algorithm and extend it to the whole set of entities.


Story: As an accountant I can see the trend curve of monthly expenses.

In First cut of this story we will build the trend based on a couple of randomly selected data sets while in the Second cut we will let the system analyze considerable amount of data.


  1. The most often question the is asked by the management is given me an efficiency gain of 25% over last year. If we need to calculate the capacity of the team is the last year, how do we do it. How do we calculate the efficiency of the team in the previous year? How do we estimate the efficiency gain of 25% over last year? How do we weigh the complexity of the user stories to calculate the capacity of the team. It is always said , your team is not working at its fullest capacity. How to over come.

  2. These are most important processing steps which you shared in the post. Reader can take best knowledge from this post about this process. Event iPhone App

  3. This comment has been removed by the author.

  4. Very nice and important points to be noted.


  5. Good job in presenting the correct article with the clear explanation.