Story Slicing Workshops

I remotely facilitated this story slicing workshop created by Henrik Kniberg and Alistair Cockburn for two of the teams in my unit recently. They named it "Elephant Carpaccio" to give people the mental image of breaking down a big feature into very thin, vertical slices. Joep Schuurkes brought it to my attention when he facilitated the workshop a couple times in 2021, leaving behind not only these very helpful blog posts about how to run it but also the Miro board he used to do so.

My elephant encounter

The Setup

The 2.5 hour workshop as Joep ran it included three conversations:

  • a conversation about why we might split stories
  • a description of today's feature we'll be building
  • a short brainstorm about what would be a good (small enough) first slice of this feature

These three parts would fill the first 45 minutes. The rest of the workshop would be smaller groups tackling each of these tasks in bigger chunks of time:

  • breaking down the problem into between 10 and 20 stories
  • actually building the first few stories as sliced
  • a reflection and debrief on the whole workshop

The First Group

In my first running of the workshop, I was able to see a few things I didn't expect in the story breakdown:

  1. American state abbreviations: The problem lists different values of sales tax for AL, TX, and three other abbreviations Americans would typically know. Participants wanted to talk about the states using their names, but didn't know which abbreviation belonged to which state. I filled them in on the table I created.
  2. Sales tax vs. VAT: Another American vs. European participant thing! The answer to "how much does the thing cost?" will be different if you're adding the tax to the price, or assuming it will be included in the total. This wasn't important to the solving of the problem, so I let this difference persist.
  3. First things first: the calculating of the price was laid out very clearly as the first problem to solve. One persistent participant really had their heart set on data input validations and a particular user interface. It took several tries from their teammates and ultimately my nudging to encourage them to think about the problem from the most important pieces first.

I switched the instruction to take a demo/screenshot every 8 minutes, and instead asked groups to make one after each slice. This helped raise awareness for the difference between how they were breaking down the work, and how they were actually picking up the work. They might get through two or three slices in one go, only to have to go back and demo each scenario individually.

A few insights came through more clearly in the debrief than during the exercise:

  • It was easier to see progress and celebrate it when the work was broken down into smaller pieces.
  • The team understood the concept a lot better now, but there was still a big delta between how small they could slice a story, and what would make sense when the burden of code review and testing would typically take a few days.
  • A thinly sliced story is not the same as an MVP.

The Second Group

My second group benefitted from slightly clearer instructions. But they had all three of the insights the first group uncovered even before they started slicing or building any stories. They got hung up on other intricacies of the problem:

  1. Money: The American dollars in the problem statement used a . to separate the integer part of a money calcuation from its decimal component. Many of the participants were used to using , for that purpose, so they needed to consciously second-guess every calculation output.
  2. Saving: Both teams were using IDEs they weren't used to, which they hadn't set up for auto-saving. Most often an unexpected result prompted the question "Did we forget to save?"
  3. Slices as defined: They got the idea of how to slice the stories. But when they got to building the solution, they had to have an "Are we really going to do it like this?" conversation.

Just breaking down the stories wasn't where the learning happened for this team. It was the building that crystallized the gap between being able to define small pieces of work, and what it's like to actually build like that.

They got farther in the building than my first group had, so they were able to see how adding one feature impacted the already existing ones. The big insight from this group's debrief was how one small addition affects all the existing features, and the kind of time needed to do it right and test it thoroughly.

I'd love to run this workshop in other settings, but having a shared programming language and an ability to work in a strong-style pair are too big a barrier to entry to submit this as a workshop to a conference.

How have you gotten strangers started working together on code? How have you taught (or been taught) about slicing user stories into smaller, vertical slices?

What Would You Say You Do Here?

For the first time in years, I was with a group of people who:

  1. weren't at a conference
  2. didn't understand what my work looks like, and more excitingly
  3. were interested in hearing about it!

I hadn't been asked "what kind of work do you do?" since my role changed a year ago, so I wasn't prepared with a thorough-enough short answer for the person asking. They'd founded a digital agency a few months earlier and were used to farming out work to developers. They didn't quite understand where a "tester" might fit into their process.

What I said in the moment

"I make sure that stuff works, and the thing that was built is what you wanted. Sometimes that means I'm looking at the website, sometimes that means I'm writing code to verify stuff."

I got scolded later for giving such a simplistic view of my skill set, the industry, and particularly the depth and breadth of my current role.

What I could have said

"Everybody needs an editor. Testers help improve things in the product and process. They're there to collect information, sift through what's relevant, and advocate for what's important. This can look like making sure we agree how big a chunk we're biting off, gathering standards and expectations to compare them to what's been built, or troubleshooting issues customers are having."

"I've been doing this well enough for long enough that I'm not just doing this for one team, I'm doing this for my whole department, all seven teams. I'm in a position to see obstacles coming farther down the road, and I have the skills to pivot more quickly when unexpected hurdles catch us by surprise. I ask curious questions to understand what might be missing, and help eliminate the work that's distracting us from our focus."

It's close to a structure I'd want for this kind of explanation, with three layers of information, in case the person asking lost interest or we got interrupted.

How would you describe your role? Do you spend more time with examples from your day-to-day, or do you find that people outside the industry connect more to what you're saying when you keep it abstract?

Friends of Good Software - September 2022

The Friends of Good Software (FroGS Conf) had its sixth edition on Thursday, 8 September. I was eager to have it on a Thursday myself to have a real weekend, and include people who could never attend on a Saturday, but it had me a bit nervous about the number of attendees. Luckily we held strong by my measures of success: active participants, a full screen of faces, enough concurrent sessions that people were jockeying for different spots, and a retro full of requests to do it again. We will. :)

I ran the morning marketplace to determine our sessions. And I had the energy to attend a session in each of our five slots.

Heather Reid - Logs, usage stats, and how you use them

A FroGS newcomer but not at all stranger Heather Reid proposed a session building on her recent blog post about what the "highly-requested" part of "highly-requested feature" really means. She wanted to hear how other product teams were approaching data-driven decision-making.

Along with a list of tools, we identified the hardest part about using data: the shift in mindset required. It's a journey to go from decisions based on your experience to decisions based on your customers' experiences. Identifying the baseline and framing things as small experiments can help get the ball rolling.

Sanne Visser - Continued improvements to my planning system

I'm always interested to hear how Sanne's very elaborate planning system grows and compares to the simpler set of annual directions and weekly tasks I have.

Sanne uploaded some photos of her physical notebooks, gave examples of what she's striving towards, and gave a us a realistic look into what pieces and goals fall by the wayside when shit hits the fan. Sanne was kinder to Samuel Nitsche than he was to himself as he confessed he had no planning system at all. "What gets done now without a system?" she asked. Sanne's looking to shift her thinking from goals to habits in the coming months, focusing on outcomes rather than outputs.

Career trajectory - two separate sessions

I will refrain from publishing the personal details of my two fellow testers who are both at different "what do I want from my job?" points in their careers. I'm delighted to see that along with ensemble programming and database migration questions, FroGS has become a good place to support and advice during a time of reflection and contemplation. These topics bubbled up in both sessions:

You create your own luck

Our careers may seem like 98% luck. But continuing to connect our skills and values to the work we do puts us in more and better situations to keep building our careers. We can value the individuals on a team more than the mission of a company, or vice versa. Finding what we want more of is a years-long and perhaps continuous process.

You are more than your career

There is more to life than software, duh! We can make tradeoffs in our work to support the way we want to live our lives. That may mean rejecting the "hustle culture" of wanting that next promotion, and recognizing that staying as an individual contributor, getting satisfaction (or money) from a side-gig, or cutting down on working hours to pursue a passion.

Sanne Visser - Choosing not to access the system under test

After an intriguing experience report at LLEWT a few months ago, I was curious for an update from Sanne. I did not literally say "You are putting yourself in incredible pain" but someone did.

Sanne made an intentional choice to not be the person fixing things, to solve underlying problem instead, by not getting the required training or hardware necessary to perform testing herself. "I have been so frustrated with myself very, very frequently," she confessed. Some data setup would have been so much easier if she could just access the system. But Sanne's colleague who joined the session agreed it would have been too much on top of what Sanne's already responsible for.

Sanne's taken on shortening cycle time as her main goal. From the "multi-headed dragon" of problems she notices, she's been getting the team to vote on which experiments to try. Her improvements to stability, predictability, and planning have already made an impact. In the end, she gave herself permission to identify exit criteria for this experiment.

Thanks to everyone who attended FroGS Conf, and especially everyone who took notes. It allows participants who joined a different session to still share in some of the takeaways. And as noted by Heather, it helps people in the session who've missed a particular word or point, and builds the feeling of a collaborative community working together.

Based on the retro and who was able to join us, it appears that some people can only do weekends and some can only do weekdays. It's likely that we'll implement Sanne's suggestion of switching off between those options. Starting an hour later (10am instead of 9am Central European Time) worked better for us in the middle of Europe, and much better for our friends in the UK, Ireland, and Portugal. We'll keep that innovation.

Shoutout to the NS for going on strike and ruining the day we wanted to get together for drinks! And apologies to our co-organizer Cirilo Wortel for scheduling this event on a weekday during what turned out to be your busy time leading up to a big release. The rest of the organizers (Sanne Visser, Huib Schoots, Joep Schuurkes and myself) have our own retro later this week. We'll see how we can incorporate the rest of the feedback from the retro into our next editions.


Sometime in 2019, this article listing all the alternatives to Google products came to me over the wires. Ok it was probably Twitter. In a world of increasing surveillance, data mining, targeted advertising, and cookie pop-ups, I made it my mission to get off of Google products completely. Here's what I was able to do, why I went with the alternatives I did, and which Google products I'm still stuck on three years later.

What I was able to switch

Search Engine

This was the most straightforward one to switch. I'd already seen several colleagues turn to Duck Duck Go. I went through each of my browsers on my work machine, personal machine, and iPhone to point there instead.

In the first few weeks, if a search didn't return the perfect result in the first three listings, I'd find myself turning back to Now I only end up on Google when I'm literally looking at a page that gives me zero results and I want to make sure the whole internet has nothing on the topic.


This too was a straightforward one for me. The hosting service I use for, Dreamhost, came recommended by my friend Sahar Baharloo and already included email as part of the services for my website. I'd set up in 2011, but finally started switching my logins for my accounts to be connected to it. Switching all my accounts (and discovering which truly could not be switched) was the biggest part of this whole endeavor.

A couple friends recommended the privacy and security of ProtonMail. The email addresses I wanted had already been claimed, and their calendar feature wasn't available at the time. If you're looking to switch, try Proton first.

Web browser

I'd set my default to Firefox at work already. When everybody else uses Chrome, you catch more bugs in Firefox.

The article made me discover Brave browser, which I started using for personal stuff. I settled on Brave for my desktop machines and the Duck Duck Go browser for iOS. I'm not entirely sure why I chose those; I think it was just a successful first experiment.


I was able to switch to Authy for almost everything. I believe I chose it because it was listed first alphabetically in the article. The LastPass account I use at work won't let me use anything except Google Authenticator for reasons I cannot explain, so I do still have that app on my phone with that one account.

File hosting

I got everything I'd created off of Google Drive: deleting most of it, moving some of it to my personal machine, and put a few precious things into Dropbox and Dropmark where I already had accounts.


Using my work (Outlook) calendar for during-the-weekday events and my personal physical calendar notebook worked great before the pandemic. Now I've got a mix of in-person things and video calls as part of my personal schedule.

All of the suggested digital calendar alternatives cost money and came tied to an email address, which I didn't want to switch again. I end up using a combination of archived emails for video chat links and writing on paper when the appointment is. It doesn't feel "optimized" or "automatic" in any way, but the physical act of having to flip the pages in my notebook and write the event down helps me not to forget it.


I don't have the need to host video myself. I switched the video links on my website to point to instead of for my conference talks. I do find myself still using YouTube for exercise videos (thanks pandemic!) or when someone shares something on Twitter.


This is one of the ones I was most excited to discover. DeepL and Linguee, built by the same company, are for full-text translations and single-word dictionary lookups respectively. The quality of the translations is SO MUCH BETTER than Google Translate. The thought of no longer sending the sensitive information I receive in Dutch (tax letters, doctor results, immigration exams) to Google either through my Gmail or Google Translate feels great.

DeepL on desktop has a keyboard shortcut integration, so hitting Command + C twice (instead of the once you'd use for copying) opens the application and pastes what you've selected into the translator. Looking up individual words in Linguee gives you Wikipedia examples where the word is used in a sentence, so you can also see if it's part of a colloquial phrase or which preposition it's used with. Thanks to my friend Marine Boudeau for originally pointing me to these.


I added Clicky analytics to my website. This is another one I tried and stuck with because it was first in the list. I don't pay them, so the data's forgotten after 30 days and I have to login every couple of months to keep the account alive. I try not to think about how unpopular my website is honestly, but when something blows up on Twitter, I like being able to see all the different countries my website visitors are coming from.


I had been using Google Fonts on my website. Switching to Font Squirrel required choosing new fonts to use and hosting the fonts myself (a.k.a. putting them in a folder and using relative links instead of absolute ones). This was probably the most trivial thing to switch over.

What I wasn't using in the first place

I wasn't downloading an video games from Google Play, using the Android OS for my phone, instant messanging with GChat/Hangouts, or using Google Domains for hosting. Nor shall I be!

What I haven't switched

There are a few things I haven't switched, either because it's too much trouble when trying to live in a society with other people, or because I haven't given the alternatives a fair shake.


This is the big one. I spent a few weeks trying HereWeGo as an alternative to Google Maps. It was so bad that I decided to use it instead as an exploratory testing exercise. I need bike directions combined with landmarks, and I haven't found another map that combines them as well as Google does. Please tweet me what you're using instead if you've gotten used to something else. I'd be very interested to try again.

Docs, Sheets, Forms

People will read a document you send as a Dropbox Paper document, as an attachment, or in some other uncollaborative format. But convincing someone they need to set up an account at a different service just because you don't want to use Google is a step beyond what societal conventions will allow at this point.

Typeform makes more beautiful forms, but viewing the responses still puts you back in a Google Sheet. Some submissions will only accept Google Docs. These are not fights I can win, so I've stopped fighting.

Particular calendar features

If I want to schedule a call with my friends, where they can edit the invitation, I'm stuck sending a Google Calendar invitation from my Gmail. Accepting a Google Calendar invitation sent to in the browser where I'm logged in to my Gmail gives me a 400 error. A calendar notebook plus saved emails has worked surprisingly well for relatively low volume of personal appointments I have.


Google Play is where, with my American credit card, I can rent movies that aren't available on Netflix. It feels better than giving money to Amazon, but I also haven't looked that hard for other options of how to rent individual titles without a subscription.

That's my Google situation.

That doesn't absolve me of all the other corporations stalking me and ruining the world. I've quit Facebook but not Instagram. I've limited my Amazon purchases to Christmas gifts to family members I couldn't find another way to ship, and moved off Goodreads to the vastly superior recommendations and statistics of The StoryGraph. But the websites I get paid to work on are hosted through AWS. I'm still tied into Apple for hardware, Photos, my desktop email client, Preview, and Keynote. I'm mooching of a shared Netflix account until Netflix finally puts the kibosh on that. I've had to lower my expectations for my ability to escape these companies and remain an online professional.

What I can do is afford to pay for email and web hosting. If my translation and analytics services stopped having free options, I'd likely pay for those too. Something I didn't expect in moving off Google products: it feels good to pay their competitors so they can survive. Every little bit helps.

From API Challenges to a Playwright Cookbook

Soon after Maaret Pyhäjärvi and Alex Schladebeck began their endeavor to practice testing APIs using the API Challenges from Alan Richardson (aka The Evil Tester), they looped me into their periodic practice sessions. Why? To make Past Elizabeth jealous, presumably.

API Testing Challenges

We gathered for an hour every few weeks to work through the challenges. The tools we were using (pytest, the Python requests library, and PyCharm) were like home for Maaret and me. I'd been writing in a framework with these tools for my job for a few years already.

I wasn't the only one. These tools were free to use and available for a number of years already. What the three of us combined couldn't figure out by trial-and-error, reading the error message, reading the darn description of what we were supposed to do again, or relying on patterns from previous exercises, we were able to Google. With one notable exception of course, as we are testers after all:

It may not seem like you'd need three people to do the work that one person could do. But I assure you, having extra pairs of eyes to catch a typo, remember whether we were expecting it to pass or fail this time, see immediately that it's a whitespace issue making PyCharm angry, crack a joke, or help decide whether to keep going in the same direction makes the work go more smoothly.

More than once, we'd end a session a few minutes early because we were stuck and lost, only to come back a couple weeks later with fresh eyes, able to understand where we were stuck and what to do about it. After several months meeting infrequently, we got through all of the API Testing Challenges!

Then we were what? We like learning together, but we'd achieved our goal.

Starting out with Playwright

After a bit of brainstorming, we landed on a skill Alex and I were both still building: UI automation. Naturally, Maaret was way ahead of us, and pointed us towards Playwright framework and a practice site from Thomas Sundberg of all the greatest hits: radio buttons, drop-downs, alerts, you name it.

Our experience with UIs, DOMs, automation, Selenium, exploration helped us, but didn't prevent every pickle we got ourselves into with Playwright. Though their documentation will tell you a lot of what you need to know (if you've correctly selected Python instead of Java or Node.js at the top), our desperation kept exceeding our patience. We escalated to the Playwright champion Andrew Knight and the Playwright community Slack channel.

Several times, it wasn't only the code that needed changing, but our perception of how Playwright wanted to interact with the website. These are a few I remember:

  1. an API response from a browser context can't be collected from a page context
  2. setting different contexts for a page and an alert on that page
  3. having that alert knowledge not help us when we also had to fill in a prompt
  4. expecting something in the DOM to tell us when an item in drop-down was checked

For the first three, wrapping our heads around a different way of thinking got us through the problem. For the last on, we lowered our expectations about what we could check. (Pun intended.)

Playwright Cookbook

We've tested what we can and should test on our first practice site. In upgrading to a more challenging one, we realized that we'd benefit from the knowledge our past selves gained. And that you could too.

We've published our progress on github as the Playwright Cookbook. It's a Python repository of what we found that worked for different UI situations. It's one step beyond the Python documentation on the Playwright website, it lets you compare an actual page to a test where we were able to select the element.

Fun was had by all

Trying to quickly get something done with a new UI automation tool had been my white whale, something I knew was annoying enough that I wouldn't know how to get unstuck. Working in an ensemble meant either (1) the knowledge we needed was in the room and just had to be shared, or (2) two brilliant, successful ladies known for their testing prowess also didn't have a clue what was happening. Either way, it made things better and achievable.

I am notoriously opposed to fun. But this has been fun.

What's next

What is next for us? We know we want to:

Have we reflected on what's valuable and not valuable to test on an API? Will we share more about this beyond this blog post? A conference talk or workshop? A Twitch stream?? Only time will tell. For now, enjoy the github repo. :)

Amateur Professional Career Coach

People come to me for career advice. It's been everybody -- colleagues, former colleagues, testers from the community, friends I know outside of software, younger family members -- everybody. I am not completely sure why. I have a job I enjoy, but I'm not a professional coach. I seem to be an amateur coach of professionals.

I do have (and am happy to share) strong opinions about what people should do when they describe particular situations they're in. I can immediately tell them what I would do. But figuring out what they should do is much more useful. So in response to tough questions, I ask them tough questions back.

Why is this so hard?

People come to me with tricky situations with their current roles. For some, venting is enough. But I might ask someone who seems exhausted, sick of it, checked-out, or seems stuck for too long:

  • How much longer could you let things stay the way they are? Another six weeks? Another six months?
  • It sounds like you're having {X} trouble with {Y} person. Have you told them this directly?
  • I don't have the skills to mentor you in {Z}. Do you know someone who does? Or where could you find someone like that?
  • Would aligning expectations help?
  • Is saying "no" an option here? (I've been called the "No Coach" for this.)

What should I do next?

I don't know what you should do next, or even if you should change what you're doing now! But here's what I will ask you about so you can decide:

  • What do you like about what you do now?
  • What parts of your current job do you want to stay the same?
  • What do you avoid or dread? What keeps you in bed in the morning?
  • What do people come to you for help with?
  • What is there no hope of changing in your current situation?

I know Esther Derby gave a webinar in March of 2021 describing the tipping point between whether you can reconcile your needs and values with your employer, or whether you should leave. But unforunately both the webinar and her name for this zone is lost to me.

What do you think of my CV?

I've written both for the Minsitry of Testing and on my own blog about resumes and how they relate to the interview. TL;DR: Tell me about the impact of what you've done, and give me some indication of how fluent vs. on the shelf the skill is for you. Other things I end up asking people:

  • I know you {did this other thing} or {have this other skill} too. Don't you want to brag about that?
  • It sounds like your skills would be a great match for {this kind of job}. Is that the kind of role you're applying for?
  • If a recruiter were trying to find someone like you on LinkedIn, what keywords would they search for?

Some of the feedback I've received after recent resume reviews:

  • "Thanks again for your feedback on my CV, it was INCREDIBLY useful and very gentle at that."
  • "You are a great feedback provider."
  • "Elizabeth was so good in helping me with my resume!!"

I'm curious who you've gone to for career advice. Were they in your industry? What made you seek them over other people for advice or wisdom? What question or piece of advice has changed the way you look at your current job or for a new one?

The Power of Separating Setup and Teardown From Your Tests

This week, I was trying to find an explanation for my colleagues about when it's better to separate the setup and teardown of your tests from the test code itself. I was hoping that pytest's own documentation would have a recommendation, since our test code for this particular repository is written in Python with pytest as a test runner. Pytest does explain many features of fixtures, and what different test output can look like, but not the power of combining them. That's what I'd like to explain here.

An example

I can't show you the code I was looking at from work, so here is a relatively trivial and useless example I was able to contrive in an hour. (Sidebar: I once tested an address field that truncated the leading zeroes of post codes, so though this test may be trivial, testing that the post code made it to the database intact can provide value.)

There's an API called Zippopotamus that can either: 1. take a city and state, and return you details about matching places; or 2. take a post code, and return you details about matching places.

I've got two tests below, both trying to accomplish the same thing: see if all the post codes returned for the city of Waterville in the state of Maine also include Waterville in their results.

  • Setup: get list of post codes for Waterville, Maine
  • Test: for each post code, check that Waterville is in the list of matching places
import requests
import pytest

zippopotamus_url = ""

def post_codes():
    response = requests.get(f'{zippopotamus_url}/me/waterville')
    assert response.status_code == 200
    places = response.json()['places']
    post_codes = [place['post code'] for place in places]
    return post_codes

class TestZippopotamus:

    def test_setup_included_waterville_maine_included_in_each_post_code(self):
        response = requests.get(f'{zippopotamus_url}/me/waterville')
        assert response.status_code == 200
        places = response.json()['places']
        post_codes = [place['post code'] for place in places]
        for post_code in post_codes:
            response = requests.get(f'{zippopotamus_url}/{post_code}')
            assert response.status_code == 200
            places = response.json()['places']
            assert any(place['place name'] == 'Waterville' for place in places)

    def test_setup_separated_waterville_maine_included_in_each_post_code(self, post_codes):
        for post_code in post_codes:
            response = requests.get(f'{zippopotamus_url}/{post_code}')
            assert response.status_code == 200
            places = response.json()['places']
            assert any(place['place name'] == 'Waterville' for place in places)

The first test shows the setup included in the test. The second test has the setup separated from the test. It appears in the fixture called post_codes.

(venv) ez@EZ-mini blog-examples % pytest                     
========================== test session starts ===========================
platform darwin -- Python 3.10.1, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/ez/blog-examples
collected 2 items                                                 ..                                 [100%]

=========================== 2 passed in 1.46s ============================

When you run these tests, they both pass. One test is a little longer, which you may find easier to follow than navigating around in the code, or harder to follow because there's code that's more about data collection than what we want to test. I find it yucky (a technical term) to have more than one thing called request or response in a single test, but these are all personal preferences.

Now imagine instead of waterville in the API requests, I've gone on auto-pilot and typed whatever in the setup for the tests. Here's what pytest gives us as the output.

(venv) ez@EZ-mini blog-examples % pytest
========================== test session starts ===========================
platform darwin -- Python 3.10.1, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/ez/blog-examples
collected 2 items                                                 FE                                 [100%]

================================= ERRORS =================================
_ ERROR at setup of TestZippopotamus.test_setup_separated_waterville_maine_included_in_each_post_code _

    def post_codes():
        response = requests.get(f'{zippopotamus_url}/me/whatever')
>       assert response.status_code == 200
E       assert 404 == 200
E        +  where 404 = <Response [404]>.status_code AssertionError
================================ FAILURES ================================
_ TestZippopotamus.test_setup_included_waterville_maine_included_in_each_post_code _

self = <test_error_vs_failure_pytest.TestZippopotamus object at 0x101f4c160>

    def test_setup_included_waterville_maine_included_in_each_post_code(self):
        response = requests.get(f'{zippopotamus_url}/me/whatever')
>       assert response.status_code == 200
E       assert 404 == 200
E        +  where 404 = <Response [404]>.status_code AssertionError
======================== short test summary info =========================
======================= 1 failed, 1 error in 0.71s =======================

Neither test passes. They both get mad at the same spot, where they're checking that they got the post codes for "Whatever, Maine" and found that, oh wait no, they haven't been able to do that.

But one test fails and one test errors: The test with the setup included fails. The test with the setup in the fixture errors. This difference is why I prefer to separate my setup (and teardown, which behaves the same way) from my test code.

The power of separating setup and teardown from your tests

  1. More of the test code is about what's being tested, instead of being about how you get to the right place.

  2. Pytest will give you an error when code fails in the setup or teardown, and a failure when the code inside the test fails.

  3. If you're reusing setup or teardown, you'll only have to fix an issue in the code in one spot.

  4. If you're running a bunch of tests with shared setup or teardown in a pipeline, it'll be easier to diagnose when something outside what you're trying to test has gone awry.

Reasons to keep the setup and teardown with your tests

  1. You are early enough in the development process that the setup and teardown don't need to be used anywhere else yet. You can extract them when they do, but for now, it's a little faster to read with everything in one place.

  2. If you don't have your IDE setup correctly, PyCharm may not let you Ctrl + click through the code to follow the fixture code. (Here's how to setup PyCharm to recognize pytest fixtures.)

  3. If you don't trust someone reading or debugging the test (other colleagues, future you, or possibly even other colleagues after you've moved to a different team) to be able to follow the code through to the fixtures. Or no one else is looking at the code!

What have I missed?

What other reasons are there? What do you tend to do for your team when your code is shared? What do you tend to do for yourself when you only have your future self to help out? How would you have written this Python code differently? Which articles do you point to when you're explaining a separation of concerns?

The Llandegfan Exploratory Workshop on Testing

An adventurous band of brave souls gathered in the northwest of Wales on the week of a transit strike in the United Kingdom. The topic: whole team testing. The conclusion: even the experts have trouble doing it well.

The peer conference format was apt for exploring mostly failure. Brief experience reports proved ample fodder for in-depth discussions of the circumstances and reflections on possible alternatives. It's better to reflect on your less-than-successful work with your troubleshooting-inclined peers than it is with your colleagues.

Ash: When "Whole Team Testing" becomes "Testing for the Whole Team"

First up was Ash Winter with a story of culture clash between Ash and the teams he help guide in their testing (cough did all the testing for cough). Ash discovered over the course of his six-month contract that getting everyone to nod along to his suggestions of having unit tests, API integration tests, front-end tests, limited end-to-end tests, and exploratory tests was completely different from agreeing on what those were or building the habits on the teams to make them happen. Saying the words "sensible journeys" and "meaningfully testable" wasn't meaningful at all.

By being a white man who looked the part, it was easy to get invited to the right meetings and seen as the authority. (How wonderful to be able to have a group all share in how outrageous this is compared to the experience other people have!) Ash was seen as an authority for all testing decisions, so teams looked to him rather than thinking for themselves.

Upon reflection, Ash acknowledged he would have done better to slow down and understand the expectations of the project before jumping in with prescriptions from his consulting playbook. The teams needed to know what habits to build day-to-day instead of receiving what must have sounded like prophesies from the future.

Sanne: Problem Preference

In listening to a book-that-could-have-been-a-blog-post, Sanne came across the question: "How have you chosen the kinds of problem you pick up?" It made her think about her preference for focusing team habits and communication so she could bring underlying issues to the surface. She's got a predisposition to be proactive and will run at a problem a hundred different ways if you let her.

On her new assignment, Sanne wants to let the team do the work instead of trying to do it all herself. So she's taking a radical step: she doesn't have access to the test environment. Her goal is to leave a legacy behind at the companies she works for, but it's too soon at her current assignment to evaluate how that will pan out.

Yours Truly: This Diagram Asked More Questions Than It Answered

I told the story of this blog post, with an addendum: I made a similar diagram for a different product that came in handy on the project I'm currently jumping into.

It was a great delight to hear my peers admire the middle of my three diagrams, the one deemed unprofessional and literally laughed at by my colleagues. Sometimes the complexity of the model gets reveals more about the complexity of the situation than a clean, organized model does.

I don't have any notes from what I said or what discussion occurred afterwards. Perhaps another participant's blog post will cover that bit in the coming weeks.

Duncan: Quality Centered Delivery

Duncan showed a truly dazzling amount of data extracted and anonomized from his five teams' JIRA stats. In so doing, he was able to prove to the teams (after wading through their nit-picks and expections) that a huge proportion of their time was spent idle: questions in need of an answer, code in need of a review, customers with no one to hear their feedback. Duncan deliberately dubbed this "idle" time to keep the focus on how the work was flowing rather than on optimizing for engineer busyness.

To shrink idle time, developers, testers, and the PM started working in an ensemble. Idle times dropped dramatically. The team kept a Slack call open all day for collaboration. One fateful day, the too-busy subject matter expert and too-busy client dropped into the call. Wait time plumeted to zero. The story of this particular success proliferated through the organization thanks to the praise from an influential developer on the team: development was fun again.

Duncan's was the one success story of the peer conference, though he was quick to point out that things could have changed after he left the assignment.

Vernon: How could I, Vernon, "The Quality Coach" Richards, make communication mistakes?!

It was a delight to get into the nitty-gritty details of a story that Vernon conflated and glossed over a bit in his keynote at Agile Testing Days in 2021. And to see the relationship repaired and strengthened in real-time with a colleague who witnessed what went down. (I'm just here for the gossip, clearly.)

A colleague asked a tester to create a release plan for the team by themselves. As the tester's manager, Vernon thought this was an outrageous way to "collaborate". Without spending time to understand the colleague's context, beginning from a place of unconditional positive regard (as coaches are meant to), or verifying his approach with his own boss, Vernon went on the war path against this "bully".

Remarkably, escalation and accusation did not solve the problem at hand: the tester didn't have the skills to build a test plan. Nor did Vernon's outrage address the real problem: there wasn't alignment at the organization about what the biggest fire was. Vernon wishes now that he'd protected his 1-on-1 time with his direct reports, and empowered them to address the situation rather than doing it for them.

In summary, it is not easy, straightforward, or simple to get a whole team to test.

Our lunch walk with a view of Snowdonia

A note about the surroundings for this gathering: spectacular. It was an 13-hour journey of four trains, one bus, and one bike to get back home, but it was worth it to be transported to views of Snowdonia National Park, a small town where the Welsh language holds a stronger footing than I expected, and a small group willing to make the same trek to geek out.

Many thanks to Chris Chant, Alison Mure, and Joep Schuurkes for making this conference possible, well-facilitated, and parent-friendly. Many thanks to my fellow participants: Ash Winter, Sanne Visser, Duncan Nisbet, Vernon Richards, Gwen Diagram, and Jason Dixon for being my peers. And B. Mure for listening well enough to capture some of the goofy things I said.

I look forward to making the trek again in the future.

From Crafting Project to Critical Infrastructure

Just for me

Three years ago, I had a shit laptop. My company makes a Windows desktop software product that allows you to build your own applications. Mac users working the software could open it on their Windows virtual machine in Parallels. When I did that, my company's software crashed, Parallels crashed, and then my whole Mac crashed. My job was to create app builds, run them, and test them. Due to my shit laptop, I couldn't do that locally.

Luckily, our app was also hosted in our public cloud. Through the cloud UI, you could make a build, see which build was on which of your environments, and deploying a new build. But the UI was...not an ideal workflow for me. It was slow to load, required several steps of clicking and waiting for a minute or two - just long enough to get distracted thinking about something else. A deploy process that might optimally take ~8 minutes took ~15 minutes as my mind wandered and the UI didn't update immediately.

I needed a one-step process to deploy, with updates frequent enough to hold my attention. I decided to abandon the UI for the API.

I wrote a Python script that took command-line input and printed output to the console as the steps of the process progressed. I used my two crafting days that month to break down the problem, setup the whole repository, and get the code to a state where it built and deployed an app to an environment.

A code review from Joep Schuurkes moved the code from a long list of functions to different classes corresponding to the API endpoints I was calling. I think the commands were limited to --build and --deploy. To make sure the refactor was successful, I'd scroll up in my Terminal history and run those two commands again. Crafting days on subsequent months brought a bit more error-handling to account for mistypes on my side or failures/timeouts from the APIs.

At this point, it was a solid tool that saved me about a half-hour per day. I presented it to the developers on my team, offering them access to the repository so they too could benefit from this time-savings.

They were deeply unimpressed. They didn't have shit laptops, they had Windows laptops, they didn't have to run Parallels, they weren't constantly switching between branches and needing actual builds of the application to test. To them, this script was relatively useless. That was fine by me! The time and frustration the script saved me was more than worth the effort to build it. I used it several times a day myself, and got to use it as an example in the "Whole Team Approach to Continuous Delivery" workshop I paired with Lisa Crispin on. That was more than enough.

Slide from the workshop

Pipelines emerge

Six months later, a developer on my team got excited to set up a pipeline for our application. They wanted to run static code analysis on a build of our application, and run our functional tests against a deployed application running in a deployed environment. They copy + pasted my code as a starting point for the build and deploy, copy + pasted the static code analysis scans from another unit, and connected the two in a pipeline that provided value to the wider team. Developers weren't great at running tests on their feature branches on their machines; now we had a pipeline that would do it for them.

Other teams saw our pipeline and discovered my deployment script in the process. Rather than copy + pasting the code as my teammate did, they pinned their pipelines to the most recent version of the code on the master branch.

With more users and use cases, fellow colleagues were eager to also use their two crafting days per month to add the features they needed. I'd receive pull requests of things I didn't need for a context I didn't have, or feature requests I used my limited crafting time to fulfill. Without a style guide, a linter, tests, or a set scope, it was hard to turn away pull requests weeks or months in the making that people were eager to see included in the master branch. I merged it to keep everyone unblocked. As the code grew to serve every individual need, I lost interest in supporting what had originally been my darling pet project.

Still Valuable?

Two years after the original two-day crafting project, my role shifted from serving one team and one application to thinking about quality for the seven engineering teams in my unit. No longer did I need to deploy the application to a hosted environment. At the same time, my old team shifted where the repository was located, and the APIs I'd been calling in my script wouldn't do a lot of what they used to.

I got to explore what it meant to be the Quality Lead for my unit, and nobody I served needed this script. I left the list of improvements I'd brainstormed for it languishing at the bottom of my personal Trello board. I didn't get any requests from other departments to use or update it.

Still Valuable!

Nine months later, the spark got reignited! A fork of the deployment script got presented in another unit, complete with a UI on top of it. Someone on my old project discovered my script, and decided to add a feature to upload builds from the new repository location to make it useful again. They shared the code for a review after just a few hours of effort.

I had a chance to think through what parts of the repository were resuable for this use-case, which parts would be better copy + pasted for better readability, and got the merge request to a place where it fit in with the existing code style before anyone's heart and soul had been poured into it.

Now a bloated script eight different actions, I decided to start writing tests for it. I didn't need the tests to make sure the existing code worked; everyone using it in their pipelines was enough to prove that. Tests will allow for future refactoring of the code and updating the version of the API I'm calling.

The first test I added confirmed that the new functionality did what the code submitter expected it to do, gave me a way to change individual parameters faster. and gave me the confidence and excitement I'd been missing.

I'm just getting going on tests for the rest of the existing code, but I'm looking forward to it!

Why do I tell you this story? Well, here's what I think when I look back at the evolution of this code base:

  • write tests, even before you really need them
  • set up a linter and coding guidelines before you give anyone else access to your repo
  • if you want to be precious about your code, tell people to fork instead of submitting merge requests
  • if you want the code to be in its most findable place and shareable state, you'll have to invest the time to collaborate with people on their changes
  • good things come to those who wait :)

Talk About Your Test Strategy

I was invited to join a team's debate this week about what environment to point our third-party security testers towards for their upcoming penetration test. I asked what I thought was both an obvious question and something worth discussing with the team:

"Do we want them to identify security risks, or are we just checking boxes here?"

A combination of stunned silence and nervous giggling (muted over Zoom) ran through the team. "We don't talk about that out loud," the team lead told me.

But that's exactly what I'm there to help uncover as the Quality Lead for this team and the others in our unit: how deep or shallow should our testing be? If our testing uncovers issues, are we interested in mitigating them? If not, why are we testing?

A Test Strategy in Five W's

This conversation took me back to a few years ago. I was working on a product in a phase before production-level quality that we dubbed "demo-driven development" in retrospect. We were showing off a combination of Powerpoint slides and small pieces of the product in order to gain more funding. A person interested in testing but with too large a scope to pay attention to my team in particular asked me for a test strategy.

But the demos kept changing. What was important this week wouldn't be important the next. There wasn't a lot of exploratory testing being performed or automated tests being written. All my time was occupied in figuring out what had already been promised, what we were trying to sell, and filling in the gaps between those with a very specific path our product owner would follow during a demo, down to browser and screen resolution.

I asked the person who wanted the test strategy document what they were going to do with it, what it might be used for. They sent me the enormous table where a link to my test strategy would be added, and clearly never looked at or noticed again.

Could I document in an official test strategy document for my team that I wasn't doing much testing? It turns out, yes.

I outlined the document with the five w's: who, what, when, where, and why. The whole document looked something like this. I don't think you even needed to scroll to read it.

  • Who: Our stakeholders are the people we're selling to, our product owner, and our team, in that order.
  • What: We're testing one particular happy path in Firefox (our product owner's default browser).
  • When: Due to the volatile nature of our product's priorities, our minimal testing has been concentrated after user stories are completed.
  • Where: We're running the application locally for demos. We haven't had a chance to set everything up we'd need to have in a hosted environment.
  • Why: We test to ensure that the one happy path is demonstrable to a customer in a demo, and to provide our product owner with the work-arounds for the gaps in our product.

I sent it to the person, expecting them to get back to me and tell me I couldn't do testing like this. Or at least, I couldn't write it down. But they never read the document! They thanked me, linked it in their table, and went on their merry way.

A Test Strategy in Stakeholders and Risks

I liked the way I shaped my test strategy around the very specific set of stakeholders and their risks in the five w's strategy. I wanted to bring this same connection to the teams I support when I started as Quality Lead for my unit. I ran a test strategy workshop for each of them to identify their stakeholders, talk about the risks that matter to them, and see how their team activities mitigated those risks. I got to this Miro board template after a few rounds.

  1. List the software the team is responsible for. (Our teams typically have legacy products they're maintaining in addition to the new things their roadmap focuses on.)
  2. Mind map the stakeholders for these products.
  3. Add stickies next to the stakeholders' names with their possible risks and concerns.
  4. Review the types of testing activities (things like exploratory testing, reviewing the production logs, static code analysis, etc.) for comprehension and completeness.
  5. Move each testing activity onto the impact (it's important vs. it's not important) and priority (we do this vs. we don't do this) quadrants.
  6. Vote on stickies that landed in an unexpected spot.
  7. Talk about the most-voted stickies in order, and identify action points with owners from there.

Part of this workshop was to show the teams that not every piece of testing is something that matters to the stakeholders. I didn't expect them to do every possible kind of testing imaginable. But I did want them to all understand and agree what kinds of testing they were and weren't doing. I got them talking about it out loud.

A Test Strategy Derived from a Vision

Believe it or not, a one-time workshop was not enough to get everyone to identify and build the perfect test strategy! As the teams grew and the workshop faded from memory, I got questions about the test strategy for the teams. I heard about goals of "bug-free software" and asked about "what best practices to follow" to get there.

As fun as it would be to pontificate about how there is no such thing as bug-free software, and there are no best practices outside of the obvious domain, that doesn't help people know what to do. So I wrote a "Quality Vision" document. (Pro tip: use a noun you wouldn't use for anything else so it's easy to pull it up by typing "vision" in your browser bar.) The Quality Vision for the unit places trust in the expertise of the teams to choose their own ways forward. It has things like:

  • Is our product at the right level of quality to release right now? This is a constant conversation between the development team and your product owner. Think about the risks and concerns of the customers you're targeting.
  • Data Security We're not to use production/customer data for development purposes outside support incidents. Here's a link to a more in-depth document from our Security team.
  • Reliability Here's a link to the document of what we promise our customers in our service-level agreement.

It's not going to tell you what the right answer is for your team right now, but it'll give you some things to point to when you're discussing quality with your team.

Because after all:

Quality is value to some person who matters at a particular point in time.

For the penetration test, the team lead quickly followed their "We don't talk about that out loud" comment with a "why not both?" jest. Why can't we both check the boxes for the authorities, and uncover valuable information that we want to act on?

Indeed, that's where we landed. We decided to point the security team to the production environment because that would reveal the best information. Unless that setup takes too long for the team, then we'll point them to the test environment. But regardless: we'll tell our bosses and our product owner what we're doing and why. We'll talk about our test strategy out loud.

How have you started a conversation about quality with your team? When have you decided not to test something? What have you not tested and also not discussed?

Photo by Patrick Fore on Unsplash