Oncall shift should be Tuesday to Tuesday
185 points by RyeCombinator 5 days ago | 214 comments
  • julianeon 2 days ago |
    He makes a sound argument.
  • applecrazy 2 days ago |
    > Most places take after hours paging pretty seriously.

    LOL i wish

    • tail_exchange 2 days ago |
      My team has a meeting to hand off the on-call to the next person, and we discuss all pages we got during the week. Primarily two things: whether the page was for a good reason or not (good: our on-call person had an something actionable to fix. bad: non-actionable pages, pages because someone else's system was broken, false alarms, etc), and also whether there is something we can do so we never get paged for this again. I find it very effective at reducing pages.
    • kdazzle 2 days ago |
      Lol yeah. My old team had oncall pages in the middle of the night pretty often where nothing was actually the matter. My manager was only nominally on call. In the handoff meetings every week he was basically just like “that sucks”.
  • N8works 2 days ago |
    I never understood why companies didn't simply leverage 24x7 internet MSPs.

    They are able to staff 24x7 by spreading the cost over multiple customers and working through the process of making your application manageable by a 3rd party is super beneficial.

    Most of these companies will also do performance monitoring and analysis as well.

    They see issues and optimization opportunities across multiple applications and know more than a single team who's only built one.

    • 0_____0 2 days ago |
      Are you speaking from personal experience having worked with one? What was the feedback between application management back to engineering like?
    • danpalmer 2 days ago |
      That works well for generic IT systems and running the desktop/laptop fleets, but doesn’t work at all for running the software a company builds.

      We typically split our teams, so we have ~16 split across two time zones so that our shifts are just 12 hours during the day. It works well, but it is expensive, so we support a lot of services (or a small number of very high priority services) as a result.

    • RaoulP 2 days ago |
      I hadn't heard of Managed Service Providers before, but you make a good case for them.

      I'm finding surprisingly little discussion on HN regarding the costs/benefits of MSPs. Or rather, under which conditions (such as company size) they make sense.

      Any big players or companies you would recommend?

    • sgarland a day ago |
      If an MSP can effectively manage your company’s product, then your problems are simple enough to have automated detection and recovery.
  • losteric 2 days ago |
    I have occasionally convinced teams to adopt both oncall and sprint cycles aligned with Tuesday [1] - the dev teams all loved it. Management was a harder sell, but by and large were happier with the extra days to communicate results/get metrics before their own Friday deadlines.

    [1] also Wednesday/Thursdays. Wednesdays were my favorite in good working environments, it felt like running a successful marathon, but it was more prone to falling apart due to short-term thinking.

    • superfrank 2 days ago |
      I'm curious why Tues to Tues for sprints was a hard sell to management?
      • dijksterhuis 2 days ago |
        likewise, be interested to hear more about that situation
    • andrewaylett a day ago |
      We have our sprints start on Tuesdays, and our in-hours on-call also runs Tuesday to Monday. Out of hours on-call starts at 5pm Monday.
  • alexwasserman 2 days ago |
    I’ve always been partial for Friday night through Friday night.

    You start off over the weekend, when you have energy and can survive the two days alone. Ideally no Friday releases so the transition is calm, but as the writer says the batches might fail.

    You spend the week fixing whatever breaks. You’re cleanly off the Monday to Monday sprint, just doing on-call/ops.

    You finish Friday evening and immediately get Friday night and the weekend to recover when you need it most.

    • taberiand 2 days ago |
      That was exactly my reasoning too when I set up our on call roster as Friday to Friday, though for us Saturday is the busiest day in terms of customer activity, so it was a no-brainer.
    • superfrank 2 days ago |
      This post was discussed somewhere else and I saw someone say that their work does Firday noon to Friday noon and their work gives the outgoing on call the rest of Friday off. I feel like that's even better because 1) it recognizes the hard work that the outgoing on call put in 2) it give the incoming on call a few hours to get up to speed while they still have the support of the other engineers on the team.
    • wging 2 days ago |
      Maybe I'm taking you too literally, but I wouldn't want to have a handoff sync-up (or any meeting, really) on a Friday night, nor push that earlier so significant things can happen between sync-up and the actual shift in responsibility from person to person. Friday-to-Friday does sound good.

      One thing I really liked in a previous job was a split daytime-vs-nighttime rotation. It was well worth a little annoyance to set up in our tools. One week you'd be the 'daytime' oncall for business hours (something like 9-5 Mon-Fri, though we might have tweaked those hours a bit; it might have been 10-6 or something). The next you'd be on call for the complementary time (5-9, weekends). You were on call for the same total amount of time, just smeared over two different weeks. It ended up being less of a burden to optimize your schedule for a reasonable response time, but operational work still got done. And in practice awareness of operational issues was not too hard to maintain between the two members of the split.

      (I think the best thing, if you can swing it, is probably a follow-the-sun rotation where there are three teams distributed 8 hours apart around the globe, and they trade off 8-hour workday shifts. But a lot of uncommon things probably have to be true of your organization for that idea to even be on the radar.)

      • andrewaylett a day ago |
        We split in-hours and out-of-hours, and I wouldn't want it any other way. It's especially good if a late night incident keeps you from sleep, because you can take the time back the next morning and let someone else pick up the pieces :).
  • ljoshua 2 days ago |
    My team does Wednesday to Wednesday for many of the same reasons mentioned in the article, and it works great. We switch at 11am and hold a hand-off meeting at that time, and invite the whole team.

    Hand-off meetings with the whole team work really well (in my opinion!) when you have a relatively small team--we have 9 FT teammates. Often someone else may have been delegated the page or bug that arose and can discuss how they handled it, or someone who wasn't involved may have insight for how to handle a situation better the next time. Since we're all going to be on rotation at least once during a quarter, it's great to know what happened in case a similar page pops up later.

    Finally, we also fill out a running Doc before/during the meeting with links to the pages/bugs, along with short descriptions of how they were handled. This forms a great living memory of how to deal with incidents, and is also often the birthplace of new playbooks for handling new types of incidents.

    • l8nite 2 days ago |
      Same here. Except we do a two week rotation, and it aligns with our sprints. The active on-call engineer doesn’t have any assigned sprint work and focuses their effort on fixing bugs or cleaning up the backlog when they’re not actively triaging an incident.
  • thuanao 2 days ago |
    Is anyone getting compensated for being on-call? If you are paged and work outside of business hours, do you receive additional compensation?
    • tdeck 2 days ago |
      At Google we used to get paid an oncall bonus which was calculated at something like 1/3 your prorated salary for the non-working hours you were oncall (IIRC), up to some limit per quarter. For my team a week of oncall per quarter would max it out and net you a few thousand dollars bonus.
      • thaumasiotes 2 days ago |
        > up to some limit per quarter. For my team a week of oncall per quarter would max it out

        That reminds me of Amazon's abysmally bad employee discount, which was "10% off anything on the site, up to $100 / year".

      • kevinventullo a day ago |
        Google still does this. Roughly speaking, hitting the limit in a quarter means you have <= 5 people on the rotation.
    • themenomen 2 days ago |
      Yes. And not only for responding to a page, but also for being stand by outside working hours.
    • losteric 2 days ago |
      On my teams, if someone got paged off-hours they would just work less the day after the event. imo it should just be part of the regular salary/work expectations, incentivizing keeping oncall low
      • rk06 2 days ago |
        No, it should be compensated, so Management prioritises fixing issues, instead of adding new bugs
        • Buttons840 2 days ago |
          I've wished for a tech workers union for this reason. I don't care about pay, let the union say nothing about pay.

          But let's align incentives. Any time spent fixing issues on-call is compensated 4-to-1. Workers may accrue compensation time, and any compensation time in excess of 20 hours is paid 10-to-1 when the employee leaves. The idea here isn't for workers to accrue and cash out comp time, but instead to give an incentive to the organization to ensure workers use their comp time.

          Let's align incentives, what's hard on the worker should be hard on the owners and management.

          • cess11 a day ago |
            All you need is three people that agree on one realistic change at your workplace and you have a union. From there you start having regular meetings and plan a strategy for pushing this single issue.

            When that's done, chill for a while, do some recruiting and education in the workplace and think about what the next realistic change ought to be.

          • erik_seaberg a day ago |
            We can pay for oncall availability, but rewarding outages and slow recoveries is a dangerous incentive.
      • tkzed49 2 days ago |
        Where I work, this would have no impact on the amount of tasks shoved into the pipeline by product and leadership.
        • kelnos 2 days ago |
          Perhaps not, but at least the oncall person will be compensated for the crap they have to put up with.
      • kelnos 2 days ago |
        Gross, no. This just allows management to ignore problems and push development teams to do feature work, even when everything is on fire and the oncall person is getting paged multiple times per day.

        Oncall should be compensated, always. The oncall person should get a flat rate just for being on standby, and should also receive a per-page payout, and that amount should be larger if the page happens outside regular business hours.

        Then management will actually realize there's a cost to pushing features and pulling in deadlines at the expense of robust engineering practices. Or they can decide they are fine with that, and paying the oncall person is a cost of doing business they way they want to.

        I've seen too many instances either issues they come up during oncall never get fixed, and just page and page and page.

        I will never again work at a company where oncall is "just a part of the job". I value my own time too much.

        • sgarland a day ago |
          > Or they can decide they are fine with that, and paying the oncall person is a cost of doing business they way they want to.

          I was going to say, this would almost certainly be the outcome. Companies have no problem throwing millions at AWS, DataDog, etc. They certainly aren’t going to blink at an employee making a couple hundred bucks extra per day.

    • ackermann-m-n 2 days ago |
      In France it is mandatory either with salary or rest. In addition the labor code stipulates mandatory daily rest of 11 contiguous hours even during the weekend and extra 24 contiguous hours of rest during the weekend. Hours of intervention are considered work.

      In my company we get approximately 800€ for every week of on-call and each hour of intervention is also compensated with salary.

      From my point of view this should be high enough for the company to be willing to focus on on-call issues. Ater years of being on-call I must admit the salary is comfortable but it doesn't cover the pain and constraints of being on-call: being kinda "stuck" at home basically, lots of consequences on private life etc.

    • waetsch 2 days ago |
      Yes, 350€ (before taxes of course) per week. No additional compensation for responding/ working on incidents.

      I would be interested in how response times are?

      Mine is 15mins. So I have to respond and be in a incident call within 15mins.

    • sunaookami 2 days ago |
      I was on-call on a Saturday at the start of November and the prod issue took nearly 10 hours. No extra compensation from my company (not required by law in my country for saturdays but come on!). A week later I had to do it Monday night again, still no extra compensation... I can work less with these extra hours but... when?
      • pnutjam a day ago |
        ah, the ever popular evaporating comp time. If companies prefer that I tell people to be sure to comp your time ASAP. No 50 or 60 hour week heroes.
    • danielbarla 2 days ago |
      In my previous job, the company followed the country's laws pretty much to the letter. Simply being on call gave some small fraction of my "hourly rate" (despite being a full time employee), which actually does add up, since e.g. on a weekend you are accumulating 48 hours of such on call time (while on weekdays it's only 16 per day, as your actual 8 hours worked is not counted twice).

      If there was an actual incident, you'd get paid as if that was overtime worked, and it depended on when it occurred (e.g. weekends and public holidays carried a higher than normal multiplier). There were also limits on how much rest you'd need to be guaranteed, etc.

      On average, our actual incidents were relatively infrequent, and the pay out mostly depended on the size of the team, which dictated how often you got rotated in. It worked out to something like +10% salary though.

    • phito 2 days ago |
      Getting paid twice as much and less taxes. But it almost never happens, and I prefer it this way :)
    • mrweasel 2 days ago |
      In a previous job yes. 7200DKK per week for on-call. Hours where between 17:00 and 08:00 on weekdays and all of Saturday and Sunday, running Friday to Friday. From 8:00 to 17:00 the normal service desk would handle incidents.

      If you got paged you'd get 150% of your hourly pay, per started hour. So if you got pages at 22:00 and again at 3:00 that's three hours of pay, regardless of each issue only taking 5 or 10 minutes to fix.

      That's roughly $1000/€950 per week of on-call, plus the hours. You'd have four/five of these per year and you could pick up an extra month of pay per year with the standby pay, plus the hours and maybe pick up another day here and there.

      Holidays where normally distributed on a volunteer basis (you'd still get paid, but you opted-in to those days). So maybe I'd be home on New Years, but out on Christmas, so I'd offer to cover Christmas, while another colleague might care more about being able to drink on New Years.

      Originally we where almost 50 people handling on-call, so you'd have one week per year, but that's not sustainable, you forget how to handle common issues or how to fill out incident reports and handoff correctly.

      The most stupid on-call schedule I ever did was midnight to midnight, every other day... It was only me an my boss. That was incredibly stupid, because you couldn't go out on one day, and the day where you could go out, you had to be careful about having one drink to many.

    • strken 2 days ago |
      At my last job I got time off in lieu for actual hours worked + about $4/hour for being on-call.

      I really wish we'd gotten paid for hours worked rather than TOIL, not for personal preference but because it would have aligned the company's incentives better. We might actually have fixed some of the problems if not doing so cost the business a tangible sum of money.

      Still, it was better than working for free.

      • tiahura a day ago |
        That's the way it was 20-30 years ago. If you were on call, it was informally understood that you were going to be rolling in late in the morning, at if something big happened, you'd be missing a day or two afterwards.

        Worked well until E&Y came in and "fixed" things with a strategic plan.

    • smasty 2 days ago |
      Yes, a flat daily fee during the week, and double that during the weekend or public holidays. Comes to ~600€ per week. If you actually get paged, you automatically get time off for the amount of hours you spent dealing with the incident.
    • oncallthrow a day ago |
      At mine: compensated for being on-call, but it's a pittance (in the 100-200 per week range, which is nowhere near being worth it). We get TOIL for time spent responding to incidents.
    • pnutjam a day ago |
      Last job I was compensated a flat $450 for being on call 1 week, on top of my salary in the 140K range. Everybody received the same on-call pay and I'm pretty sure salaries were similar but not identical.
    • alienchow a day ago |
      Google paid SREs 67% their hourly rate for tier 1 oncall outside business hours, regardless of whether they were paged. So 12h shifts on weekends were a full day's pay. Convertible to Off-in-Lieu. I had so many off days. Not sure if it's still the case after those layoffs.

      Anyway, I prefer Mon - Thu, Fri - Sun shifts.

      • skuzye a day ago |
        Still the case. It’s a good system IMO. My team is low toil and low page it’s basically free money/time off
        • alienchow 20 hours ago |
          I agree, it incentivises SREs to reduce pager toil.
      • ndjdjddjsjj 20 hours ago |
        Damn stuff like this motivates me to start a union
    • andrewaylett a day ago |
      Not that way around, no -- paying extra when paged creates a perverse incentive. We're paid to make ourselves available, and encouraged to take any time we actually spend working out of hours back in lieu.

      My team have put a lot of effort into only rarely being paged: a normal week on-call won't have any out-of-hours activity at all.

    • looperhacks a day ago |
      We're being compensated for being available (more on weekends and holidays) and get additional compensation for every 30 minutes incident response. Two incidents in one night? Twice the money. Additionally, we are required to work less the next day (we're also required by law to have at least eight hours of free time between two work days. So if you get an incident seven hours after you got off work? Congrats, you now have to wait eight hours before you can start working. This is of course very annoying, so nobody does that)

      I'm not even sure if doing on-call duty without compensation is legal in my country.

      In the past, there were some cases of "fake" incidents, but the amount of documentation makes sure that the company is able to crack down on this.

    • icedchai a day ago |
      I worked at one place that paid an extra $500/week for on-call. That was a flat fee.
    • lightning19 13 hours ago |
      Based on the replies to this I didn't realize how bad we have it in South Africa. I was on call 24x7 for 2 years and was not paid anything for it.

      Big ecommerce companies and even global brands like AWS and BMW require 24x7 on call without any compensation.

  • AaronM 2 days ago |
    We are on-call for 48hrs at a time, about once every 12 days or so, one day as backup, and one as primary. It's nice because it doesn't interrupt your week too much. The downside being that complex issues might require extra work while not on-call
  • Charon77 2 days ago |
    My company does this
  • nikolay 2 days ago |
    Ours starts at 5 PM on Tuesday and I think it's great.
  • polack 2 days ago |
    We do Thursday to Thursday and then you get Friday off after completed on-call. Being on-call gives you no extra pay by itself, but if you get paged off hours and need to work you get paid 150 to 200% of your normal hourly wage depending on what time of day you need to work.

    Best on-call I’ve had.

    • mrweasel 2 days ago |
      That's good the hear. We're currently redesigning our on-call and plan to make it Thursday to Thursday, and then Friday off.
    • einichi a day ago |
      No pay for being on call by itself is still poor, particularly when it comes to swapping rotations between team members to provide flexibility amongst each other.

      You’re making yourself available 24/7. That has a non trivial lifestyle impact which I’ve always thought deserves more than is typically rewarded.

      • zgeor a day ago |
        Not to mention that there is incentive to keep having oncall pages, because that's how you get paid. Or not participate at all. On the other hand, with a flat payment, there is a big incentive to prevent issues and not have(reduce) ooh incidents, and participate in the rota.
      • sokoloff a day ago |
        As long as the on-call coverage is as specified at the time of hiring, this is just a difference in form of payment.

        If I receive 100 total units of compensation, I'd way rather get 100 units of base pay (and 0 on-call pay) than 90 units of base pay and 10 units of specific on-call pay. (What if the company eliminates on-call? What if I get injured and my insurance only covers base pay? Severance is usually based only on base pay; I would not be paid on-call while I'm on PTO or other paid leave, annual raise percentages typically apply to base pay, etc...)

        • dangus a day ago |
          How can the on-call coverage be specified at hiring? Can the company guarantee that my team will never shrink or that the page rate won't increase?

          What will financially encourage my company to stop paging me overnight if there isn't a labor cost to the company every time an on-call incident occurs?

          > What if I get injured and my insurance only covers base pay?

          Insurance payouts can be easily based on wages that include reported commissions, tips, and overtime. They can very easily be based on an average of past actual wages paid in the last handful of months at the company.

          > Severance is usually based only on base pay

          Severance is a completely optional practice that is based entirely on what the company wants to do. I would argue that severance is more accurately based on "The lowest safe number to pay to this particular employee to make sure their termination does not become a legal risk."

          > I would not be paid on-call while I'm on PTO or other paid leave

          But also, PTO days and on-call days don't indersect. If you took time off during an on-call shift you would be trading it with a team member, so you would never lose that extra wage.

          Example: I'm taking a week off, it's during my scheduled on-call shift. I would normally get paid my on-call hours but I didn't this week. But when I get back from my vacation, I'm picking up an extra on-call shift because my team member covered my shift when I was on vacation.

          Now, I'm taking a week off, but it's not during my on-call shift. I wouldn't have been paid on-call hours this week anyway. When I get back from my vacation, I am going on my normally scheduled on-call shift.

          I personally have never felt compensated dynamically enough for on-call schedules. Most corporate jobs seem to pay for a sliver of the life disruption, maybe paying for half my phone and Internet bill or something like that. They all say that the on-call is baked into the compensation, but I'm not so sure.

          • hawaiianbrah a day ago |
            > If you took time off during an on-call shift you would be trading it with a team member, so you would never lose that extra wage.

            I think this is true in _most cases_, but is not a given. I myself have encountered scenarios where it isn’t true: switching with someone much later in the rotation, only to then end up having to switch again for instance. You could envision a nefarious teammate weaseling out of their fair share with sneaky switches like this, too, though paying for it would maybe incentivize them not to!

            • dangus a day ago |
              Of course it wouldn’t be hard to figure out a rough average on-call amount to pay during PTO
              • sokoloff 14 minutes ago |
                At what point along this continuum does it just become "base salary" rather than "pay specifically for being on-call"?
          • hawaiianbrah a day ago |
            Germany (among other countries) has laws around this. My company pays I think 200 euro a day that someone is on call, so my German reports end up making a decent amount in months they have their on call shifts, especially felt when the team is smaller and rotations more frequent!
          • ddingus a day ago |
            >"Severance is a completely optional practice that is based entirely on what the company wants to do. I would argue that severance is more accurately based on "The lowest safe number to pay to this particular employee to make sure their termination does not become a legal risk."

            Almost right! I see it as an extension of what I call the basic rules, "I am as nice to you as you are to me", and "I care exactly as much as you do."

            That does, in some cases, expand severance a little beyond the cold risk calculation. If the severance is going to someone who helped the company make it, then helping make sure they make it to their next gig is part of the equation.

            Not everyone boils it all down that far, but a whole lot of us do!

            Which makes your comment solid, and mine a quibble, but one I consider worthy of some discussion.

          • 8note a day ago |
            > PTO days and on-call days don't indersect.

            If you have any national holidays, somebody still ends up being on-call for that holiday. I've been on-call for almost every US holiday this year.

      • larsrc a day ago |
        +1! You can't travel very much, you can't go hiking or biking in places without cell coverage, your whatever thing you are busy with gets interrupted, you can get woken up in the middle of the night, etc etc. That deserves some compensation.
      • jxf a day ago |
        In OP's case it sounds like they do get compensated with the day off, which is PTO. It's not a trade everyone would make but an extra day off into a long weekend is one I would have taken earlier in my career.
    • makeitdouble a day ago |
      The extra day off is probably equivalent to getting paid ? Last time I had on-call part of the job, I think the pay increase for standby would have amounted to roughly 8h as well (actual interventions were also 150% for regular nights and Saturday, 200% for Saturday night and Sunday)

      Lugging around a laptop and the on-call phone when going anywhere, checking every now and then when the phone was not with your for a while (e.g. pool, gym etc), making sure you don't go places with no signal was enough of a PITA that knowing we were paid every hour of that had a nice psychological effect.

    • benhurmarcel a day ago |
      > Being on-call gives you no extra pay by itself

      If I’m not paid to be reachable, nobody gets to complain when I don’t pick up the phone though.

  • siliconc0w 2 days ago |
    We do daily shifts with a follow the sun rotation, makes it easier to handle persistent commitments and ensures a bad week doesn't all land on the same person.
    • tgma 2 days ago |
      Daily might be okay for more ops/SRE types, but it is a hell for a primarily dev team. Can't focus on building shit.
      • crossroadsguy 2 days ago |
        In some cases it might help. Because then it becomes “natural” - on-call thing. It’s not something someone dreads as in “god, that week is coming”. Also, it spreads the fuck-ups and peaceful times better.
        • tgma a day ago |
          Increasing hand-offs by 7x is sub-optimal and interferes with folks wanting to take continuous vacation time. Again, I can see for ops teams that could be true, but very much disagree that on-call should be a "natural" thing for dev teams in the first place. It can be a necessary evil that should be minimized (the personality of people who like firefighting and quiet development are quite distinct and there are people who actually like the former.) On the latter point, I think that benefit is very much a mirage. If there's a flaw in the system causing a "bad week," it actually might be easier for the first person who gets the hang of it to deal with it than to try handing off and teaching the next one in the rotation.
    • crossroadsguy 2 days ago |
      This! Whenever someone talk about on-call this aspect of that rotation gets swept under the carpet. Whenever I interview I always ask whether they have on-call system (they must if there are servers and apps involved) and if they do whether they have follow the sun.

      Most don’t even like the question. For them such questions are red flags or the candidate is not “motivated enough”. Rarely some even have follow the sun policy. They might have one in their HQ, true for a lot of US/EU firms, but their offices in a developing country like India - it’s always something on the lines of “oh, engineers here take full ownership; they are the owners”.

      Also, I have seen — 2-3 days rotation with follow the sun is best, week long or longer being worst.

      Then there are companies where it could be forever on-call with no follow the sun - e.g. Amazon, Uber (in India at least). That’s another world altogether.

      • martin-t 2 days ago |
        "If they're the owners, do they get all the profit?" When you know you're not gonna work somewhere, might as well have fun raising eyebrows.

        Cooperatives really should be more common.

        • Seattle3503 2 days ago |
          I mean it sounds clever, but how do you have engineers with no expertise in a system handling calls for it? I've been at places that follow the sun, and you frequently have no idea what to do during an incident, because the person who owns the system is offline. But at least you can sit one a useless incident call during work hours instead of completing your own tickets I suppose.
          • Braini a day ago |
            There should be SOPs in place for each "expected" issue so people know what to do. Its not like you (should) start debugging and deploying stuff in the middle of your on-call shift anyway. Its not 100% for sure but in the normal case this should be fine.
          • achierius a day ago |
            The point is that the "ownership" used to rhetorically justify the labor of such engineers is not "ownership" at all: it's missing the literal most important aspect, that is the ability to profit proportionally as a project brings in revenue. Literally everything else is included -- the intensity of work, the singular focus, the care and devotion, the expected level of initiative -- but not the part that would most benefit the engineer.

            It's just rhetorical trickery.

    • chx 2 days ago |
      Not only that but if you have a multiple continental team then no one needs to be waken by an emergency meow. (My PagerDuty is set to a meow sound. So we practice meow driven development: I don't want to hear my phone meowing piteously.) Say, you have someone on the US west coast they can do 10am-10pm while someone else in continental Europe being nine hours ahead can do 7am-7pm.
  • bckr 2 days ago |
    > But websites need to be up 24/7, cron jobs need to run on the weekend and backend servers need to be up to support both

    Tech entrepreneurs should give more weight to choosing markets that don’t require this

    • vineyardmike 2 days ago |
      It used to be somewhat common for websites/services to go down for a few minutes every so often for maintenance/migrations/etc.

      Tech entrepreneurs should give no weight to this. The market seems to support engineers doing on-call rotations, and a service that can’t tolerate any downtime is (theoretically) a service that is worth a lot to a lot of people- which is perfect for monetizing.

      Tech entrepreneurs should stop giving excessive “nines” of availability. Even 99% is probably enough for most customers to never notice, and significantly easier to engineer than 99.999….

      • portaouflop a day ago |
        The issue is less giving out excessive SLAs - it’s more that even a tiny ass startup thinks they need high availability and four nines and scale to billions - when in reality almost no one actually needs it. But those are cool engineering problem so we’d rather work on them than on building a business.
      • bckr a day ago |
        > and a service that can’t tolerate any downtime is (theoretically) a service that is worth a lot to a lot of people- which is perfect for monetizing.

        The connection makes sense but one must not think in this order. One must think “people will pay for this” and then consider “does this need to be highly available?”

        If you have more than one road to choose from, and one of them doesn’t require high availability, then give that one some bonus points for that.

      • SoftTalker a day ago |
        It's still common. Nobody really cares if a site is offline for a few minutes. You try again later (or not, but so what). Heck, nobody cares if they are offline for half the day, it gets fixed and at the end of it it's just a post-mortem for the nerds to read and a shrug and life goes on for everyone else. People vastly overestimate the importance of anything that is on the public internet. None of it is life-critical (if it is, it certainly should not depend on an internet connection or a web server being up).
        • coffeefirst 18 hours ago |
          Correct. Things are breaking all the time. If you’re not a hospital or air traffic control, nobody is going to die if your website goes down.

          There’s a time and a place for heroics, but we go to it for shit that doesn’t really matter, or worse, allow the culture of heroics to cover up the real problems that are much harder to fix.

          • erik_seaberg 2 hours ago |
            Imagine Walmart telling all their customers "Come back tomorrow, maybe, we can't manage to keep any of our stores open. At least nobody is going to die."
    • matsemann a day ago |
      An internal service we relied on when working at Norway's biggest government agency had opening hours. If you called it outside 8-17 it wouldn't reply, heh.
      • bckr a day ago |
        Government is a good one. Healthcare is another. Anything geography-locked.

        I’m not saying no one should create a highly available web service. I am saying that this is one of those things that techies assume, and shouldn’t, because it’s a huge plus to hiring, business and engineering simplification, and morale if you can define away non-business-hour problems.

    • kqr a day ago |
      I agree it's worth considering, but allowing problems on e.g. weekends sets up weird incentives for management everywhere I've been.

      For example, they may not want to fix quality issues as long as their consequences can be pushed to the weekend. Or they may start to demand people work weekends to do maintenance.

      Or -- worst of all -- they realise they can avoid deployments entirely on weekdays, and then do these big bang deployments on weekends.

      This makes engineer's lives miserable but looks like rational optimisation to management.

  • thebigspacefuck 2 days ago |
    On a past team I set up on-call to be:

    - Mon/Tue - Wed/Thu - Fri - Sat/Sun

    Original reason for this schedule was that on-call was paid by days per quarter in a tiered system so this guaranteed that all members got the 5% on-call for 10 days/quarter rather than one person hitting 9 days and dropping to 3%, but I stand by this as a better on-call rotation.

    The number of people does need to be not wholly divisible so the days rotate so if you run into this you can combine Fri into Sat/Sun or break Sat/Sun apart. It’s a bit complex to set up but the mental impact of on-call is greatly reduced and if you need a week for vacation you can much more easily find someone to cover your shift for a couple days in a nearby week rather than ending up with 2 weeks back to back 6 weeks from now. And if you pull a weekend you get the week off rather than losing your weekend to on-call and going into a work week still on-call.

    • bigiain 2 days ago |
      Any company that makes it an employee's responsibility to find "someone to cover" their on call time while they're on vacation is a company worth quitting.

      I'm pretty sure that'd be illegal here in .au

      On call coverage while an employee is on vacation is a management problem, not an employee problem.

      • glitchcrab 2 days ago |
        Could not agree more, any company I've worked at with an on-call rotation has always ensured that staff are not scheduled when they have holiday booked. The only time an employee needed to find their own cover is if something unexpected came up during their on-call period and they needed a few hours out (like an emergency visit to the doctor with a child etc).

        At my current job we have an automated scheduler which uses our gcal to ensure that it never schedules if people have an AFK entry. It also schedules fairly based on how long since the person was last on-call, not putting them on on a weekend if they were on last weekend etc (we do 24hr shifts).

        • jeduardo 2 days ago |
          Are you using an in-house scheduler or is this a feature of a particular tool?
          • glitchcrab 10 hours ago |
            No this is an in-house tool. I would share the repo but for some reason it's private (not sure why, there's nothing confidential in it)
        • groestl 2 days ago |
          > The only time an employee needed to find their own cover is if something unexpected came up during their on-call period and they needed a few hours out (like an emergency visit to the doctor with a child etc).

          That's exactly the time where "finding your own cover" is the most stressful.

          • glitchcrab 10 hours ago |
            I do see your point, but I think some context about where I work helps here. It's a very chilled company and all teams are very self-organising. In the case of me needing to find cover, I would just drop a message in our on-call slack channel and someone will almost always pick it up (and do the necessary stuff like adding an override in the opsgenie rotation). If nobody happens to see it (unlikely) then I would just let the alerts escalate to the next person who would be happy to pick it up because they know that people don't let alerts escalate without a good reason. Everyone cares about their colleagues so they want to help if they can (it's also quite a small company).
      • thebigspacefuck 8 hours ago |
        Most companies are like this in my experience. You have some default rotation going out to forever and as part of planning vacation you check whether the time you are requesting is when you're scheduled for on-call and if so ask if someone else is available to swap on-call. If you can't find someone to cover, you raise to your manager and either they'll cover it or find someone else to cover.
  • zeroonetwothree 2 days ago |
    In my 20ish years I’ve done every possible day for oncall schedules. I would say each have pros/cons but overall I found it to be a minor difference.

    Mon-Mon is nice because it’s a logical time to start something fresh at the start of the week. Tuesday is good for the reasons in the post, Wednesday is similar. Thursday is nice because after you’re done you can relax on Friday. Friday-Friday is less common but can be nice because you get the satisfaction of being done on the last day of the week.

    • natebc a day ago |
      Where i work we do Friday 8AM -> Friday 8AM. We changed to that from a Monday->Monday a few years ago. Feedback has been postive. Coming off of on-call on Monday morning was just a major bummer.
  • pmayrgundter 2 days ago |
    This is a strong positive imhe:

    "- Step 1: handling it

    - Step 2: making sure it doesn’t happen again

    So when a major issue happens over the weekend only Step 1 happens during the weekend. Step 2 involves following up with other teams, creating new alarms and updating the runbook. And all that usually happen during the week. The oncall is going to spend at minimum their Monday doing that so it’s better if the schedule reflects that."

  • Simon_ORourke 2 days ago |
    100% agree, especially when you have to deal with distributed teams in the UK with all their "bank holidays" which all seem to land on Mondays.
  • throwaway240403 2 days ago |
    Each person on my team has a day of the week they own, and then we have a rotation for weekends, and negotiate holiday/pto trades. I guess it really only maps correctly for a 5 person team.

    We previously had a week long rotation, and some folks were initially skeptical of the idea to change, saying they were worried they'd feel like they were "oncall all the time". But, they agreed to try it for a month. That was a bit over a year ago now, and no complaints.

    I think it ends up being a lower stress configuration, because it just becomes part of your normal expected work-week routine, and generally isn't as mentally draining. It does make end of year PTO/holiday time a bit more complex to work out, but so far my team has been okay with that tradeoff.

    • hnlmorg a day ago |
      how do you work around bank holidays? Which in some countries are almost always on the same weekday? Does the person who has Mondays just deal with not having a longer weekend like everyone else?

      What about the person who has Friday? Do they never go out on a Friday evening?

      Sounds a nice idea in theory but not all week days are equally inconvenient.

  • coding123 2 days ago |
    It makes it slightly harder to go on vacation, I think a lot of people won't like that.
  • canergl 2 days ago |
    wrong, oncall shifts should not even exist
    • guessmyname 2 days ago |
      I agree with the sentiment, but on-call support is an unavoidable necessity given the critical nature of many systems that underpin modern society.

      When we talk about on-call, we’re not referring to systems like Netflix streaming a major fight for 65 million users, but rather essential infrastructure like healthcare systems, nuclear power plants, military operations, financial markets, and the vast array of SCADA (Supervisory Control and Data Acquisition) systems that monitor and control industrial processes.

      These systems are crucial to our safety, economy, and everyday lives, and downtime or failure is not an option.

      Before Apple, I worked at Microsoft in the Azure team, where I logged over 2,016 hours of on-call support each year. This involved six 24/7 on-call rotations, each lasting two weeks, with responsibilities alternating between primary and secondary support. While there were certainly tough moments and challenging issues during those rotations, they also provided valuable learning experiences and helped me develop problem-solving skills under pressure.

      On-call support is a necessary evil.

      • harimau777 a day ago |
        Then the people who have to work on call should be compensated accordingly. In my experience they are not.
      • smitelli a day ago |
        I’ve always felt it should be split among a geographically distributed team where support hours follow the sun. It really sucks to be awake at 3am, alone, groggy and unsupported and responsible for saving the world.

        If the company/product isn’t large enough to be distributed, is it really important that it have a 10 minute time-to-acknowledge?

      • sed_zeppelin a day ago |
        I was once paged over thirty times in a span of 24 hours while working for a website that, in the grand scheme of things, could unilaterally improve life in the United States by shutting itself down.
      • azemetre a day ago |
        Are you willing to discuss how MSFT compensated you for on call.
  • MisterBastahrd a day ago |
    Here's my on call schedule: never, and don't ask. It's my responsibility to do my job when I am scheduled, and it's management's responsibility to staff properly. If we can't agree, then we can't have a business relationship.
    • sed_zeppelin a day ago |
      I wish more devs had the gumption to refuse it.

      I can understand on-call hours if you're a literal firefighter or paramedic who saves lives. I understand that, as a building superintendent, every once in a long while you have to run out and fix a burst pipe before property is destroyed. I don't understand why some of these tech companies have on-call responsibilities like there was some hazard to life or property.

      They need five nines of availability to make sure they don't lose one cent of potential ad revenue? Good luck with that, I guess, but I'll be over here actually sleeping through the night.

  • sgarland a day ago |
    I’ve also done Wednesday to Wednesday, though Tuesday seems better mentally if only because there is one less day after the new week starts.

    What is much better, though, is splitting the week into a 4/3 or 5/2 split, with a primary and backup on-call. Primary takes the weekdays, then switches with Backup for the weekend. You’re still sharp and aware of any current issues should the need arise, but the odds of a weekend page are (hopefully) lower, so you can relax a bit.

    This of course requires enough people to have a reasonable rotation; 6 at a minimum, but 8 is better.

  • nvarsj a day ago |
    We started a split shift for a really busy oncall and it works out really well. It's Th->Tu, Tu->Th. So basically weekend+2 working days vs 3 working days.

    Expectation is you are 100% oncall during the working day, so it works out pretty well between weekend vs non-weekend shifts.

    I much prefer the shorter shifts to a full week. A full week on-call usually means delaying important project work, etc. for a full week.

  • harimau777 a day ago |
    On call should be reasonably compensated. IMHO all other discussions of on call should take place after that is resolved. Instead, developers are expected to work unreasonable hours and are then fired when they start to burn out.
    • metaltyphoon a day ago |
      Seriously don’t understand why devs want to do free work.
      • s1artibartfast a day ago |
        If that is what you think they are want, you are seriously confused.
        • metaltyphoon a day ago |
          Please enlighten me as to how on call is not free work when companies do not pay for this outside of your regular salary .
          • s1artibartfast 20 hours ago |
            1) Salary can include tasks like this.

            2) You said they want to. They dont. If you offer same pay for a job with and without it, exactly nobody would choose the job with extra on call duties.

            The obvious part you are missing is that people do it because they are paid to do it, and they like money.

            • metaltyphoon 11 hours ago |
              > Salary can include tasks like this

              So lets say that it just magically happens that when YOU are on call, stuff breaks all the time but when its your coworkers it doesn’t. You are all paid the same, does it seem fair to you now?

              Unless it’s written in paper where a salaried worker will be getting X extra per hour you are just working for free. The definition of a salaried worker in the US is having 40hrs of total work time averaged throughout a year.

              • s1artibartfast 8 hours ago |
                >The definition of a salaried worker in the US is having 40hrs of total work time averaged throughout a year.

                I think we got to the heart of things. This is absolutely not true! Not legally, and not in practice. There are Overtime exempt salaried positions and non-exempt positions [1]. An exempt salary position position pays more than $685/week and means you do the "the role" however your employer defines it. That can be 40 hours, 80 hours, or whatever they choose. It can require you live on-site for the whole year.

                https://www.dol.gov/agencies/whd/fact-sheets/17a-overtime

  • dmazin a day ago |
    A few months ago we switched from Tuesday-through-Monday to Monday-through-Sunday and on call stress decreased.

    After a weekend of on call, it sucks to have yet another day of on call on Monday. This overpowered all other reasons (most of them listed in the blog post) for us.

  • looperhacks a day ago |
    I'm not sure if my team is a crazy exception, but here's how our on-call works: We're usually on from Monday-Monday (with exceptions if we say, don't have time on Wednesday or something), but every team decides the time on their own. During work-hours, every team member is responsible for responding to alerts (but usually, only the on-call engineers will carry their company-provided phones and are the first to respond).

    Outside work-hours? Most alarms (if they happen) are due to bad alarm configurations. Because nothing ever happens. There was one alert this month, and it was because a randomly generated ID contained the string "ERROR" and was logged due to a warning.

    I know that my company isn't the "biggest" (only a few hundred requests per minute) and traffic amount is mostly correlated to usual business hours in my country, so there's just not much happening at night (but never zero traffic). Still, I'm always surprised that other companies seem to have really stressful on-call shifts, because the most annoying part to me is having to carry my laptop if I leave my home for more than 20 minutes.

  • ludwigvan a day ago |
    Mon-Mon has the advantage that it is a single week and you are done. Tue-Tue means it bleeds into the second week.
  • sed_zeppelin a day ago |
    I just want to jump in as a minority voice here. In case anybody is reading the other comments and feeling... alienated.

    I refuse to accept on-call duties, full stop. If a job posting expects it, I don't apply. If a hiring manager says they have it, I do not accept the offer. If management starts talking about maybe implementing it, I protest. If it becomes enacted, I resign.

    There is absolutely no situation in which I will ever participate in another on-call shift. I've been there, I've done it, now that chapter of my life is closed. Find some younger kid, pay them better than you paid me for the miserable intrusion on their life. I'm done.

    Just wanted to be the voice who says what, hopefully, some of the more seasoned and battle-scarred readers here are thinking.

    • convolvatron a day ago |
      on call is like hiring civil and structural engineers to build you a bridge over a canyon, and then when they show up to do a site inspection you just push them in. eventually maybe you'll be able to cross.
      • ddingus a day ago |
        I love your comment on this. Perfection.

        Once, while traveling in an RV for some work related marketing thing, the discussion turned to the lack of fuel economy...

        The RV might perform better if the engine powered the RV by blowing fuel right out the tail pipe. Horrible efficiency, terrible for the planet, and, and all the negatives packed right into a quick expression.

        Your comment is on point. Solid and I just felt like sharing my appreciation for the morbid fun it contains.

        Nice work. Worth a healthy chuckle. Thanks.

      • stackskipton a day ago |
        As SRE, strongly disagree. On Call is like hiring civil and structural engineers then holding them responsible when their poor bridge collapses under the weight of all the traffic.

        Sometimes, yes, Devs get called out for stuff outside their control like infrastructure failing. However, at my job, we just had two devs that quit over on call and guess what, their service was one of worst offenders in "Opps, we pushed bug to production."

        • convolvatron a day ago |
          firstly, on call means supporting the entire service. not something in general I built.

          secondly, many if not most of the issues that arise are part of some infrastructure automation or third party service or database. expecting me to be fluent in all of those to be useful in the hot seat is a pretty substantial investment and qualifies me to be an SRE on top of my other duties

          thirdly, one major reason why my code might fail in production is that it wasn't sufficiently tested, probably because the service as a whole is basically untestable, and even if it were, building test and test infrastructure is likely not at all valued. in many places just filling in that hole would take a year.

          onto to the fourth, the story is supposed to be that by operating the service, I'll be incentivized to fix automation and come up with solutions to make it more robust. I actually know how to do this, and every week I'm on call is time that I _dont_ spend doing this. furthermore, getting permission to do so is often like pulling teeth. sounds complicated. sure that would be nice, look at that when you have time in the indefinite future.

          so what this often looks like from a development perspective is that I'm being paid to be a developer, I was judged based on my ability to be a developer, but at the end of the day I'm not building the service. I _am_ the service.

          • stackskipton a day ago |
            If you are on call for infrastructure, then I could understand not wanting to be on call. If I'm there, I'm on call for infrastructure as SRE.

            I get all political reasons that your code may not work. However, refusing to be on call doesn't fix any of those reasons, it's just ignoring work. Flip side as SRE, I ask if Devs are on call. If they are not, I don't take the job because there is zero incentive for them to fix anything vs churn out 5 features, chuck it over the fence and be like "Ops problem now"

          • CoffeeOnWrite a day ago |
            > but at the end of the day I'm not building the service. I _am_ the service.

            I agree. For me though, it gives me pride to own my services and be fully accountable to the business, especially as part of a team with whom I build comradery, and of course our value to the business justifies our good compensation. It only works because we are empowered to make decisions that keep our on calls sustainable.

        • badgersnake a day ago |
          Give them the time and budget to build it like a bridge then. Oh wait, your competitors beat you to market by several years.

          Making people work 24/7 is not conducive to good anything, thus on call is a terrible way to do things.

          • stackskipton a day ago |
            On call shouldn't have 24/7 responsibilities. For example, I'm on call and took a call this morning due to MySQL Server running out of space. Terraform change later, it's no longer out of space and I'm back to my day. I'll take 30 minutes it took me to resolve out of my normal time elsewhere.

            If on call balloons your 40 hours to 70 hours, yes, you have an issue. That's not normal and you should consider changing jobs.

            • Retric a day ago |
              Waking someone up is going to cost far more than the time spent fixing the issue.
              • stackskipton a day ago |
                I wasn’t woken up. If someone is woken up, it’s expected they take more time off.
            • badgersnake a day ago |
              If you are expected to carry your laptop and answer the phone you are working. Whether you get paged or not is irrelevant.

              The corporate gaslighting is strong with this one.

            • bdangubic a day ago |
              you should be looking for another job… immediately… terraforming on YOUR time is no way to live…
      • acchow a day ago |
        In software, building bug-free software is almost never the goal. There is a constant juggling act of tradeoffs between time, requirements, and tech debt.
    • tengbretson a day ago |
      This attitude will keep you off the pager rotation, but get used to building meaningless projects or having your expertise relative to the average developer seen as a liability to the org rather than an asset.
      • gedy a day ago |
        I disagree, not the OP but there is a time and place in a career for firefighting, similar to military.

        I always take responsibility for my own work, even after hours fixes, etc. But active on-call orgs usually are just reaping tech debt that others sowed. Sorry not going to rally for that.

    • corytheboyd a day ago |
      Nobody is rooting for on-call, but yeah I put up with it because I am a young stupid idiot thirty-something who needs to make a lot of money now so that the rest of my life can be Nice Enough. I’d love to be able to cherry-pick jobs like this too, but I am not there yet.

      Not trying to dunk on you, I’m honestly glad you get to do this, it must make your life considerably better.

    • 0xbadcafebee a day ago |
      I am 40 yrs old. I get paid a shit-ton of money (just around $200K) to do this stupid tech work job. I work 40 hours a week, I get benefits, flex time, plus I work remote.

      If I'm getting paged for a legitimate issue that is related to something I built or maintain, then, yes, I am going to respond on-call. Because it's a fucking privilege to get paid this much money to sit on my ass and type into a screen.

      If I'm getting paged repeatedly, or for an issue that isn't my responsibility, then I will get pissed off, and yell and scream until I'm no longer on-call (or they fix the issue, whichever comes first). But I am grateful to be able to have this life. I can spend an hour or two after hours to fix my shit that broke.

      • majormajor a day ago |
        An on-call rotation without sufficient influence over the roadmap and planning to be able to fix persistent problems so they don't repeatedly cause the same issues over and over and over is toxic. And it's gonna kill the team's overall productivity so it's not good for management either. Congrats, you're playing SWE salaries for an ops team that would traditionally cost you less otherwise.

        In a more healthy situation an on-call rotation is the price of being able to move quickly, get stuff out the door, and have compensation that reflects that the company isn't paying a whole team of extra people to stare at dashboards 24/7 just for the rare situations that things break after-hours.

        Gigs with low-overhead + customers that don't expect 24/7 operations are kinda the real sweet-spot dev compensation + role-wise, but ... pretty rare.

        • 0xbadcafebee 15 hours ago |
          Well, I have two thoughts about that:

          1) gigs without 24/7 operations are rare, because there is no good reason for a tech product not to be 24/7. it's not costing extra electricity to keep the lights on overnight, nor more staff. there are a bunch of these gigs (my last gig had no customers for 2+ years) but you shouldn't expect them, because part of the reason we're paid so much money is we're expected to deliver "continuous value". most devs would agree with this, because they all want to be able to deploy continuously, whenever they want. (which is a terrible idea, but it is the status quo.) furthermore, if you're doing your job right (and so is Ops), supporting a 24/7 product should not result in on-call pages, because nothing should be breaking outside regular business hours. if it is breaking outside regular hours, somebody sucks at their job. and Ops' job is pretty simple, so...

          2) you do have lots of control over the roadmap, planning, etc. but nobody is going to walk up to you and say "hey we were just thinking of maybe doing this in the roadmap, is that okay with you?" you have to get involved, early, and consistently. you have to show you're not going to rock the boat, but that you will have good suggestions, and can show they will turn into better outcomes. you have to play a little politics, a little product ownership, and also an engineering role, in order to influence what the business decides to do. as you get more senior this gets easier because people will defer to you more, but even an extremely likeable junior can influence the roadmap.

          on the off-chance that you're just trapped in engineering hell, with hostile management, a terrible product, and a completely apathetic and terrified staff, quit immediately. this isn't normal and you shouldn't think "oh, I'm trapped here." people don't stay in abusive relationships because there's no other choice, they stay because they've justified their own abuse.

      • tbihl a day ago |
        You haven't lived until you've spent a whole weekend at work rushing to fix a production-limiting issue because the boss doesn't know, though you do, about the other division's production-limiting issue which cannot, under any wildly optimistic circumstance, get done in the next two weeks.

        Oh, and that weekend is the weekend before Christmas.

        • 0xbadcafebee 20 hours ago |
          Oh, I have so, so many on-call stories. The one of "these other people are making our lives miserable" is hard to deal with, but there are paths you can take to get them to work on it. Sometimes it's just not feasible (or is risky) to get them to take more ownership in the short-term. So it's really important to do your own job to establish all the potential failure paths, and set up lines of ownership, make sure your dependencies have their shit together (performance testing, trend analysis, alerts, limits, runbooks, etc) so that when they do inevitably fail you can push back.

          I have never been at a job where on-call was done as well as it could be, and most were/are pretty bad in general. But I could always get changes made to on-call, so that when shit started rolling down hill, it didn't hit me.

    • deathanatos a day ago |
      You expect to not be responsible for what happens to the software you put into production?

      (… and I'd like to avoid distracting arguments that amount to "my company does on-call badly" — yeah, those problems do exist and we should strive to fix them. But if I'm to not categorize the argument here as the baby with the bathwater, then we need something to replace on-call with. Prod goes down on a Saturday afternoon; are you going to tell management "tough cookies" until Monday?)

      • rufus_foreman a day ago |
        >> You expect to not be responsible for what happens to the software you put into production?

        I'm responsible for the software I put into production from 9 AM to 5 PM for about 200 days a year. At 3 AM, I am responsible for taking care of myself by getting a good night's sleep.

        If you need 24 hour coverage, taking into account vacations and weekends, you need 5 or 6 people.

        • hn_go_brrrrr a day ago |
          "you need 5-6 people" is moving the goalposts. The root comment said nothing about minimum team size.
          • mikedelfino a day ago |
            If the company has enough people in the team, someone just works the night shifts or on scheduled weekends. No one needs to be on-call because there would be someone taking care of it already.
            • nosefurhairdo a day ago |
              Is the argument here that every software team should have engineers whose normal working hours have 24/7/365 coverage?
              • SuperNinKenDo a day ago |
                If you expect your team to provide 24/7/365 assurance, then it's hard to see how that isn't a perfectly reasonable idea. The only counter to it is that keeping people on call shifts financial cost off the business in the form of psychological cost to its employees. Not very convincing.
                • SpicyLemonZest 16 hours ago |
                  Would you take the night shift? Everyone I've seen promote this idea seems to expect that they'll be the lucky ones who get to keep a normal schedule. If you have a service that needs 24/7 uptime, and you transition from an oncall model to a shift model, at least 2 out of every 3 engineers on the team are going to have to change shifts or quit. If the entire industry shifts, high-availability software would simply join the ranks of fields like nursing or manufacturing where many people have no realistic option to work normal hours.
                  • andreasmetsala 14 hours ago |
                    The sane way to solve that problem is to hire people in different time zones to get coverage. Some still need to do weekends but even those are not the same in every country (e.g. Israel).
      • al_borland a day ago |
        My boss recently started an on-call rotation for us. None of the code I have written is customer facing. If everything I wrote breaks at 5:01pm on Friday, external customers will feel 0 impact if I wait to fix it until I show up again on Monday. Worst case, someone internal has to wait to work on something they’ve probably been putting off for months anyway. There are other things they can work on. If it was a constant problem, I’d get it, but a rare instance can be forgiven when no outside impact is felt.

        I am responsible for my code, but we need to be realistic about the impact. Not all outages are created equal.

        I used to work nights watching over the hardware, operating systems, and applications running in it. We’d do upgrades and break/fix stuff. Some things were worth waking someone up for, but a lot of things weren’t. We’d do what we could do fix it on our own, but for a non-prod environment, it could wait until morning if we couldn’t do it on our own. This idea seems to be lost on people now. I get that 100% uptime of 100% of the systems would be nice, but not at the expense of your employees sanity.

        I haven’t actually been called yet with the new rotation, but any week I’m on-call I’m a bit on edge. In the past I had some pretty horrible on-call experiences that pushed me close to quitting, which I won’t get into, so I’m preparing for the worst. I worked my ass off to get into a position where I didn’t need to be on-call and put in my time working nights so other people could sleep. Being back on-call feels like a demotion.

        • Retric a day ago |
          It is a demotion.
      • ggeorgovassilis a day ago |
        > You expect to not be responsible for what happens to the software you put into production?

        First: IT seems to be rather the exception - most professions have no on-call. Eg. even if my car mechanic screws up a service job, they'll have me bring the car back into the garage during their normal working hours, regardless of how and where stranded I am in the middle of the night.

        A second comment: I'll be responsible for anything I have created in my own way. The reality of software development is that we implement functional requirements we've been given with which we disagree, we implement non-functional requirements which don't achieve the goal, we are made to use frameworks and tools we're not familiar with, on a short timeline, a low budget and inadequate infrastructure and we're supposed to take responsibility for code our co-workers wrote.

        • mikeocool a day ago |
          > IT seems to be rather the exception

          I think there’s actually a fair number of jobs where some level of this is expected.

          Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

          If you manage people who work different hours from you, in a lot of jobs it’s not uncommon to be called in if shit hits the fan when you’re not working (for example if you’re a hotel manager, to just name one).

          I’ve found that any good lawyer I’ve worked with will answer my calls and help me work through things at basically any time of day (their firm might be billing me for the time, but that doesn’t necessarily directly translate to their comp).

          Lots of reporters are expected to cover news that breaks on their beat, no matter when it happens.

          • quicklime a day ago |
            > Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

            My doctor (primary care physician) doesn’t work outside of business hours. In an emergency the recorded message says to call an ambulance and go to the emergency department at the hospital, which is staffed by a different set of people.

            So it seems they do have at least some separation of the oncall aspect?

            Lawyers are another story, there’s a lot of things wrong with that profession and we shouldn’t be trying to copy them.

            • mikeocool a day ago |
              If you go to most hospitals at 2AM and need a specialist of some kind (say a specific type of surgeon), there’s going to be someone in that specialty on call whose going to get paged to wake up, come in, and see you.

              Even in family practice, it’s not uncommon to be able to get a call back from the on call doctor at the practice on weekends or off hours — if you’ve got a situation that maybe doesn’t warrant the ER, but you’re not sure if it can wait until Monday.

              • bsder a day ago |
                > If you go to most hospitals at 2AM and need a specialist of some kind (say a specific type of surgeon), there’s going to be someone in that specialty on call whose going to get paged to wake up, come in, and see you.

                Only if you're dying.

                Come in late Friday and you're going to be sitting in a bed until Monday even if your gall bladder is about to explode.

                • mikeocool 20 hours ago |
                  Sorry, you’re right. Doctors have it way easier than software engineers.
                  • bsder 20 hours ago |
                    Sarcasm simply serves to undermine any valid points that you have.

                    The point was that "on call" is specifically confined as an expectation only to certain types of doctors or under very urgent circumstances.

                    In addition, doctors have extra special dysfunctions like "too many hours in a shift".

                    However, many of these are because doctors also have been fighting various efforts to teach more of them which would enable distributing the required extra labor across more people.

                    • doubleg72 17 hours ago |
                      Funny, my wife is primary care yet does on call via answering service. You clearly don’t know what you’re talking about.
                  • quicklime 20 hours ago |
                    I definitely don’t think they have it easier. They work hard and the stakes are much higher.

                    But what you’re talking about is a person whose job it is to be oncall. It’s the equivalent of an SRE, rather than a SWE. They’re not doing it because they believe in “you build it, you run it” or anything like that.

                • chipsa 17 hours ago |
                  I went in to a hospital at just after midnight, and had my gall bladder out by noon. No, the surgeon wasn’t called in early, but the radiologist who diagnosed the gall bladder was.
              • happymellon 15 hours ago |
                This is wrong on so many levels.

                No they don't.

                I know plenty of people who have had to sit around for 8+ hours because the particular type of doctor is not available. The on call only really applies if you're bleeding out.

                In my 20+ years of development and support, there has only been once that I was paged due to an actual catastrophic failure. Most are because shitty "SREs" wants monitoring on everything, even if its stuff that I have no control over.

          • shafyy a day ago |
            > Doctors are one obvious example — they have on call responsibilities often more onerous than IT, and depending on the situation don’t always receive additional compensation for it.

            I mean.... On call doctors literally save lives. Most on-call software engineers don't. So.

            • bloppe a day ago |
              But think of the shareholders!
        • bloppe a day ago |
          Doctors, firemen and cops are obvious examples, but I've called plumbers at 2am because of a burst pipe flooding the basement. I've called locksmiths well past closing time due to lockout. I've called landlords at all hours for apartment emergencies. Society needs on-callers of all kinds. It's not surprising that some people are vociferously against holding the pager, and I sincerely wish those people success in avoiding it. But someone will always have to step up and they should be appropriately rewarded for it (I've been on-call and was considered lucky to have gotten overtime for it, which I think is strange because it's just a well-aligned incentive structure that any smart company should have)
      • Retric a day ago |
        I don’t put things in production, the company does. And it’s the companies responsible to deal with problems that show up.

        24/7 coverage is expensive and mandating someone is on call 24/7 don’t actually provide it.

        • joshuamorton a day ago |
          This is "companies do on-call badly".

          For the purposes of this exercise presume that our theoretical on-call process is no worse than Google's SRE structure: You are on-call for a 12 hour shift that is more or less aligned with your waking hours, and you are compensated extra for the time you are on-call outside of normal working hours, whether or not you are called in. You are on-call at most one week per month, on average, and usually less.

          • tharkun__ 20 hours ago |

                You are on-call for a 12 hour shift that is more or less aligned with your waking hours
            
            I suppose if you're Google they can theoretically make it so it's more aligned with your waking hours? Do they do it? Most companies don't or can't. I.e. it's _less_ aligned.

                you are compensated extra for the time you are on-call outside of normal working hours, whether or not you are called in
            
            How much? Way too many on-call processes in which this is nothing but a few dollars to be able to say "see, we do pay for this, even when you're not called!". As in, way not enough for the number being on-call does to how you go about your day. Always on edge, always awaiting that call / alert that requires you to drop whatever you are currently doing. Preventing you from actually doing/starting certain things.

            You haven't even mentioned the expected reaction and resolution time and that alone can make a huge difference.

                You are on-call at most one week per month, on average, and usually less.
            
            Great, only one week out of four /s That's crazy if you ask me. Going back to preventing you from going about your day in a normal way. There's no "doing on-call well" in how you describe it.
            • ksmith14 7 hours ago |
              Google staffs SRE teams as either 8 in one location/TZ or two geographically distributed teams of 6 -- often some pairwise combination of U.S., Europe, and Australia to accommodate reasonable on-call shifts.

              The on-call compensation varies depending on what tier of service they're offering. Tier 1 (5 minute response time) is 2/3 of your effectively hourly pay for on-call time outside of local business hours and 1/3 for tier 2 (30 min response time). Or time off in lieu.

      • dheera a day ago |
        I am capable of writing very good software, testing it, and putting it into production, but I am not capable of being responsible for what happens at 3am on a Sunday. Whether that deal is okay is up to you. I'm okay if you don't want to pick me. There are other jobs I can get. I write good software though.

        If the customer is awake at 3am on a Sunday, it's the customer's problem that they were awake at 3am on a Sunday. If it's a social network, I frankly couldn't care; the customer should go to bed. If it's going to be deployed in the emergency room, fine, we should care, but YOU, management, should find people who are actually willing to take that shift (for extra money, or are based in other time zones).

      • uuddlrlrbaba a day ago |
        I expect to be very responsible during my working hours.
      • chairmansteve a day ago |
        Only a disfunctional company would rely on the programmer who wrote the code.
      • RandomThoughts3 a day ago |
        I have never worked for a company where people building the software and people supporting it when it is critical were the same. The idea is weird to me.

        Plus any large enough company should have team in spread out timezones eliminating the need for on call if it’s correctly managed.

      • deprecative 20 hours ago |
        Why do I give a single shit about the software having an issue at 2am? I don't own the company. I don't care. If they care they can hire night shift triage.
      • Spooky23 9 hours ago |
        “Devops” traded less bureaucracy for more accountability.

        Have a generalist ops team that is staffed 24x7, or has paid on call as part of the job. They get run books to respond to whatever goes on.

        I’ve set this up twice. The first time, we had a team in the Philippines that would cover overnights.

        They could start and rollback deployments and do most stuff via the runbook they were provided. Most callouts (5% of escalations) to product teams were due to bad or missing documentation.

        The US based team did similar work, just during the day. Both could escalate quality issues for the product team to fix.

        The other model was all US, on-call based. We used junior and low-skill folks, who had rotating on-call. They were paid 20% of hourly rate for standby pay and had a minimum pay threshold when they got called. All of that hit the cost center of the offending product or service, so there was both a financial incentive to not get calls, and a human incentive as the engineers didn’t want to get called for escalations. Again, documentation is key.

    • dheera a day ago |
      I 100% fully agree with you.

      I have survived 2 cardiac arrests (almost died) during high-stress times. I've been stable for a few years now, but only after I enacted VERY HARD boundaries around work/life and never cut down on sleep for any reason (among other health-first changes I made). I have a significant increase in cardiac arrythmias any time I don't sleep enough.

      I consider myself at this point as having a disability that prevents me from overworking, and I absolutely need my employers to respect that and accommodate that.

      I can work normal hours, and that's my offer. If you want to pay me less, that's okay, but I'm not doing on-call unless it's business hours only.

      If customers give a shit about uptime at 2am then it's management's responsibility to find people in other time zones to deal with it, or pay extra for people who are willing to sacrifice and risk their health for a customer (I won't take that deal though).

    • nosefurhairdo a day ago |
      I've been the on call engineer on my team for 75+% of the last year (most of my team is contractors, new hire not onboarded to on call rotation yet, etc.).

      It's not an issue because we don't break prod. I also feel I'm well compensated. When there have been issues at inconvenient hours, my manager has encouraged me to take it easy after resolving the incident. We've also prioritized improving our integration tests and addressing other issues noted during root cause analysis (RCA), which I suspect is why we haven't had any incidents in recent memory.

      If on call duties are this frustrating, I'd argue it's team/organizational dysfunction that is the real problem, and bad on call shifts is just one of the symptoms.

      Ultimately, somebody needs to be available to fix a production incident. One person suffering from on call duties is better than thousands of paying customers suffering from broken software.

      • alemanek a day ago |
        On call even if you aren’t actually called is still a burden. No drinking or other impairing substances. Need to be available and ready to help on weekends. So unable to disconnect and go on a hike or some other activity without internet and your laptop.

        75% on call even if I was never called would be profoundly unhealthy for me. So I wouldn’t dismiss the toll of just being available 24/7.

        EDIT: I forgot to mention I am on an on call rotation but it is 1week on and 7 weeks off. So, not too horrible.

        • nosefurhairdo a day ago |
          This is a good point. I am fortunate to have a good manager who recognizes the unfair burden. I have missed a pagerduty notification before, which my manager dealt with. The incident did not appear to affect my subsequent performance review, as evidenced by top of band compensation.

          I would expect stricter accountability with a more reasonable on-call schedule.

          • darkwater 14 hours ago |
            Congrats, you have a very good work ethic but a very poor personal health ethic.

            I was like you, and probably still am deep below, and I will fall again in the same trap. But please, try to think about flipping your point of view here and instead of being your manager generous and your company good for not taking into account the time you failed to answer on duty, think about how you are being exploited covering 75% of on duty alone, and the money the company didn't loose just because of you. And how much of that money you got.

    • renewiltord a day ago |
      I think the useful thing here is to mention the trade-off. Practically all my SWE friends making $1m+ are instant responders whether denoted so or not.
      • throwaway2037 8 hours ago |
        I know HN doesn't like jokey replies, but you wrote: "all my SWE friends making $1m+", where "friends" is plural. You have multiple friends who earn more than 1M USD total comp per year? Man, HN is getting crazier by the day.
    • darthrupert 15 hours ago |
      I had on-call jobs at one point in my career. It roughly doubled my salary, and I put most of that into assets that rose about 20,000% in 10 years. So that was nice.

      The job can be excruciatibg though. Don't do it without proper compensation.

    • srhtftw 3 hours ago |
      At last, a voice of reason amid the vulgar crowd.

      I've held several management positions where I've carried a pager. At one employer I helped keep trading databases operational in 7 time-zones on 3 continents. At another I helped fix a backup issue on Christmas eve. Helping those customers was a core part of my responsibility. I fully understood that and took great pride in it.

      But as a developer I too will never accept another on-call rotation.

      Companies which assign on-call duties to developers make the mistake that development, management and operations are different kinds of work which require different environments and skill-sets. Other engineering tasks include testing, documentation, training, and maintenance. At small startups the founders and early employees may do some or all of these but that becomes impractical at larger established businesses.

      Engineers should learn and do all these things in the course of their career but not all at the same time unless quality isn't a concern.

      My experience at a unicorn a few years ago convinced me companies which assign developers on-call rotation either don't understand or don't care about the quality or sustainability of their business. In that company senior management was replaced by folks from Google and Facebook shortly after I joined. I was moved into a team where I had no role in the design, develop or deployment of its services. I had no say in the hiring or firing of the so-called engineers who rushed failing services into place past a wholly ineffective QA department.

      I should have seen the writing on the wall when I began to be pressured by managers and recruiters to rubber-stamp candidates who couldn't pass our coding tests but had spent lots of time on-call. The company's priorities slowly became clearer to me as they grew evermore desperate to live up to their promises. Ultimately I suffered an ischemic attack from the stress of this environment and left the company to focus on my health.

      Oh and the company? It let go of most of its engineers a year later and was eventually acquired by competitor for a few hundred million after having raised over a billion dollars.

  • bradleyjg a day ago |
    I’ve had to schedule on calls for a team and this would make it much harder to make a schedule. Every week’s vacation now means two weeks that person can’t be scheduled for on-call.
  • corytheboyd a day ago |
    I’ve been in so many meetings endlessly searching for the Best Day for on-call shifts to start/end, and feel I have heard it all at this point, every day of the week seems to have pros and cons. Current team just does Monday, because that’s when the week usually starts, and I really don’t want to talk about it anymore lol. If it conflicts with one of the abysmally few US holidays, adjust accordingly, you’re a bunch of smart clever adults, you’ll figure it out.
  • dpcx a day ago |
    We used to do two week on-call rotations (to follow along with sprint/release cadence). At the beginning of '23, one of our devs asked to be full-time on-call, both because they had sleep issues and so would regularly be awake at 2 am, but also because they didn't like that the other team members wouldn't follow the documented processes.

    They don't get paid extra, but they seem to be very happy with the setup.

  • mnahkies a day ago |
    The point about holidays resonates with me. Our setup you get paid extra for being on-call for a public holiday, but given we do Mon-Mon shifts in practice that means two people can't take advantage of a long weekend and only one of them gets extra compensation for it.

    Different people deal with being on-call differently but personally I don't do what I normally would when on-call, whether that's long motorbike rides or hiking etc because it's not practical to guarantee cell coverage and also the threat of a page ruins the experience. A "day off" whilst on-call isn't equal to a day off

  • seusscat a day ago |
    I hate on-call shifts, but if they must exist, I like the way my team handles them. We have split day and night shifts. 7-18 day shift, and 18-07 night shift. All non-work hours compensated with standby at 10% of hourly pay. Any pages outside of work hours earn you an additional 150% in base pay. Each page guarantees a minimum of 3 hours of pay even if you spent only 5 mins on it.

    And since in my country, you must gave at least 11 hours between shifts, if you get paged at night, you get PTO for the next 11 hours on top.

    • BHSPitMonkey a day ago |
      I like the idea of added compensation based on hours covered as it incentivizes the business to avoid very small rotation sizes, but paying extra per page seems like a perverse incentive favoring instability.
      • lolinder a day ago |
        It depends on who has the largest amount of influence on how noisy the on call is.

        If engineers have blanket control to define what is important enough to get interrupted and to prioritize fixing frequent offenders, then sure, it's a perverse incentive.

        If, on the other hand, engineering doesn't have very much control over the roadmap and/or isn't allowed to make their own judgment calls about what really matters for pages, then the arrangement that OP describes makes a ton of sense—it gets gets pages onto the budget as a separate line item, which is a good way to get the people who are really in charge on board with investing in permanent fixes.

        • seusscat a day ago |
          It also becomes a good deterrent against useless requests. You get pinged on Slack at 10pm? Just ask them to file a ticket with a page-worthy severity. When its not nearly as important as that, even external managers will hesitate to do that since they need to explain if the ticket was worth 150% base pay for 3 hours plus the extra PTO next day.

          Significantly reduces the number of pages.

      • seusscat a day ago |
        Ehh.. Only pages between 18:00 and 09:00 count for extra pay. Which means it affects your free / personal time. Where I am, the people care a lot about work not intruding on their personal time, so the perverse incentives are reduced.
    • ndjdjddjsjj 20 hours ago |
      With that setup I'd almost cheer when I get paged. As long as that time off doesn't become "why you now behind on X"
  • davidjfelix a day ago |
    At a previous employer we had a pair on call for each of: front end, back end, and infra. We had on-call lasting from Monday midday - Friday midday. Handing off to a "weekend on-call" from the same pool of people from Friday midday to Monday midday. Weekend on-call paid 100 per day, weekday on-call paid 50 per day. You were generally expected to take normal time "off" (but still on call) if paged off hours. Many people would still work if it was just a blip (rare).

    I thought this was a pretty good system and despite the cycles being shorter, we had enough engineers to fill a rotation pretty well so that at most you were on call once a month, alternating months between weekend and weekday on-call cycles.

    I still do not enjoy being forced into on call and wish I could opt-in. We traded weeks a lot but with smaller rotations or really finicky paging its awful. I still have a sinking feeling in my gut when I hear the work phone ringtone from somebody else's phone in public, and murphy's law definitely applies to being on call -- you always get paged the minute after your beer gets delivered at a restaurant.

  • bleuarff a day ago |
    Here in France, we have strict laws, and on-calls MUST be paid in some form or another. When we were bought by a US company, mgmt tried to set up on-call shifts for us - we had never needed them for the 10 years prior -, until they learnt of labor laws and went "fuck it, you're on call mon-fri, 10am - 6pm". I'm forty, have a family, and no amount of money could justify that I can't shutoff my phone at night, or prevent from going on a walk on weekends because "uptime". I've never been so glad of french worker protections.
    • pb7 7 hours ago |
      Sounds great. What's your salary?
  • regularfry a day ago |
    > Highly scrum/agile focused people have brought up that sprints start on Monday and that starting the oncall on Tuesday makes sprint planning harder

    Wait, what? Don't run your sprints Monday to Monday either. That's been the eventual conclusion on all scrum teams I've been on.

  • phamilton4 a day ago |
    When did on-call become so accepted and demanded from employers? Currently I am "Release Captain" for a week: So I have to setup any releases and manage all the related tasks, do automated/manual testing of the release, release (enabling toggles and any config changes). Then Backup to secondary and primary for a week: About once or twice I am asked to help with tickets. Then for 14 days we alternate primary / secondary. Thursday to Thursday is our deal. Every ~40 days I am in one of the above. It's absolutely miserable.

    I have never had this much time spent doing non-development related tasks. For 4 weeks every 1.5 months I can't have a life at all. This just screams to me that we are forcing broken software/not complete software out the gate a building huge piles of technical debt that will never get the focus. I remember a time when I would start at 9am and end at 6pm every day and never heard a peep about production issues unless the support engineers couldn't figure it out. Which maybe happened twice a year. To make matters worse most things are not allowed to be touched in production with the risk of being fired for making changes. So if you want to "fix" any data or call xyz service you need high ranking approval. It's like being tortured!

    • rr808 a day ago |
      > When did on-call become so accepted and demanded from employers?

      As a 50 something year old software engineer. Its always been like this. I'm kinda shocked at how reluctant the new generation is to support the systems. Sure we'd all prefer strict 9-5 hours but most companies rely on software to stay in business and you need experts available in case things go wrong.

      • szszrk a day ago |
        If you need experts 24/7, you should have had shifts that cover that timeframe.

        Oncall is a source of so many ways for abuse, don't even ask me how I know. Saying that rejecting Oncall is denying support for you system is bollocks.

        I'm happy that younger engineers mostly laught at that concept and leave. Once of the few lessons they are teaching us (the old pricks), especially in self care and respect space.

  • gwillz a day ago |
    I work in a small team, we have ~30 active clients and other ~100 or so that are low maintenance or dormant.

    We don't have an on-call rotation but I desperately want one. Because if no one is on-call - then all of us are on-call. Any one of us could be called at any time if one of our larger commerce projects falls over.

    To me on-call is a necessary burden that means when I'm not on duty I am completely free to ignore my phone.

    I'll certainly feel more positive about helping outside of the 9-5. I do like to be helpful, but perhaps that'll wear off like some kind of honeymoon period, juxtaposed to my current situation.

    I'm always looking for more positives in such a system because I want it to work. Tuesday to Tuesday sounds great. Other comments here highlight the difference between critical fixes and patch it laters. Any other insights are welcome.

  • chungus a day ago |
    When I just started out, I used to take everybody's on-call service because it paid well. Also had the advantage that I learned the ropes quicker. Now-a-days I'm very happy to not be awoken at 3am, even if 1 weekend incident would amount to a couple grand.
  • renewiltord a day ago |
    I always do Wednesday to Wednesday. I think I’d rather fix something than having it hanging over my head. That said, I prefer systems that don’t require constant attendance. But when I was young, I reveled in that. You get to have a lot of fun firefighting and using each chance to make sure the fire can’t be lit next time. Thoroughly enjoyable and eventually the fires reduce.

    I still think of each error as a possibility of improvement to asymptotic zero error and I prefer working with people like that.

    Others prefer other systems and I think that it’s fine for each of these groups to select for members appropriately aligned.

  • dillydogg 11 hours ago |
    I'm not IT, but the hospital I work for has the resident physicans on call Tuesday to Tuesday and the attending physicians Monday to Monday. I have found it to be a good system, with the weekends being covered by people who have a good idea of their patients.