Once a Maintainer

Identifying unmaintained open source packages at scale

Allison Pike — Thu, 20 Nov 2025 16:13:47 GMT

Open source software often comes out of a developer solving their own problem and giving the solution away to the community. Sometimes this surfaces as a one-time code dump, but more commonly the original developer sticks around to maintain their work, adding features and fixing bugs. Eventually the original developer may no longer be interested in continuing to maintain a package, at which point it is either taken over by other contributors or abandoned.

Detecting abandoned packages is important because an abandoned package may not receive security fixes. It also may not receive compatibility patches for new versions of the underlying language or other packages. Some packages are explicitly abandoned by their authors, but many more enter a silent state where it’s unclear whether the maintainer is still around or not.

Infield detects abandoned dependencies using a combination of factors, including release cadence, commit history, maintainer comments, and community behavior. We’ve trained a model to combine these inputs to score the abandonment level of a dependency in order to predict dependencies that might be at risk. Our customers can then take action on these dependencies within their repos.

Dependency dashboard inside Infield

Here’s what we consider in determining a package’s abandonment potential.

Release cadence

Historical release cadence can predict the next “expected release” for a package. For instance, if a package historically averaged one month between releases, but it’s been a year since the last release, that package might be abandoned. Conversely, if a package is released annually, and it’s been a year, that’s less likely to be an abandoned package. We combine release cadence and absolute release staleness together with the following formula:

where

X = average days between releases
Y = days since last release
T = absolute time scale (constant)
p,q = weight parameters (constant)

Commit history

Some packages are in “maintenance mode” where the original maintainer might have left, but collaborators with repo access are still contributing. In this case we want to consider the freshness of new commits to the repo in addition to official releases.

Maintainer comments

The most clear indication of an abandoned package is official communication by the maintainers. We detect these in a few ways:

Maintainer marks a repo as archived or deletes it
Maintainer responds to a Github issue or open pull request noting that the package is no longer maintained. We use language models to detect these.
Maintainer fails to respond to any Github issues or any pull requests

Community behavior

We can detect abandoned packages in part by looking at how the community responds. For example, if we find a fork of a package with fresher commit history than the upstream source, that’s an indication that the community is taking over maintenance of this package.

We combine these factors using a machine learning model. We labeled hundreds of packages ourselves and now use the resulting model to predict “abandonment likelihood” as well as a true/false “abandoned” label.

If you’re interested in reading more about this problem, we suggest this academic paper coming out of China.

If you want to use Infield to track and manage your own open source dependencies, you can easily get started for free on our Get Started page. We currently support Ruby, Python, and Javascript packages, with more languages coming soon.

Once a Maintainer: William Woodruff

Allison Pike — Wed, 21 May 2025 15:25:03 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to William Woodruff, contributor to Homebrew, PyPI, and creator of several open source tools including zizmor, a static analysis security tool for Github Actions. William is currently Engineering Director at Trail of Bits, a security research and consulting firm in New York.

Once a Maintainer is written by the team at Infield, a platform for managing open source software upgrades.

How did you get into coding?

I don’t actually have an academic background in software, really. My degree is in Philosophy. I got into it as a hobby when I was a high schooler. I had a computer and I wanted to run different software on it. And I found Homebrew back when it was like a build from source package management ecosystem. I had a lot of fun compiling things on my machine and installing a bunch of random languages, which I never really used. So I began to contribute to Homebrew and make patches for the formulae that it was using. And then when I went to college, I had nothing to do one summer, and I applied to Google Summer of Code. I did that for two years with Homebrew and I think it took me from sort of a passive interest to really actively doing software engineering.

You mentioned installing a bunch of different languages at first - when did Ruby enter the picture for you?

Ruby was the first full-fledged language I think I wrote software in. I’d written some Perl and Python before, but those were like little scripts.

After graduating college, did you go right into software development?

The company I work for, Trail of Bits, they hired me right out of college. And pretty much since then I’ve done open source. I’ve done a bit of proprietary stuff, but most of my career has been in open source.

Awesome. So what would you say the structure is like, working on open source from within the umbrella of a company?

Yeah, as I’m sure you’re familiar with, the incentive structure for open source is very complicated. Like if you're a big company, the incentive is to extract value from open source, but not necessarily invest in it unless it's a direct selling point. But Trail of Bits is a pretty small company. One of the nice things is that the incentive structure gets inverted a bit at that smaller size. Because we're a consultancy, we can actually sell our expertise to larger companies who do need to invest in open source. And so I look after an entire team that does pretty much full time open source engineering. It's me and about five others who are actual day engineers as well as a couple of project managers working on Homebrew, but also RubyGems. A little bit of standards work there, and a bit of standards work in the Rust ecosystem. And also one of our really big areas is the Python ecosystem. So we do a lot of Python security engineering.

What was the inspiration behind zizmor?

Zizmor is a side project of mine. It started because for about the last five years, I've worked on the Python package index. I'm not a maintainer of it, but I've contributed a lot to it professionally. And so I built up all these security features with the help of a bunch of really fantastic people, the actual maintainers of the project. And as part of that, I noticed this trend that's happening in open source, which I think is ultimately for the good, but has some significant downsides, and that is this push towards putting things into CI/CD platforms. So things like GitHub Actions are alluring because you no longer have to have all this development state locally. You can sort of push it and compartmentalize it into a platform. But the downside to that is it’s a black box. You sort of throw your code and your build steps into it and pray you get functional and integral build products out of it. And so, you know, I believe pretty strongly that a core part of being a security engineer is not just trying to get people to do the secure thing, but meeting people where they are. People are going to use Github Actions and Gitlab’s CI/CD, whether or not I think the fundamentals are secure or not. And so the question is how to make those things as secure as possible.

So similarly with the PyPI stuff, we built this feature called 'Trusted Publishing’, which basically allows credential-less authentication between Github Actions, Gitlab and PyPI. You don't need long lived API tokens anymore. And that got me thinking like, well, how much do I actually really trust the security of the average Github Actions workflow? And I started looking into ways to statically analyze Github Actions workflows and action definitions to see whether or not my assumption that this was a more secure by default posture was well founded. And I don't know. I think the results are mixed. I think the answer is that you can write very secure Github Actions workflows, but by default Github Actions exposes a really large number of footguns that have recently led to some very high profile breaches in the last couple of months.

What are the kinds of risks you’re worried about? Like I as a package maintainer, I'm trying to build a new version of my package. It's being done on GitHub Actions and the resulting build is not what I expect it to be because I installed some wrong plug-in and it's malicious or something? Or someone else was able to publish a version of my package to PyPI because I didn't set up my Github Actions permissions properly and they were able to intercept it? That sort of thing?

I think both of those are major concerns of mine. I am especially concerned about the case where I think everything is integral. I think I've produced a hermetic build within Github Actions, but in reality there's a cache poisoning vector or a trigger that I think only I can trigger this workflow. When in reality, anybody who submits a pull request can trigger this workflow with elevated privileges. So I'm really worried about that case where everything looks like it's going perfectly. Even if you use all these modern supply chain standards, things like SLSA and Sigstore, which are supposed to give you attestations and strong evidence of a place of origin. But if the origin itself is compromised, then these attestations are only an attestation of malicious activity. They don't give you the protection that many people assume they give you. And so I'm really worried about that type of vector.

I’m sure you get asked this question a lot, but as a security engineer, on behalf of your customers what do you prioritize? Because there’s so much that goes into cybersecurity, what do you start with to get your house in order before moving onto the next level of problems?

Yeah I mean certainly with PyPI we’re now on year 5 or 6 of this collaboration. And we started with, as you said, get your house in order steps. The most basic one was that five years ago, PyPI did not have API tokens. You would authenticate PyPI with a username and password pair. So if you were say Google or Amazon, you had employees who had the keys to the kingdom for your entire namespace on the index or your entire account rather on the index. And this violates a principle of least privilege, right? It violates longevity guarantees around tokens. Tokens should ideally have mandatory expirations. If they can't, at the very least they should be identifiable and traceable and not entirely random. They shouldn't be user controlled credentials. And so the very first thing we did was add API tokens. So you know, they have global controls. That was a really basic baseline thing.

Then the question from there is, well, we know for a fact that users will still normalize deviance, they will still create global scope tokens that don't expire. So how do we give users a default path that doesn't encourage them to create non-expiring global tokens? And the answer to that was Trusted Publishing. And it had two factors so that people can't just log into the same credential. Then let's add this self-expiring, self-scoping mechanism. And then finally, now that we have this self-expiring self-scoping mechanism without identity control, you've got to do attestations by default. So that's where we currently are. And I guess that gets back to what I was saying earlier, I really love open source. I think it's amazing what we've built on platforms like GitHub and Gitlab, but I am very worried about this semi-autonomous machine that we've built that runs at all hours of the day. I'm really worried that one day someone's going to find something that no one else has thought of yet. And we'll basically have a new version, a much, much worse version of the xz attack or Heartbleed, one of these Internet-breaking attacks.

Have there been any contributions to zizmor that made you think oh, I never would have thought of that? Something that really surprised you?

Definitely. There've been a couple. I mean, I've had a couple of really fantastic contributors come in and submit audits that I had either not thought of or I was not thinking about structurally the way they thought about them. Like I was like, oh, you just sort of scan for this pattern and hopefully you'll catch the really bad things. But they built up the right machinery to detect, you know, malleability in the pattern and they rigorously thought through the problem whereas I had only a vague sense of what the check needed. That's been really nice. And also people have been filing issues for new audits. You know, people just want new features. That's natural. But I've been trying to burn those down.

Do you have a formal roadmap for the project or is it more organic? How much would you say it kind of lives in your head versus managed by the community?

Definitely it mostly lives in my mind at this point. I mean I track everything with Github issues and I have milestones for things I want to accomplish, but big picture things I'm not tracking anywhere but in my own way at the moment. It’s still only a six month old project. So it's mostly just me sort of feeling through where it needs to go. And eventually what a 2.0 release will look like because that'll be where I can begin to break things and try new directions.

What are some other open source projects that you think are really interesting right now, or people in open source that are doing something interesting to you?

There are a lot. I do a lot of work as part of my day job with PyPI and the Python package index maintainers. I think Warehouse itself, the backend of PyPI, is a really fascinating codebase and it doesn’t get the kind of attention it deserves. It’s a really under-appreciated codebase given that it’s a monolith that controls the world’s largest by volume packaging ecosystem. And they do that with an almost entirely volunteer staff and the shoestring resources of a nonprofit foundation with a few grants. And it’s my opinion as someone who’s contributed to it, who isn’t a maintainer but who’s really read through a lot of the codebase, that it’s a really well architected and tested codebase that’s held up under a lot of unpredictable stresses over the years, like ways it had to evolve very rapidly or had to grow an entirely new feature surface which could not be predicted. I’m sure the maintainers there could talk much more intelligently than I could about those pressures. So that's people like Mike Fiedler and Ee Durban and Dustin Ingram. And then I know you’ve already talked to Mike McQuaid, who I consider a great mentor. He’s one of the first people who got me really into open source on a more serious level than just sending patches every once in a while.

I have one more question since it seems like you have a breadth of experience across ecosystems. And different open source ecosystems to me have different cultures, like what the JavaScript community will create packages for versus what the Rust community creates packages for, etc. Do you have any thoughts on the way that different ecosystems are doing things from a security perspective?

Definitely. The difference between them can be very legible at times. I would say that six years ago, if you had asked me that, I would have said that Python was pretty far behind in terms of their security practices. I believe at that point RubyGems and npm already had API tokens. And I believe npm had already enabled two factor authentication at that point. So I would say that Python back then was trailing the pack. And these days, I would say that Python is towards the head of the pack because it pushed so hard on these newer ideas similar to trusted publishing.

Is there a community consensus on what it could look like if we didn't have the baggage?

I think especially for Python, there’s a big wish list of things that could be different if only we had known 20 years ago to set aside conceptual space or standard space for this. I know especially with Python, a huge desire the community has, which is very hard to solve technically, is namespaces. npm had a pretty decent amount of community pain when they did it, but it paid off long term. Now there's been some discussion around Python standards for adding namespaces to PyPI and other indices, but there is some packaging that is so old that it's a really significant lift. That's one thing. The lock files are another thing that are really conspicuously missing from Python, unfortunately, in my opinion. And now there's PEP 751, which is the lock file standard. And I'm really hoping that sees more adoption over the next months and years.

To suggest a maintainer, write to Allison at allison@infield.ai.

Once a Maintainer: Ed Waisanen and Nate Papes

Allison Pike — Wed, 19 Feb 2025 14:28:58 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Ed Waisanen and Nate Papes of the Gala project, an open source research and education platform out of the University of Michigan. Ed oversees the ongoing development of the Gala platform and advises users on instructional design, multimedia production and use, and the development of interactive data tools. Nate is a developer for Atomic Object, a software agency based in Ann Arbor, and works closely with the Gala team.

Once a Maintainer is written by the team at Infield, a platform for managing open source software upgrades.

Ed, let’s talk a bit about your personal background and how you came to be involved with Gala.

Ed: Sure. I was a Master’s student at the University of Michigan, at what was then called the School of Natural Resources and Environment. I was studying environmental policy but also informatics, and I got involved with this radio show called “It’s Hot in Here” on the student-run radio station, WCBN. It was a lot of PhD students and faculty, and also folks in the community who were working on the environment and sustainability. And out of that came this focus on case-based education. Essentially, using case studies as a way to get at these intersecting issues. They drew some inspiration from the business school and other disciplines where case studies are traditionally used. Rebecca Hardin, the faculty member who was holding down the radio station, got involved in writing an internal grant to create like 50 case studies that would become known as the Michigan Sustainability Cases.

The initial plan was to make some sort of Wordpress plugin that would house these case studies. Because traditionally, you know, a case study would be some sort of PDF that’s just a written document that gets passed around. But it didn’t feel impactful and current and like they were accomplishing their goal. And around this time a student named Cameron Bothner, who’s now at Shopify, was very involved in the radio station and studying linguistics and computer science. He turned out to be a very gifted Rails developer. Pearl Zhu Zeng-Yoders was another early contributor. So the idea was to build some sort of tool that would be multimedia-rich and bring the kind of stuff that was happening at the radio station into these case studies. Every case study would have a podcast episode associated with it. And that was sort of where I came in, because I had experience as a podcast producer. For example let’s say we have several conversations for a case study with somebody in charge of managing water quality somewhere. Let’s use that audio and tie it together to the writeup.

So we had this initial grant for the Michigan Sustainability Cases, which was supposed to last five years or so. That was winding down, and Cameron had left to go to Shopify. We were very lucky in having someone who was the original developer for the project be very capable but also very passionate. And that’s when we approached Atomic Object and Nate to come in.

Nate, can you talk a little bit about your experience with Rails and how you came to be involved with the Gala project?

Nate: So I went to Central Michigan University, and did computer science all the way through. I knew that’s what I wanted to do from a very early age. And I had a campus job where we helped the recreational staff, mostly with Python apps. That was my introduction to understanding anything about the web.

Then my first professional job out of college was at a Rails shop. They made online wellness campaigns - helping companies drive their employees to do wellness-related activities, like getting in 10k steps. I didn’t really want to work for a big company and it was a short drive, so I was like OK, I’ll do an internship there and if you want to hire me, cool. And I ended up really liking it and liking Rails a lot. I was like, why would anybody not choose this? I learned so much and got to work with really senior people.

Then Covid happened, and I needed a change, so I started doing my own consulting. And I ended up getting recruited at Atomic Object, which is an agency based in the Midwest. And one day the Managing Partner was like hey, we know you like Rails. We have a little treat for you. And he showed me Gala and said, “Would you want to work on this?” And I was amazed. I looked at the code and I was like oh my gosh, this is actually like really good code. I would love to work on this.

What did the growth of Gala look like over this time period?

Ed: At the beginning it was this very structured thing where students would apply to get funding for a case study, and we would essentially give them a mini-grant. They’d create their case study and I’d interface with the student teams and we’d work with Cameron to put them up. But over time we started asking for more robust tools to do the sort of authoring we wanted to do. And at a certain point, I don’t know who said it, but it was like, “Can we just make this available for anyone to create something?” And once we did that, people kept finding us and coming up with something cool to do.

One that I’d like to highlight is the OCELOTS. They’re a group of tropical ecologists that use Gala for teaching tropical biology and conservation. It’s a research coordination network, which brings together instructors to teach them how to create modules that are similar to the goals of the Sustainability Cases, i.e. more multimedia-rich, more engaging and interesting, but still open educational resources. So now we have this relationship with several of these groups where they're doing their thing and we're kind of embedded with them and can say now Gala needs to be able to do this or that.

Is there any kind of top down roadmap? How do you manage feature requests from the community?

Ed: I think Nate bounces with excitement on this sort of thing. We don’t have a roadmap per se. We have started tracking issues a lot better recently. I would say it’s one of our goals because all of this is sort of managed in my head, weighing our needs versus where we have knowledgeable people, and our funding, and I’m always trying to synthesize what’s the best focus that will serve multiple needs at once.

This is the thing with open source, right. Do you have an estimate of how much of your time you’re spending on Gala, per week or per month?

Nate: The number one goal originally was just to keep Gala online. So one day a week I’d work on Gala for just whatever improvements needed to be made, bug fixes or whatever. And then we were behind on Heroku updates and stuff, so I started doing more work there, and eventually started doing some feature work. As I got more into it I really got into the vision, the promise of Gala and the people involved.

Ed: I balance my time with instructional design support and documentation and managing how we actually package these things up to engage more students. We have CS students that come through who work on either Gala itself or on tools that integrate with Gala, like data tools and things like that. Recently we’ve also gotten some UX design students. And they get pretty energized about it. I find we can get them pretty excited about education and they like having worked on an open source project. Many of them have used it in the classroom.

Where do you see Gala going from here? That could be in terms of features, users, or use cases, or it could also be in terms of contributors. In other words, what do you hope for Gala?

Nate: From my personal perspective, I’d like to see Gala grow at a sustainable pace. I’d like to see more people be aware of it. Really smart people are writing really good content and the platform is good in and of itself. I think it was ahead of its time in a way, like how you can add really rich media to the learning modules.

Ed: As you know, we’re moving our base institution to Notre Dame. And one of my hopes is that because we’ve always had people from a bunch of universities clicking around the site, you wouldn’t immediately say this is a University of Michigan product. I’d like to formalize some of the governance and be for transparency but also be set up so that more people can get involved. So students can come in and learn and also contribute. The idea is you're a learner, but you're also an author, and you're also maybe an instructor - bringing that kind of ethos to the infrastructure that's running the thing.

Another thing is we've been engaging with SEEKCommons, which is another NSF-funded network of folks who are big on open science and open metadata. We've also got some folks from Wikidata engaged. So my hope is to become more interoperable, nicely indexed, and lean on these other open source projects to get the stuff we want to do done.

To suggest a maintainer or learn more about managing your open source upgrades with Infield, write to Allison at allison@infield.ai.

Once a Maintainer: Santiago Pastorino

Allison Pike — Tue, 26 Nov 2024 15:13:37 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Santiago Pastorino, contributor to the Rust compiler team and alumni of the Rails Core team. Santiago is also a cofounder of a software development consultancy called Wyeworks, and he spoke to us from Uruguay.

Once a Maintainer is written by the team at Infield, a platform for managing open source upgrades.

How did you first get into programming?

I live in Montevideo, Uruguay. And I would say that like most of the people back in the day who started when I did, we got into programming through formal study. I went to university to do computer science. It was less of a hobby like it may be for many people now. I studied structural programming, algorithms, object oriented programming, and things like that.

And what was your first job out of school?

I started working in some local companies as a consultant doing mostly Java. But my friend and I, we wanted to see if we could do things a little bit differently. And we were getting bored of Java. So we started our own consultancy. And these were the days that DHH and Ruby on Rails was really starting to have its rise with the demo building the blog in 15 minutes and all that. My friend was already doing some Ruby, I was more into Python. But we had some people that wanted to do some prototypes in Rails. And we started to follow people on Twitter who were advocating for things like agile, scrum, TDD. And we found that really interesting.

This was actually a huge shift in the way that we worked. We used to meet with clients, have maybe a document, spend months working on something and then find out afterward that everything was wrong. But since we started our own agency, we could try out these new ways of doing things that we found interesting. That was also the beginning of my contributions to open source.

Can you talk a little bit about the progression from your first contribution to Rails to becoming a Core team member? Do you remember what your first contribution was?

I don’t remember exactly, but there was a client that needed a certain feature. And I ended up building something crazy that required me to go into how ActiveRecord worked. So that gave me a taste for the framework. Then I saw one of the Rails Core members say that they were going to have a weekend hackathon sometime soon. And I thought, why not? They included some gamification, you earned points or something. So I ended up sending 3 or 4 simple pull requests, maybe fixing a warning or adding a test case, something simple.

A lot of people think that when you start working in open source you must work on some super complex project, and the first thing you do is a total rewrite, and it’s not true. For me personally I have a really local understanding of what I’m doing and I don’t need to know the whole thing. These are really complex codebases, and it’s really hard to know what’s going on everywhere. I know my little piece of it.

When did you start getting more into Rust?

So when I was doing Rails around 2014-2015, I started to notice that people were talking about Rust. I started reading more about it, following people that talked about it, and went to this Rust conference in the US without even doing anything in it beforehand.

In my personal work experience, we were also starting to shift again the way that things were done, and I started to become less interested in the web development side of things. This is without judgment, but there were a lot of changes, mainly on the JavaScript side of things, where it seemed like every day there was something new and that was sort of unsustainable to my brain. It was like today we use A, but tomorrow B, then we use C, and not really for any reason. In my opinion it was changing for the sake of changing. Which is fine, but it was not for me.

So I started to get into more lower level programming. Rust kind of aims to cover the space of system programming, but with added safety warranties. So that was an interesting thing to me. I started looking into more, their solution for safety, and it was very principled and elegant. And so I started attending more events, and eventually I decided that it was a really complex language to just be working on on the side while I was still doing web development, only an hour a week or something. So I decided OK, let’s get a project. And the project ended up being the Rust compiler.

Ruby has this mission statement or is organized around the principle of developer happiness. What would you say is the motto for Rust?

Yes. It’s about safety, and about performance. It’s also about having stability in the language. You upgrade your software to the latest Rust version and it’s not going to break. Something that I’ve seen that is kind of new to me is that when you as a developer go to propose a change to the Rust compiler, you can request what’s called a Crater run. And what that does is basically we fetch all the libraries that exist, all the Rust libraries in the ecosystem. And we build them and run them and test them against the new version of the compiler. So it’s a way to check with every release that we’re not breaking existing codebases. Of course we don’t have private codebases to check. But at the very least we consistently check that we are not regressing.

I read the blog post that your team put out a few weeks ago about the reorganization of the compiler team. How do you, as a group of people, think about the roadmap for the project? How do you move the project forward?

In open source I think these are really interesting questions - what is the roadmap, what is the vision. Because you know, this is not a company where people are getting paid to do some specific thing. There are people that are paid to work on Rust, but by different entities and each entity has different interests, right? So it’s kind of hard to come together and say this is the roadmap, and these volunteers must adhere to the roadmap. In general, different contributors tend to contribute based on their interests.

But there are teams inside Rust that have different responsibilities. The lang team for example, in order to make important changes to the language you need to follow their RFC process where you write a formal document, and there is a vote. There is a community team, a library team, an infrastructure team, and then there is the Rust Foundation, which is kind of the legal representation of the project. Each team runs in their own way.

It sounds like it’s primarily bottoms up in that it’s driven by the contributors and their interests, but there are teams responsible for different functional areas that have their own processes.

Lately it’s been sort of half year based. We introduced a concept of project goals for the first half of next year, and we decided together between the volunteers and the team members the kinds of things we want to focus on. And then each goal needs an owner who is responsible for it. They are the main force behind it, and then it needs a little plan, and the people to implement it. If other teams are affected they need to say yes, we agree. So it’s more bottoms up, as you say, versus how a company would do it.

What would you say is your personal focus for the next six months on the compiler team?

I have been working on a couple of things. First, we have a project goal that is about building a proper async programming story. And that involves a lot of different things. It wasn’t long ago that async functions inside traits were implemented in the compiler. Before that you needed to use a crate for it, which was basically a proc macro that generated code, right? It wasn't able to generate the most performant code because you needed to box the stuff and that required heap allocation and things like that. So recently the concept of async functions in traits was introduced. We still don’t support dynamic dispatch for those. So at some point we are going to add support for that. We are building a new crate that will allow dynamic dispatch for asynchronous functions inside traits.

There is another project goal about ergonomic ref counting. If you work in particular with a lot of RC or Arc data types and threads and things like that, you need to resort to a lot of clones to get a new reference count. There's a person showing a use case where they need to clone like 20 variables or things like that. So we’re looking at ways to make this simpler and more user friendly.

To suggest a maintainer, write to allison@infield.ai

Subscribe now

Once a Maintainer: André Luis Cardoso Jr.

Allison Pike — Fri, 18 Oct 2024 13:57:24 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to André Luis Cardoso Jr., maintainer of the runtime developer console and IRB alternative Pry. André Luis is a Principal Software Engineer at RD Station, and he spoke with us from Brazil.

Once a Maintainer is written by the team at Infield, a platform for managing open source upgrades.

How did you get into software development?

I think when I was 14 or so I got my first PC, and I discovered that there were things like programming languages, operating systems, and I just felt like that was what I wanted to do. I remember reading tutorials and books on programming, C, C++, Javascript, even though I didn’t know English very well. So it was hard. Most of the stuff I didn’t understand but I knew it was what I wanted to do for the rest of my life. So I decided to do computer science when I got to university.

How did you find the transition from self study and figuring things out on your own to a more formal computer science program?

I thought it was pretty boring. When I joined university, I already knew a lot from the books and tutorials. But they also gave me a foundation that I didn’t have in algorithms, structures, simple stuff that I hadn’t read much about. Then in my second year I started a job as a software engineer for the university. It was in Java, writing software for digital learning for students at the university, e-learning essentially.

What was your first exposure to the Rails ecosystem?

When I was at university, I started reading about other languages. I knew Java because that was what they were teaching, then I moved to Python, Ruby, C, Lisp, and Haskell. I had a lot of free time, so I got every book or material I could get my hands on. I really wanted to understand how each of the languages worked. Also back in 2007-2008, the Rails community in Brazil was getting a lot of traction. I remember Rails Summit Latin America was a huge, huge event in Brazil. That conference was a life changer for me. I saw people from Github, from popular open source libraries. I remember David Chelimsky from RSpec speaking at that conference and my decision at that point was, I needed to find a Rails job. I thought here in my city, most of the jobs are Java. So I sent emails to a lot of places and eventually got from a startup in the US, and that was my first Rails job.

Do you remember your first open source contribution?

Yeah, I remember making a pull request for the Thor library. When you run your Rails server or any command line on Rails, you are actually using Thor. I remember I made a pull request for a simple refactor. It was just renaming a few methods, and the person that reviewed my pull request was José Valim, who created Elixir but was very active in Rails at the time. So I was really happy when he merged my request.

After that first PR, how did you start to think about contributing to open source in terms of your time?

I think mainly for me, I like to help. I like to be useful, and give back to the community whenever I can. So when I see an opportunity to give back, I jump in. But as a software engineer, I believe that you must understand how things are working under the hood, not just be a user of some dependency. You need to understand how it works. And I like to read code. So for me it’s like exercise. When I make a contribution or when I read the code from a dependency that we use, I’m learning and that improves me as a developer and as a person.

At my current job, we have a huge Rails application and we’ve been upgrading Rails for about 10 years. We had a lot of dependencies that broke and we had to make fixes or forks and stuff like that. And that improved me a lot as a software engineer. Whenever I can I try to make pull requests. Right now I see that there are a lot of projects that are not well maintained, but we need more people helping. Most people just make an issue and say “fix this for me.”

We’ve heard about the “graying of open source” and the fact that the average maintainer has been at it for more than 10 years. Why do you think that is, and how do we get a new generation of contributors?

At least where I work right now, I try to teach the younger engineers that you need to understand how things work under the hood. And it's not about just learning how the dependency works, but to be a senior engineer, you need to understand more like how the database works. How does Redis work, which is an awesome C codebase to read. And I believe we need to teach the younger generation to contribute more.

For Pry, which is what I'm working on right now, the original creator is not working on it anymore. He's not even working in Ruby at the moment. So he moved on to other stuff that he likes and he passed down the torch to me. I was making a few pull requests and he said, “hey, do you want commit rights?” And that was it. He gave me access and let me do my thing. I think that comes with a lot of responsibility. There aren’t many people that I can ask for help. So we need to ask the younger generation and show them how to help.

There's a lot of simple stuff that people can do, documentation fixes or typos or whatever that people can make some pull requests and fixes for. The example I gave you earlier, my first pull request, it was just renaming some methods and I was really nervous. People should know that even making tiny fixes, that means a lot.

How did you overcome those nerves you felt with that first pull request? We hear a lot from people that they want to contribute, but they are worried they’ll make a mistake and waste the maintainer’s time or get criticized harshly.

At first, I think what made me comfortable was following the project before learning the code, getting a sense of the structure of the code, reading pull requests for newer commits, how the maintainer was reviewing them, reviewing the docs, and that gave me a sense of what was happening. And second, I believe that you should contribute to something that you use every day. No one is better than you at saying what is right or wrong if you’re using that tool daily.

With Pry, I use it every day whenever I’m debugging or jumping around code fixing things. And I noticed the project was not really active. There were a lot of open issues and pull requests, because the maintainer wasn’t really working on it anymore. And I wanted to make it still work with newer Ruby versions. I know Pry is acting a little weird with Ruby 3.3 for example, so we’re working on that.

What are some other projects that you've taken inspiration from or you're looking at right now that you think are interesting or pushing Ruby forward?

The main project that I'm looking at right now is Prism, the new parser that's helping me to remove some code from Pry as well. I think that it's really nice. It's from Kevin Newton.

I'm also following Reline closely. Reline is implementation of Readline in pure Ruby. Readline is the base of Pry and IRB. They are making a lot of improvements, like the new IRB features with multi-line editing and auto-completion with some drop downs that made me think I can bring some of that to Pry as well.

To suggest a maintainer, write to Allison at allison@infield.ai.

Once a Maintainer: Nate Berkopec

Allison Pike — Fri, 27 Sep 2024 14:38:56 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Nate Berkopec, maintainer of the Ruby web server Puma and expert on Rails performance. Nate lives in Tokyo where he runs Speedshop, a Rails performance consultancy.

Once a Maintainer is written by the team at Infield, a platform for managing open source upgrades.

How did you become a software developer?

I was going to school in New York, and I kind of knew I wanted to be involved in tech startups. I was just interested in the whole scene, but I didn’t really know how to get involved. So I was just going to meetups and stuff like that in college. And I had a professor basically advise me that if I wanted to be in the tech world, the easiest way to do that was to become a programmer. This was in 2011, which in retrospect was probably the easiest time in the last 20 years to be a junior programmer and get a job. So I did Michael Hartl’s Rails tutorial. I self-studied, and then worked for a friend’s startup, and then got a job as basically the apprentice programmer at another startup, and it sort of snowballed from there. My first and last love was Ruby.

Did you have a sense at the time, say 2011-2013, that Rails was the right choice? What was the feeling that led you in that direction?

I think Rails was clearly already the framework of choice for the makers of the world, the people who were just out to build stuff and get stuff done. I think there were also two specific things: the first was a blog post by a guy named Nate Westheimer. I don’t remember the name of it but he basically described the concept of going into the sweat lodge for a week and learning to code. He was also non-technical and learned to program with the Rails tutorial. I just remember that blog post really speaking to me.

The other thing that was in the water was Why’s Poignant Guide to Ruby, and I didn’t get it. I couldn’t follow what was going on. It didn’t teach me Ruby. But I remember thinking like, if these are the kind of people that write Ruby, then I want to be in that community. Those two little pieces of culture were really important for getting me into Ruby.

Were you going to meetups in NY around this time as well?

I remember meeting Spencer (Fry), he’s been kind of a pillar in the Rails community in NY for more than 10 years now and he’s built two businesses with Rails. And Graham Lawler’s Ultralight Startups, that was a big meetup back in the day.

What was your first contribution to open source?

My first ever open source contribution was to Rails, and it was just in that spirit of “I'm out here getting stuff done, and this thing is broken - it's in my way. So let's fix it.” It was this weird thing with emails. I didn't know this at the time, but most of us are aware there's a plain text and an html part to an e-mail. The thing is, the order of those parts can matter. So in Gmail, it will display the first part that it can recognize and read. Which I guess is not true of all email clients. So I was looking at emails from our site in Gmail, and they were coming out in plain text, and then in Outlook and they were coming out in html. And I was like, what’s going on here?

It turned out that when you did a format block in ActionMailer, so you would write format.html, txt that was also setting the order of these mime parts in the e-mail, which kind of makes sense if you think about how Rails probably implements this. But to me it was so unexpected. I was like first of all, I didn't know this even mattered, right? Who would have any idea this would be important? So my pull request to Rails was to basically always put the html part first. And that was my first ever open source contribution.

What was the feedback from the Core team? Were they like, no one’s ever brought this up before, or obviously yes, we should do this?

Pretty much, and I think that’s the case for a lot of things in Rails specifically, which is that you come into the project for the first time thinking that everyone’s thought about all of this very deeply. And you, in the trenches, come across a weird corner case that no one’s really thought about before. And so there was a bit of back and forth but generally they were like yeah, merge it in. I think the bar for a new feature or something like that is very different than the bar for changing the behavior of some obscure part of ActionMailer.

How did you end up getting involved in Puma?

So for the next three years or so after that first contribution I was working freelance, and I started writing about performance online and eventually wrote my book, The Complete Guide to Rails Performance. And that kind of catapulted me into being the Rails Performance Guy online. I was going to RubyConf and RailsConf a lot, speaking. At the time, Evan Phoenix was at Ruby Central and running those conferences. So I got to know Evan and Richard Schneeman from Heroku. And at one of those conferences, sometime in 2016, Evan just sort of grabbed Richard and me in the hallway and said, “Do you guys want to maintain Puma?” And we were like, “yeah, sure.” So we did.

What inspired you to write the book on performance to begin with? Or why was that your focus?

So this was now on the down slope of peak Rails, right? This is 2015, and the start of peak JavaScript, we'll call it that. And I was frustrated because I was seeing a lot of people start to think, oh, Rails doesn't scale. Shopify was not huge yet. GitHub wasn't massive scale yet. And so we didn't have a Rails community Top 20 website in the world kind of success story. We really just had LinkedIn rewriting to Node, and we had Twitter rewriting to Scala. So two of the big Ruby unicorns were rewrites.

I was really frustrated by this argument because it didn’t make sense to me. And mostly I was like well, I really like Ruby. I want to keep writing it. So I started to learn more about performance and writing about it and trying to change the perception. And the blog posts I was writing about it were getting tons of views. So I was like ok, I guess I’ll write a book.

Did you get the sense that people were kind of waiting for someone to step into that role? Because many Rails community members have said things exactly like you're saying, they love Ruby so much, they love the projects they’ve worked on, and the Rails doesn't scale stuff maybe bothered them but they didn't quite know what to do about it. So do you feel like when you stepped into that role there was a rush of support around you?

Yeah, yeah, yeah. Performance is important, but it's not the core value proposition of any business. You have to do something fast. You can't just be fast. And so the community was mature enough at this point that there were a lot of apps running and a lot of people that were starting to hit growth pains for the first time. They're thinking like, oh man, we're going to have to end up having to rewrite this thing in a different language. That sounds awful. So I wasn't the only one feeling the anxiety for sure.

Taking it back to Puma specifically, how do you guys run it as a project? Like how do you manage all of the contributions from a people perspective?

I think Puma's actually not that high volume of a project because it does one thing and it's not that complicated of a thing, being a web server. We don't have a mission like Rails, to continually expand what we're doing, right. Puma does a thing and we want to do it really well. So that helps keep the complexity of it down in general.

How Puma works is there are three core maintainers now, myself, Greg, and Patrik. And basically each of us has our own little thing that we like to do and we just run around doing that thing. So Greg really likes to improve our CI, and Patrik is mostly running around responding to issues. He’s very responsive and more of the hey, can you reproduce this, do we need to move this to a discussion, etc. And then I just sort of fill in gaps. I tend to manage the pull request backlog. So merging or hey, this needs to change before we can merge. The other part of it is that I see it as sort of my mission with Puma to grow and find new contributors. I think the most effective thing I can do to help the health of the project is to just find more people rather than to you know, get in there and cowboy code myself because I don’t have that much time.

Does anything stand out to you - could be a feature request, or an issue - as the craziest or most unexpected thing you’ve seen as a Puma maintainer?

I think one thing that was unexpected was this feature that we call reforking, that was contributed by a startup founder (Will Jordan) who wanted it for his own app. Basically the idea is that it changes when Puma spawns new processes and which processes it spawns from. Because to spawn a process in Linux, you’re always copying another process. Instead of booting from the master process, this feature boots from what we would traditionally call a child process, the process actually responding to requests. And the idea was that it would reduce memory usage because you wouldn’t need a master process anymore that was just sitting there not really doing anything. That was such an interesting idea, I’ve never thought of that before. Then Jean took that idea and wrote an entire other web server that was basically Unicorn plus this refork feature and he called it Pitchfork. So that was cool because it was clearly such an interesting idea that it inspired other people to do their own stuff with it. I think that was the coolest feature we ever got out of the blue.

I guess that brings up an interesting thing about Puma as a project, which is that it's an open source project where there are other direct competing or replacement projects that you could use instead. Like in the Ruby ecosystem you've got Puma, Unicorn, other web servers that all kind of do the same thing. How do you think about working on a project of this type, in terms of what could be seen as competition?

Yeah, I don’t see it as competition I think. I think my job is to basically be a steward of this codebase that already exists, and to make it as good as it can be. And I don’t really care if it gets more or less usage as a result. That’s why I contribute to Puma. I don’t really like the framing of competition in open source, because we’re all doing this for free. What are we competing for? But it is true that it’s a pluggable part of the ecosystem, a web server, and I think it’s cool to look at what other people are doing and to look at those ideas.

The best time is when another web server does something and we go “yeah, we would never do that.” I think the person who is pushing the most new ideas in web servers right now is Samuel Williams and Falcon, and Falcon does two things that Puma will never do - one is fibers. We just aren’t interested in doing fiber-based concurrency, and Falcon does it really well. So we don’t have to do it. The other is HTTP/2. Falcon is HTTP/2 native, and I don’t really see the use case for it for Puma. I don’t think it’s on the feature tracker. The only reason HTTP/2 would be useful in Puma would be slightly better development performance. So it hasn’t been high on the list. It’s cool that we can have multiple projects and make different things a priority. Like the whole reason why Jean wrote Pitchfork was because he didn’t want a threaded web server. Shopify has said we’re not doing threads - we’re doing process based concurrency, and that’s it. So we’re not interested in Puma, we’re going to do our own thing with this Pitchfork server. I think that’s great.

To suggest a maintainer, write to Allison at allison@infield.ai.

Once a Maintainer: Adrià Mercader

Allison Pike — Mon, 16 Sep 2024 18:19:46 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Adrià Mercader, maintainer of CKAN, an open source data management system powering the data portals of governments and corporations around the world, including the US government’s portal, data.gov. Adrià spoke with us from Spain.

Once a Maintainer is written by the team at Infield, a platform for managing open source upgrades.

How did you get into software development?

I have a degree in biology, and once I finished my degree I did some coursework that required Excel macros, very basic commands. Everybody hated it, but I didn’t mind it. And then I did a masters in Geographic Information Systems, which had a lot of cartography and spatial resolution and some programming. I was semi-good at it, at least for the standard that they expected there. And from there I just quickly found a job in the geospatial field, but it had a lot of programming components and I started learning more and more.

Was that first job in the software engineering department or in another area?

Yeah. It was a very small company here in Barcelona. It basically provided GIS services, and I was hired to work as a GIS analyst, but really quickly, after just a couple of weeks, they put me on the programming side. We were only like 15 people. It's not like there were departments or anything. But I started learning from people that were there and had good mentors that pointed me in the right direction. But I never set out like OK, I need to be a programmer. It was more organically copying others and going step by step.

I was really lucky that at that first company I was exposed to open source from such an early stage of my career. That’s where I got familiar with the consumer side of open source software. I wasn’t yet a maintainer or contributing, but I started getting into the community and learning how things worked.

My first contributions were these very small projects that I open sourced myself while working at this company, like small Firefox extensions. I remember that I was creating mailing lists and documentation and everything for those projects, even though of course nobody was looking at them. They were super small and just useful for what I was doing in a super specific niche. But at least it gave me an initial sense of what would be involved running a successful open source project.

When did you start working on CKAN?

At the time I was living in the UK, and then I moved countries to go back to Spain. In that time I took a few months just to learn new things. And I was really lucky that the Open Knowledge Foundation, which is the organization that created CKAN, posted a job around this time that just happened to align with what I was learning. This was around 2011. So I was incredibly lucky that they hired me then. At the time, CKAN was open source in the sense that the code was public, but it didn’t have a community around it, or a tech team or anything like that.

It was also around this time that the UK started looking for a government data portal. It was sort of the initial phase of the open data movement. And we got the UK, and then others followed. At first there were no processes, no community or standards around open data. But in the next two years or so a bunch of other governments started exploring open data, and the Open Knowledge Foundation was sort of a trailblazer in that area. So soon after it picked up data.gov, the US government’s open data portal, and it really picked up after that. That’s when we formed an official tech team, and I was already there so it was a natural move for me.

From a people perspective, how does the tech team split up duties? Are there different areas of responsibility?

I would say there’s no single person for every part of the code, but there are parts of the code that someone is more familiar with because they’ve worked with it for longer. It’s not formalized in any way, but we all have our things we are closer with -you’re the search guy or you're the database guy. It's kind of organic historical knowledge.

How do you handle roll outs of new versions with breaking changes and things like that? We’ve talked to some projects where there's kind of a formal cadence of releases say once a year, or others who want to release as infrequently as possible. How do you guys think about that?

Yeah, that's one of the major things that we’ve been discussing forever. The main issue, like most open source projects, is resourcing. It's really difficult to predict. There’s no formal distribution of resourcing as of now. People just contribute whatever time they happen to have at that specific moment. Ideally we want to do more frequent releases, maybe a major one every year because obviously that makes upgrading easier. But the reality is that we've probably been more like a year and a half, even two years between major releases. But that's something we really want to address.

As of just this month we have version 2.11 and I would say that the changes between releases has reduced a lot. We’ve made a lot of effort to make the upgrades as painless as possible. I hope in the future every half a year or so people can rely on a new version or we can make even like semi-automated ones where we just ship bug fixes every month or something like that. Another thing is that our users, who are mostly governments, they often can’t devote a lot of development time to keeping an eye on our releases or upgrading. So if they have a site that’s working, we don’t want to break it. Obviously this is a balance because we need the project to move forward. We need to introduce changes, and to patch vulnerabilities.

We’re a small project. We’ve changed as developers, we’ve changed as a project. There are a lot more tools available for maintainers than 10 years ago. We try to keep up and use whatever we can to make releases more stable and better for our users.

You mentioned that CKAN is primarily used by governments but also commercial enterprises. On the data side, are there any features that have been released more for one type of user or the other? Or that surprised you as being requested by both?

That’s a really good question. Obviously at the beginning our main target users were governments, at the national or lower level. And we didn’t target the enterprise directly, it was more like people just organically started using it. One of our tenants is that CKAN is really flexible and customizable. The core functionality is essentially a catalog, and you can build stuff on top of it and integrate it with other systems and platforms. I think that’s helped with adoption in other fields. There are very basic building blocks for a data management system, and it will take a while to plug them together and build your stuff on top of it. But then it’s incredibly flexible.

The successful projects are ones that are seeking the final technical part just to publish their data. But they have done the work beforehand to come up with an internal data management policy, train their people, and make sure the data is clean. CKAN has been around a long time at this point. It’s well known, at least in the data space we operate in. I don’t think it’s that common anymore that people come into it without having done their homework beforehand.

What’s your roadmap look like for the next year or so?

CKAN is in a pretty stable place right now. I think the next big thing will be the next major version, so CKAN 3.0. It’s probably going to be a refreshed front end. Right now it’s a bit clunky. It’s based on Jinja templates, which is fine, but the way they structure it makes it difficult for people to extend and make their own thing. It also looks quite dated, which is being addressed with the new design. Another thing we’ve identified, through a survey of users, is friction with search. Historically we’ve used Apache Solar for search, which works really well, but it can be a pain to maintain and people want to use a more out of the box solution like Elasticsearch or even just Postgres because for most sites, they’re not massive and Postgres would work just fine. There’s also a lot of work on the data store, which is the data repository itself for CKAN, and how to make it play nice with other tooling like data lakes, big data, etc.

There are also extensions in CKAN that are widely used. They’re not part of CKAN core, but we maintain them nonetheless because a lot of people use them. I’ve maintained a couple related to metadata standards. For example there’s this standard called DCAT for presenting the metadata of a catalog, and there’s legislation in Europe and the US that government portals need to present the metadata in this particular format. We want to make it as easy as possible for sites to comply.

To suggest a maintainer, email Allison at allison@infield.ai.

Once a Maintainer: Bryan Housel

Allison Pike — Fri, 30 Aug 2024 13:31:16 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Bryan Housel, maintainer of several widely used packages in the Open Street Map ecosystem, including Rapid Editor and OSM Lab. Bryan is an avid runner, which led to his interest in mapping the world around him. He spoke to us from New Jersey.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you get into software development?

I guess I’ve always been into problem solving. As a kid I played computer games like a lot of kids, and it seemed like going to study something that I already liked would be a good idea. I went to Drexel University in Philadelphia for engineering. And one thing that Drexel does that’s really cool is they make everybody go on these internships. So I got to do several internships while I was still completing my undergrad degree. And you work at real companies writing code, and that’s how I got started, working for some of those companies that I interned with.

Was the first line of code you wrote at college, or had you dabbled before that?

We had a Commodore 64 when I was a kid. And actually my parents had bought one that was broken. It was putting random characters up on the screen. And so when I was a little kid, probably 3rd or 4th grade or something, I figured out how to get those random characters to show up on the screen as a prank. My parents would get mad and think they had to return it to the store again.

So I was growing up in the 80’s with this Commodore. You could get magazines and books that had listings of code and you could go and type them into the computer and see it actually work. And I did that a few times. There was this brief bulletin board scene where if you had a modem, you could call into other people's computers, leave messages and play games with people. It was actually a lot more fun than the Internet we have now in a lot of ways, because nobody was making any money off of it. It was all just like bored kids chatting and playing games.

If you were running a board, you could actually modify the source code and have your bulletin board do different things. So that was really the first programming I did, hacking up my bulletin board. And once I hit like 10th grade, 11th grade, I realized hey, I have an aptitude for this thing and I really like doing it. I could go to school for it and, you know, turn it into a career.

What was your first exposure to open source professionally?

I would say in the 90’s, there was this kind of shared source thing. And Linux was sort of taking off. That was the beginnings of open source as a movement or a community. And I remember it wasn't so much that I was doing that kind of work directly, but the companies I worked for definitely benefited from it. Some of it involved graphics, some of it involved databases, some of it was just a server and e-mail and Linux and whatever. So I wasn’t directly working on open source, but I think if you were working professionally at that time, you were kind of aware of it.

My first company was a company involving computer graphics. There was a technology back then called VRML. It was sort of like this 3D graphics thing that never really took off. But I worked on that for a bit. I also worked on database software for lawyers and law firms and then became a consultant for it. That was closed source and like all things in tech you realize that eventually things will shut down, things will kind of go away. I realized that I'm not going to be able to be a consultant on this software forever. So I kind of turned to some of my other interests.

I sort of fell in with mapping because I'm a runner. And I would go do runs, you know out in the woods or on trails or things. I think I was using Strava or the Garmin tools and realized that some of the maps that I was using were out of date. And that threw me into Open Street Map. At first I just wanted to see my running routes show up on a map that was correct. But it also sort of dovetailed with my interest in programming, and so I got into maintaining the editor for Open Street Map around 2013. It’s a piece of software called iD. In 2012, iD launched and it was just this really cool piece of software. I think I read about it on Hacker News or someplace and I just had to dig in and see how it worked because I was just fascinated, and I ended up submitting some contributions that grew over time, and then I eventually took on a maintainer role.

The open street map ecosystem is pretty big. Can you talk a little bit about the ecosystem from a maintainer's perspective? Are you pretty siloed in the piece of it that you work on?

Yeah, it's a big community and it's very enthusiastic. I agree it is kind of siloed. I don't think that people working on the different pieces of software are necessarily having monthly meetings or anything. We do have some communication channels that the community uses. And it's a particularly challenging kind of software to work on, I think. Maybe that's what drew me to it.

Almost every app that you use probably has a map in it. And they have this saying that “spatial is special”. If your e-mail software crashes, you're just like, oh, well, I'll just restart it. But if you're navigating somewhere and your map tries to take you down a street that's not there, or you're out in the world and you just feel like, it's lying to me - it hits us at a different level. So I think people have a higher expectation of their location software than they do of other software and get more passionate about it in some ways. You know, the first thing that people do when they get on Open Street Map is they zoom in on their house, right? They want to see like, oh, there's my house and there's my street. And then almost immediately they're like, this is wrong. And they want to fix it.

Open Street Map is really supposed to be a map of people's local knowledge. Obviously nobody knows more about your house then then you do, so this idea that Google's going to make a map and they're going to drive all around with their cars and map the world and spend several billion dollars and it could be a pretty good map, but there will still be mistakes. What's cool about Open Street Map is if you're driving down your street and you know this business is closed, you can do something about it. You can walk around your town, you can survey, and Open Street Map encourages people to do this. Make the map your own.

How do you think about mapping data and how it’s used? I noticed that you have a canonical data structure so you want everyone to input McDonald’s in the same way, for example. Why is that important?

So the other thing about mapping is that there’s a big mapping business and companies that are making money all have an interest in the map being correct. So Meta uses mapping for things like the marketplace. Even though mapping is not their main business, it’s important. Or Uber’s business. Obviously mapping is very important to Uber, but they are not selling the map. So there's an incentive around having people collaborate on a shared map that is the best map that it can be, which requires the data to be correct. And maps are never done - the streets are closing, businesses are changing.

Even as a maintainer you can’t know all the ways that your map is going to get used or who it's going to be used by. Like I started out because I wanted my trails in the woods to look right. We have mapping conferences where we'll go and see what other people are using the map for. And some of the stuff is just amazing.

There's an organization in Tanzania that is mapping remote places where people are victims of violence or this terrible thing called female genital mutilation. They've actually gathered data about how as the map has improved, violence has been going down because they can get people out of situations. And that's the thing where I had no idea that mapping would be used for important things like that. The Humanitarian Open Street Map Team, or HOT, they’re doing mapping in places that are affected by natural disasters, to go and help people. There’s so much going on that as a maintainer I like to think about - it’s not just helping Uber.

How do you think about from a technical perspective what’s next?

That’s a great question. So the map is always going to be wrong in certain ways. It is always going to lack detail or be incomplete. Those are things we are trying to improve all the time. If you go back even 10 or 15 years, the best aerial imagery was all pixelated, blocky and blurry. And every year it’s getting better. It kind of surprises people when they look at their house how clear it is. And so as the imagery improves, we can use machine learning on that to detect things in the imagery. Parking spaces, building footprints, even the roads. We’re developing new ways to detect this information. And there’s a whole industry around taking images over time and telling how the world is changing. So that’s exciting.

Maybe 15 years ago, people weren't really thinking about mapping the sidewalks and the pedestrian accessibility features. You know, it's called Open Street Map, but it's more than just the streets. There are at least a couple different teams right now working on this problem. One of them is up at the Taskar Center at the University of Washington where they actually have an app for wheelchair accessibility. They can make detailed maps around Seattle of where it's safe to take a wheelchair. Like some of the curb ramps are good and some are not. So if you're in a wheelchair, it might make more sense to cross the street in some places instead of others. Or at a street crossing, is it marked or not marked and are the signals are set up with a button or something that a blind person can use? So you know we're just collecting more and more data. It's almost like every year the state-of-the-art improves, so we’re expanding what's even possible.

You said that you originally got into mapping because of your interest in trail running. Are there any particularities about mapping the woods that are different than mapping the streets?

A lot of people have a hobby or an exercise that they like to do and they want the map to support that activity. I don’t know if you remember Pokémon Go, which got really popular a little while ago. So it came out that they were using Open Street Map as the base map for that game. And that presents some interesting challenges because people were mapping fake rivers or fake beaches or whatever to catch river Pokémon outside of their house. We didn’t realize at the time that anyone was going to do that. So we had to develop essentially counter-vandalism, to have people review the map really well. And this is a totally unsolved problem in places where no one’s really looking at the map very closely.

So it’s kind of like people are taking artistic license with the real world?

It does feel like that at times. It’s just being used for so many different things. Like it came out a while ago that Tesla was using Open Street Map parking lots to make its Smart Summon work. That’s the feature where you can have your parked car come drive up to the front of the building to get you. After that got announced we got this influx of data from people mapping parking lots in excruciating detail.

And it can be very political. We’ve got people who are mapping on both sides of the Ukraine conflict.

If people are interested in getting involved in open mapping, how would you suggest they start?

If you’re waiting to get involved in open source, I would say please look at mapping and into Open Street Map. There are so many unsolved problems that are really interesting. And as maintainers we really need people who are willing to just show up and do the small chores every day. It’s a certain kind of person who wants to do that sort of thing. But if you like gardening, maybe you enjoy weeding. And it’s the same with open source. The weeding is the work. The software I’m working on now is called Rapid, and it’s the map editor for Open Street Map. We could definitely use help there.

And anyone can go out into the world and contribute to the map. Write down what you see. You can take pictures and share those pictures with the world. Or help out with the humanitarian work. There’s always work to be done.

To suggest a maintainer, send a note to Allison at allison@infield.ai.

Check out Infield’s new diagnostic tool to get a health report on the state of your app’s dependencies and how to upgrade.

Once a Maintainer: Sean Law

Allison Pike — Tue, 20 Aug 2024 17:52:58 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Sean Law, creator and core maintainer of STUMPY, a powerful and scalable Python library for time series data analysis. Sean and team recently released STUMPY 1.13.0, which includes an easier to use matrix profile data structure, NumPy 2.0 support, pyproject.toml adoption, Python 3.12 support, and improved documentation and testing. Sean is currently Principal Data Scientist at a Fortune 500 finance firm.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

What was your first exposure to open source software?

I started off in the early 2000s, working as a scientist and then eventually going on to grad school. But I happened to get lucky and worked in a field called computational chemistry. If you've ever heard of the protein folding problem or a company called DeepMind, that was the area of expertise that I was in. So even though I'm a biochemist by training I'm actually more of a computer person. We were working on computer simulations of how DNA, proteins, and RNA interact with each other. It was 2013-ish when our academic grandfather won the Nobel Prize in chemistry for producing the earliest biomolecular simulations. What people refer to as data science today, I kind of half jokingly say that we just call it “doing science”. Certainly data has to be a part of it, and it's finding the right tools or sometimes making up your own tools to tackle the task at hand.

Originally I programmed a lot in Fortran and then Perl, and eventually C++ and Python. I would argue that for scientific computing, Python was not really the tool of choice until like the late 2000s, around 2010. One of the toughest parts was package management. Until package management tools like those developed by Anaconda, and then later conda-forge made things much, much easier, people didn't really look at Python, especially in the data space.

Did you foresee writing any open source software yourself at the time? Or was it more happenstance?

The short answer is “no”. I don’t think anyone goes into it expecting that they’re going to do open source. When I started doing software development especially in academia, it was more of a means to an end. You didn’t write code in anticipation that someone was going to use it. You wrote code to do the quick and dirty analysis without any unit testing, but it works. And then maybe a year later you come back to it because a reviewer of your paper asks you a question and you need to go back and look at it and you've probably forgotten what you did in the first place.

But then around 2010 when GitHub started to become more popular they made the developer/code versioning experience much nicer. I think it was around 2011 when I even considered taking some of my academic code and open sourcing it, putting it on places like GitHub or even PyPI so that other people could consume it. But in those days you weren’t thinking that a lot of people in the world were actually going to use it. You’d be happy if twenty years later, 10 people downloaded it. So as part of my postdoctoral training I decided that as a sort of companion to publishing the paper I also provided some code to reproduce the work. And I wanted to contribute back to the scientific community.

And that was also in time series data analysis?

No, the work back then was the earliest stages of applying a more novel or sophisticated approach to predicting protein secondary structure using machine learning. Also I built a package for analyzing simulation data. But again, very limited usage and adoption. This was right around the time that I decided to leave academia and move into industry. And so my pursuit of it was never really about people using it, but that hopefully it could inspire somebody else’s work as a starting point more than anything.

Great, now let’s talk about STUMPY. First, do you pronounce it “stum-pie” or stumpy, like the word?

I’ve learned from Travis Oliphant (NumPy) that we shouldn’t spend time arguing about these things, so I try not to correct people but personally, because the word stumpy is in the dictionary I naturally gravitate toward that. But some people pronounce it “stum-pie” and that doesn’t bother me.

How did STUMPY come to be?

So at the end of 2016, a pair of research labs at the University of California, Riverside and the University of New Mexico published two back-to-back papers detailing this concept called a matrix profile that, once you can compute it, will allow you to perform a variety of time series data mining tasks. And then within the papers, they presented a couple of algorithmic as well as algebraic improvements that allow them to generate a matrix profile really, really quickly. And being somebody who had worked with time series data for a very long time, I was very skeptical at first. And because of the nature of my work I spent time reading through those papers and validating whether it was snake oil or whether there was something there. It took me about a month to develop a basic implementation based on some pseudocode. And my first few attempts, I was like, oh, something's funky here. I’m not getting the right result.

If I implemented it naively, I knew what the answer should be. It's pretty straightforward. But to implement the high performance version of it, there were some oddities. So eventually I went back to the original authors and was like, hey, I'm getting stuck on this part. And they were like oh yeah, there's an error in our pseudocode. An off by 1 error (but their actual MATLAB implementation was correct). Once that was clarified, then everything became unlocked. This published research was indeed much faster than what a naive person would have implemented. And so I thought there's probably something there. And then I tried it on slightly larger data sets and things started to slow down.

So I tried to leverage some of the PyData set of tools, in particular Numba, which is a just in time compiler that takes your Python code and compiles it to much more performant and faster machine code. And after trying it out, it seemed to scale very nicely. At that point I knew there was something there. There are some significantly tricky bits to implementing the performance version but I thought as a society, we shouldn't be reinventing the wheel, right? And that's when I decided to open source it.

So it was around April or May of 2019 when we open sourced STUMPY. And at that point it was more of a nice thing to do without any expectation that people were going to use it let alone contribute to it.

How did you think about those first couple of contributions or those first couple of issues as they came in, from a human perspective?

I think there are probably different camps. But I think for myself having seen a lot of successful packages being open sourced, I had a reasonably good idea of what I wanted to aspire towards. Before we open sourced it we made sure that we had 100% code coverage, which is a high mark to maintain. We opened it in 2019 and five years later, we still have 100% code coverage. So everything that we add and that we've added since the beginning is heavily tested. In fact, we probably have more unit tests than we have functions.

From day one, if I'm the single person maintaining this, I needed to do everything I could to make my life easier. Just having unit tests run every time that there are contributions certainly has saved myself and everybody a lot of headache and this decision has served us very well. Now that also has its challenges of making the bar to contributing a little bit more challenging, if people don't have experience writing tests. But that's where the human side comes into play. We have to have a willingness to serving our community and guiding contributors who want to learn how to write unit tests. We need to remember that people genuinely want to contribute, and they have taken that first step, and that first step is really hard. And when you recognize that, then it's humbling to realize that somebody's willing to spend their time on your project.

For example, very early on STUMPY didn't even have a workflow or CI/CD for testing as new commits and PRs came in. So I created an issue for it, because that world was completely new to me. Then a few days later somebody said, “Hey, I'd like to help you.” It was somebody I didn't know that was in Australia. And when I was sleeping, they were working on it. And very quickly the magic of open source allowed us to have a regular pipeline for automated testing. And then when they were done, they handed over the keys, and they vanished. And I was just like, wow, this is a side of open source that people don't get to see. It inspires me to know that good people like that exist in this world and it inspires me and keeps me going as a maintainer.

Do you have a core team of people that manage the roadmap, or are you still primarily doing it all yourself?

Today, I'd say that at least 50% is by myself. Earlier on we had a contributor from Germany who was pretty active and then they stopped contributing once they graduated from school. But more recently we added a new core maintainer, Nima Sarajpoor, and, together we've been thinking about how to improve the performance of STUMPY. This really requires a deeper understanding of how the package itself is designed and so it's really mainly two people running STUMPY. But again, because of some of the proactive approaches we've taken and a lot of the automation that we've done beyond adding features and improving what currently exists, there's surprisingly not a ton that we need to do.

In terms of focus for the next year or two, would you say that improving performance is where the focus is?

Yes, I think that’s always the case for us because that’s what STUMPY is. It computes something that if you did it in a brute force way would take forever and with some better algorithms and compilers we’ve gained a lot of benefit. Even knocking 20% off the computation can drastically improve the user experience. I think it was Leland McInnes, who created packages like HDBSCAN and UMAP, that likes to remind us that as a data scientist, there are different bins of time, right? There are tasks that take the time to go grab a cup of coffee, and then come back and it's done, to go grab lunch, and to go to bed and what we're all trying to do is to always move the process up to the earlier levels to eventually get to interactive time scales, where you hit enter to execute some command and it's just done so that you don't have to spend time context switching. And that's what we're always striving for. In one of our recent commits we actually did improve our CPU computations by about 15 to 20% which is very, very rare when you're talking about code that is already performant. In fact, we think there's more juice to be squeezed from this orange.

How do you track that? Like can you say we’re twice as fast as we were in 2019 when we came out? I know that that's a complicated question because it might be fast on hundreds of machines or fast on a single machine or fast on a CPU, fast on a GPU, there's a matrix of the places where people want to run your library.

What we can't do is rerun it on the hardware that we originally tested on. All we can really do is run it on multiple different types of hardware and operating systems using the previous version of the code and then the changed version of the code, but also try to be a bit more meticulous in terms of identifying the precise line or lines of code that had the most impact. As a reformed scientist, I'm usually very, very skeptical when people claim that there's a 90% improvement in this machine learning model - like sure, right. Trust but verify is sort of the name of the game.

So we loosely look at performance, but at the same time, what matters more than performance from an open source standpoint is usability - the user interface, the API, how simple, how familiar it is. We often think about STUMPY as aspiring to be what NumPy is to numerical computing. What is important about open source too, is fighting the urge to be everything for everyone and to realize that if I build the software in such a way so that it is modular and composable, then people can build on top of it like they did with NumPy and SciPy or even pandas for that matter.

Have you gotten any input from the community where you thought wow, I couldn't have imagined that someone would have taken it and done this with my package?

Yes. I would have never imagined that people were using STUMPY at CERN, the Large Hadron Collider. That surprised me. When we open sourced this package, we also published a very short article in the Journal of Open Source Software, JOSS. Mostly so that we could get a sense of what people were using the software for, at least in the academic world. When people cite that paper, we get a glimpse into this. We'll see people using it for looking at energy/electricity usage, applications in particle physics, and even people using our package for sports analytics. There continue to be a lot of fascinating opportunities and I think we can safely attribute this to the fact that we have created a package that is general purpose and isn't hard coded for any particular field.

To suggest a maintainer, send a note to Allison at allison@infield.ai.

Check out Infield’s new diagnostic tool to get a health report on the state of your app’s dependencies and how to upgrade.

Once a Maintainer: Sofie Van Landeghem

Allison Pike — Fri, 28 Jun 2024 13:04:03 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Sofie Van Landeghem, core maintainer of the advanced NLP library spaCy. Sofie has been working in machine learning and NLP since 2006, and has worked on practical use cases in the pharmaceutical and food industries. She spoke with us from her home in Belgium.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you get into software development?

It was a long time ago. I had always loved mathematics in high school, and I studied computer science at university. I guess I have quite a logical brain. So I loved doing mathematics, but I didn’t want to just stay in the theoretical. When I learned about computer science, I thought this is something I can apply maths to. After my degree, I did a masters in software engineering and my masters thesis was around data integration and text mining. So I went into data science from there and have been in the field basically ever since.

What was it about data science in particular that led you to want to pursue graduate studies?

It was this feeling of, well, I know all this theory and algorithms and stuff, but now I can actually apply it to a domain. At the time of my masters, this was the biomedical domain. This was years before big data was even invented as a term. So that was very compelling to me.

My thesis was about developing novel machine learning algorithms to process biomedical text. And throughout my PhD and then a little bit in my postdoc work as well, I collaborated with a research team in Finland where we ran machine learning algorithms on biomedical texts, millions of them, and stored the results in a database so that users could query it. All of this is outdated now. At the time it was technically challenging to run text mining on such a large scale, and not many people were doing this yet.

What have you observed over the last 10 years or so as NLP and AI have exploded, coming from an academic background?

I mean, we've been on a crazy ride in the NLP field, right? So back in the day when I was doing my masters and then my PhD, I was using support vector machines and manual feature engineering. You know, this was even before we had word2vec, before we had transformers. The progress has been crazy. And I have been in an interesting position because I left academia around the time that transformers started coming up.

I joined Johnson & Johnson, the pharmaceutical company. And we really had to work on very basic, almost low hanging fruit at the time, introducing text mining to business processes that hadn't been using it at all. So there was quite a bit of a disconnect between all of the new research that was coming out and you know, the practical things that we could or could not do in this sort of setting of a large company. Having to think about privacy, and you know, transformers often needed GPUs - sometimes you couldn’t afford this if you need to have the results quickly. And we still have that today with LLMs. You know, they are awesome. And I think everybody has been amazed by the progress that we've been able to make with LLMs. But at the same time, I'm still very much thinking about everyday use cases and how people can use these in production and whether this will solve an actual business case. There's often still a disconnect between the two.

How did you first get involved in open source?

It started pretty small. Back in the day when I was at university we would try and open source the code behind the research papers that we would publish. This was in Java back then. So that tells you a little bit about how old I am. And then when I was working in industry, in the years after that, I wasn't able to do a lot of open source work, but then as the Python ecosystem grew I started using a lot of these Python libraries like spaCy. It was one of the tools we were using in one of my previous jobs when I was working at a startup. That’s how I first got into the field. And then I got to know Matt Honnibal and Ines Montani, the founders of Explosion, the company behind spaCy, and, and started collaborating with them, and that's how I got more involved as a maintainer.

How do you think about the roadmap for spaCy? For example we spoke with a maintainer from the NumPy team, which is a huge project and quite formally run, I would say. We spoke to Ralf Gommers about NumPy 2.0, which was just released a few days ago. Whereas we’ve spoken with other projects where the roadmap is quite individual driven, like what does this person feel like working on this year? So I’m curious where you think spaCy falls on that spectrum.

I think for us it is more individually driven. The founders of Explosion, Matt and Ines, have a vision of what the library should be. And I think we've stuck to that vision which has been good because it helps us to keep the library more stable and users know what to expect. Other than that, we do have a never ending task list internally where you know, whenever somebody thinks of a feature, small or big, and they add it to the board. So the question is what to prioritize, right? Sometimes we have a consulting project that might require building out some functionality, or somebody wants to maybe do a blog post or a tutorial.

I wouldn't say that there's a grand plan, like we'll do exactly this in three months’ time, but we often have work going on on different branches. So we always have master that we're using for quick fixes and small things that can just go in the next bug fix release. And we have development where the larger things, the things that may break other people’s code are kept. And then we have a v4 branch as well, where the really major features are being added so that we know if we release from that branch, then we have to bump to v4.

What are the features you’re working on implementing right now?

We’re working on the NumPy fix right now (to support NumPy 2.0). That should be out relatively soon. We also made a plugin which is called spacy-llm. This was something that we couldn't really plan for last year. We wanted to make sure that you can also work with LLMs in a spaCy pipeline, so we created spacy-llm as a sort of optional additional plugin. We're working on a major refactor and corresponding documentation and getting that polished up so that we can release the v1 version. And then the other one is actually getting v4 out - I think we started working on v4 two or three years ago already, but it required an update of thinc as well, our underlying machine learning library.

I think sometimes we all just want to keep on pushing new features, but we have to make sure that at some point we wrap up, publish what we have, and continue. This is the main reason why v4 hasn't been released more quickly, because we always think of something new to add. And at some point you just have to say, you know, it doesn't have to be a huge release every time. Let's just make sure that the community has what we've already created and continue from there.

Why do you think that Python gets such a bad rap in terms of its dependency management? It’s interesting to me how how often people say dependency management in Python is just absolutely hellish. What’s your take on that? Why is it?

I'm pretty sure I've made the statement myself in the past few years. It’s always difficult, right? In theory, there should be a proper way to do this and you know, minor versions shouldn't really break things. So you should be able to pin it to the next version that shouldn't break things. I think in reality, though, it’s just messy. We've definitely had cases where we would publish a release that we would assume was not breaking at all and we wouldn't document any breaking changes. And then it turns out that some sort of usage by a few users is in fact broken by something that we published. Often you don't know all of the different ways your code is being used or its interdependence with other libraries.

If every maintainer could promise every time that when they do a small bug fix release it wouldn't actually break anything, then we would all be able to pin it correctly. But sometimes when this does happen, when an external library for instance publishes something that is breaking that you didn't expect, then you might become more careful with this external dependency and pin it more strictly. Which then means that in a month's time users will be complaining that they're locked into this version. So there's no ideal way of dealing with that, right? Either you're too strict and you're locking people in or you're too lenient and your software might break tomorrow if they publish a breaking change. This is true for all open source libraries or all Python packages really.

So tell me about Typer.

So Typer is being maintained by Sebastián Ramírez, or as people know him, tiangolo on GitHub. He's the creator of FastAPI as well. And basically Sebastián used to work at Explosion as well some years ago. That's where I got to know him. I used to send him cat pictures. He also lives in Berlin. He's just an all around awesome guy basically. When he was sort of leaving the company to make more time for developing FastAPI and friends, as he calls it, he made me promise to keep sending him cat pictures, and that's mostly how we stayed in touch.

Typer is just a small little library that makes your Python functions into CLI commands using type hints that you add there with your functions. I've always enjoyed adding type hints to Python. You know, me coming from a Java background, a heavily typed compiled language, I sort of enjoy having the types in Python again and being able to run type checkers. So just seeing that he couldn't give the love and attention to Typer that it needed, Sebastián asked me whether I could get involved a little bit with the maintenance there as well. I’ve been just cleaning up some of the user contributions, getting them in good shape and making them up to date with master, making sure the tests pass, that they're all adhering to the standards that I know Sebastián wants so that he can review them more easily and get them over the line more quickly. That’s my role.

To suggest a maintainer, send a note to Allison at allison@infield.ai.

Infield is hiring full stack engineers!

Once a Maintainer: Mike McQuaid

Allison Pike — Tue, 18 Jun 2024 13:58:15 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Mike McQuaid, project leader and longest tenured maintainer of Homebrew, a package manager for macOS and Linux used by tens of millions of developers worldwide. After ten years at GitHub, Mike is now CTO of Workbrew, a startup for managing a fleet of machines running Homebrew. Mike spoke with us from Edinburgh, Scotland.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

What was your first exposure to computers and eventually writing software?

As a kid my first exposure was at my school. We had BBC Micros, which is what we grew up with in the UK. This was in the early 90’s. And my dad noticed that pretty much any task, no matter how boring, if it involved computers I would happily do it. He was in finance and back before you could get online stock prices, every Saturday he would get the Financial Times and there would be the big broadsheet with the stock prices, and I had some little spreadsheet program and I would basically just go through and enter all the stock prices into his spreadsheet and spend like two hours on a Saturday afternoon doing that. And for me that was delightful. And my dad was like, OK, what's going on with this kid?

And then I just evolved my own interest through PC gaming and fiddling around trying to get the games to work on my computer. Eventually I went off to university and did a computer science and business degree and got a job as a software engineer. This was 17 years ago.

What was the environment like at your first job?

My first job was for BT, British Telecom. It’s sort of similar to AT&T if you’re in the States. And this was back in the days when it wasn’t that AI was going to possibly destroy your job, it was offshoring. Why would you be British and interested in writing software, because in two years all the jobs will be in India because they’re just as good for half the money, etc. etc. That was my first experience at a massive company.

I’d been dabbling with open source for a few years, and I learned pretty quickly that the open source way of doing things - it goes down well in the open source community, but at your real job you get pulled into your manager’s office, like what have you done? So that didn’t work super well with me. After that I left and was the first employee at a startup called Mendeley in London. I’d been hacking on KDE, the Linux desktop environment, for a few years at that point. Hired a bunch of people from the KDE community there, stayed there about a year, did some consulting around Qt for KDAB, a Swedish company. And then I kind of got this sense that maybe the cross-platform desktop app world was not future-proof, you know, starting to smell things. I did a bit of research. I wanted to work for a company using Ruby on Rails based in the Bay Area and pretty quickly came upon Github. I applied, and they said “Not now, you’re too fresh.” So I went to work for this company AllTrails. And then several applications later, I got a job at Github where I worked for ten years.

I guess that’s a nice segue back to the open source stuff since I left there last year and started my own company with some former Github people called Workbrew, and we’re building product solutions for bigger companies around Homebrew.

I’m having this Zoom from a KDE desktop. So I’m curious how did you get into working on it?

So I did Google Summer of Code the summer after I graduated, which was in 2007. Back then you could have journals, and you could post journals to your blog. And I basically built that integration so you could post journals to your blog. I built blogging support for WordPress and a couple of other blogs as my summer code project. And then I ended up sticking around and being involved with like bits and pieces on the KDE side for, you know, maybe a kind of couple years. And then I moved to Mac and I was doing some KDE porting to Mac and then I sort of gave up on it as a lost cause.

I remember a period in the middle there where it was like you're gonna run all of your Linux apps everywhere - I think I tried to install Amarok or something on Windows, and it sort of worked. Not the way the world ended up going.

No, it never worked quite as nicely not on Linux, unfortunately.

What was your first exposure to Ruby?

During university I had started dabbling with Linux and using various Linux distros, submitting patches to things and modifying things on my system or whatever. I’d seen a tiny amount of Ruby but mostly I got into KDE and playing with lower level Linux bits and pieces. Ruby didn’t come until later on when I was at the consultancy and I heard about this project called Homebrew from a friend of a friend in London. And that was the first serious Ruby that I wrote in 2009 or so, and I’ve been using it for the majority of my career ever since.

We hear a lot about how intimidated people are to get involved with open source. So when you heard about Homebrew, what made you get over that hump and get involved?

Hopefully without sounding too arrogant, I never felt particularly intimidated to get involved with open source projects. I think particularly in the Linux era and the pre-Github era, the barriers to entry were so high that it felt like if you managed to get through those barriers people were generally like, we're going to be your friend now. And if you're in my city, I'll take you out for dinner. I’d been to a few open source conferences by this point and everyone was very friendly and welcoming and accepting.

So with Homebrew, one of the guys I hired at Mendeley was friends with this other guy who worked at another startup in London. And he was the guy that created Homebrew, Max Howell. I think having that connection and in the first year or so, being involved with Homebrew, meeting up and getting beers with Max and talking about our thoughts, it felt easy. What sucked me into Homebrew initially was just this idea of scratching your own itch - I’m using this, I need this, and I’m going to add a new thing because it’s easier if it’s in here and then everyone else can use it. And then over time it becomes less about me building for myself and more about, what does everyone who uses this project need and how can I help them?

How do you manage the roadmap? We talk to some maintainers where the projects, especially the huge ones, are very formally run. Others are still very informal. How do you think about that?

Yeah, I would say we’re on the moderately anarchistic end in terms of feature roadmaps and stuff like that. At least until you get to open source projects run entirely by corporations, maintainers have no ability to make anyone work on anything they don’t want to. I'm the longest running maintainer, I've been the project leader. But if I say like, hey Bob, I want you to ship this thing by the end of the month, then Bob can just be like, nah. And I don't really have any ability to do anything beyond that.

So in some ways it was quite similar when I was a principal engineer at GitHub, because when you're in those roles, you have no direct reports and so you don’t have twenty engineers to do what you tell them, but at the same time, you have an awful lot of cultural influence. So instead of it being like, I will tell you what to do, it becomes like, hey, I've had an idea, who wants to join me on this journey? And there's some people with within Homebrew particularly who would like to be doing more Homebrew stuff. So I guess a big change for me in the last few years is that I try to push as much of my personal Homebrew to do list into issues that are tagged ‘help wanted’. Sometimes those are done by me, and sometimes those are done by random people in the community. There are five open issues in the main Homebrew repo right now and all of them are things that five years ago I would have maybe just had in my Apple notes somewhere. But I learned that there are a lot of people that might go “hey, that’s a good idea” and jump in and get involved.

Do you guys have any kind of regular meetings among the core maintainer team?

Once a year we try and get as many maintainers as possible to meet together in person for our AGM. We try and collocate that with FOSDEM, which is in Brussels every year in early February, to try to make a bit of a weekend of it. We're trying to do more events now. We're doing a hackathon focused on performance and security next month. And historically we’ve had some vaguely regular Zoom calls, but it’s hard to sort out the timing for those. Most of our private communication is happening in Slack. That's where we have the conversation about what do you think we should be doing? Or, there’s this issue right now. Could someone jump on and help with that?

So this list of issues here, I see this one about Sorbet. How does that conversation, like “we should add type checking to Homebrew”, get shepherded through?

That's a pretty good example actually. You inadvertently stumbled on one of the more amusing but contentious ones. All of our governance documentation is public but if you’re not a Homebrew maintainer, it can be a bit of a struggle to get through it all. We have the project leadership committee, which is essentially managing the money and the governance structure. And historically that was also maintainers. But now two of the five people in that are not maintainers and never have been. Then we have the technical steering committee, which is managing roadmaps and deciding these types of things. And then we have a project leader, which is like an elected position as well, which is me. And I'm the only person who's ever been the project leader so far.

But the way it works in reality is that the technical steering committee exists to help resolve conflicts in technical direction I’ve been unable to resolve myself with the other maintainers. And that's what happened with the Sorbet stuff. Interestingly, a few years ago when it was initially proposed, I was not a fan of the idea. Now, however, I'm actually a big proponent of Sorbet. I think we should double down on it and use it even more.

What convinced you?

Well sometimes the way it can go with open source projects is that someone gets an idea, that one person is very enthusiastic. And people get excited and say yeah we want to get involved, great. But what can happen in the worst case is that none of the people who say they’re going to step up actually step up, and the person who pushed it to begin with got bored and wandered off, and now you’re left with these problems. And then it becomes someone else’s problem, often mine, to clean up the mess. So I thought that was the way it was going to go with Sorbet. But it turned out that over time actually more people got interested in it, and more people got involved. And personally over at Workbrew we started using it to solve a bunch of problems, and at GitHub a bit as well, and so now having used it to solve problems I’m like ok, this is better than I thought it was.

So in the 15+ years that you’ve been working in open source, do you feel that the way that open source projects are run have changed? What have you noticed in terms of open source culture over the last 15 years?

I think the really big thing in the last 15 years is where and how open source is happening. So 15 years ago I'm not sure whether node.js existed - certainly npm if it existed was pretty minor. Whereas now most engineers out there are writing JavaScript, right? If you're a new engineer coming out of a bootcamp or learning a new language, it would be a strange choice for you not to learn JavaScript and do that on the front end and the back end and then I'm full stack and yadda, yadda, yadda. So I think that has been interesting just because the JavaScript ecosystem has its own culture and way of doing things. You have millions of dependencies and often those are quite small and maintained by like a single person. Not fewer, big chunky open source projects. Like KDE is essentially a big umbrella project that probably has, I don't know how many active contributors it has, but it wouldn't surprise me if it's hundreds or thousands or whatever. And it's carved into a wide variety of pieces. Whereas I feel like open source in general has become a bit more decentralized and there's a lot more small projects of one or two people here and there.

Things have also moved away from the Linux world where it used to be. Like when I was at university, open source and using Linux were almost one to one, right? If you were a big C# Microsoft stack Windows type person, it would be relatively unusual for you to use any open source at all. But obviously over time, with GitHub and Microsoft acquiring them and Microsoft themselves getting a lot more into open source, it's like everything is open source now in some respect but what open source means has also changed. There used to be the assumption that open source projects are community run and maybe they're loosely affiliated with a company or whatever. In the earlier days of GitHub, if you looked at the repos with the most stars or whatever, Homebrew was often up there. Whereas now it's all like VSCode and next.js. Almost all the projects that are up there are ones that have major corporate backing basically. That makes open source a very different world and makes it harder to be an indie, volunteer-run project.

Are there any other open source projects that you’re impressed by or watching?

Ruby on Rails always continues to impress me. I'm using it again at the Workbrew startup that I'm building, and it’s the first time I've built a completely greenfield app from scratch. We’ve got a couple apps I guess, but one’s in Rails and one’s in Go and you know, Go is really good at what it does, but writing it doesn't make me happy in the same way Ruby does. There's been such an amount of time and effort and care and love that's gone into making it very pleasant for developers to use.

To suggest a maintainer, send a note to Allison at allison@infield.ai.

Infield is hiring full stack engineers!

Once a Maintainer: Okura Masafumi

Allison Pike — Fri, 17 May 2024 15:16:20 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Okura Masafumi, a software developer based in Tokyo and lead organizer of the Kaigi on Rails conference, this October 25-26 in person and online. Okura has contributed to and created several open source projects including alba, a JSON serializer for Ruby, and as a longstanding user of Vim, translated the book Mastering Vim into Japanese.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you become a software developer?

My major in university was economics. I was not interested in computers at all at the time. But I live in Japan, and when I was about 20 years old I went to Stanford University to learn English as part of this program, and there were some activities outside of classes where we went to Palo Alto. There were some guys there that did stuff with computers and they were trying to start a company. And this made me think I could do something similar back in Japan.

Do you remember what company it was?

No, they were really just starting. They were in a garage, literally.

So it looked, well not easy, but not that difficult you know. They were just sitting in front of computers and it looked so fun. So when I got back to Japan, a friend of mine suggested that I do an internship where I would build iPhone apps. This was 2009, 2010. The iPhone 3GS had just come out. I thought, oh, this is wonderful, this is what I want to build my own program for. At the time I had no interest in desktop apps. And six months later I successfully launched my first app in the App Store. So my first real world programming experience started with the iPhone.

How did you learn to build the app? Were you self-taught?

Mostly self-taught. As I said I did an internship, but they didn't really talk to me all that much. The app was a location reminder type app, which is now built-in to iPhones. When I reached a subway station it notified me to do something, that kind of thing.

Did you have any users?

I’m afraid no. Maybe 10-ish users, but the primary goal was to publish the app itself, not marketing. So I was satisfied.

What was your first exposure to open source?

When I look at my pull request history data in GitHub, my first contribution to open source was 2012, which is around when I switched from iPhone development to Rails. At the time I had just started to use Vim because for iPhone app development, we must use Xcode. But for Rails development, we were able to choose editors. And my editor of choice was Vim. I still use Vim. There were many plugins for Vim at the time, there still are, and my first contribution was a feature request to a snippet plugin. The snippet in Vim has placeholders. And I wanted to skip one. So if there were 3 placeholders I sometimes wanted to skip, for example, the second placeholder. I wanted that feature.

The funny thing is that after I submitted an English description for my feature request the author replied to me “Please reply in Japanese if you’re uncomfortable in English” because he was Japanese. So I switched to Japanese.

How did you find the human element of collaborating with people that you had never met before? Because something that we hear a lot as a barrier to people contributing to open source is that they feel scared or intimidated just by the prospect of putting themselves out there and saying hey, this is a feature I want or this is a bug that I found. What gave you the confidence to write that first issue?

So one of the first Rails issues I wrote was for the spring gem. And it turned out that I was wrong. I actually went ahead and closed my own issue. But what I learned is that there were other people who ran into the same thing, and they found my issue and it actually helped them a lot. So just because my contribution was not perfect because I raised an invalid issue, it turned out to be helpful to people debugging their own code.

How did you end up eventually becoming involved in conferences and organizing Kaigi on Rails?

The first Kaigi on Rails event was in 2020. We in Japan had a similar event called Rails DM (Developer Meetup), but it ended in 2019. I was assistant staff for that conference, and so I communicated with DHH so he could join remotely, and I invited Jeremy Daer from Rails Core who physically came to Japan for that event. So I knew that if we had a Rails-specific event in Japan, people will come. I was also an organizer for VimConf. So I was confident by then that I could be the founder of a conference. I came up with several friends who might be interested in helping, and talked to them one by one. We ended up with a staff of eight.

When Covid hit we needed to decide to stop planning entirely or just go. And we decided to go, to make it a virtual conference. Which made some issues irrelevant, like physical venues or food. But we still needed to think about keynote speakers and gathering proposals. It was eventually successful partly because we have the Japanese Ruby community, the community behind RubyKaigi, and their experience.

Last year, 2023, was the first year we had Kaigi on Rails in person. We’ve had three virtual conferences, and one hybrid.

It’s much easier to have people join virtually obviously, but you miss out on the spontaneous interactions that can happen in person, where people run into each other getting coffee and things like that. How was the transition to in-person?

I personally preferred it in person. As an organizer, we were lucky because we had three virtual conference experiences before the in-person one. So we knew how to organize talks, how to get sponsors, etc. And the only experience we lacked was the in-person part - throwing a party, for example.

Which is much more expensive, I imagine, too.

Yes. But the sponsors also pay more. And we’d built up a brand by then.

When is Kaigi on Rails 2024?

October 25-26, hybrid in person and virtual.

Finally, who’s doing something in open source right now that you think is really interesting?

I’m a Neovim user now, and there is a guy called Folke who is a Neovim plugin author. He invented lots of plugins simultaneously, and he’s one of the people I’d love to meet in person because he’s so productive. He also sometimes goes on vacation for a few weeks, and during that time doesn’t do any open source. And so I wanted to ask him about that.

How he takes a real vacation?

Yeah! A vacation from open source. I’d love to talk to him about that.

To get in touch, find Infield on Twitter @infieldai or write to Allison at allison@infield.ai.

Once a Maintainer: Rafael França

Allison Pike — Thu, 02 May 2024 13:56:08 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Rafael França, member of the Rails Core team since 2012 and contributor with the most commits to the framework. Rafael is a Principal Engineer at Shopify where he leads the team responsible for the Ruby and Rails ecosystem within the company.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you come to be on the Rails Core team?

I guess I always wanted to be part of Rails Core, at least as soon as I knew what the Rails Core team actually was. But in reality, how it came to be is that one of the Rails Core team members was José Valim - he was the first Brazilian person I knew that was working on open source projects as big as Rails.

Of course later I learned that there are more, but at the time he was the only person that looked like me that was on a team that important, and I wanted to work with him. One year later I was moving to São Paulo to work with José, and I learned a lot. I started maintaining a few open source projects like devise and simple_form. A year later I decided that I would try to help with Rails. I was basically mimicking his steps, doing what he was doing.

This was pre-Shopify, or were you already there?

This was five years pre-Shopify, so 2012. I was doing consulting, and working with José and others like Carlos Antonio who is also Rails Core. That’s how I started, mostly mimicking José, doing what he was doing. Then I found my own things, the things I like in Rails, and to me that’s helping others and unblocking people, growing new contributors, etc.

Then I came to Shopify as the first person responsible for directing open source. At Shopify I had a lot of different responsibilities. The first one was to make sure that Rails works for Shopify, the company. But now it’s a lot of things - YJIT, parsers, LSPs, everything in the Ruby ecosystem we try to touch. I still kept my role at Rails Core. As a person, that’s my role. As an employee of Shopify my role is the entire Ruby ecosystem, not just Rails.

So Rails 7.1 came out last October. How soon after that release does the planning start for the next version?

That’s a good question. And I don’t know if anyone knows this, but we never have a plan, and it’s no problem. The first time that we actually exposed a plan was this year, I believe David (DHH) decided to expose a plan [last] December. That plan is there, and people actually thought that we started that plan in December. But the things that he created as public milestones have been there for years. Those are things where we say hey, we would love to build something like this, and we use Basecamp to track what those ideas are. It accumulates, and some day people say hey I’m going to work on this, and we do.

One of the reasons why we never made the plan public is that it adds the pressure that we need to finish, and in private we never have to finish the thing. We’ll say “I want this for Rails 7,” for example. But then life goes on, you change interests, and you do other things and forget about that thing, and you don’t feel bad because you never promised anyone that we would do it.

But this year, last October, during Rails World we had our first ever meeting of Rails Core. We’d never had a meeting. Our first meeting was last year. And one of the things we discussed in that meeting was maybe we should have a regular cadence for releases - we try to, but sometimes we miss. Our original plan was to release once a year, but if you pay attention, 7.1 took way more than one year. So one of the decisions we made was let’s try to follow a regular cadence. And the cadence is now going to be six months.

The next release is going to be next month - 7.2, not 8. And the second decision was maybe we should make our plan public. Just to let people know what direction we’re going in, in case they want to steer in the same direction or even change their own plans or if they’re building something similar maybe that can help us. And that’s why David decided to open the milestones. All those ideas were there for years - there are things I put there three or four years ago, and now it’s public for Rails 8.

For a big release like 8 or 7, we try to see how things actually evolve on the internet. The goal for Rails 8 is simplifying, have less moving pieces. So we are planning to remove things that we believe add confusion - like today the assets pipeline has Sprockets, Propshaft and jsbundling. We don’t need all of that. So we use a plan to decide what we want to achieve and then someone will go ok, I’m interested in this area and I’m going to build this. It’s very individual. But we agree as a team with the idea. So like Xavier Noria, years ago said he’d write the auto-loader, he wrote Zeitwerk. But that was his personal plan, it was not a team plan.

So after the meeting last year, we are going to try to keep the milestones public as we are doing now with Rails 8. The interesting thing though is that most of the things we added to the Rails 8 milestones are coming on 7.2. Because we are not trying to plan specific releases, we are trying to plan big ideas. Those big ideas make sense as a big release, but it does not mean that we need to wait until October, we can release things as we go, and the things we are finishing now are going to be in 7.2, and the things that are not are going to be in 8.0.

How do you distinguish between those two things? So if you have one over-arching roadmap, and individuals are kind of picking things off the roadmap that they are working on, how do you decide what goes into the major release versus let’s just put this in 7.2?

There are two main decision points. One is, how big the change is. So when we talk about simplifying the assets pipeline as we are talking about, this needs to be a big change. If we had finished it this month, I would say 8.0 would be released in April, not in October, because for us that’s a big change in the direction of the framework. I think the same thing happened with 7.0 when David introduced Hotwire. People were like ok, maybe I shouldn’t be doing more in JavaScript-land like using React, using webpack, etc., and Rails introduced Turbo/Hotwire with its vision of simplifying the web. It could have been 6.2. Like if you look at the release history, it was always 4.1, 4.2, 5.0. 5.1, 5.2, 6.0. But 6.2 was never a thing. We decided we have this new thing, it’s very big. It’s not going to be a small change for people, it’s not incremental. It’s a large change in the vision of the framework.

8.0 for us is going to be the same. Like I said, the goal is to simplify things. The simplification is large enough that it requires a milestone, a celebration of something. That’s why we are going to say it’s 8.0.

So it’s more about impact.

Yes. David usually calls that, because he has the larger picture in his head. In the past we also used the major versions to drop Ruby support. But now we are trying to change that because otherwise we are going to be supporting versions of Ruby that Rails Core or Ruby Core has not supported for years. Like today, Rails 7.1 supports Ruby 2.7, that’s not supported at all by Ruby Core. So now we are trying to change this as well, because removing support for old Ruby versions was a reason to bump major versions. Now it’s not anymore.

How do you balance items on the roadmap that are maybe from the Core team themselves versus from the community, versus from your work at Shopify? You have all these different stakeholders who use Rails, in some capacity. How do you balance across those different communities?

So just to reiterate, it’s mostly individual driven. What I care about are some of the same things that David cares about or Xavier cares about. Like there were a lot of changes in authentication to make it easier to build your own authentication. Those were community driven. I said something about it on a podcast, but it was not me that built it. But I was championing it because I think the idea is great. Or Xavier, he cares about auto-loading. So if people have an idea about how to push this forward, he will be open to it.

There are other things where we prefer not to get input from the community - the assets pipeline is one of them. Because everyone has an opinion. And if you try to satisfy everyone you would end up with the same mess we have today. There are so many choices that you can pick and choose and none of them work as great as the default worked years ago. But sometimes in the community there are a lot of small ideas that together make a large vision, and I care about those. One of those is people are trying to extract things from ActiveRecord to ActiveModel. Individuals that don’t work together are doing this. And so I’m trying to look at the overarching story around that, like why are we doing this? What are the reasons someone might extract things right now or in the future? Rails right now is very mature. It’s harder to change. Like I could not just say “the way you do this is wrong, let’s do it completely differently.” And for community members, it’s hard to think about backwards compatibility and stability. They want to change faster. And that’s why you see them building things on the more experimental side, building the libraries from Rails but outside of the framework.

I wish we had a better way to start to integrate some of those ideas back in Rails. Some of them are happening. So encryption is one of them. For years we had libraries that do data encryption. Now it's baked into Rails. If we build it ourselves we can build a different developer experience, we can integrate with different modules, while with a library you have to deal with different integration points. I try not to make Shopify dictate the decisions we make at Rails Core. There are things the team works on that we treat the same as any other contributor - they have to open a PR, a member of Rails Core has to review, and then we merge.

So besides which features make it in, there’s an element of how much breaking-ness can we introduce. As an outsider it seems like as the framework has gotten more mature it’s gotten more backwards compatible, more deprecation warnings, a lot of work has gone into making that smoother. How does the team think about helping existing applications stay up to date going forward?

That’s a really good question. I remember when I was doing work similar to what you guys do - it was really hard. The worst time was from Rails 2.3 to 3.0, it was almost impossible to do. I don't know if there are any apps on 2.3 today, but if there are they are in real trouble. So I said no more, now I actually have power to make things happen. I don’t ever want to see that happen again. And one of the things I shouldn’t have said about us trying to change the pace of release from one year to every six months - I don’t want companies every six months to be having to upgrade Rails, removing deprecations, dealing with a lot of code. Like if it was only bumping the version, I would be OK with companies doing that every six months because it’s not a lot of work, but removing the deprecated code or fixing breaking changes - I don't want that every six months. So I'm trying to think about how we can increase the pace without increasing the burden on the community.

There are some ideas I have, like not removing deprecations every release, removing every year only. I don’t know if it’d be the first half or second half of the year, but once a year. So we’d have two versions of Rails with the same deprecations. Things like that. It would allow people to upgrade but they don’t need to do any major breaking changes or rewrite parts of code, etc.

So you’re able to get features out throughout the year without waiting an entire annual cycle, but people can take those features without breaking their code.

Yes, exactly. I care a lot about this problem space, and Shopify is the same, and we automated that kind of work. There is a tool at Shopify that helps more than 300 services be upgraded from one version of Rails to another. And the tool helps you open PRs, tell you what you need to do, etc. And we do this every year. We are very good at this. I don’t know that any company gets close. But even for Shopify, I’m not willing to say, hey we have tooling, I want you to upgrade every six months. I’m still on the fence. If we, even with automation, are not comfortable doing it every six months then I don’t want the community to feel like they need to do it every six months. It should be easy to upgrade Rails. It should not be hard. I’m not saying we don’t need expertise, but it should not be as hard as that 2.3 to 3.0 upgrade was. It took Github like three years to upgrade from 2.3 to 3.0.

Zeitwerk is probably a recent introduction of a major breaking change that was still painful handling the upgrade, but there was tooling shipped as part of the release and deprecations that made it much more tractable and much safer to do than, for example, Strong Parameters in the past where it was much harder to know if you’d done it correctly.

Yeah, so what I say is, you should need to do work. But the work needs to be directed, and paced and let’s say even official sometimes. Like before, you could call form_for and pass an object, now you call form_with. So we discuss. Should we deprecate form_for? Maybe, yes, it’s simpler for Rails to have one API. But I’m not sure if we should require people to change this code. There’s no reason to.

So if Rails does go to a six month cadence on release, does that affect the maintenance commitment to older Rails versions? How are you thinking through that?

Yeah, this is another thing I’m thinking about. One of the reasons we want that cadence is that it’s hard to predict how long your version is maintained. Because if we had a calendar, I would say ok yes - it’s maintained today, next year it’s not. But because we have no strict cadence, you don’t know. We could release tomorrow or in two years. So we want to help people plan. But that’s another problem, because every minor release we do, that means the old one is not supported anymore. Before, you would get at least one year, or more. So one of the ideas I had around this was maybe we should have a yearly maintenance policy, no matter what version you’re on. So 7.1, you have one year, 7.2, that should come next month, you have one year. That would mean we’d be maintaining at least two versions of Rails at once, but that’s our problem it’s not the problem of the community.

To me, the right window is one year for bug fixes, two years for security releases. I think that’s enough. That’s less than other ecosystems do, like if you go to Java, Oracle still supports Java that is 12 years old. I don’t want that, because I think that makes the community lag behind. So I want to balance pushing the community forward, bringing them with us, and still supporting them. I think a yearly window for bug fixes, and two year window for security fixes is enough.

How much time do you spend thinking about the other frameworks and how they're doing things?

I don't spend a lot of time looking at other frameworks. I spend a lot of time looking at other languages. Because my larger job at Shopify is to be responsible for Ruby. I have a very good friendship with José Valim still. He created the language called Elixir that has a framework called Phoenix. I chat with him all the time, we discuss different approaches. I know what Phoenix is doing. He knows what Rails does because he was also a Rails Core member. So that one I'm very close to. I pay attention to Hanami, that's Ruby. I read every single blog post they release. I follow those people around. I work a lot with Rust right now, I pay attention to what they get right, what I hate about the language.

JavaScript for me, not in terms of frameworks but in terms of tooling - how easy it is to work with JavaScript in your editor is one of the reasons why at Shopify I created the team to work on LSPs, code editors, parsers, type checkers because I believe that thing is good and we should bring it to Ruby and Rails as well.

Tooling especially is what Ruby is lacking right now. Matz three years ago, maybe two years ago, he made a keynote saying Ruby 4 is going to be about tooling. And that’s what Shopify is building Ruby 4 to be. We are working on a new parser because we believe the parser should be the same across all the tooling, so we could have better code mods. Better Rubocop - it’s great but it’s slow, at least for our codebase. Typing is a hard one because I don’t want to push typing to the community, but there is a lot of value in typing especially in large codebases. So the storytelling around typing should be better than it is today - not put typing everywhere, but there is a specific case where you need typing.

So my inspiration from other languages is mostly about the tooling. I want to see what tools they are building so we can build the same in Ruby or even better.

OK, last question: was there ever anything on the Rails roadmap that was a real moonshot, something that was thought about but never done, that you wish you did?

There is one, but I’m not sure if I should talk about it. One of the struggles I see a lot of applications have is what to do when your company grows. Like say when your team gets to be more than 20 people - how do you deal with that code base? So one of the things we did at Shopify, with 8,000 engineers, that I think should be part of Rails, but I’m not comfortable yet putting out into the public is to be able to build applications as if it were Lego sets. So you break your app into pieces. A lot of people have tried to do this, like Shopify released a library years ago called Packwerk that helps you to do it, but we never told you how we actually did it, using a feature in Rails that is already there.

The feature is there, but people don’t use it for that purpose. Rails engines could be used to create small applications, and you could use them to build bigger ones. Usually people use engines to share code, but this is not for code sharing, it’s for code organization. The feature is there, people can use it already - that’s how we’re doing it at Shopify. But we could make it easier, because the way we make engines right now makes it hard - it requires you to have a gemfile, gemspec, it actually generates engines to be shared, and that’s not the goal. So the new feature is actually generating less code, not more code. But I’m afraid of sharing this yet because I don’t want people to think that as soon as you create an app you should have this. It’s a tool to solve a problem, but you need to have that problem first. If you use the tool too soon, you’ll be in trouble. That’s why I never shared it outside of Shopify talks, blog posts, etc. I don’t know yet how to document the problem space well enough to tell you “don’t do this yet if you don’t need to.”

Fascinating. We’ve definitely seen engines gone wrong for modularization inside of smaller teams that they regret doing. But also the idea of how do you scale a Rails app, not just the code but as the engineering team itself grows how do you support it, has been a question the whole time.

So to me, it’s a sharp knife. But it’s a sharp knife you can’t pull out. So that’s the problem I’m struggling with. It’s built, the code is there, but I haven’t released it. The documentation, to tell the story properly is the hard part. Because as soon as you add it to your app and you realize you don’t need it anymore, you’re stuck. To remove it is a lot of work. So that’s the hard part. It’s one of the ideas I wish I would release, because it would help Shopify a lot, and we’re probably going to have a lot of big companies in the same situation as us. But it’s hard to sell in the right way. Because people will do what they do with Google - oh Google does this, so we should do it. But you’re not Google.

Right. But in the meantime, you've got, you know, people adopting dozens of micro services or something, and maybe they shouldn't have done that.

Yes. So the other side of the coin is that this could be a response to microservices because it allows you to deploy the same app in different ways. You can say I have this monolith, composed of many lego pieces. But here I want to make a man, and here I want to make a car. You can use the same lego pieces to make two different things. Google actually wrote a paper last year about this. But we need to get the story right. I would say that’s what Rails is, in part. The code is the least of the thing. Rails is storytelling, it’s leadership, it’s showing that we can do it differently. The code is there in support of this.

To get in touch, find Infield on Twitter @infieldai or write to Allison at allison@infield.ai.

Once a Maintainer: Armin Ronacher

Allison Pike — Thu, 18 Apr 2024 19:13:16 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Armin Ronacher, creator of the Flask framework and founder of the Pocoo team, a group of open source developers working on several widely used Python projects. Armin is a regular speaker at various developer conferences and currently works as a Principal Architect for Sentry.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades. Armin spoke with us from Austria.

How did you get into open source? Do you remember your first contribution?

Actually, it took me a lot longer to contribute than to write my own open source software. In 2004 when Ubuntu released Warty Warthog, there was quite a bit of excitement about that Linux distribution. There was a bit of momentum, and I joined a bunch of people and decided it would be a good time to create a community for the German speaking population in Germany, Austria and Switzerland. There we used PHP, VB and I think Dokuwiki. I was also learning Python at the time. It was essentially a community website for an open source project.

Through that I started building things, utilities that I could use to switch this thing over to Python. Basically the idea was to take the PHP code and replace it with Python code. And so I think in 2004-2005 I started my first open source libraries just out of personal interest, and because I noticed that it was kind of hard to do web development in Python compared to how easy it was for PHP. And so that's how I got into it gradually. But a lot of it was my own libraries. I think it took probably a year and a half, two years more until I contributed to other people's stuff.

When you first released Flask, did you know right away that it was going to be popular? Did it hit right away or did it kind of take a while?

So the pre-story there is that I wrote the libraries that Flask was based on quite a bit earlier. I couldn't tell you by how much, but the precursors of those libraries are probably like three years older than Flask was. And there was at the time quite a bit of sentiment that you had to do these single Python, no dependency libraries, monolithic, everything in one place kind of thing. And so I made an April Fool’s joke where I bundled all my libraries into a zip file and just made some marketing for it. And that was obviously a joke. But the joke was surprisingly popular.

What I learned through that joke is that marketing actually matters quite a bit. With the libraries I did before, I didn't even try to do that. With this one I actually tried to make a proper website, proper documentation. And it felt somewhat clear right from the start that there was a certain need or an interest from people. Now, it was not as measurable as open source projects today with like Github stars and everything. But for me it was measurable by the activity that we had in the IRC channel. Compared to all the questions that came in before, which were once in a while, all of a sudden there was like a constant flurry of questions and answers and it felt like there was more activity behind it right from the start.

How quickly did you bring in other people to start helping maintain it?

Before Flask I sort of informally worked with a guy called Georg Brandl and we started this Pocoo website where we basically collected a bunch of open source libraries that we created together. It was primarily there as a way of creating a community, which was entirely based on IRC. It wasn't really based on the idea of getting more people in, more on the idea that you're not in this alone. If there are questions, then there's another person you can ask. It wasn’t a very clear work arrangement or anything like that, but it was one where multiple people were always joining or leaving or or doing something. Some of those projects always had people technically capable of committing to it, but there was no real structure around it. And that actually never changed for Flask until eventually David Lord started contributing to it, and he's sort of the face of the project nowadays and he has a much more organized approach.

I want to talk about a blog post you wrote a few years ago about PyPI naming your project a “critical” package - your post is called Congratulations: We Now Have Opinions on Your Open Source Contributions. I'm curious now that it's been a few years, do you feel any differently than you did in the post, or how have your thoughts evolved on the package indices?

It didn’t become much clearer. So that particular [post] was just like hey, we're setting a bar here that's not a bar for everybody, but a bar for let’s call it popular people. If you want to publish your critical package now this extra rule applies to you. Which is not a particularly impactful rule, it was just two factor authentication, but it was still a rule that didn't apply to everybody. And it basically meant the index all of a sudden sets something in place that wasn't there originally. And the problem is that the open source package kind of turns from someone's toy into this critical piece of infrastructure. And I think we have never really figured out what the relationship of an index to the open source contribution is.

I don't feel particularly strong about two factor authentication. But I do kind of feel that we have never really addressed the fundamentals of how this sort of stuff is supposed to work and the unclear relationship of the indices to the packages I think is one part of it. It's not the most impactful one. But yeah, it's kind of interesting.

Do you feel like any other ecosystems are doing it in a better way, different way, worse way?

I mean the extreme versions of this is do you even need a package index? You have Go which doesn't really have one because you can directly depend on GitHub, but then GitHub is sort of hard coded into the language. I'm not entirely sure if decentralization actually is the solution for this problem, but there are all these different approaches and some of them are maybe better than others in certain aspects.

The fundamental problem is we got very willing to depend on a whole bunch of stuff, on external dependencies, and we haven't really established well defined rules about what it means to depend on something. And so that has created a bit of a mess. And in an attempt to solve this mess, a bunch of people ran to the package indices like “you've got to figure this out now.” So for instance, left-pad was an example where people went to the package index and said “Undo this”. And they were basically like well, the maintainer clearly went rogue. So we're going to override his decision to unpublish this package. And the question is like, is the index supposed to do that or not? The index can do whatever they want, right? Because they're just republishing under the license that the creator used. And so you can't really take it away.

So yeah, I think all the indices are more or less in the same position. People depend on packages, so the packages kind of have to stay there. But then that always turns political and most package indices don't have the capacity to deal with the politics too much. In theory, if you want to run a package index there should be contracts in place and rules and regulations and everything, but it’s mostly made up and they have quite a bit of flexibility in how to interpret their own rules. I don't think any index is doing it particularly better than others. npm probably has the most well defined rules today of how the index is supposed to work.

Is the npm registry run by Microsoft now?

I think it's technically run by GitHub, which is part of Microsoft, but I'm not entirely sure how this works. It doesn't have Microsoft branding or anything, so they are probably keeping it somewhat intentionally separate.

I think the core problem is just that there's a big difference in practice from publishing something on an index versus putting it on your own website. And it looks very much the same, but it really isn't. And that's kind of not disclosed. And it's not well understood what the differences actually are, so some people only learn like years later.

Do you mean, for example, that once I publish to an index I've given license for that index to keep that code forever and make it available to people if they want to? Or what sorts of distinctions are you thinking of?

One is that the index clearly has rights that it sort of gains - the terms of service of using the index apply and then you have to adhere to it. But in a very specific way, it also takes away some agency. I'm not saying anyone should ever be doing this, but there was a very well known Ruby programmer called why the lucky stiff. And he decided one day that he’s just done. He's going to remove all his content on the Internet. And that sort of removal of your stuff has lasting implications that really only works if you're in full control over where you're hosting. It’s probably not a good idea, but this was a freedom he had, to take it away. If you're on a package index, you really don't have that freedom. And that is not so much because being on an index takes away a right, but because people literally depend on the package being installable on the index.

So here's an example, non-malicious. When we publish an update to Sentry CLI, which is one of the tools that we have, and there's a bug in it, within minutes there will be a bug report and tons of people will say “hey, this broke.” And it's not because they all suddenly upgraded at this very moment, but because the CI systems run all the time and they run on the latest version. So if something is not working, then you have the mass effect of everybody being broken all at once. In the time before public package indices, it was pretty common that your website wasn't perfectly available. The website might be down every once in a while. You might be like OK, I can't download this right now. So you would mirror it, you would vendor it, you would do a whole bunch of stuff to keep this code around for yourself because the original source wasn't all that reliable to begin with. With package indices, you don't have to do that because the index is there to serve it up for you. But it also means that now this act of unpublication is gone. So the way you interact with an index is just fundamentally different.

I made this joke that curl cannot be unpublished because everybody has their own version of curl. I remember when I worked on a game console, and we didn't even use the official curl version. We used a thing that looked like curl but it was actually written by someone else. It just emulated the curl interface.

There are also package managers at the operating system level that Ubuntu runs or Debian runs. I get the feeling that they are governed more strictly than the language package managers are historically. Is there any difference?

Well, I can tell you my version of this. The first time I published an open source library, I think it was a Python web library, and there was no PyPI. Or maybe there was PyPI, but nobody used it. So I went to SourceForge, and if you wanted to create a project on SourceForge, it took seven business days and you had to go through a manual review process. I'm pretty sure I had to sign a letter with pretty terrifying legal text to read through. It was a relatively daunting process compared to ‘npm publish’, which is what we do now. And I don't think that's a good thing. After I had the package there, eventually someone from Debian started putting it into their Debian archive. People don't do this anymore. But I remember for many years I had conversations with Debian people where they told me I cannot do such and such because the license prohibits it, or I have to do such and such because otherwise they can't do their job. For instance, I wasn't allowed to republish a tag on git because they would pin against the tag and if something changed, they got an alert.

I think that eventually it may go back to something like that. But we're definitely in a completely different world now. It's kind of scary in a way how we do software development now because we have a lot of dependencies, but we don't trust anyone. Everybody has complete rights to screw you over. In practice it doesn't happen often. People are good most of the time, but security-wise it's pretty terrifying compared to how we used to do this like ten years ago.

And ten years ago there was a lot more emphasis on GPL licenses and having the expectation that if anybody used your code that they were going to contribute back to the community. That's changed a lot too. Do you have an opinion on what combination of index, registry, server systems, governance, open source licensing decisions or other kind of decisions make for the healthiest open source community?

I don't know what the right answer to this is. I'm very skeptical of GPL in general. One thing is that the only way in which you can actually have a GPL enforced is if you're willing to go to court. And I think that has been demonstrated multiple times throughout the years that the most successful GPL projects have enforcement alongside them. And that’s never something that I would want to do. A license is only as good as the enforcement. If the license is BSD or Apache you have to do something that is really extreme for enforcement to ever come into play. Typically it's just like, people forgot to put the license file into some copyright screen. You're not going to go to court for that.

But that also means that if you want to do something beyond open source, like you actually want to make money, GPL kind of always had a way for you to make money. In the historic world that doesn't really work for a more permissive license like BSD or Apache. Then it becomes really complicated, because my belief on how you'd combine open source and and commercial things is that it is no longer open source. For example the license that we use at Sentry is the FSL license, where for two years it has an exclusivity period and then on a rolling two years it turns into Apache, or there's a second version that turns into MIT. That’s sort of my and David’s way of squaring this. But it’s also very loaded because it’s violating this belief for people of what open source should be. But if you don't actually have a way to earn money, then it kind of sucks. There’s just burnout and a whole bunch of other stuff that doesn't really work. So there has to be a way to commercialize something along the way without giving up on the benefits of open source entirely. I don't know what the perfect solution is, but that has been what I feel are some of the most reasonable approaches.

Lastly, it’s been about a year since you released Rye. What led you to create it?

Multiple things sort of led to it. The original version was never published. The reason I don't publish all the things that I sort of hack on in my free time is because I feel like I can no longer put something on GitHub without having some sort of responsibility for it in one way or another.

I wrote it because I had a need to use multiple Python versions and I didn't like compiling them. I don't actually know what finally made me change my mind on publishing it. My best guess is that multiple things aligned. For one, there were people in the Python community raising money for Python tooling in one form or another. And so I felt like, ah, people are going to go and build a package manager or really a project manager/package manager kind of thing. And I just felt like, hey, since I already have it, let me polish this up and show what it could be and maybe somebody wants to build it. There is an article from someone who wrote if you don't have the desire to actually build something, just publish the idea. And so there was a little bit of that, let me just show you the idea of what it could be.

To suggest a maintainer, write to Allison at allison@infield.ai.

Once a Maintainer: Jeremy Smith

Allison Pike — Fri, 05 Apr 2024 13:38:09 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Jeremy Smith, co-host of the IndieRails podcast, organizer of the BlueRidge Ruby conference, and enthusiastic member of the Ruby and Rails communities.

Once a Maintainer is written by the team at Infield, a platform that helps companies upgrade their open source software without breaking things.

How did you get into programming?

So I distinctly remember seeing my first website when I was a teenager, probably 1995-96, something like that. I had dial up access to my dad's university Internet service because he was getting his PhD, so he had it as part of his program. I remember dialing in and logging on to the university website. It was the first website I'd ever seen. And I was immediately like, in love. But I thought, I've got to learn how to do that. I have to learn how to make these. And I spent the rest of high school learning how to build websites.

Then I went to college, and I was actually in a systems analysis program which is like computer science without the electrical engineering. And I got my first D on a midterm in a data structures class, and I thought, I can't be a programmer. I don’t know how to do this. It was a lot of C++ and obscure problems that felt very foreign and different from the websites I was building. So I dropped out of that program.

I ended up backing into an interdisciplinary studies program which essentially let me make my own major, where I took some computer science, some writing, design, media production. That has been beneficial in some ways to me and not in others.

How did you get that first job building websites for other people?

It was the summer after I graduated, I don’t know really, I made contact with someone at the environmental, health and safety department. At the time, it was easier. If you were anyone who knew anything about computers, people would be like, yeah, we’ve got computer stuff for you to do. And it was all sort of lumped together under IT. Handling printers and setting up people’s desktops and building websites, it was all together. So I did that for a bit.

Then my wife and I got married, and my parents moved to south Florida. And my dad found this company that sold palm trees. It was totally Florida. He said they have an opening for a web developer. And so we moved down there and I worked for this company. Essentially they dealt with a certain kind of palm tree called a date palm. It’s like a nicer, higher end palm tree. And there was enough money in this for them to hire a few web developers to build out what’s essentially an app for landscapers and people to browse through the inventory of palm trees and select them. And then he would deliver them, install them and maintain them. So these really rich people on Palm Beach Island would go to this website and pick out 12 foot palm trees. And I designed the app and I also ended up doing a bunch of other design work for them, making trade show banners and things like that. I kept working for small companies for a long time, because I’ve always wanted a high level of ownership and being able to get into all the details of a thing.

What was your first exposure to Rails?

I think it was in 2008 or thereabouts. I was working for a small nonprofit and we were building applications in ASP Classic. And I knew that we had a lot of product development ahead of us and I needed to pick a new framework. And at that time I'd done a bunch of research and the ones I knew were Rails, Django, and I think it was CodeIgniter in PHP. That was from what I could tell the best PHP framework at the time.

And I evaluated them on a number of criteria, including sort of the look and feel, the framework itself, the number of libraries, how substantial the libraries were for each, how large the community was. But what especially impressed me about Rails was how large the community was in comparison to the others, how vocal they were, how many resources there were in comparison to the others. And so I knew that if I picked Rails, I felt really confident that I was going to have enough resources to tap into to help me whenever I got stuck.

Sometimes people call that a community of practice. The idea that it's not just for fun, it's not just for association, but people are on a journey of learning together sharing their practices. I think the programming world is like that in general, but even more so I felt that with Rails specifically, and still do.

How has that feeling evolved over time?

By 2015-2016, I was really getting worried about the Rails ecosystem. There was still plenty for me to learn from, but it was a feeling more like what's the longevity of this? I knew a bunch of people were leaving, or at least talking about leaving to go to Clojure. There was sort of a wave to Clojure and then a wave to Elixir. And that was a little nerve wracking for me, because I’m a more product focused developer. So I’m depending on the foundation underneath me to stay stable. And if I see that foundation shifting, it made me a bit nervous.

We see this a lot in the Rails community, like the “Is Rails dead?” post every few months or so. What pulled you out of that feeling?

Well during that period, one person that really helped me was Jason Charnes. He didn’t help me directly, but the way he talked about Rails, the way he turned that “Rails is dead” into a joke and almost like, who cares? This is what I love. This is where I belong. I'm a ride or die here, you know. And then I got to a place in my life where my kids were getting older, and I actually had some discretionary time again. So I decided suddenly that I was going to start investing a lot more of my time into Ruby and Rails.

At this point I'm freelancing consulting, I'm completely invested in Rails. So it aligns really well with the work that I'm doing. I've learned so much from the Ruby and Rails community, I felt an obligation to do like something to help others to pay it back. And there was a sense of maybe looking around and thinking there was an opportunity to step into the community and take care of it.

Is this what led to the Indie Rails podcast?

Yeah. For a while, I had the idea that it’d be fun to do a podcast. And then on Twitter, my co-host Jess Brown reached out to me. He said, hey, I've got this IndieRails domain and I'm wondering what it could be. Maybe it should be like a mastermind group or something like that. And I was like, this should be a podcast. Definitely. He wasn’t so sure at first but that was a thing where I had always tried to do everything on my own, and I wasn’t great at collaborating. And I know if it wasn’t for Jess or Paul our editor, I wouldn’t have made it to a year. Knowing that someone has my back and has an expectation for me has been really, really valuable. It’s really opened my eyes to the fact that I probably should have done much more collaboration earlier in my life and I should look for more ways to do that in the future.

Who has been a guest on your podcast or what kind of topic did you cover that you think back on as like, wow, this was so cool. I'm so glad I got to have this conversation.

This is tough. This is tough to narrow down. There’s a quote that’s something like the impact that people have on other people is not so much who they are, but what they represent. Like, what does Andy Croll represent to me? What does Allan Branch represent to me? We’ve got this diverse collection of people that are doing really interesting things in different ways. Whether it's going deep into building their own library, going deep into database performance, going to deep into building their own business with Rails or building community, all these different facets, but all together in this community.

When I was trying to organize BlueRidge Ruby last year, Andy was someone who gave me a lot of time and helped me think through a lot of the conference stuff. He was incredibly encouraging to me, and sent me a handwritten note beforehand saying “you’ve got this, you’re going to do great.” It was incredibly touching. And so Andy represents to me the kind of person I want to be in the Rails community.

Who are some people you’re following right now that are doing something really interesting?

One person is Bhumi Shah. She is doing a fantastic job of writing just deep, thoughtful pieces on Ruby, and I've learned a lot from her and her newsletter in the past year. Her newsletter One Ruby Question is really good.

Matt Swanson is a pretty well known Rails dev and CTO of Arrows, and he writes a lot on Twitter. His blog Boring Rails is one of the best resources I've found for doing Rails in the style that I end up doing a lot with my client work, which is usually on small teams. I've learned a lot from Matt.

There's a guy that people aren't talking about enough who I think is doing an amazing job of writing deep dives in Rails right now, his name is Akshay Mohite. He’s very consistent and his writing on some Rails topics is fantastic.

The last person who's very well known is Vladimir Dementyev. He works for Evil Martians. He's a prolific open source contributor, writes a lot of Ruby gems, is the maintainer of AnyCable, and his book Layered Design for Ruby on Rails is my favorite Rails book at this point. That'd be my top recommendation for people getting started learning Rails.

To suggest a maintainer, write to Allison at allison@infield.ai.

To learn more about keeping your open source software up to date using Infield, write to us at founders@infield.ai.

Once a Maintainer: Ralf Gommers

Allison Pike — Fri, 15 Mar 2024 13:58:39 GMT

Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.

This week we’re talking to Ralf Gommers, Co-Director of Quansight Labs and leading contributor to NumPy, the fundamental package for scientific computing in Python, as well as SciPy, meson-python, and the Array API Standard. NumPy published the first pre-release version of their upcoming 2.0 release in public beta this week. This is the first new major version of NumPy in 16 years.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades. Ralf spoke to us from Norway.

How did you get into software engineering?

To begin, I trained as an experimental physicist. During my degree I had one course in Pascal and that was about it. So really I'm self-taught from doing research, learning Python for data analysis, MATLAB first and then labVIEW and C for writing control code for experiments, things like that. When I started my PhD I decided you know, I have three or four years, I’m going to do it right and go straight for everything open source.

So I started with Linux and Vim and Python all at the same time, and I had a very rough month. This was before NumPy existed. I had to join the mailing lists to figure out what was going on, because nothing worked with each other. It was very immature at that point, zero documentation. So I just stayed subscribed and I kind of gradually figured out how this whole open source thing worked. And then after a few years when I was actually a decent programmer and had a few things of my own to share, I had a break after my PhD and I thought why not try and get started on this? I started with smaller contributions, some documentation, some features that I contributed to scikit-image. And at that point the release manager of NumPy quit. He wrote to the mailing list like, I'm sorry, I'm too busy, I quit, does anyone want a job? And for five days nobody wanted it.

This was early 2010. NumPy was also very infrequently released at that point. So after five days I'm like, well, I don't really know what I'm doing, but you know, if nobody else wants to do it, I'll give it a try. And then I had to do things like, you know, build Windows binaries for releases on Linux via Wine with undocumented scripts. The whole thing was a big learning experience. But, you know, it remained interesting and it's a very friendly community. So I stayed. And then after six months the guy came back and he's like, well, what about SciPy? I did SciPy too. Are you going to do SciPy? And I said ok, I'll do SciPy too. And I've been one of the leads for NumPy and SciPy since then.

In my experience, people who become an open source maintainer, especially of a large, widely used project, they have a certain mindset and they don't mind doing the dirty work. They like helping others and hopefully they learn something and have some fun in the process. But they tend to be the people that don't like saying no and they like to be helpful. It works like that for me too.

How much time per week or per month did you devote to these projects?

I'd say I spent 10 years doing it as a volunteer, and probably spent, I don't know, 10 to 15 hours a week on average outside of a pretty busy job. And then in 2019, it got to the point where AI had really taken off. And SciPy was big when I started, but that was hundreds of thousands of users, and now it's 20 million. It got to the point that it was really not doable as a volunteer in the evenings. I'm either going to make it my job or at some point I’m going to quit.

So at that point I went to the SciPy conference for the first time and I met Travis Oliphant, was the original author of NumPy. I knew him reasonably well. But when we talked in person, he said I'm just starting a new company, what do you think about joining me? It's a consulting company where 3/4 is consulting around the PyData space and then 1/4 is the labs department, which is directly contributing back and employing maintainers. So I started leading Quansight Labs, and half of my time is basically management, getting funding, being on some projects, and the other half I still get to contribute, but now it's part of my job.

Can you speak to the differences in the way that the academic or government world uses open source software versus the commercial side? This seems to come up a lot in the Python ecosystem especially, curious about your take on it.

That's a good question. So, industry is a very broad term. I think one thing you have in academia is that everything is custom. It wasn’t take a thing in pandas, run a scikit-learning thing over it, and then I'm done. It was really thinking about data structures and what you want to do and building your own code usually from the ground up. But academia is usually something you do by yourself or in a small group.

There’s also not just how people use the project but also how they interact with open source. I often hear that there's an overrepresentation of academia and smaller individual users, hobbyists. What we find is people in industry who have large deployments and things like that, they never really show. They come and maybe talk to me now because I'm working in a consulting company. But they have the type of request that you never see on an open source project issue tracker. It's more like this whole module is wrong or you know, we've we've already rewritten it and we found that everything here in your project is suboptimal. And you can end up with months or even years worth of work.

How do you think about that for such a long running project? Stability versus you know, maybe I would go back and change some things about how this works?

Yeah. I think that the lower you go into the stack, the more stable it has to be by necessity. You have more users and every change has more impact. For NumPy specifically, there was an extra constraint in that NumPy isn’t only a Python package, but it has a very large C API, and so everything is kind of built on top and all the binaries depend on each other.

So actually right now, over the coming weeks, we’re splitting off NumPy 2.0 and doing the first release candidates, and that's the first time in 16 years that we’re breaking API compatibility. The effort to keep that for 16 years, including all the things we don't really want and exposing too many internals that we don't want people to use and all that kind of stuff, it's just the cost we've had to pay. I've been a co-author on some of the proposal documents, we call them NumPy Enhancement Proposals. And there's also now a SciPy version of that where we define support windows for a set of Python versions and NumPy versions and some of the other key libraries.

How does that work on the research side? If someone wrote a paper 15 years ago with some NumPy code, should someone else be able to run that code now?

I would say no. It's really hard to get people to create environments where they know what versions they used to begin with. But I think that's the only correct way of doing it. We try to be careful to make sure that if something used to work, and now it gives an error, that's a lot better than if it used to work and now it still works but it gives you a different answer. But you can't keep stability at the individual API level for that long.

Other than that, I think in industry what does happen a lot is that they deploy applications and they use a certain Python version, certain NumPy version, the environment gets frozen and then it has to run for five years or something like that. In extreme cases, maybe even ten years, but that's not really relevant to the development of the project because they know how to lock their environment and it's actually not changing. So it doesn't matter if you release new versions.

With the release of 2.0, what do you see as the focus? What’s the direction you’re taking the package?

Roadmaps in open source are really challenging, especially in such a diverse project. You have the people who really care about static typing, or people who care about performance, usability, etc.

From my perspective, it was an opportunity to really rethink what the Python API looks like because that's still what 99% of our users use. And I think the big change that we're landing there is that first of all, we made it smaller and easier to understand, introducing a very clear split between what's public and private.

The other big change that I've been working on for a few years is to introduce the Python Array API standard, which basically is like the core 150 functions or so that make up an array library and now all the changes that made it hard for the NumPy API to run on GPU for example, have now been fixed. And they were fixed by the design of that standard. So with my SciPy maintainer hat on, NumPy is never going to run on GPU, right? But there's also PyTorch, there's JAX, there's all these libraries that are newer and way faster and I want SciPy and higher level libraries to make it as easy as possible to run on GPU or to use PyTorch or Jax's automatic differentiation and things like that. For me that’s probably going to continue to be the main theme for the next few years.

To suggest a maintainer, write to Allison at allison@infield.ai.

To learn more about keeping your open source software up to date using Infield, write to hello@infield.ai.

Once a Maintainer: David Wobrock

Allison Pike — Tue, 20 Feb 2024 15:38:22 GMT

Welcome to Once a Maintainer, where each week we interview an open source maintainer and tell their story.

This week we’re talking to David Wobrock, a Senior Software Engineer at Back Market. David is a maintainer of the Python framework Django and a contributor to many projects in the Python ecosystem. He lives in Berlin.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you get into programming?

How far should back should we go? My parents are both university professors, and they do research on meteorology. So they always had computers around to do cloud modeling and stuff. My brother and I always wanted to play around, to tinker, to try things out. I remember that they installed games on the computer that we had when I was a child, and they wrote the commands to type in the terminal in order to launch the games on Post-It notes. I had no idea what it meant, but I just typed these characters and it worked. I always had computers near me.

When I started studying at university, I really fell in love with the entire field, both the theoretical aspects, the mathematical parts like linear algebra for databases and also the technical parts.

And have you always worked as a software developer since you graduated from university?

Yeah, I have. I tried the academic path of doing a master thesis researching how development and computer science applied to different fields, but it didn’t stick. I think I like the private sector where you have a bit more impact on the day-to-day life of people through software. But yeah, I've always been a software engineer.

How did you first get into open source?

I started at university a little bit. My first contributions were during my studies when I was very C# focused. I think a professor taught us that, so it's stuck with me. And I remember my first major feature, something that was not a typo or a link or whatever, my first major contribution was a Python plugin in Visual Studio - not Visual Studio Code, the big IDE Visual Studio. I don't know why I was writing Python in Visual Studio. What came upon me was like hmm, this integration is not working as I expected, I would like some auto completion features that I don't have, and so I contributed to that.

I’ve always admired open source maintainers. Those people were both incredibly smart, being able to write software that's used by hundreds, even thousands, millions of people and putting your work out there saying, OK, I think it works. I really admire the dedication, to spend some of your free time to give back to the community, to say hey look, I did this on my free time maybe it can be helpful to someone. They’re sort of a role model for me.

How did you get involved in Django specifically?

At university, we had to do some group projects where we could freely choose the technology that we used. And Python was the language to learn, right? That's where the industry was going for most jobs. So me and my fellow students, we thought OK, we have to learn Python. We want to make a web server. What can we use? The first Google search basically popped out Django. FastAPI didn't exist at the time, and Flask was less batteries included. So we were like well, everyone recommends Django, let's try it out.

Then when I started looking for jobs, I didn't specifically look for Django jobs, but they were what I qualified for. So I applied and I got one of those. And then, the more you play with the framework, the more you know the internals a little bit. You start diving into the source code. You start finding bugs. Sometimes we're like, that shouldn't work like this. Why doesn't it work like this? Then you're like, OK, I think I can add that. And then you think, hey, there’s no issue open, let's give it a try. That's how I got into Django - step by step let's say.

Is there a Django core team? How does the maintenance work?

So there are two fellows who are paid to maintain Django who accept contributions, triage tickets, they do a ton of things. Coding is probably a very small part of that job. And then there's the Django Software Foundation with the board that handles the finances. Then there's a technical board making technical decisions, and a security board that handles critical issues or security issues that could be in the framework. So it's quite a big organization in the end. I’m a tiny person in all of this.

How would you describe working with those people and working on Django versus how you do your work in your day job?

It’s very different in a way, because there are so many different interests at stake. The contributor wants their change merged, right? The maintainer wants the software to make sense, to answer the specific problem it was designed for. In Django, since it's batteries included, there are many people who want to add things that could be third party packages and they say hey, it's going to be even more batteries, but there's a limit to that. So there’s an interesting discussion of what should be inside the framework and what should be a third party package. These types of discussions are very interesting.

But in your company, you're all focused basically on making money for your company or pleasing your customers which is still working towards the same goal. Open source tends to take more time of course because it’s people’s free time. In a work environment things take a day, half a day because you can sit together, pair program and say let's do this together. You might have Slack or Microsoft Teams. In open source there is a back and forth and everything is much more asynchronous. For an open technical question or a new feature we’ll say let's ask the technical board. We'll give them maybe two weeks to discuss and come back with a solution or a suggestion. The communication is quite different. It’s also more relaxed in a way.

How do you dive into a code base that somebody else has written and many people have contributed to over the many years?

Talking to a person will always be ideal. I think that's the easiest way. But, you know, open source projects that are 10 or 15 years old or even older, it's going to be hard to find the contributor. What do I do? I probably dive into the code and try to understand who calls what. And then have a little notebook where I write things down and draw these big diagrams of arrows and boxes in all directions, but I can start making sense of it. And then it's really about trying out something. Maybe having a break point somewhere, or at the unit test launch it and see what are the code paths that you got into.

If you think back to when you first got into open source, to now feeling comfortable unraveling a large, established project - how do you get comfortable doing that?

Debugging skills are definitely useful there. The more you’ve had some weird, hairy bugs, the more you understand how to follow your stack trace and understand where you're going. I’m not sure it has to do with seniority. It's just getting comfortable with the code base. Once you understand the models, the different files, modules that are in the code base, it's basically just experience in this code base. When you start fixing a bug that affects one function and one file, that's one thing, right? You can write tests, you can test your function, it works. That's a really good start. And from there you start doing changes that imply two files, maybe two models that work together, and then step by step it's getting bigger and bigger.

Are there any other open source projects that you've been into lately that you think are really interesting?

There are many, many interesting people and interesting projects. Honestly, everyone in the Django community, it's amazing how they work, how efficient they are, how on point. Every review is always inspiring.

In the Python community there is Ruff, a linter and formatter. It's mind blowing how fast they had adoption on their tool and it's written in Rust, so it's faster than all the others. It solves a problem that you didn't know was there, and suddenly everyone wants to use it and it's amazing how fast adoption grew. It's a great project.

Personally I'd like to dive a little bit into the PostgreSQL community. Maintaining an open source database sounds really interesting. It's such a massive project that the governance around it must be quite interesting, and I’d love to get a better sense of how they do it.

To suggest a maintainer, write to Allison at allison@infield.ai.

If you’re interested in learning more about how Infield can help your team keep its open source dependencies up to date, write to hello@infield.ai.

Once a Maintainer: Will McGugan

Allison Pike — Fri, 09 Feb 2024 14:24:00 GMT

Welcome to Once a Maintainer, where each week we interview an open source maintainer and tell their story.

This week we’re talking to Will McGugan, founder of Textualize and creator of rich, textual, and pyfilesystem2. Will is based in Edinburgh and primarily works in Python.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you get into programming?

My first exposure to computers was way back in the 80’s when my mother bought me a ZX Spectrum 48K. It was like a little home computer back in those days, and that was it. I was kind of hooked. I think I was nine or something. And yeah, I just discovered an aptitude for computers and a desire to tinker. I studied computer science at uni, but I dropped out after two years and got a job making video games because that was kind of my end goal.

I worked for a few games companies in Scotland. Two of them went bust, so tragically the games never got released. I worked for Evolution Studios, they make PlayStation games. I was working on PlayStation 3 games, like one of the first PS3 games, MotorStorm. So yeah, I've worked on varieties of bits of games and that just led to a career down the line.

Did you teach yourself graphics programming? How did you learn to work on games?

Yeah, I used to learn from books which we don’t see much nowadays. I had stacks and stacks of books on graphics technology and video games and I just devoured them. That’s basically how I learned.

Did you have any kind of community around you in doing this?

There wasn't really, because I grew up in a small town in northeast Scotland, and we all had computers, ZX Spectrums, Commodores, etc. and most people just used them to play games. So it was not much of a community. Later on I started subscribing to magazines where you'd get a floppy disk and it would have kind of like a website type thing and some other game demos and you could send your work there and it would be distributed amongst other people in in the UK, but that was pre-Internet. So I didn't actually meet anyone in that community.

How did you eventually find your way into the open source community?

Open source was a natural progression of the kind of things I was working on. You know, I'd build things and when the Internet came about, it felt natural to want to share your work. So I guess that was open source before I'd even heard the term open source. I used to put my code on my website. This was before GitHub and Google Code. And then Google Code came along and I put my work there and it started being used by a number of people, and you start to get feedback. So that’s how I got into open source, sharing my work and using the work of others.

Are there any projects that you remember from that time using and thinking, oh this is really amplifying what I can do by many times. Does anything stick out in your mind from that time?

Yeah, there were some libraries I used a lot. There's a library called wxWidgets. It's a cross-platform GUI library, so you could write something which worked on Mac and Windows and Linux. And there's quite a big community around that, people building various widgets and things and they would share them. And there was also a Python wrapper for that called wxPython. I used that quite extensively.

So what led to eventually creating your own open source project? I mean, it sounds like you were building things all along, but what made you take the leap?

Yeah, it kind of ramped up over the years. I put several libraries out there, first on Google Code and then on GitHub and then some of them started to get traction. It went crazy when I published rich. Rich is a Python library for nice formatting in the terminal, and there seemed to be a real need for that. When I released that, it just became massively popular, it just kind of took off in the Python community and now everyone's using it. So that's how I became well known in a very certain kind of niche.

There were lots of libraries out there which did part of what rich does, but they didn't work well together. So if you had say a library that does tables, and a library that does wrapped text, you couldn't put the wrapped text inside the table. And I really wanted to do that. So I built a system where it was composable and you could combine these things together. Nothing else did that at the time. Maybe you could trace it back to being in my small, small town and having no one to share these things with growing up, but that was definitely part of my motivation.

How did rich end up resulting in Textualize?

Yeah, so it was just over four years ago, actually. This was just before the pandemic, like literally a couple of months. And I was actually in Wuhan at the time because my wife’s family is there. So really crazy timing.

So I was working on rich, which is basically static content in the terminal, but people were starting to want to build applications with it. And I resisted for a while because, you know, I wanted to reclaim some spare time. But then I saw what people were building with it and they were actually attempting more full screen applications, and I thought, wow. This is so much more promising compared to libraries we used to use.

Is this like curses?

It's exactly like that. Curses was how people wrote these libraries. But they haven't moved on much since the 90’s.

It’s very hard to use. You have to be very motivated to build things with it. And I experimented with Textualize to see if I could build some kind of application library which did a better job. And I proved to myself after a few months that yeah, it can be a lot easier. And that was how it started.

So the idea is we built this application platform which people can build these libraries very quickly, faster than web apps, and you can distribute them in the terminal, but you can also turn them into web apps if you’d like. That’s what Textualize is based on.

How do you manage that human side of the community that you've built?

It is a lot of work. It's mostly asynchronous. In the Textualize team, we've got four developers, but there's lots of contributors and lots of people posting issues, so we try to respond. To those people, a lot of it is misunderstanding the docs, and we just point them in the right direction, but that kind of interaction is very important.

What we're working on at the moment is what you might describe as issue-based development. The issues can tell you where people are having trouble with your API, so you can focus your efforts there and refine it. It can also tell you what features they're missing, so you can build those features. And it means that you're always just ahead of what people need it for. So you can build the things that people need now rather than build the things people need in a year or two.

I think that's the strength of open source. If it's closed source, you don't have as many eyes on the code and you can't refine it as well because you've only got a handful of people looking at it for a year. When you release it, that's when you get feedback. But with open source, you get that feedback from the community and you know straight away.

Have there been any contributions from the community that just struck you as interesting or impressed you?

Yeah, with something that runs in the terminal you’d think that you’d do particularly terminal things, but not always. We had someone build a piano emulator in Textual. I've never thought someone would build that in the terminal, but it's quite cool. They've got all the keys and everything and you can click each one and it plays the MIDI music. It’s fun watching creative people take what you’ve built and do something really for themselves.

What are some other projects that you're using or you've come across that you think are really interesting right now?

There's pydantic, which got funded recently, that’s quite an interesting project. It's a data validation library that uses Python typing, so it's very easy to express in Python form. It's very natural.

There's also a project called MkDocs that's really excellent. It basically builds docs and it can extract signatures in your code and it looks really nice. It's very navigable. That's a really terrific project and he's doing amazing work.

To suggest a maintainer, write to Allison at allison@infield.ai.

If you’re interested in learning more about how Infield can help your team keep its open source dependencies up to date, write to hello@infield.ai.

Once a Maintainer: Stephen Ierodiaconou

Allison Pike — Fri, 02 Feb 2024 14:49:30 GMT

Welcome to Once a Maintainer, where each week we interview an open source maintainer and tell their story.

This week we’re talking to Stephen Ierodiaconou, a freelance Ruby and JavaScript developer who has created several recent Ruby gems like vident and typed_operation.

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you get into programming?

I started out interested in electronics. I was building small circuits and stuff like that. At some point, me and some friends, we started going to the IT department computers, which back then were either DOS or Windows 95 and basically started messing around in QBasic. I mean initially we're just like building stuff with you know print and choose your own adventure text-based games, right? But then we started getting some PC list magazines with listings at the back, copying those and it got us more interested in micro controllers and programming them so one thing sort of led to another.

By the time I was in my late teens, I got more into software, and I progressed through VisualBasic, a bit of C and C++. And of course I wanted to build a game. That kind of got me hooked and I got quite interested in the mix of hardware and software, CPU design and stuff like that. I even designed a tiny 4 bit CPU which was a bit rubbish, but it was interesting.

Was this an interest of your parents, or do you know where this interest came from at such an early age?

When I was in my early teens I was given one of those little science kits for kids. And to be honest, it's very simple. You just throw some wires together on some circuits that are already there. But the experience of actually seeing something working, it's magical. And then the combination with software was that it suddenly became so much more malleable, right? Because you build your circuit, that's fine. I see that there's a timer, it makes an LED flash fine, but there's not much more you can do with it until you start building another circuit. But with software, suddenly you can completely change what you were doing. And that means that the experience becomes so much more engaging.

When I went to university I decided to study electrical engineering because I still was really interested in electronics and I wanted to kind of stay in that realm. I elected to do a lot of the units that were sort of computer architecture, VLSI, microelectronics and that sort of thing. And then I went into academia and studied signal processing. But I never formally studied computer science.

Did you have any friends locally or community locally that you were talking through those problems with? Or were you using the Internet to share with anyone?

No, initially it was just me in a bubble. Well, I had one friend in school who was interested in the same stuff making little games and things. But I think actually the first thing that pushed me into open source was the fact that I didn't really have anybody else to work with on things, and I wanted to. I discovered that you could just go on SourceForge or the other alternative at the time and find like a million projects. And that really inspired me into actually taking those steps because I needed to have somebody else to learn off, right? I was essentially teaching myself through some books and the Internet.

Do you remember any open source projects at the time that you learned from?

One of the first projects where I saw a contribution opportunity and became part of the community was a project called Stendhal, it's a game written in Java. At the time, I didn't know any Java, so I didn't really know how I could contribute with code, but I joined the IRC channel. I just sort of chatted to people and eventually I realized that, you know, I could help with drawing up maps for the game world. I could triage tickets. So I was helping triage bug tickets and stuff on SourceForge, write documentation, proofread stuff, whatever. And I realized there were a lot of different ways that I could help out that wasn’t writing code. In fact, I don't think I ever wrote code for that project. But it was fun and I felt like part of the group and made friends.

It sounds like open source and the open source community was really part of your journey to being a professional software developer, where a lot of people come at it from the other direction. They go into some company, and maybe a mentor or someone at the company says, hey, we use open source, and they start contributing. And your story is instead this very organic kind of thing where open source has just been part of your blood for a very long time. So that's really interesting.

That was definitely part of it, looking for a community of people as well as working on problems I found interesting. I did go off the open source radar for a long time after I got a real job, which is not uncommon. My first full time job after academia was in startups and it was really startup style working late nights and everything. And with a long stint of years without contributing it really knocks you out of the space. If you have no cadence at all, if you don't try to at least do something every now and then, it feels like it's going to be way too hard to get back into it again. And that's what happened to me.

Really last year kicked it off again for me. I really like reading other people’s code and investigating issues and things like that. And being a freelancer, which I’ve been for a while now, I thought it would also be a great way to show off your work. It’s sort of like an online CV. As Hacktoberfest was coming up I saw a tweet or something where Richard Schneeman talked about his new book which is called How to Open Source. And I just thought, well, sounds perfect, I'll just get that. And it turned out to be great. It really helped me get back into it.

He has this framework to apply that he calls COIL. And I really took it word for word and applied it to a few projects during Hacktoberfest, to find an opportunity to contribute, to actually do the contribution, to get it merged. And that got the ball rolling for me again. I had a goal in my head as well, which was to try and get a contribution into Ruby the programming language itself, which I managed to do by the end of that year. So anyway, big thanks for Richard for his book.

I was going to ask, where does Ruby come into this? Because it didn't sound like you were programming in Ruby, you know, 10 years ago.

Yeah no, fair enough. My entry into the Ruby world came just because I got a job where they were built on Ruby. I’d never really seen or used Ruby or Rails before. And I think in the beginning it was a little bit of a love/hate relationship. My previous job was working with JavaScript and the server side, so it was kind of like jumping into a different realm.

When I went back to freelancing after that job, I actually went back to JavaScript and I worked for a while doing TypeScript. But then a project came up that was Ruby again and I took it and I don't know what happened it clicked for me this time, something I can’t quite put my finger on. I guess in a way Ruby as a programming language is designed with this in mind, right? It's to try and make you as a developer feel something about it that is nice to work with, productive, and makes you feel happy, right? Since then I've only been writing Ruby basically. I like the community as well. And I thought, you know, why don't I just dedicate myself to this only from now on?

How do you think about the community for your own gems that you put out there?

Well to be honest my projects don’t have contributions from the community, they're just very small projects that I've released myself in the last year. But I would love to get more people involved simply because it’s fun to bounce ideas off of other people and talk about problems.

One thing I’ve found about making your own gems is that you can take an application that you’ve already written, take a piece of the code, and make a gem out of it. Essentially you've already built something, you've already got code there, and as long as of course in my case I get permission from the client or whatever, but extracting a piece of code from an existing application and turning it into a gem is very easy because it's already written. That’s what I did. I sort of just threw them out into the wild. And once you do that you get inspired to improve it, partly because there’s this imposter syndrome which comes when you put something on the Internet and you know people can read it.

Who are some other people in the community you think are doing really interesting work?

One of the things I’ve enjoyed recently is Joel Drapper, the maintainer of Phlex, he’s been working on some gems that he’s experimenting with all the time, trying out new things and iterating on them. He’s got a Discord and it’s nice to chat with other people on it.

Also Maxime Chevalier-Boisvert and the whole YJIT team is doing some really amazing stuff. I think what they're doing is really cool because they're actually technically implementing a complex system inside Ruby, improving Ruby for the whole of the community.

The other project I’ve been interested in recently is Marco Roth’s TypeFusion project, which is basically sort of runtime type analysis. Or rather just gathering types at runtime from a running application about say a particular gem or piece of code and then pushing it up to gem.sh such that you're gathering statistics about the types that are being used in a particular gem. And from there you could, you know, generate RBS files, which I think is really interesting because one of the big problems of adoption for something like RBS is obviously having to write the RBS in the first place. So I think it's a really cool idea.

To suggest a maintainer, write to Allison at allison@infield.ai.

If you’re interested in learning more about how Infield can help your team keep its open source dependencies up to date, write to hello@infield.ai.

Once a Maintainer: Robert Mosolgo

Allison Pike — Fri, 26 Jan 2024 14:31:48 GMT

Welcome to Once a Maintainer, where each week we interview an open source maintainer and tell their story. We’re back with a great lineup for 2024!

This week we’re talking to Robert Mosolgo, creator of the graphql-ruby gem and prolific open source maintainer, linguist, and dairy farmer. Robert is a former Senior Platform Engineer at Github and was the first maintainer of the react-rails gem outside of Facebook (now Meta).

Once a Maintainer is written by the team at Infield, a platform for managing open source dependency upgrades.

How did you become a software developer?

I think it starts in like 10th grade with my TI-83 calculator. When I realized that I could create games using the digits and numerals. You know I read somewhere recently that in the past, I think it was in a military prison - people would teach each other languages, passing strings with knots tied in them. It's like, man, when you're in 10th grade science class and you have nothing else to do, programming your calculator seems like a great idea.

I didn't code at all in college. I studied Chinese and linguistics, but I graduated and none of my job plans panned out, so I ended up kind of pushing papers in a professor's office, working on research projects with him. And that's when the opportunity to start writing code again came up as I was smashing CSVs together and learning about relational databases. A buddy of mine said you would really love Ruby on Rails, you should really look it up. And I told him no. And he suggested it again. And I told him no again.

Why did you say no?

Because I had a thousand line php file that was doing a great job. Why would I learn something else?

Right.

But by the end of six months or something, I knew that I just wanted to write Ruby all the time. So I ended up switching jobs just to write code and and that was that.

What time period was this?

Oh, probably 2010-2011. I did Michael Hartl’s Rails Tutorial that had just been updated for Rails 3, and I think in the Rails 2 to Rails 3 shift ERB and Rails became one and the same and so there was like a gathering of momentum. I still think his tutorial is great.

Awesome. So what was the first job that you got?

The company is called Planning Center, and the short description is enterprise software for churches. So if you're scheduling musicians for Sunday church services and you need a bass player, a drummer, three singers and somebody to shake a tambourine, and you've got 20 volunteers, but you can't schedule so and so at the same time as so and so, and you can't schedule [this person] on the same Sunday that they're doing the sound board and you put all these rules in and it gives you a schedule, sends everybody emails. That was that. Now it does a lot more. Every year that passes I realize more and more what an amazing company it was and the owner, also the head developer there was just doing great work and it was a magical time for sure.

Wow. So transitioning from the professional side to your open source work, or maybe they’re related for you - how did you first get exposed to open source software, not just as a consumer, but as a contributor?

In that job at Planning Center, we used JavaScript, and the JavaScript framework wars were just ending as React stomped everything else out. We had one guy who was still going for Ember.js, but we went through a company-wide project replacing everything with React. And there was a React-Rails gem at the time that I think had been put together by someone in Facebook just so they could say there is a Rails integration, but it it wasn't Rails-native. It was just someone’s job to say there was a Rails integration. So I was young and energetic and we started using that gem a lot, and it was almost like a coup. I just started answering unanswered questions on the issues and submitting patches for bugs that were reported. The guy was like, I guess I should just let you take this over now? Could you? And I was like, sure, thank you.

That's really interesting, because a bunch of the people that we've interviewed have said that it was really intimidating to even dip their toe into contributing. What gave you the confidence to participate in the conversation?

I was young. That was a good confidence builder. And, now that I think back there was a first stop. So the JavaScript framework that our company was using at the time was called Batman.JS. It was from Shopify. There are no traces of it on the Internet anymore, but it became clear by the time that I joined Planning Center that Shopify was not actually using Batman.JS, that they were not answering questions. But we had made a big company investment in it and so, looking for a way to make my mark on the company, I devoted myself to becoming the Batman.JS expert in house. And I think at a certain point I was the worldwide Batman.JS expert. So I became the expert in a dead open source project.

But I did learn a ton because it was this full stack front end framework in the sense that it had a router and models and a network layer and views, etc. It was a great learning experience.

So how did you get into GraphQL?

Planning Center sent us to conferences, and they sent us to the first ReactConf at Facebook headquarters. And gosh, we were so eager to go to that conference that we - this is a long story.

They were selling tickets through some kind of ticketing website that we'd never heard of, and we realized that they weren't selling tickets until you know, 9 am on such and such a day. But we realized it was just a client side implementation to keeping you from submitting the form, and we were like, I think if you just open the console and submit the form it will go through. So we did that. We paid for our tickets, but we got in before it was actually open.

There were a lot of interesting ideas at that conference. But at some point during one of the intermissions, they get up and say “Is Robert Mosolgo here? Would you stand up for a minute?” They caught me for for getting my ticket too soon and he said, “We wanted to thank you for taking over the React-Rails gem” and you know, all of your contributions.

It was at that conference that Dan Schafer, and he had a co-presenter whose name is escaping me, described GraphQL and its usage in Facebook for the first time. If you were to go back and look at that conference, you would see the language is different than it is now, but my eyes lit up. I have a background in linguistics and I worked in Ruby on Rails, and I could just see how these things could be really useful. And so I went home from the conference and I pushed an empty gem to the GraphQL name on rubygems.org and decided I was going to do the thing.

Can you talk a little more on the linguistics side? What hit you as Rails and GraphQL, there’s something interesting at the intersection there?

I had recently read the book Ruby Under a Microscope and it's just such a fantastic book, and I loved thinking about how programming languages do their magic. And so you know, having the practical experience to recognize the upsides of using GraphQL, especially when you're fetching data for React and seeing that what we needed was a language implementation, I thought it would be fun.

Language implementation meaning writing the actual interpreter for the GraphQL.

Yeah, like that idea of translating this new kind of programming syntax into the query side. To me computer languages are way easier than human languages, so it made it seem like fun for sure.

How did you find, after creating the gem, being on the other side of it as a maintainer? How did you find interacting with the contributions that you received from the community?

Certainly, mostly good. Especially in those early years. So many people were so hyped up, excited, trying things out, making suggestions. I think at this point, a bit like Ruby on Rails, people take the functionality for granted and they expect it to do what it does and understandably there's a little less wiggle room for for imperfections or shortcomings. And so there's a lot more of you know, I downloaded this and I expected it to work. And for some reason it didn't.

But the other thing is, there are plenty of companies who have made deep enough investments in graphql-ruby that there are a handful of contributors out there who come out of nowhere with a really great patch. There's nothing like waking up to the notification that, you know, somebody's opened a pull request and you read the description, and you can’t think of how you would have done that. This is great. Just click merge and you're done. And I really love that.

Sometimes there's more back and forth. Somebody has a question and got halfway, and then I get to actually be useful. Point them where they need to go for the other half of the way, and I really enjoy that kind of deeper technical conversation. I've been doing a lot of that especially with folks at Shopify this year, working on performance and it’s been a lot of fun.

So you needed to write a parser for the GraphQL gem. As someone with a non-traditional academic CS background, how did you learn how to do that? That’s a hard thing to do independently.

My first parser was written with a ruby gem called Parslet and it has parsley themed documentation and it's super easy. You describe the parser in Ruby and it just makes Ruby objects that parse code. It's not really a parser generator in the more heavy metal sense, but it worked great and it was good enough to get started.

I guess my point about a background in linguistics is that parsing is probably the most familiar thing. If you've taken linguistics courses where the assignment is like, hey, here's a paragraph or here's five sentences in a language that you've never seen before, where are the boundaries between subject, object, and verb? And you've got to recognize the pattern and break it apart into pieces and put it back together. And it's very fun and very similar to parsing a computer language. So bridging the gap from that rusty school knowledge to the Parslet documentation and all of its little vegetables was a bit of work. But it came naturally enough, and it's been and remains a really fun area for nerding out.

There are just so many ways to optimize and improve the parser. Last year, you know, after Parslet, I wrote it in Ruby's own parser generator library called racc. It's a spin on yacc, and it has a similar arcane format for describing how the language is constructed. And it worked fine. But people were always complaining that it wasn't fast enough, too much memory. So this summer, all right, fine. I started from the Ruby grammar. I wrote a yacc grammar that generates a C parser, and it still makes AST nodes in Ruby, but it's 10 times less memory and 10 times faster. But recently, a few months ago, Aaron Patterson wrote a plain Ruby parser that's faster than the C parser once YJIT is warmed up. So now the ball is in my court to get the takeaways from his work and and put them into real life production. So I'm looking forward to giving that a try.

How do you balance the open source work that you do with your day job?

I don't have a day job, which is sweet. Seven years ago it became clear that GraphQL was gonna be a thing, and companies were downloading the gem and starting to use it. And I thought, there's a good place for me to formalize my relationship with them and a good way for me to make some money on all this work that I'm doing, if I do just like Mike Perham has with Sidekiq and Sidekiq Pro and Sidekiq Enterprise.

I don't own a private island like Mike does. I don't think he owns a private island, but I think Sidekiq makes a lot more money than GraphQL Pro, but it pays my bills and I don't have to do that balance. Besides that, my family just moved to a hobby farm this year, and I got to milk cows for the first time, which has been like a five year dream for me. And now I've practiced with other cheese makers and helped out at other people's farms, and so now my cows will make cheese which is sweet.

What other projects are you using now that you think are interesting, or you want people to know about?

I would say probably the newest thing that is on my mind, but it's not that new anymore, is the Fiber API in Ruby. It’s been around for a long time and it's been getting better more and more recently. Samuel Williams is the main hero champion of it right now and he's got a gem called async that once every three months, I work as hard as I can to make async and graphQL-ruby work together, and I have not quite gotten there yet. I keep getting thread deadlocks and I keep getting core dumps from crashing Ruby. But the possibility there for really easy, true parallel concurrency in the Ruby code you already have is really promising.

And I'd say in general, the performance folks at Shopify who are working on YJIT - I'm rooting for them, because I love writing Ruby and I don't want to have to write another language just because Ruby isn't fast enough and it's such cool work the ways they're optimizing it. So those are probably the two things most on my mind right now.

Ruby itself is a hard to parse language. I guess it's got so much syntax to it. Has it gone in the other direction for you at all? Have you gone back to linguistics groups and tried to take anything you’ve done back to that domain?

I have. I've just started what I think of as a 10 year journey to learn Biblical Hebrew, which is kind of crazy. In studying the translation from the Hebrew Bible to your off the shelf English Bible I’ve discovered a deep, rich world of the historical interpretation of the text. So it's not really related to programming. I'm a year in and I can sound things out and I know some words, but it's very much a slow burn and it's probably the hardest language I've ever studied.

To suggest a maintainer, write to Allison at allison@infield.ai.

If you’re interested in learning more about how Infield can help your team keep its open source dependencies up to date, even with breaking changes, write to hello@infield.ai or check out our website.