Welcome to Once a Maintainer, where we interview open source maintainers and tell their story.
This week we’re talking to Adrià Mercader, maintainer of CKAN, an open source data management system powering the data portals of governments and corporations around the world, including the US government’s portal, data.gov. Adrià spoke with us from Spain.
Once a Maintainer is written by the team at Infield, a platform for managing open source upgrades.
How did you get into software development?
I have a degree in biology, and once I finished my degree I did some coursework that required Excel macros, very basic commands. Everybody hated it, but I didn’t mind it. And then I did a masters in Geographic Information Systems, which had a lot of cartography and spatial resolution and some programming. I was semi-good at it, at least for the standard that they expected there. And from there I just quickly found a job in the geospatial field, but it had a lot of programming components and I started learning more and more.
Was that first job in the software engineering department or in another area?
Yeah. It was a very small company here in Barcelona. It basically provided GIS services, and I was hired to work as a GIS analyst, but really quickly, after just a couple of weeks, they put me on the programming side. We were only like 15 people. It's not like there were departments or anything. But I started learning from people that were there and had good mentors that pointed me in the right direction. But I never set out like OK, I need to be a programmer. It was more organically copying others and going step by step.
I was really lucky that at that first company I was exposed to open source from such an early stage of my career. That’s where I got familiar with the consumer side of open source software. I wasn’t yet a maintainer or contributing, but I started getting into the community and learning how things worked.
My first contributions were these very small projects that I open sourced myself while working at this company, like small Firefox extensions. I remember that I was creating mailing lists and documentation and everything for those projects, even though of course nobody was looking at them. They were super small and just useful for what I was doing in a super specific niche. But at least it gave me an initial sense of what would be involved running a successful open source project.
When did you start working on CKAN?
At the time I was living in the UK, and then I moved countries to go back to Spain. In that time I took a few months just to learn new things. And I was really lucky that the Open Knowledge Foundation, which is the organization that created CKAN, posted a job around this time that just happened to align with what I was learning. This was around 2011. So I was incredibly lucky that they hired me then. At the time, CKAN was open source in the sense that the code was public, but it didn’t have a community around it, or a tech team or anything like that.
It was also around this time that the UK started looking for a government data portal. It was sort of the initial phase of the open data movement. And we got the UK, and then others followed. At first there were no processes, no community or standards around open data. But in the next two years or so a bunch of other governments started exploring open data, and the Open Knowledge Foundation was sort of a trailblazer in that area. So soon after it picked up data.gov, the US government’s open data portal, and it really picked up after that. That’s when we formed an official tech team, and I was already there so it was a natural move for me.
From a people perspective, how does the tech team split up duties? Are there different areas of responsibility?
I would say there’s no single person for every part of the code, but there are parts of the code that someone is more familiar with because they’ve worked with it for longer. It’s not formalized in any way, but we all have our things we are closer with -you’re the search guy or you're the database guy. It's kind of organic historical knowledge.
How do you handle roll outs of new versions with breaking changes and things like that? We’ve talked to some projects where there's kind of a formal cadence of releases say once a year, or others who want to release as infrequently as possible. How do you guys think about that?
Yeah, that's one of the major things that we’ve been discussing forever. The main issue, like most open source projects, is resourcing. It's really difficult to predict. There’s no formal distribution of resourcing as of now. People just contribute whatever time they happen to have at that specific moment. Ideally we want to do more frequent releases, maybe a major one every year because obviously that makes upgrading easier. But the reality is that we've probably been more like a year and a half, even two years between major releases. But that's something we really want to address.
As of just this month we have version 2.11 and I would say that the changes between releases has reduced a lot. We’ve made a lot of effort to make the upgrades as painless as possible. I hope in the future every half a year or so people can rely on a new version or we can make even like semi-automated ones where we just ship bug fixes every month or something like that. Another thing is that our users, who are mostly governments, they often can’t devote a lot of development time to keeping an eye on our releases or upgrading. So if they have a site that’s working, we don’t want to break it. Obviously this is a balance because we need the project to move forward. We need to introduce changes, and to patch vulnerabilities.
We’re a small project. We’ve changed as developers, we’ve changed as a project. There are a lot more tools available for maintainers than 10 years ago. We try to keep up and use whatever we can to make releases more stable and better for our users.
You mentioned that CKAN is primarily used by governments but also commercial enterprises. On the data side, are there any features that have been released more for one type of user or the other? Or that surprised you as being requested by both?
That’s a really good question. Obviously at the beginning our main target users were governments, at the national or lower level. And we didn’t target the enterprise directly, it was more like people just organically started using it. One of our tenants is that CKAN is really flexible and customizable. The core functionality is essentially a catalog, and you can build stuff on top of it and integrate it with other systems and platforms. I think that’s helped with adoption in other fields. There are very basic building blocks for a data management system, and it will take a while to plug them together and build your stuff on top of it. But then it’s incredibly flexible.
The successful projects are ones that are seeking the final technical part just to publish their data. But they have done the work beforehand to come up with an internal data management policy, train their people, and make sure the data is clean. CKAN has been around a long time at this point. It’s well known, at least in the data space we operate in. I don’t think it’s that common anymore that people come into it without having done their homework beforehand.
What’s your roadmap look like for the next year or so?
CKAN is in a pretty stable place right now. I think the next big thing will be the next major version, so CKAN 3.0. It’s probably going to be a refreshed front end. Right now it’s a bit clunky. It’s based on Jinja templates, which is fine, but the way they structure it makes it difficult for people to extend and make their own thing. It also looks quite dated, which is being addressed with the new design. Another thing we’ve identified, through a survey of users, is friction with search. Historically we’ve used Apache Solar for search, which works really well, but it can be a pain to maintain and people want to use a more out of the box solution like Elasticsearch or even just Postgres because for most sites, they’re not massive and Postgres would work just fine. There’s also a lot of work on the data store, which is the data repository itself for CKAN, and how to make it play nice with other tooling like data lakes, big data, etc.
There are also extensions in CKAN that are widely used. They’re not part of CKAN core, but we maintain them nonetheless because a lot of people use them. I’ve maintained a couple related to metadata standards. For example there’s this standard called DCAT for presenting the metadata of a catalog, and there’s legislation in Europe and the US that government portals need to present the metadata in this particular format. We want to make it as easy as possible for sites to comply.
To suggest a maintainer, email Allison at allison@infield.ai.