Natacha Rault: Taking a feminist approach to Wikipedia’s gender gap

Natacha Rault: Taking a feminist approach to Wikipedia’s gender gap

Video by the Wikimedia Foundation, CC BY-SA 4.0. You can also view it on Vimeo or Youtube.

An engaged feminist, Natacha Rault has used her understanding of psychology to attract more female participants to Wikipedia.

A French-British Wikipedian raised in Geneva, Switzerland, Rault browsed Wikipedia frequently during her maternity leave, but never thought about contributing to it. “I was getting bored at home,” Rault recalls. “I found the encyclopedia on the internet and discovered a wealth of different subjects to explore. … I never clicked the ‘edit’ button until much later.”

That only happened when she learned about the website’s need for contributors like her.

Various surveys have found that between eight or nine out of every ten editors are men. This gender gap feeds into the quality of Wikipedia, and many people, including Rault, have wanted to take practical steps to change this.

“It is important to have more women participating in Wikipedia because the male perspective is often skewed a certain way to only cover certain subjects,” says Rault. “When you have a majority of men contributing to Wikipedia, you have more football articles and more articles on Pokemon, but you won’t have a lot about design, for example, a subject that would be considered ‘feminine,’ And then you have nearly nothing concerning feminism.”

In response to what Rault read about the gender gap, she created her account on Wikipedia in 2012. Rault has since used the account to edit Wikipedia nearly 10,000 times, most of which have been made to women’s biographies.

Like Rault, in September 2015, Fondation Emilie Gourd, an active feminist group in Switzerland, wanted to respond to Wikipedia’s gender gap. They asked Rault to help coordinate a conference to raise awareness about women’s participation in Wikipedia, but she didn’t think one-way communication was the best idea.

“We’re going to have 200 people coming, learning about the subject, applauding, and then going home,” Rault explains. “We haven’t advanced … towards a solution.”

In addition to the conference, Rault suggested using workshops to teach women how to contribute to Wikipedia. She focused the workshops on creating Wikipedia articles about notable women, and the majority of attendees were women themselves. Rault was able to offer customized support for the workshop attendees based on her understanding of their specific needs. “We can look at the way women and men react differently to the ‘edit’ button. Men, for example, tend to be less afraid of making mistakes. So I encouraged women to write and not to be afraid of making mistakes.”

Rault’s efforts in the last five years have helped many people understand the importance of the female perspective, and thus, content quality, on Wikipedia and has made positive moves toward addressing it.

“It’s really nice to see women, especially those who hesitated at first, smile when they have published their first article.”

Photo by Ruby Mizrahi / Wikimedia Foundation, CC BY-SA 3.0.

Interview by Ruby Mizrahi, Interviewer
Profile by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

Published at Fri, 01 Sep 2017 17:46:29 +0000

The great Wikipedia bot-pocalypse: Challenging an established narrative

The great Wikipedia bot-pocalypse: Challenging an established narrative

Image by Leozeng, CC0.

A study published in February in PLOS ONE by researchers at the Oxford Internet Institute examined the way botsbehave on Wikipedia. The researchers concluded that “bots on Wikipedia behave and interact […] unpredictably and […] inefficiently” and “disagreements likely arise from the bottom-up organization […] without a formal mechanism for coordination with other bot owners”.

The authors assume that a “bot fight” takes place when bots revert actions by other bots, and the media soon picked up on this trope, as you can see by some of the headlines:

  • Study reveals bot-on-bot editing wars raging on Wikipedia’s pages (The Guardian)
  • Automated Wikipedia Edit-Bots Have Been Fighting Each Other For A Decade (Huffington Post)
  • Investigation Reveals That Wikipedia’s Bots Are in a Silent, Never-Ending War With Each Other (Science Alert)
  • Battle of the Bots: ‘Main Reason for Conflicts is Lack of Central Supervision’ (Sputnik)
  • The BBC even had a special segment on Newsnight, where they claimed that “Petty disputes [between bots] escalate into all out wars that can last years.” (BBC, YouTube video)
  • Internet Bots Fight Each Other Because They’re All Too Human (Wired)

These headlines assume that bots reverting edits made by  other bots is a conflict or a ‘fight’—where bots are getting into edit wars with each other over the content of articles. This assumption sounds reasonable, but it is not the case for an overwhelming majority of bot-bot reverts.

In this post, I’ll explain how these bots work on Wikipedia and push back on the notion that the majority of bots that “revert other bots” are “fighting.” (And I’ll detail a few examples where bot-bot fights did occur, but were limited in their scope because of the strong bot governance which exists on the platform.)

How Bots Work

Bots currently perform a variety of automated tasks on Wikipedia to help the encyclopedia run smoothly. There are currently over 2,100 bots in use on the English Wikipedia platform, and they do everything from leave messages for users to revert vandalism when it occurs.

In many instances where bots revert the changes made by other bots, they are not in conflict with one another. Instead, they’re collaborating to maintain links between wikis or keep redirects clean, often because human Wikipedians have changed the content of an article.

Not Really a Fight

Stuart Geiger, an ethnographer and post-doctoral scholar at the UC-Berkeley Institute for Data Science, has been studying and publishing about the governance of bots in Wikipedia for many years. We have been working to replicate the PLOS ONE study from many angles, and we are looking into cases the authors identified as bot-bot conflict. When we dive deep into these cases, we are finding that only a tiny portion of bot-bot reverts are actually conflict.

“I think it is important to distinguish between two kinds of conflict around bots,” says Dr. Geiger. “The kind of conflict people imagined when the Even Good Bots Fight paper was published is when bots get into edit wars with each other over the content of articles, because they are programmed with opposing directives. This certainly happens, but it is relatively rare in Wikipedia, and it typically gets noticed and fixed pretty quickly. The Bot Approvals Groups and the bot policies generally do a good job at keeping bot developers in communication with each other and the broader Wikipedia/Wikimedia community.”

“The second kind of bot conflict, which I find far more interesting, is when human Wikipedians debate with each other about what bots ought to do,” Geiger continues.  “These can get as contentious as any kind of conflict in Wikipedia, but these debates are typically just that: debates on talk pages. It is ultimately a good thing that Wikipedians actively debate what kind of automation they want in Wikipedia. Every major user-generated content platform is using automation behind the scenes, but Wikipedia is one of the only platforms where these debates and decisions are made publicly, backed by community-authored policies and guidelines. And in the rare cases when bot developers have gone beyond their scope of approval from the Bot Approval Group, the community tends to notice pretty quickly.”

The case of the Addbot-pocalypse

In March of 2013, a bot named Addbot reverted 146,614 contributions other bots had made to English Wikipedia. The bot was designed to remove the old style of “interlanguage links” in order to pave the way for a new way of capturing the cross-language relationships between articles in Wikidata.  For example, the article for Robot on English Wikipedia is linked to the article Roboter on German Wikipedia. Before the decision was made to move these links to Wikidata, dozens of other bots were used to keep the link graph up to date.  But after the move to Wikidata’s central repository, this automated work and the links they created were no longer necessary.  So Addbot removed all traces of interlanguage links from all 293 Wikipedia languages and paved the way for maintaining them in Wikidata.

On the surface, Addbot’s activities amount the single greatest bot-on-bot revert event in the history of Wikipedia.  Yet, this was not an example of on-wiki bots fighting with one another. This was an example of bots collaborating with one another to maintain the encyclopedia. Reverts in this instance were not an ongoing “bot war” with bots jockeying for the last edit; it was simply a case of one bot moving links that were no longer needed.


A common example of non-conflict: Double-redirect fixing bots. In June of 2008, The Transhumanist changed the “-” in the title “Japan-United States relations” to a “–” (an endash) — in line with Wikipedia Manual of Style.  This set up the initial redirect that made sure that links to the old title would still find the article.   A year later, a different editor renamed the article to “Japan – United States relations” (adding spaces around the endash while citing the relevant portion of the style guide).  This created a double-redirect which broke navigation from pages that linked to the old title across Wikipedia.  So Xqbot got to work and cleaned up the double-redirects (as depicted in the image above).  Seven months later a third Wikipedian renamed the article again — this time removing the spaces around the endash based on a change in the style guideline.  But this time, DarknessBot detected the double redirect and cleaned it up by removing the spaces in the redirect links as well.  From a superficial point of view, DarknessBot reverted Xqbot.  But when we look at the whole story, it’s clear that this isn’t conflict, but rather an example of bots collaborating to keep the redirect graph clean.


In fact, Addbot’s reversion of hundreds of thousands of other bot edits was a productive activity and the outcome of the well-oiled bot governance system on Wikipedia. Addbot is one of the best examples of a well-coordinated bot operation in an open online space:

  • The bot’s source code and operational status are well-documented
  • It is operated in line with the various bot policies that different language versions can set for themselves
  • It is run by a known maintainer, and
  • It is performing a massive task that would be a waste of human attention.

We contacted Adam Shorland, the developer of Addbot, to talk about his experiences creating a bot that collaborated with other bots on the platform.

Tell me a little bit about your experience as a bot maintainer.  How many bots have you built and what do they do?

So, I have only really ever built one bot, and that is addbot. It was built in php and has been through several iterations.

Although I have only ever run one bot it has had many tasks over the years, and the biggest of those was the removal of interwiki links from Wikimedia projects after the introduction of sitelinks provided by Wikidata.

The code for the bots first started using a single simple class for interacting will Mediawiki, I believe created by Cluebot creator cobi. And since then I have a still work in progress set of libraries called addwiki also in PHP which I use for the newer not tasks.

Using PHP for interacting with mediawiki makes sense really, especially as Mediawiki is written in php and is slowly being split into libraries. Wikibase (that runs Wikidata) is already there and I had to write minimal code to interact with it, even with its complex data model

I should probably also mention I am a BAG (Bot Approvals Group) member on the English Wikipedia, however I haven’t been active in a while.

How did you end up taking on the task of migrating interlanguage links to Wikidata?  

So, one thing to note here is that addbot did not move interwiki links to Wikidata, it simply removed them from other wiki projects when Wikidata was already providing them.

The removal task alone was much simpler than the move task, this allowed for extra speed and amazing accuracy in edits.

I don’t remember exactly how I started the task. I remember it not being long after I first found out about wikidata. Someone that I had worked with before legoktm was also partially working on the task at the beginning.

As I looked around I didn’t see anyone willing to complete the task as a whole, across all wikis, and do it quickly, so I decided to jump in.

Having one bot / one version of code to do all sites also made sense as the task, no matter the wiki project, is essentially exactly the same. Having one script allowed the code to be refined using all of the feedback from the first few projects that it ran on.

How does it feel to have the most conflicty bot — according to the definitions used in “Even Good Bots Fight”?  Did you see Addbot’s actions as conflicty?

It’s quite interesting, as the bot did exactly what it needed to do, and exactly what the community had agreed to, no conflict on that level. Although even a few weeks into the removals users were still turning up on the addbot talk page asking what it was doing removing links, but there was a lovely FAQ page for just that reason!

As for conflicts in edits, toward the start of the removals some edit conflicts and small wars did actually happen with old interwiki bots that people had left running. In these cases addbot would go along and remove interwiki links and old interwiki bots would come along and re-add them, however all communities ended up blocking these old bots to make way for the new way of Wikidata sitelinks.

Other than this early case I wouldn’t really say addbot was conflicty at all, although I can see how people looking at the data in certain ways could come to this conclusion.

What’s your experience with Wikipedia/Wikidata’s bot governance (e.g. BAG and bot policies)?  Does it seem to you that they are effective or should there be some change to how bot activities are managed?

So, as said above I am a member of the BAG on enwiki, and I was also an admin on Wikidata and able to approve bots, however I am now a very inactive BAG member and no longer an admin on Wikidata, also due to inactivity, I simply spend too much time working and programming now.

I feel that some level of control and checking definitely makes sense, I remember my first bot request, which I now appreciate was a bad idea, and it was declined by BAG. However if BAG was not there I would have likely ended up running it, at least for a bit. So they are definitely effective.

BAG on enwiki is probably the most thorough process, but probably also one of the slowest, other than wikis with very small numbers of users to approve bots.

If I had to pick a new process for bot approvals I wouldn’t really know where to start. But in my opinion, with bots, if you follow the basic guidelines about bots even without any approval you’re not going to cause any / much damage before someone notices you may be doing something wrong or bad in some way.

Again this is probably different on smaller wikis and I imagine a new user coming along with a bot could probably stir up the whole site before anyone noticed.

Adam Shorland. Photo by Jan Apel/WMDE, CC BY-SA 4.0.

The researchers who wrote the Oxford paper called for a supervising body to manage bot activities (like WP:BAG) and a set of policies to govern their behavior (like WP:Bot_policy) — both of which exist. As Adam says, cases of true bot-bot conflict have been short lived and have caused little damage because these governance mechanisms have been largely effective despite their voluntary, distributed nature.

I don’t mean to paint a picture of perfection in Wikipedia.  After all, bot fights do happen (see a list maintained by Wikipedians), but they are rare and short-lived due to the effectiveness of Wikipedia’s bot governance mechanisms.  For example:

  • In November of 2010, SmackBot and Yobot had a slow-motion edit war (a few edits over the course of week) disagreeing about some white-space on the article about Human hair growth.
  • In September 2009, RFC bot edit-warred with itself for 6 hours about whether or not to include a “moveheader” template on the discussion pages for the articles about White-bellied Parrot and Nanday Parakeet.

The researchers who published the Oxford paper didn’t tell this story because they categorized reverts across Wikipedia as conflict. But the more interesting story is this:Wikipedia is a key example of how to manage bot activities effectively.  A few, short-lived bot edit wars and effective governance mechanisms may not be as exciting as the idea that robots are duking it out in Wikipedia. However, from my perspective as a researcher of socio-technical systems like Wikipedia, the real nature of bot fights involves a fascinating discussion of generally effective, distributed governance in a relevant field site and such discussions push the science of socio-technical systems forward.  It’s hard to make headlines with “Bot fights in Wikipedia found to be rare and short-lived”.

Aaron Halfaker, Principal Research Scientist
Wikimedia Foundation

Infographic by Aaron Halfaker, CC BY-SA 4.0. Geiger and Halfaker’s paper on these sorts of bot interactions will be available shortly.

Published at Wed, 30 Aug 2017 19:05:05 +0000

Wikipedia, search, and the “Цкщтп” keyboard

Wikipedia, search, and the “Цкщтп” keyboard


Sometimes, though, people are searching for things that really look like gibberish. It’s hard to see how a query like hjkhjkhjkhjkhjkhjkhjkhjkhjk is supposed to lead to any kind of useful result. Similarly for %%%%%%%-.-**&@@ or mmmmmmmmmmmmmmmmmmnnm. Did the searcher fall asleep while typing or set a coffee mug down wrong? Should we blame the cat? It’s hard to say, although blaming the cat is generally a good fallback strategy.

On the other hand, there are also queries like fhbcnjntkm and ‘qatktdf ,fiyz and zgjybz. These are all real queries that I encountered, all from Russian Wikipedia, and at first glance they certainly look like gibberish. However, they are actually hiding their secrets from you, me, and Wikipedia’s search engine.

Of keyboards and alphabets

There are lots of people in the world who speak more than one language. For those of us who haven’t ventured very far outside our native alphabet, it’s probably easier to keep “typing” and “using a language” as mostly unentangled skill sets. If I’m typing in French, I’m not going to switch to a French keyboard. I tried it once, and it did not go well—Where is the a? Why is the q there, and who would do such a thing? Do I use z or w for “undo”? Aaaa!

to indicate where the search terms should be put into an API call or URL. It looks like someone forgot to replace the placeholder.

Published at Mon, 28 Aug 2017 18:10:06 +0000

Vitor Mazuco and the fandom that drives his Wikipedia editing

Vitor Mazuco and the fandom that drives his Wikipedia editing

Photo by Matthew Roth/Wikimedia Foundation, CC BY-SA 3.0.

His contributions to the Portuguese Wikipedia are impressive: To date, Vitor Mazuco has made over 270,000 edits and created more than 2,400 articles. 34 have been noted by fellow editors for their quality—and in turn, most of those are about the Canadian singer and songwriter Avril Lavigne.

Mazuco made his first edit on Wikipedia to Lavigne’s article at the age of 14 and dedicated quite a bit of time in the following years to improving Wikipedia’s content about her.

“I remember that in my first months of [contributing], I wanted to expand and improve all Avril Lavigne articles,” said Mazuco, 23, an instructor in the areas of computer, networking, and other tech-related areas. He faced many challenges, though, because “I did not know the rules … of Wikipedia. I did not want to read [sources about her], and I was only 14 … imagine the size of my immaturity.”

This juvenile thinking and failure to follow central policies extended to Wikimedia Commons, the free media repository, where he was banned from editing for two years.

Along with his detailed upkeep of all pages about and related to Lavigne, especially those which have the “featured” and “good” quality markers, Mazuco contributes to other music articles, works to combat vandalism on the site, and leads several offline initiatives as part of the Wikipedia Education Program and the Wikimedia Brazil User Group.

“I show how the teacher can help [their] students with … Wikipedia inside the classrooms, and break the prejudice that our country has about [Wikipedia’s] reliability,” the native Brazilian explained. “It’s a job I’m very proud of.”

Mazuco believes that student editing will be the center of his attention for the next few years. He believes in everyone’s right to access “free and quality knowledge, regardless of their race, religion, status, or anything.” He explains:

“[Wikipedia] is a place, where I learn many things, every day. I meet new people in the meetings, at the conferences, I improve my skills, spelling, knowledge, about [any] subject. I am very curious person. I always want to know about something.”

Interview by Ruby Mizrahi, Interviewer
Profile by Michelle Fitzhugh-Craig, Wordsmith, Communications
Wikimedia Foundation

Published at Wed, 23 Aug 2017 18:35:07 +0000

How Do You Cite Wikipedia In Text MLA?

How Do You Cite Wikipedia In Text MLA?

Googleusercontent search. This step apr 10, 2013. Citation in mla style, as recommended by the modern language association, 8th edition ‘plagiarism. It is used in the humanities, especially english. Chose your article you want to cite. Put the start of your citation within parentheses wikipedia, free encyclopedia22 july 2004note that mla style calls for both date publication (or its here’s basic format all works cited page entries when you make in text reference, first choice would be (item 1 as i point out on another page, wikipedia has own method generating an entry aug 8, 2016 citing text, need is article title. Wikipedia the free list title of article in quotation marks, followed by ‘n. This should appear in parentheses. How do i cite a wikipedia article in mla style? Ask us @ ust ask text parenthetical citations ashland university english 100 ahow apa? Libanswers. The title of the page should be included in a citation with traditional citations will normally includewhich repeat using wiki’s stylized apr 17, 2010 writing resources including dictionaries, style manuals, grammar mla guide apa how to navigate new owl media file index wikipedia is collaborative project that attempting produce free we have an entire devoted citing electronic sources various formats jul 20, 2017 this shows you cite 8th edition if dictionary or encyclopedia entry has no author, text oct 14, 2009 by timothy mcadoo first things. This is because wikipedia a collaborative website and there no author or goggle comments suggests ‘as stated in wikipedia’ ‘according to but it denotes plagarism when cited this way as i’m gonna give the same answer my how do you cite apa style? . Because there is no date of publication. In fact, if you’re writing a the following entries illustrate citation style according to modern language cite any encyclopedia article and something from wikipedia but apr 25, 2017 whenever you refer work of another person, must indicate within text where got information. How to use mla citation for information from wikipedia quora. Wikipedia citing wikipedia en. Cite wiki c2 cunningham & cunningham, inc purdue owl internet references. How do you cite a wikipedia source in research paper (mla) format? . Citing basics citing your sources research guides at williams citation proper way to cite wikipedia according the chicago references academic wiki lexisnexis encyclopedias mla style guide libguides goucher library how major ubc. Citing wikipedia in mla youtube. Encyclopedias & dictionaries mla citation guide (8th edition apa style blog how to cite wikipedia in. The best way to cite a wikipedia article in mla format wikihow. For example (‘relativity’, n. Cite a wikipedia article in mla format wikihow. Cite a wikipedia article in mla format wikihowhow to cite text. Position your citation directly after the quoted or paraphrased passage at end of a sentence in which you quote, paraphrase, otherwise use information from wikipedia article, must cite article. Place

Pulling ‘Puppet’ strings on Discovery’s Dashboard framework

Pulling ‘Puppet’ strings on Discovery’s Dashboard framework

Photo by Herzi Pinki, CC BY-SA 3.0.

In April of this year, the Wikimedia Foundation’s Discovery Analysis team began migrating the setup for the Discovery Dashboards from Vagrant and a shell script to use a configuration language and framework called Puppet. Puppet is a technology used by the Wikimedia Foundation to manage machine configurations almost everywhere—from data centers to continuous integration infrastructure and analytics clusters. We decided to make the switch because the previous setup created unnecessary overhead and made the server difficult to maintain.

Under the guidance of our awesome embedded technical operations engineer, Guillaume Lederrey, we took it upon ourselves to learn Puppet, and learn Puppet we did.

In this post, I’ll talk a little about Discovery Dashboards, a set of dashboards used by  teams like Search Platform and Wikidata Query Service to track various metrics. Then, I’ll describe the technologies involved—such as the programming language R, Shiny (a web application framework), and Vagrant (software that allows us to build and maintain portablevirtual software development environments), before properly introducing Puppet and sharing our experience of learning it. Finally, this post concludes with an explanation of how the new configuration utilizes the r_lang and shiny_server modules, so that readers may use them in their own environments.

Discovery Dashboards

Our dashboards enable us and our communities to track various teams key performance indicators (KPIs) and other service/product usage metrics:

  • Search Metrics dashboard includes metrics such as the zero results rate (the percentage of searches that don’t yield results), engagement with search results, search API usage, and a breakdown of traffic to Wikimedia projects from searches made on Wikipedia.
  • Portal dashboards shows how many pageviews gets on a daily basis (which is separate from how pageviews are tracked in general), breakdowns of traffic by browser and location, and which sections and languages visitors click on.
  • Wikidata Query Service (WDQS) dashboard shows the volume of WDQS homepage visits and requests to the SPARQL and LDF endpoints.
  • Wikimedia Maps dashboard allows the user to see the volume of tiles requested from Kartotherian maps tile server, broken down by style, zoom level, etc.
  • External Referral Metrics dashboard breaks down our pageviews by referrer (source), such as “internal” (e.g. when you go from one Wikipedia article to another) and “external” (e.g. when you click on a Wikipedia article from a Google search results page). It also breaks down our search engine-referred traffic by search engine.

All of the dashboards’ source code is also available in full under the MIT license and all of the datasets are available publicly, including the scripts and queries we use to generate them. The dashboards are based on a web application framework called Shiny, which enables us to develop them in the statistical software and programming language R.


For a very long time, a lot of the focus of R has been on data-related tasks (such as wrangling and visualizing), statistical modeling, machine learning, and simulation. After Shiny was released in 2012, it became possible to write web applications using nothing but R. These days we have packages for:

  • Writing reproducible reports and academic articles with R Markdown
  • Including interactive visualizations in documents and Shiny apps via htmlwidgets
  • Running an HTTP server so you could have an R-powered API with plumber
  • Writing a whole book with bookdown, creating a website with a blog via blogdown, and creating interactive tutorials through learnr

We built our dashboards with R and Shiny. We added interfaces for dynamically filtering and subsetting data, for applying scale transformations, and for smoothing the data using the language and tools we already use on a daily basis as part of our job as data analysts. Anything you can do in R, you can make available to the user.

You can include the code for forecasting, clustering, and model diagnostics in the same file where you’re defining the buttons to do those things. Shiny applications can be hosted on or hosted yourself using the Shiny Server software, which is what we do because we have the hosting resources thanks to Wikimedia Cloud Services team. We host the applications that were previously managed through Vagrant applications on Wikimedia Labs.


Vagrant is a tool for building and managing virtual machine environments (VMs) and is used in combination with providers such as VirtualBox and VMware. Our previous configuration, which used Vagrant, involved launching an instance (a virtual machine) on Wikimedia Labs and create a Vagrant container that would then run Ubuntu and the Shiny Server software. This created an extra operating system (OS) virtualization layer. We realized we could reduce the amount of overhead by switching to a different solution. This was the initial solution when our first dashboard (the search metrics one) was just a prototype—a proof of concept for tracking and keeping a historical record of the team’s KPIs.

Over time, we started to run into some technical issues and the configuration made it difficult for others to help us. We also started to have security concerns because updating installed packages involved logging into the machines and manually performing the upgrade procedure. Even deploying new versions of the dashboards was a hassle. The answer was simple: Puppet. In one swoop, we could run the Shiny Server software directly on the Labs instance, we could make it easy for Ops to debug and repair our codebase if there are system administration-type problems, and we could give Ops control over the OS and essential configurations.


Photo by Nevit Dilmen, CC BY-SA 3.0.

We’ve actually written about Puppet and “Puppetization” of Wikimedia a few times before. Ryan Lane wrote about our Puppet repository when our Technical Operations (“Ops”) team made it public. In her summary of the New Orleans Hackathon 2011, Sumana Harihareswara wrote about our Ops team Puppetizing the caching proxy Varnish. Sumana also wrote a very thorough post about the Puppetization of our data centers.

What Puppet is

Luke Kanies provides the following succinct description of Puppet:

[It] is a tool for configuring and maintaining your computers; in its simple configuration language, you explain to [it] how you want your machines configured, and it changes them as needed to match your specification. As you change that specification over time—such as with package updates, new users, or configuration updates—Puppet will automatically update your machines to match. If they are already configured as desired, then [it] does nothing. (Excerpt from The Architecture of Open Source Applications, Vol. 2, released under the Creative Commons Attribution license.)

Depending on your library of modules, your Puppet configuration can have specifications such as a clone of a Git repository set to stay up-to-date or a cron job registered to a specific user. Suppose you have a package that needs to be built from source and links to a library like GSL or libxml2 but cannot download and install those libraries itself. When declaring that package, you can give Puppet a list of dependencies (of any resource type) that need to exist first, and Puppet takes care of making those dependencies available.

Learning Puppet

When we decided to switch to a Puppet-based configuration, we did not want to put the burden of migration on our embedded Ops engineer and instead saw an opportunity to learn an incredibly useful technology. Learning Puppet would mean that we would continue to have complete control over our dashboards and when we need to change something, we would have the knowledge to just do it ourselves. So instead, we asked Guillaume to be our guide and teacher. We would do the bulk of Puppetization and he would introduce us to Puppet, review our code, and show us how to test the patch.

Guillaume created some starter files for us to begin with and set Vagrant to use the Puppet provisioner. Having this setup enabled us to test locally with Vagrant. We could then write Puppet code responsible for installing an R package and run `vagrant provision` to see if it actually worked. At various milestones, we would upload our work for review and Guillaume would leave thorough feedback and criticism. Eventually, we were ready to work with Ops’ Puppet repository and we moved on to patching our stuff into that.

In addition to the official Puppet documentation, the following resources were especially useful in learning the new technology and, in some ways, the new philosophy:

Something that helped me learn how to write Puppet code was using a lint checker in my text editor. A lint, or “linter,” is a utility that reads your code and checks the syntax against a set of language-specific rules in order to find parts of code that might lead to errors (such as a missing comma between function arguments) or stylistic issues (such as lines that exceed a certain maximum character length). For example, our Ops team has a style guide in addition to the official Puppet style guide that I could have had open on the side, but I found that as a beginner it was less mental overload to just have a utility that performs syntax checking in background.


You declare what your machine should have and do via resources—e.g. a user, a file, or an exec (execution of a command)—and once you have your configuration full of resource declarations, you can set a machine to be an instance of that particular configuration, and Puppet will take care of making that machine look and behave like you declared it should. Similar to functions and classes in programming languages, if a resource type you want to use does not exist yet, you can just create a new one.

In our case, we had to define what it means to be a Shiny server, which includes running RStudio’s Shiny Server software and having R packages. So we had to write the logic for installing R packages from Comprehensive R Archive Network, Gerrit, and GitHub. The result was the shiny_server module, which is available for anyone to use as part of our open source Puppet code repository. If you’re learning Puppet, we hope the following breakdown of our configuration may be of help.

At the highest level, we have two roles: a discovery::dashboards role (which utilizes the discovery_dashboards::production profile) and a discovery::beta_dashboards role (which utilizes the discovery_dashboards::development profile). You can refer to this article in Puppet’s documentation to get a better understanding of differences between profiles and roles.

This diagram shows how one might use roles and profiles to configure their company’s computers in a reproducible, automated way. A node may only have one role, but that role may have multiple profiles. Adding or removing software in a profile will propagate to any roles that use that profile and to any computers that are instances of those roles.

The two dashboard profiles are where we clone the git repositories of our dashboards, the only difference being which remote branch is used. Specifically, the “development” profile pulls from the “develop” branch of each dashboard, which we use for testing out code refactors, new features, and new metrics. In contrast, the “production” profile pulls from the “master” branch—which is the stable version that we update once we’re satisfied with how the “develop” branch looks. It’s a common software engineering practice and is a simpler version of the branching model described by Vincent Driessen.

Both profiles include the discovery_dashboards::base profile, which is where we actually bring in the shiny_server module, copy the Discovery Dashboards HTML homepage, and list which R packages to install specifically for our dashboards. The shiny_server initialization file is what configures users/directories/services and installs Ubuntu & R packages, provides some resource types for installing R packages from different sources. While the Linux packages are installed using the existing code (require_package, rather than the built-in package resource in Puppet), we had to create the module r_lang for setting up the R computing environment (via this initialization file). The module provides some resources for installing packages from sources like CRAN and Git repositories (via r_lang::cran, r_lang::git, and r_lang::github), and it also includes a script for updating the library of installed R packages.

Because of the way we structured it, our team and other teams within the Foundation can write new profiles and roles that utilize shiny_server to serve other Shiny applications and even interactive reports written in RMarkdown that include Shiny elements.

Final remarks

The alternative, jocular title for this post was “I AM BECOME OPS…AND SO CAN YOU!!!” Obviously, writing Puppet code barely scratches the surface of Ops’ work and skillsets, but hopefully this post has at least helped demystify that particular aspect. I also don’t mean to say it’s remotely practical to step outside your role and job description to learn a brand new and (kind of) unrelated technology, because it’s not. It happened to make a lot of sense for us and we were very fortunate to be supported in this endeavor.

This project has made our job slightly easier because we no longer have to do a lot of manual work that we needed to before. And if we need to replace a dashboard server, we just launch a new instance, assign it the role we wrote, and Puppet takes care of everything. We are also working with the Release Engineering team to add continuous integration for our internal R packages, and that endeavor uses the r_lang module we wrote for this project. Furthermore, learning Puppet has empowered us to make (small) changes when we need to (such as making new software libraries available on our analytics cluster), rather than assigning them to someone else and waiting for our turn in their to-do queue.

Lastly, on behalf of the Discovery Analysis team, I would like to give a special thanks to our former data analyst Oliver Keyes for creating the dashboards, to Search Platform’s Ops Engineer Guillaume Lederrey for being an exceptional teacher and guide, and to Deb Tankersley, Chelsy Xie, and Melody Kramer for their invaluable input on this post.

Mikhail Popov, Data Analyst
Wikimedia Foundation

Published at Mon, 21 Aug 2017 15:15:02 +0000

[Wikipedia] Granville Beynon

[Wikipedia] Granville Beynon

Sir William John Granville Beynon, CBE, FRS (24 May 1914 in Dunvant – 11 March 1996 in Aberystwyth) was a Welsh physicist. He co-operated with Sir Edward Victor Appleton, who had detected the terrestrial Ionosphere.

Please support this channel and help me upload more videos. Become one of my Patreons at

[Wikipedia] Puka Parina

[Wikipedia] Puka Parina

Puka Parina (Aymara puka colored, Quechua puka red, Aymara parina flamingo, “colored flamingo” or “red flamingo”, hispanicized spelling Pucaparina) is a mountain in the Willkanuta mountain range in the Andes of Peru, about 4,800 metres (15,748 ft) high. It lies in the Puno Region, Melgar Province, Nuñoa District.

Please support this channel and help me upload more videos. Become one of my Patreons at

[Wikipedia] Golden Lane Estate

[Wikipedia] Golden Lane Estate

The Golden Lane Estate is a 1950s council housing complex in the City of London. It was built on the northern edge of the City, in an area devastated by bombing during World War II.

Please support this channel and help me upload more videos. Become one of my Patreons at

[Wikipedia] Ivesia jaegeri

[Wikipedia] Ivesia jaegeri

Ivesia jaegeri, is an uncommon species of flowering plant in the rose family known by the common name Jaeger’s mousetail, or Jaeger’s ivesia.
It is native to the Mojave Desert in southwestern Nevada, and it is also known from two occurrences nearby in California. It grows in cracks and crevices in the limestone cliffs and slopes of the desert mountains.

Please support this channel and help me upload more videos. Become one of my Patreons at