Reaktiv Blog

Introducing Locomotive

When it comes to batch processing WordPress data there aren’t a ton of options available. Yet, we often find ourselves needing to query large amounts of data, and perform simple and repetitive tasks on them.

This might mean moving unapproved comments into an approved status, converting users from one role to another, or migrating posts from one content structure to another. Instances where manual effort won’t get the job done without some serious work.

Think about any time you’ve wanted to pull a large amount of posts and perform a simple action on them. Or maybe you want to manipulate users, comments, terms, or any of the other built-in WordPress post types. Writing a single function that performs a query and makes the data changes works…to a point.

But we often work with sites with thousands of posts. One query would be too much to load at once, so the query needs to be broken up. Then, it’s just a matter of setting the right offset, catching errors, performing the steps asynchronously, and pretty soon the complexity skyrockets.

In the last few years, WP-CLI has helped a lot with these kinds of tasks. But there are still hosting environments that don’t have WP-CLI enabled, or don’t allow SSH access. We have clients where that’s the case. We’ve also had occasions to make these kinds of repetitive tasks client-facing, where the command line is just not an option.

That’s why we built Locomotive.

Introducing Locomotive

Locomotive is a batch processing library and plugin that untangles the complexity of querying and making changes to large datasets. Locomotive automatically breaks these queries into a number of steps, performing smaller queries one at a time, then running each individual result through a single callback function. Along the way, it catches and logs errors and keeps track of which objects have been processed. Locomotive provides the foundation to easily create and run your own custom batch processing scripts.

Everything you register with Locomotive is available in the WordPress admin. So once your query and action are set up, it’s simply a matter of going to the Locomotive menu and running your task with the click of a button. It can be installed in just about any hosting environment, readily available to clients.

How Locomotive Works

In order to use Locomotive, you need to register your batch process, by adding the register_batch_process function to either your theme’s functions.php file or a separate plugin. Each process has two parts: your query, and the callback function to run over the results.

When you register a process, you define the parameters of your data query, using all of the arguments available to you in default WordPress queries. Then you set a simple callback function. As Locomotive runs your query, it iterates through the array of data, and passes each individual object to the callback function, where you can process it however you want. Locomotive then chunks the data into individual steps, and streams the results live as they are processed.

Where We’re At

Locomotive is in early beta. We are running this in production, but cannot commit to backwards compatibility until we reach 1.0. In the coming months, we’ll be continuing to test edge cases and consider other features that are on our roadmap.

Feedback Time!

We’re looking for testers and curious developers to let us know what they think. All of the code and documentation is up on Github, so you can download and install it from there. Pull requests and issues are welcome.

If you have a feature idea or use case, please post it in issues. We’re still fleshing out the documentation and any feedback or tips or comments would be very much appreciated.


We’ve included a short tutorial below to get started with your first Locomotive process.

Registering Your First Process

How about an example?

Let’s say you are working on a publisher’s site with over 10,000 posts. In the past, whenever they have wanted to mark a post as “Classic,” they added the custom field classic_post and set it to true. But they’ve decided to build out an entire section of Classic posts and wish to migrate these posts to a newly created category called “Classic”.

Locomotive makes this easy.

The first step is to register your process using the locomotive_init hook and the register_batch_process function added by the plugin.

First things first, let’s give your process a name. That’s how you’ll find it in the admin later.

function register_my_batch_processes() {
  register_batch_process(
    'name' => 'Convert Classic Posts',
    'type' => '',
    'args' => array(),
    'callback' => ''
  );
}
add_action( 'locomotive_init', 'register_my_batch_processes' );

Next, we need to define our type. Locomotive collects data by building on top of existing WordPress query classes. Each type maps to a WordPress query class:

WordPress Class Type
WP_Query post
WP_User_Query user
WP_Term_Query term
WP_Comment_Query comment
WP_Site_Query site (multisite only)

In this instance, we would want posts, so we pass post as our type.

function register_my_batch_processes() {
  register_batch_process(
    'name' => 'Convert Classic Posts',
    'type' => 'post',
    'args' => array(),
    'callback' => ''
  );
}
add_action( 'locomotive_init', 'register_my_batch_processes' );

args is used to pass other query parameters to scope out the dataset. Any parameter that works in WP_Query, or other corresponding query classes, will work here. For our example, we want to grab every published post that has a custom field of classic_post:

function register_my_batch_processes() {
  register_batch_process(
    'name' => 'Convert Classic Posts',
    'type' => 'post',
    'args' => array(
      'post_type'   => 'post',
      'post_status' => 'publish',
      'meta_key'    => 'classic_post'
    ),
    'callback' => ''
  );
}
add_action( 'locomotive_init', 'register_my_batch_processes' );

Pretty simple. Locomotive will grab all posts with the proper custom field, and process it through your callback function. Don’t worry about pagination, Locomotive comes with some smart defaults for chunking up the data. However, if you do want to define how much data to process each step you can use the posts_per_page argument.

The final step is to define your callback function. The function will be passed a single object, each individual post object. In this case, we need to use that function to add the post to the category “Classic”. Let’s create that function, and pass it’s name to the register_batch_process.

function register_my_batch_processes() {
  register_batch_process(
    'name' => 'Convert Classic Posts',
    'type' => 'post',
    'args' => array(
      'post_type'   => 'post',
      'post_status' => 'publish',
      'meta_key'    => 'classic_post'
    ),
    'callback' => 'add_post_to_classic_category'
  );
}
add_action( 'locomotive_init', 'register_my_batch_processes' );

function add_post_to_classic_category( $post_object ) {
  wp_set_object_terms( $post_object->ID, 'classic', 'category', true );
}

One function later and we’re done! The only thing left to do is run the process.

Running Your Process

Locomotive keeps a list of batch processes in the WP admin Tools menu. Navigate to Tools->Batches and you will see a list of the processes you just registered. To start one up, just select it and click “Run Batch.”

As the process runs, a progress bar will update as each step is completed. If, for any reason, some of the data is not processed, you will see a “Status: Failed” error and an error log that lists out these object IDs and error messages.

Locomotive also keeps track of any objects that have already been processed using metadata. It will automatically skip those. If you want to reset everything, and process all of your data again from scratch, you can select a batch process and click “Reset Batch”.

Comments

  1. We use a similar batch processing system (that’s pretty tightly coupled to our plugin, but here’s the documentation for comparison: https://github.com/eventespresso/event-espresso-core/blob/master/docs/M–Batch-Jobs-System/batch-jobs-library-overview.md).
    One of the batch jobs we use it for is generating CSV files of output. In order to do this, it helps to also know when the job is started (so we can write the column headers for the CSV file), and when the job is finished (so after we send the generated file to the user in the JS, we have a chance to delete the now-unused file). So you might find it helpful to add callbacks for when a job is first started and another one for when processing is all done.
    But this looks really easy to use and reuse which is a very nice contribution to the WP community. Thanks!

  2. Have had a play with the plugin and it looks awesome thanks 🙂

    Thought you should know you have a typo in the example above: the array isn’t properly passed to the register function

    Regards,
    Joey

  3. It’s nice to have an interface to batch processing. The code will be probably not much different from what I am always doing: retrieve the records and loop them. But – do not need to “vardump” the progress.
    BTW, your GIF shows only the success. How do you plan to show failures?

Leave a Reply

Written by:

Josh Eaton is Partner and Lead Facial Hair Cultivator at Reaktiv. He ships beautiful code facilitated by Sriracha, an intense calling to be your WordPress hero, and coffee.