Skip to main content

Perform bulk operations with the GraphQL Admin API

With the GraphQL Admin API, you can use bulk operations to asynchronously fetch data in bulk. The API is designed to reduce complexity when dealing with pagination of large volumes of data. You can bulk query any connection field that's defined by the GraphQL Admin API schema.

Instead of manually paginating results and managing a client-side throttle, you can instead run a bulk query operation. Shopify's infrastructure does the hard work of executing your query, and then provides you with a URL where you can download all of the data.

The GraphQL Admin API supports querying a single top-level field, and then selecting the fields that you want returned. You can also nest connections, such as variants on products.

Apps are limited to running a single bulk operation at a time per shop. When the operation is complete, the results are delivered in the form of a JSONL file that Shopify makes available at a URL.

Note

Bulk operations are only available through the GraphQL Admin API. You can't perform bulk operations with the Storefront API.


  • You can run only one bulk operation of each type (bulkOperationRunMutation or bulkOperationRunQuery) at a time per shop.

  • The bulk query operation has to complete within 10 days. After that it will be stopped and marked as failed.

When your query runs into this limit, consider reducing the query complexity and depth.


The complete flow for running bulk queries is covered later, but below are some small code snippets that you can use to get started quickly.

Anchor to Step 1. Submit a queryStep 1. Submit a query

Run a bulkOperationRunQuery mutation and specify what information you want from Shopify.

The following mutation queries the products connection and returns each product's ID and title.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

mutation {
bulkOperationRunQuery(
query: """
{
products {
edges {
node {
id
title
}
}
}
}
"""
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}

JSON response

{
"data": {
"bulkOperationRunQuery": {
"bulkOperation": {
"id": "gid:\/\/shopify\/BulkOperation\/720918",
"status": "CREATED"
},
"userErrors": []
}
},
...
}

Anchor to Step 2. Wait for the operation to finishStep 2. Wait for the operation to finish

To retrieve data, you need to wait for the operation to finish. You can determine when a bulk operation has finished by using a webhook or by polling the operation's status.

Tip

Subscribing to the webhook topic is recommended over polling as it limits the number of redundant API calls.

Anchor to Option A. Subscribe to the ,[object Object], webhook topicOption A. Subscribe to the bulk_operations/finish webhook topic

Note

Using webhooks with bulk operations is only available in Admin API version 2021-10 and higher.

You can use the webhookSubscriptionCreate mutation to subscribe to the bulk_operations/finish webhook topic in order to receive a webhook when any operation finishes - in other words, it has completed, failed, or been cancelled.

For full setup instructions, refer to Configuring webhooks.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL mutation

mutation {
webhookSubscriptionCreate(
topic: BULK_OPERATIONS_FINISH
webhookSubscription: {
format: JSON,
uri: "https://12345.ngrok.io/"}
) {
userErrors {
field
message
}
webhookSubscription {
id
}
}
}

JSON response

{
"data": {
"webhookSubscriptionCreate": {
"userErrors": [],
"webhookSubscription": {
"id": "gid://shopify/WebhookSubscription/4567"
}
}
},
"extensions": {
"cost": {
"requestedQueryCost": 10,
"actualQueryCost": 10,
"throttleStatus": {
"maximumAvailable": 1000,
"currentlyAvailable": 990,
"restoreRate": 50
}
}
}
}

After you've subscribed to the webhook topic, Shopify sends a POST request to the specified URL any time a bulk operation on the store (both queries and mutations) finishes.

Example webhook response

{
"admin_graphql_api_id": "gid://shopify/BulkOperation/720918",
"completed_at": "2024-08-29T17:23:25Z",
"created_at": "2024-08-29T17:16:35Z",
"error_code": null,
"status": "completed",
"type": "query"
}

You now must retrieve the bulk operation's data URL by using the node field and passing the admin_graphql_api_id value from the webhook payload as its id:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

query {
node(id: "gid://shopify/BulkOperation/720918") {
... on BulkOperation {
url
partialDataUrl
}
}
}

JSON response

{
"data": {
"node": {
"url": "https:\/\/storage.googleapis.com\/shopify\/dyfkl3g72empyyoenvmtidlm9o4g?<params />",
"partialDataUrl": null
}
},
"extensions": {
"cost": {
"requestedQueryCost": 1,
"actualQueryCost": 1,
"throttleStatus": {
"maximumAvailable": 1000,
"currentlyAvailable": 999,
"restoreRate": 50
}
}
}
}

For more information on how webhooks work, refer to Webhooks.

Note

Webhook delivery isn't always guaranteed, so you might still need to poll for the operation's status to check when it's finished.

Anchor to Option B. Poll your operation's statusOption B. Poll your operation's status

While the operation is running, you can poll to see its progress using the currentBulkOperation field. The objectCount field increments to indicate the operation's progress, and the status field returns whether the operation is completed.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

query {
currentBulkOperation {
id
status
errorCode
createdAt
completedAt
objectCount
fileSize
url
partialDataUrl
}
}

JSON response

{
"data": {
"currentBulkOperation": {
"id": "gid:\/\/shopify\/BulkOperation\/720918",
"status": "COMPLETED",
"errorCode": null,
"createdAt": "2024-08-29T17:16:35Z",
"completedAt": "2024-08-29T17:23:25Z",
"objectCount": "57",
"fileSize": "358",
"url": "https:\/\/storage.googleapis.com\/shopify\/dyfkl3g72empyyoenvmtidlm9o4g?<params />",
"partialDataUrl": null
}
},
...
}

Anchor to Step 3. Retrieve your dataStep 3. Retrieve your data

When an operation is completed, a JSONL output file is available for download at the URL specified in the url field. If the query produced no results, then the url field will return null.

See Download result data for more details on the files we return and JSONL file format for how to parse them.


Below is the high-level workflow for creating a bulk query:

  1. Identify a potential bulk operation.

    You can use a new or existing query, but it should potentially return a lot of data. Connection-based queries work best.

  2. Test the query by using the Shopify GraphiQL app.

  3. Write a new mutation document for bulkOperationRunQuery.

  4. Include the query as the value for the query argument in the mutation.

  5. Run the mutation.

  6. Wait for the bulk operation to finish by either:

    1. Subscribing to a webhook topic that sends a webhook payload when the operation is finished.
    2. Polling the bulk operation until the status field shows that the operation is no longer running.

    You can check the operation's progress using the objectCount field in currentBulkOperation.

  7. Download the JSONL file at the URL provided in the url field.

Anchor to Identify a potential bulk queryIdentify a potential bulk query

Identify a new or existing query that could return a lot of data and would benefit from being a bulk operation. Queries that use pagination to get all pages of results are the most common candidates.

The example query below retrieves some basic information from a store's first 50 products that were created on or after January 1, 2024.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
products(query: "created_at:>=2024-01-01 AND created_at:<2024-05-01", first: 50) {
edges {
cursor
node {
id
createdAt
updatedAt
title
handle
descriptionHtml
productType
options {
name
position
values
}
priceRange {
minVariantPrice {
amount
currencyCode
}
maxVariantPrice {
amount
currencyCode
}
}
}
}
pageInfo {
hasNextPage
}
}
}
Tip

Use the Shopify GraphiQL app to run this query against your development store. The query used in a bulk operation requires the same permissions as it would when run as a normal query, so it's important to run the query first and ensure it succeeds without any access denied errors.

Anchor to Write a bulk operationWrite a bulk operation

To turn the query above into a bulk query, use the bulkOperationRunQuery mutation. It's easiest to begin with a skeleton mutation without the query value:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL mutation

mutation {
bulkOperationRunQuery(
query:"""
"""
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}
  • The triple quotes (""") define a multi-line string in GraphQL.
  • The bulk operation's ID is returned so you can poll the operation.
  • The userErrors field is returned to retrieve any error messages.

Paste the original sample query into the mutation, and then make a couple of minor optional changes:

  • The first argument is optional and ignored if present, so it can be removed.
  • The cursor and pageInfo fields are also optional and ignored if present, so they can be removed.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL mutation

mutation {
bulkOperationRunQuery(
query:"""
{
products(query: "created_at:>=2024-01-01 AND created_at:<2024-05-01") {
edges {
node {
id
createdAt
updatedAt
title
handle
descriptionHtml
productType
options {
name
position
values
}
priceRange {
minVariantPrice {
amount
currencyCode
}
maxVariantPrice {
amount
currencyCode
}
}
}
}
}
}
"""
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}

If the mutation is successful, then the response looks similar to the example below:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

JSON response

{
"data": {
"bulkOperationRunQuery": {
"bulkOperation": {
"id": "gid:\/\/shopify\/BulkOperation\/1",
"status": "CREATED"
},
"userErrors": []
}
},
...
}

Anchor to Wait for the bulk operation to finishWait for the bulk operation to finish

To retrieve data, you need to wait for the operation to finish. You can determine when a bulk operation has finished by using a webhook or by polling the operation's status.

Anchor to Option A. Use the ,[object Object], webhook topicOption A. Use the bulk_operations/finish webhook topic

Use the webhookSubscriptionCreate mutation to subscribe to the bulk_operations/finish webhook topic. For full setup instructions, refer to Configuring webhooks.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL mutation

mutation {
webhookSubscriptionCreate(
topic: BULK_OPERATIONS_FINISH
webhookSubscription: {
format: JSON,
uri: "https://12345.ngrok.io/"}
) {
userErrors {
field
message
}
webhookSubscription {
id
}
}
}

After you've subscribed, you'll receive a webhook any time a bulk operation on the store (both queries and mutations) finishes (for example, completes, fails, or is cancelled). Refer to the GraphQL Admin API reference for details on the webhook payload.

Once you receive the webhook, you must retrieve the bulk operation's data URL by querying the node field and passing in the ID given by admin_graphql_api_id in the webhook payload:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

query {
node(id: "gid://shopify/BulkOperation/1") {
... on BulkOperation {
url
partialDataUrl
}
}
}

Anchor to Option B. Poll a running bulk operationOption B. Poll a running bulk operation

Another way to determine when the bulk operation has finished is to query the currentBulkOperation field:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
currentBulkOperation {
id
status
errorCode
createdAt
completedAt
objectCount
fileSize
url
partialDataUrl
}
}

The field returns the latest bulk operation created (regardless of its status) for the authenticated app and shop. If you want to look up a specific operation by ID, then you can use the node field:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
node(id: "gid://shopify/BulkOperation/1") {
... on BulkOperation {
id
status
errorCode
createdAt
completedAt
objectCount
fileSize
url
partialDataUrl
}
}
}

You can adjust your polling intervals based on the amount of data that you expect. For example, if you're currently making pagination queries manually and it takes one hour to fetch all product data, then that can serve as a rough estimate for the bulk operation time. In this situation, a polling interval of one minute would probably be better than every 10 seconds.

To learn about the other possible operation statuses, refer to the BulkOperationStatus reference.

Anchor to Check an operation's progressCheck an operation's progress

Although polling is useful for checking whether an operation is complete, you can also use it to check the operation's progress by using the objectCount field. This field provides you with a running total of all the objects processed by your bulk operation. You can use the object count to validate your assumptions about how much data should be returned.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
currentBulkOperation {
status
objectCount
url
}
}

For example, if you're trying to query all products created in a single month and the object count exceeds your expected number, then it might be a sign that your query conditions are wrong. In that case, you might want to cancel your current operation and run a new one with a different query.


Anchor to Download result dataDownload result data

Only once an operation is finished running will there be result data available.

If an operation successfully completes, the url field will contain a URL where you can download the data. If an operation fails but some data was retrieved before the failure occurred, then a partially complete output file is available at the URL specified in the partialDataUrl field. In either case, the URLs return will be signed (authenticated) and will expire after one week.

Now that you've downloaded the data, it's time to parse it according to the JSONL format.


Anchor to The JSONL data formatThe JSONL data format

Normal (non-bulk) GraphQL responses are JSON. The response structure mirrors the query structure, which results in a single JSON object with many nested objects. Most standard JSON parsers require the entire string or file to be read into memory, which can cause issues when the responses are large.

Since bulk operations are specifically designed to fetch large datasets, we've chosen the JSON Lines (JSONL) format for the response data so that clients have more flexibility in how they consume the data. JSONL is similar to JSON, but each line is its own valid JSON object. To avoid issues with memory consumption, the file can be parsed one line at a time by using file streaming functionality, which most languages have.

Each line in the file is a node object returned in a connection. If a node has a nested connection, then each child node is extracted into its own object on the next line. For example, a bulk operation might use the following query to retrieve a list of products and their nested variants:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
products {
edges {
node {
id
variants {
edges {
node {
id
title
}
}
}
}
}
}
}

In the JSONL results, each product object is followed by each of its variant objects on a new line. The order of each connection type is preserved and all nested connections appear after their parents in the file. Because connections are no longer nested in the response data structure, the bulk operation result automatically includes the __parentId field, which is a reference to an object's parent. This field doesn't exist in the API schema, so you can't explicitly query it.

{"id":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986123","title":"52","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/ProductVariant/19435458986040","title":"70","__parentId":"gid://shopify/Product/1921569226808"}
{"id":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/ProductVariant/19435459018808","title":"34","__parentId":"gid://shopify/Product/1921569259576"}
{"id":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/ProductVariant/19435459051576","title":"Default Title","__parentId":"gid://shopify/Product/1921569292344"}
{"id":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/ProductVariant/19435459084344","title":"36","__parentId":"gid://shopify/Product/1921569325112"}
{"id":"gid://shopify/Product/1921569357880"}
{"id":"gid://shopify/ProductVariant/19435459117112","title":"47","__parentId":"gid://shopify/Product/1921569357880"}

Most programming languages have the ability to read a file one line at a time to avoid reading the entire file into memory. This feature should be taken advantage of when dealing with the JSONL data files.

Here's a simple example in Ruby to demonstrate the proper way of loading and parsing a JSONL file:

# Efficient: reads the file a single line at a time
File.open(file) do |f|
f.each do |line|
JSON.parse(line)
end
end

# Inefficient: reads the entire file into memory
jsonl = File.read(file)

jsonl.each_line do |line|
JSON.parse(line)
end

To demonstrate the difference using a 100MB JSONL file, the "good" version would consume only 2.5MB of memory while the "bad" version would consume 100MB (equal to the file size).

Other languages:


Bulk operations can fail for any of the reasons that a regular GraphQL query would fail, such as not having permission to query a field. For this reason, we encourage you to run the query normally first to make sure that it works. You'll get much better error feedback than when a query fails within a bulk operation.

When a bulk operation fails, some data might be available to download, the status field returns FAILED, and the errorCode field includes one of the following codes:

  • ACCESS_DENIED: there are missing access scopes. Run the query normally (outside of a bulk operation) to get more details on which field is causing the issue.
  • INTERNAL_SERVER_ERROR: something went wrong on our server and we've been notified of the error. These errors might be intermittent, so you can try submitting the query again.
  • TIMEOUT: one or more query timeouts occurred during execution. Try removing some fields from your query so that it can run successfully. These timeouts might be intermittent, so you can try submitting the query again.
Tip

Querying resources using a range search might timeout or return an error if the collection of resources is sufficiently large, and the search field is different from the specified (or default) sort key for the connection you are querying. If your query is slow or returns an error, then try specifying a sort key that matches the field used in the search. For example, query: "created_at:>2024-05-01", sortKey: CREATED_AT.

To learn about the other possible operation error codes, refer to the BulkOperationErrorCode reference.

If bulk operations have stalled, then they might be canceled by Shopify. After bulk operations are canceled, a status of CANCELED is returned. You can retry canceled bulk operations by submitting the query again.

Note

When using the bulk_operations/finish webhook, the error_code and status fields in the webhook payload will be lowercase. For example, failed instead of FAILED.


Anchor to Canceling an operationCanceling an operation

To cancel an in-progress bulk operation, use the bulkOperationCancel mutation with the operation ID.

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL mutation

mutation {
bulkOperationCancel(id: "gid://shopify/BulkOperation/1") {
bulkOperation {
status
}
userErrors {
field
message
}
}
}

You can run only one bulk operation of each type (bulkOperationRunMutation or bulkOperationRunQuery) at a time per shop. This limit is in place because operations are asynchronous and long-running. To run a subsequent bulk operation for a shop, you need to either cancel the running operation or wait for it to finish.

Anchor to How bulk operations fit within the Admin API rate limitsHow bulk operations fit within the Admin API rate limits

Bulk operations are initiated by you, the API consumer, by supplying a query string within the bulkOperationRunQuery mutation. Shopify then executes that query string asynchronously as a bulk operation.

This distinction between the bulkOperationRunQuery mutation and the bulk query string itself determines how rate limits apply as well: any GraphQL requests made by you count as normal API requests and are subject to rate limits, while the bulk operation query execution is not.

In the following example, you would be charged the cost of the mutation request (as with any other mutation), but not for the query for product titles that you want Shopify to run as a bulk operation:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

mutation {
bulkOperationRunQuery(
query: """
{
products {
edges {
node {
title
}
}
}
}
"""
) {
bulkOperation {
id
}
}
}

Since you're only making low-cost requests for creating operations, polling their status, or canceling them, bulk operations are a very efficient way to query data compared to standard pagination queries.


Anchor to Operation restrictionsOperation restrictions

A bulk operation query needs to include a connection. If your query doesn't use a connection, then it should be executed as a normal synchronous GraphQL query.

Bulk operations have some additional restrictions:

  • Maximum of five total connections in the query.
  • Connections must implement the Node interface
  • The top-level node and nodes fields can't be used.
  • Maximum of two levels deep for nested connections. For example, the following is invalid because there are three levels of nested connections:

POST https://{shop}.myshopify.com/api/{api_version}/graphql.json

GraphQL query

{
products {
edges {
node {
id
variants { # nested level 1
edges {
node {
id
images { # nested level 2
edges {
node {
id
metafields { # nested level 3 (invalid)
edges {
node {
value
}
}
}
}
}
}
}
}
}
}
}
}
}

The bulkOperationRunQuery mutation will validate the supplied queries and provide errors by using the userErrors field.

It's hard to provide exhaustive examples of what's allowed and what isn't given the flexibility of GraphQL queries, so try some and see what works and what doesn't. If you find useful queries which aren't yet supported, then let us know on the .dev Community so we can collect common use cases.



Was this page helpful?