EN VI

Java - SolrJ deleteById does not deletes data in Solr?

2024-03-16 21:00:05
How to Java - SolrJ deleteById does not deletes data in Solr

I've a Solr collection having 6 shards based on years - 2019 to 2024. I use this method to delete some documents in this collection :

invoke(() -> solrClient().deleteById(collectionName, ids ));

but this does not actually deletes the documents for corresponding Ids even after waiting for a day. However this below method works and deletes documents instantly.

invoke(() -> solrClient().deleteById(collectionName, ids, 1000 ));
         try {
             solrClient().commit(collectionName);
         } catch (SolrServerException e) {
             throw new RuntimeException(e);
         } catch (IOException e) {
             throw new RuntimeException(e);
         }

can someone please explain me what's going on here and what's the significance of commitWithinMs value that I'm using here as 1000. I'm not sure if should keep this value as 1000ms or increase it.

I'm using Solr version 8.9

I tried passing commitWithinMs parameter value as 1000 in deleteById method and did the commit at the same time and it worked but I thought Solr does autocommit and I can see Autocommit time passed in SolrConfig.xml

   <autoCommit>
            <!-- in ms, our setting is 10 min -->
            <maxTime>600000</maxTime>
            <maxDocs>100000</maxDocs>
            <openSearcher>false</openSearcher>
        </autoCommit>

Also just passing the commitWithinMs is not sufficient, I've to do the commit explicitly just after I invoke the deleteByID method

Solution:

In Apache Solr, the process of adding, updating, or deleting documents involves two main steps: sending the changes to Solr, and then making those changes visible by committing them. The commit operation is what actually persists the changes to the disk and makes them searchable. However, committing is an expensive operation in terms of I/O, and doing it too frequently can negatively impact Solr's performance. This is where the concepts of commitWithinMs and auto-commit come into play, and understanding them can help you manage the balance between data visibility and system performance.

Understanding commitWithinMs

The commitWithinMs parameter specifies that the changes (in your case, deletions) should be committed to the index within the given number of milliseconds. When you call deleteById(collectionName, ids, 1000), you're essentially requesting that Solr commits these deletions within 1000 milliseconds (1 second) of receiving them. This is a way to suggest to Solr that it should try to make the changes visible soon, but without forcing an immediate commit.

However, it's important to note that commitWithinMs is a suggestion to Solr and not a strict guarantee. The actual commit might happen slightly later than the specified time, depending on the server's load and the settings in solrconfig.xml.

Auto-Commit Feature

Solr's auto-commit feature is designed to automatically commit changes after certain conditions are met, such as a specified time interval (maxTime) or a certain number of changes (maxDocs). In your solrconfig.xml, the auto-commit is set to trigger every 10 minutes (600000 milliseconds) or after 100,000 documents have been changed. This feature ensures that changes become visible in a timely manner without requiring manual commits, which can improve performance by batching multiple changes into a single commit operation.

Why Explicit Commits are Still Needed

Even with commitWithinMs and auto-commit configured, there are scenarios where you might want to explicitly commit changes. For instance, if you need certain changes to be immediately searchable, waiting for the next auto-commit cycle might not be acceptable. This is likely why your deletions are only effective when you explicitly call commit after using deleteById with commitWithinMs.

Explicitly committing after deletions ensures that the changes are made visible immediately, but it should be used judiciously to avoid performance issues.

Recommendations

  1. Use commitWithinMs Judiciously: While specifying a commitWithinMs value can help ensure that your deletions are committed in a timely manner, relying solely on this without understanding the implications on performance can be problematic. It's a helpful parameter for operations where you have flexibility on exactly when the changes become visible but want to suggest a timeframe.

  2. Understand Your Application's Requirements: If immediate visibility of changes is crucial for your application, then following up deletions with an explicit commit, as you've found to work, is necessary. However, if your application can tolerate a slight delay, relying on auto-commit could improve overall performance.

  3. Tuning Auto-Commit Settings: Consider your application's specific needs and adjust the auto-commit settings in solrconfig.xml accordingly. If your update volume is high and updates are frequent, you might want to adjust the maxTime and maxDocs settings to ensure a good balance between visibility of changes and performance.

  4. Monitoring Performance: Pay attention to how these settings impact your Solr cluster's performance. Adjusting these settings might require some trial and error to find the optimal balance for your specific use case.

In conclusion, the use of commitWithinMs and explicit commits should be tailored to the needs of your application, keeping in mind the trade-offs between immediate data visibility and the performance impact of frequent commits.

Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login