Drupal
min read
April 5, 2024

Managing large record sizes in Drupal and Algolia Integration

Managing large record sizes in Drupal and Algolia Integration
Table of contents

Imagine having a big homepage on our site, and when we put all the text from different fields together into 'aggregated_text_field,' the total character count goes over 10,000. Then there is a good chance that the home page won’t get indexed in Algolia as there is a 10 KB size limit per record In the free plan. Even if you switch to a paid plan, it's not a good idea to keep so much text in one record. Algolia recommends to split large records into smaller ones.

Thankfully, this is already being discussed in the issue queue of the search api Algolia module and there is a patch available. Let’s apply the patch first and see what it does.

Before we start:

  • For those new to this series, reading both part 1 and part 2 is mandatory. Part 1 discusses important concepts related to Algolia and Algolia Drupal integration. Part 2 explains the different approaches for structuring the content from the Drupal backend before indexing.
  • Make sure you have the same search api ‘index’ configuration (Same fields and machine names) as created in part 1 and part 2.
  • Make sure that you have indexed all contents from ‘demo_umami’ profile with those configurations.
  • Make sure that you have done all necessary Algolia dashboard configurations as described in part 1 and 2.
  • Make sure you have a copy of the code from this repository.

The patch we applied adds a search api processor that helps in splitting the large records. To configure the processor:

  • After applying the patch, go to ‘admin/config/search/search-api/index/demo_umami/processors’. 
  • You will see a new processor named “Algolia item splitter”.
Algolia item splitter

  • Enable the processor. Scroll down to the configuration section of this processor and enable it only for the “Aggregated text field” field. Give ‘500’ as ‘Maximum characters’.
configuration for Algolia aggregated text field
  • The patch creates a new search api processor that checks the text in ‘aggregated_text_field’. 
  • If there are more than 500 characters, a new record will be created and the remaining part of the text will be stored in the new record (With 500 character limit). So, big records would get splitted into multiple smaller records. 
  • The patch also adds a new parameter  “parent_record” to all records.
  • The original record will have ‘self’ as the value in this field, and split records will have ‘node_id:language_code’ as the value. 
  • We can use this field to distinguish between actual records and split records.
  • Index the contents after applying the patch and check the Algolia dashboard. You will notice the increase in record count as records were splitted. 
records splitting on Algolia dashboard

The consequence of splitting records

By splitting the records, we have created multiple records for the same content. If you visit the search page now, you will notice that the results are duplicated.

duplicate search results
  • Each duplicated item is a split of the original record. But we only want to show one result from a node in the search page.
  • To fix the duplication, we should configure ‘attributeForDistinct’ from the Algolia dashboard.
  • Assume that a node was split into 5 records. All the 5 records should have a common attribute with the same value so that Algolia understands those are splits of the same record.
  • The search api module adds a ‘search_api_id’  parameter to all the records. Records (Original one and its splits) of a node will have the same value in the ‘search_api_id’ field. Adding this field as ‘attributeForDistinct’ will enable Algolia to display only one item from all the records. 
  • We can also use ‘Title’ as  ‘attributeForDistinct’.
  • Go to “Deduplication and Grouping” in the Algolia dashboard.
  • Mark ‘Distinct’ as True.
  • Add  “search_api_id” as “Attribute for Distinct”.
  • Save
Deduplication and Grouping in the Algolia dashboard
  • Go to the Algolia dashboard and search for “World Chocolate Day”. 
  • You will only see one result with objectId ‘entity:node/12:en’ (12 is the nid of the article and en is the language code). This is the original record of the article “Dairy-free and delicious milk chocolate”. 
  • The article “Dairy-free and delicious milk chocolate” also  briefly mentions  “almonds and hazelnuts” in the final paragraph. 
  • So search for ‘almonds and hazelnuts’ in the dashboard. You will still see only one record in result in the dashboard and it will be a split of the original record.
split of the original record in dashboard

dashboard result record n split
  • Visit the search page we created and search for the same keywords. 
  • You will see the same behavior. There won't be any duplication and  Only one record from the node will be displayed in results. Yay! We fixed the duplication.

 Few more essential dashboard configurations

You have to do another crucial configuration in the dashboard. Since records are splitted, when we delete a node, all the records associated with that node should also get deleted including the splitted records. The same should happen when a node is updated as well. The patch we applied handles this but the ‘parent_record’ field should be configured as a filter from the Algolia dashboard for this to work.

essential dashboard configurations

 

Now, if a node is updated/deleted, it’s splits would also get updated/deleted 🎉.

We have to make one more change. Visit the search page we created and take a look at the results.

node and splits updation and deletion

The description will be cropped in each result card because we are getting the splitted record from Algolia even when there is no search term. This can be fixed by updating the js and adding a filter ‘parent_record = self’ when there is no search query. 

  • Remove the configure widget we added earlier in this series and add the following code.
updaion in code

  • The change is self explanatory.If there is no search query, a filter to show only parent records would be added along with the language filter.
  • Visit the search page again. 
search page
  • The search page will now show only the original record when there is no search query.

We have now successfully built a fast and responsive search experience for the Umami profile using Algolia. Let’s recap what we have learned in this blog series.

  • Explored fundamental concepts and dashboard configurations of Algolia.
  • Understood multiple approaches for structuring the site’s content before indexing.
  • Built custom search api processors for optimizing the content.
  • Designed an intuitive and ‘language aware’ ’search interface using instantsearch.js.
  • Understood how to create custom widgets using instantsearch.js
  • Learned about splitting records for content rich websites.
  • Understood the concepts related to deduplication and grouping in Algolia.

As mentioned earlier in part 1 of this series, ‘search’ has become an integral part of all the modern websites. Whenever users see a search bar on your website, they expect it to be as intelligent as modern search engines. Algolia delivers this with its AI-powered capabilities and flexible APIs. I hope this blog series has given you a clear path to follow for integrating Algolia and Drupal. The complete code is available in the git repository.

Written by
Editor
No art workers.