This document will present the big picture of how data is indexed and searched in SES.
SES replicates the Shopware Bundle structure to a certain level:
ESIndexingBundle: Services for indexing dataSearchBundle: General components which are not necessarily bound to ElasticSearch such as Facet and FacetResult objectsSearchBundleES: Components specific to ES, such as FacetHandlerServices specific to SES can be found in EnterpriseSearchBundle:
AlternativeTerm: Performs an additional ES search in order to find alternative search terms for the given search termDictionary: Provides language specific dictionaries used during indexingExplain: Enables the ES "Explain" functionality for the preview search in Shopware's backendHistoryBoosting: Defines the amount of boost for certain fieldsImportExport: Import/Export functionality for settingsSearchConfig: Representation of the backend search configuration (e.g. relevance, boosting…)Session: Session Wrapper for the Shopware Session which is easier to inject / testSynonymSearch: Perform searches for synonymsIndexing describes the process of making data from Shopware available in ElasticSearch and keeping it up to date.
SES adds various content pages to the Shopware search. For that reason, it provides additional indexer for blogs,
shopping worlds, categories, static pages, manufacturers and synonyms. All of these services can be found in
SwagEnterpriseSearch\Bundle\ESIndexingBundle. The main entry point of each of these components is the so called DataIndexer,
which is registered in the DI container with the tag shopware_elastic_search.data_indexer. It will either index all entities
of a given type (e.g. blogs) in the populate method for full updates or just index certain entities of a given type in
the index method for partial updates.
Usually every DataIndexer will have a method called createQuery which reads all affected IDs for the full index. The
Provider service is then used to read the actual data for that entity, e.g. "name", "author" and "content" for a blog.
At this point every Provider needs to make sure, that all relevant information for the frontend are indexed into ElasticSearch,
so that no additional queries are needed in the frontend in order to fetch e.g. URLs, images etc.
Also every component has a Mapping service registered to the DI tag shopware_elastic_search.mapping. It provides
the ElasticSearch mapping data, such as "id is an integer" or "description is an english text field".
The so called SuggestionBuilder of each component is responsible for providing the search terms each entity will match
to and also provides the suggestions being shown as "search term suggestions" of the ajax search. Usually the SuggestionBuilder
will make use of the SuggestionStringExploder service, which will split compound words into individual words based
on dictionaries.
By default Shopware provides commands such as sw:es:index:populate (full index) and sw:es:backlog:sync (partial update
usually run by a cronjob). These will automatically apply for SES as well. Full indexes are handled by the populate method
of the DataIndexer services, partial updates are handled by the synchronize method of the Synchronizer services.
Shopware will pass all current indexing backlog entries to these services which will then extract the IDs of those entries,
which are handled by the current services. So the BlogSynchronizer will only handle backlogs of the type blog. The
extracted IDs will then be passed to the index method of the DataIndexer.
In order to recognize which entities needs to be re-indexed, the class SwagEnterpriseSearch\Subscriber\ORMBacklogSubscriber
will register to all lifecycle events of the handled content types (blogs, categories etc) and write the corresponding
backlog entries for those.
If immediate indexing is enabled, SES will index changed entities right away and not wait for a cronjob to run.
return [
'db' => [...],
'es' => [
'immediate_index' => true;
...
],
];
The following section describes, how SES extends Shopware in order to make the content search available and how SES applies the search configuration.
All additional information provided by SES (such as suggestions and content pages) are added by the SwagEnterpriseSearch\Bundle\SearchBundle\SuggestionFacet
and its handler SwagEnterpriseSearch\Bundle\SearchBundleES\SuggestionHandler. The handler adds
the corresponding suggestions queries to the main ElasticSearch query in the handle method and hydrates the
results into SwagEnterpriseSearch\Bundle\SearchBundle\SuggestionFacetResult. For that reason SuggestionFacetResult
contains all non-product search results such as search suggestions, blogs, manufacturers, categories, shopping worlds
and static pages.
The backend configuration of the search (such as relevance fields and boostings) are applied by SwagEnterpriseSearch\Bundle\SearchBundleES\SearchQueryBuilder.
This service decorates the default shopware_search_es.search_term_query_builder service and also adds the "auto suggest"
and the "history boosting" functionality.
So roughly speaking the SuggestionFacet and its handler are responsible for the suggest search (including content suggestions),
the SearchQueryBuilder is responsible for extending the default product search by the SES features and configurations.
The main entry point of the ajax search is SwagEnterpriseSearch/Controllers/Widgets/Suggest.php. It triggers
a search using Shopware\Bundle\SearchBundle\ProductNumberSearchInterface::search after it added the SuggestionFacet
to the Condition object. The rest is handled by SearchQueryBuilder and SuggestionHandler as described above.
As quick responses are key for a "search as you type" functionality, the ajax search disables the template engine and prints
out a JSON representation of the search directly. The actual rendering of the results to the search overlay is performed
by the JavaScript stack.
The search result page generally operates by the same patterns: The SuggestionFacet is added by the
SwagEnterpriseSearch\Bundle\SearchBundle\CriteriaRequestHandler, so all content hits are also available by the SuggestionFacetResult.
Additionally it adds a SwagEnterpriseSearch\Bundle\SearchBundle\SynonymFacet to the Criteria object. The corresponding
SwagEnterpriseSearch\Bundle\SearchBundleES\SynonymHandler will then add a SwagEnterpriseSearch\Bundle\SearchBundle\SynonymFacetResult
to the result, if a matching synonym group for the current search term was found. The SynonymFacetResult is then used
to display shopping worlds or product streams for the current search, if configured.
You should notice, however, that SES replaces the default search controller in SwagEnterpriseSearch/Controllers/Frontend/Search.php.
This is needed, as configured SynonymGroups might replace the entire search result with a product stream or redirect the
user to another page, if RedirectURL was defined. For that reason, SES will perform a lookup for a matching SynonymGroup
before hands and only triggers the default search as described above, if no redirect and no product stream are configured
for the current SynonymGroup.
Dealing with ElasticSearch there are usually concerns handled while indexing and concerns handled during the actual search. Roughly speaking you rather want your indexing to be slow than your search. For that reason, compound words, ngrams and synonyms are dealt with while indexing. Concerns such as relevance, boosting, auto suggest and history boosting are applied while searching as described above.