This page contains information about how we're organizing our extension point framework. It is a work in progress:
In order to facilitate the use of Sando as a research framework we have created a set of extension points that expose the most commonly studied sub-problems of code search, namely splitting, parsing, indexing, querying, and search result ranking. Each extension
point (EP) is detailed below. See
for a diagram.
The Splitter EP allows researchers interested in program identifier splitting
replace Sando’s default splitting algorithm. This allows them to investigate how to split more
challenging identifiers such as FILEMANAGER and the effect of improved splitting on overall
The Parser EP allows researchers to control how program elements, such as a
method, are parsed, ultimately effecting how they are indexed and searched. Researchers
interested in alternative parsing strategies (e.g., parsing using NLPA
) can change the default
parser to tag all identifiers with their part-of-speech, possibly leading to increased precision
This EP allows researchers to add new program elements, or variations on existing program elements, into the framework. This EP will enable, among other things, the addition of fields to existing program elements and the creation of
different families of program elements. Note that most users of this extension point will also have to implement the Parser EP in order to parse these program elements.
The Query EP gives users access to two interfaces: (1) query weights for
each field in the indexed documents, and (2) the query string itself. By using the first
interface, Query EP researchers can increase the importance of a given field in the search
results, such as weighing the method parameters more highly than the method body. The second
interface allows a researcher to rewrite the user’s query string, to, for instance, augment queries
with part-of- speech information, possibly leading to more precise results.
Search Engine EP:
The Search Engine EP allows researchers to re-order the top 100 text-based search
results according to any arbitrary scheme. A researcher could re-rank results according to how connected they are in the call graph or to implementing Posynov’s
activation spreading approach.
- Create a new project SandoExtentionContracts, which will contain interfaces, base classes (contracts) for all the classes for which we want to enable the extension framework.
- Create interfaces (or move the existing one - like the ParserInterface) in a new project.
- Create an ExtensionPointsRepository class, which will contain methods for registering extension points, like RegisterParserImplementation and for getting the implementations of the extension points. This class should also ensure that the default implementations
will be used if none custom provided. Class will be created as a singleton.
- Call the method for all the extension-enabled types defined in our plugin code during the initialization.
- Create an XML extension configuration file with a simple matching: Interface -> Implementation - something like:
<contract name="Sando.ExtensionContracts.ParserInterface" supportedFileExtensions=".h, .cpp">
<implementation name="Sando.Parser.CustomParser" libraryFile="CustomParser.dll"/>
Adding custom extensions
- Implement reading information from this file by ExtensionPointsRepository
- For every entry found in the configuration file, load the Assembly from the dll (Assembly.LoadFile method), look for all the known types (Assembly.GetTypes) and call the RegisterImplementation with the appropriate parameters.
- At this step we have working plugin with configuration, yet without any 3rd-party components.
- Download SandoExtentionContracts library and reference it in your project.
- Create custom implementation for possible extension contracts.
- Place your dll inside the pluginDirectoryPath directory.
- Change the configuration file to use a custom implementation of the given interface instead of the default one.
- The ExtensionPointsRepository should do the rest