Extension framework

Apr 5, 2012 at 10:57 PM
Edited Apr 5, 2012 at 11:08 PM

I've decided that I create a discussion for the extension framework, as there are more formatting options here and I can write something more readable.

I'd like to make a document, reviewed by at least 3 of us, before staring the implementation.

Here is my idea - leave as many thoughts/ comments here as you can so that we design something really useful.

Interfaces contracts

  1. Create a new project SandoExtentionContracts, which will contain interfaces, base classes (contracts) for all the classes for which we want to enable the extension framework.
  2. Create interfaces (or move the existing one - like the ParserInterface) in a new project.

ExtensionManager

  1. Create an ExtensionManager class, which will have the generic method RegisterImplementation<T>(typeof(T)) - an example of usage can be RegisterImplementation<ParserInterface>(typeof(SrcMLParser)). This class probably can be created as a singleton.
  2. Call the method for all the extension-enabled types defined in our plugin code during the initialization.

Configuration file

  1. Create an XML extension configuration file with a simple matching: Interface -> Implementation - something like:
    <configuration pluginDirectoryPath="./Plugins">
        <contract name="Sando.ExtensionContracts.ParserInterface" language="C++">
            <implementation name="Sando.Parser.CustomParser" libraryFile="CustomParser.dll"/>
        </contract>
    </configuration>
  2. Implement reading information from this file by ExtensionManager
  3. For every entry found in the configuration file, load the Assembly from the dll (Assembly.LoadFile method), look for all the known types (Assembly.GetTypes) and call the RegisterImplementation with the appropriate parameters. 
  4. At this step we have working plugin with configuration, yet without any 3rd-party components.

Adding custom extensions

  1. Download SandoExtentionContracts library and reference it in your project.
  2. Create custom implementation for possible extension contracts.
  3. Place your dll inside the pluginDirectoryPath directory.
  4. Change the configuration file to use a custom implementation of the given interface instead of the default one.
  5. The ExtensionManager should do the rest

 

 

Questions

  1. What can be customized? I think different parser returns different types of data - it can be an xml, collection of custom classes etc, so we need to assume that the real returned data is the collection of ProgramElement subclasses - that means that adding a new ProgramElement subclass requires adding a new parser, right?
  2. So - is it possible to add a field to a ProgramElement subclass, without implementing a new subclass? How?
  3. So - is it possible to add a new ProgramElement subclass that will be indexed, without changing the parser? How?
  4. Why do we need to enable custom word splitters? Does it also mean choosing the fields on which the splitter will do some operations? Does it mean enabling the user to chose if he wants to also store the original value for each field?

That's it for now - I'll update it whenever a new question/idea comes to my/yours mind - please write everything that you can think about

Apr 6, 2012 at 3:28 PM

First off, I think your idea makes great sense to me. Now about the questions.

>1. What can be customized? I think different parser returns different types of data - it can be an xml, collection of custom classes etc, so we need to
>assume that the real returned data is the collection of ProgramElement subclasses - that means that adding a new ProgramElement subclass requires
>adding a new parser, right?
>2. So - is it possible to add a field to a ProgramElement subclass, without implementing a new subclass? How?
>3. So - is it possible to add a new ProgramElement subclass that will be indexed, without changing the parser? How

Contributed parsers should be dependent on the ProgramElements we specify, but not the opposite. Adding a new ProgramElement shouldn't interfere with the existing parsers. Also, one should be able to define a parser that only produces some program elements (e.g. only CommentElements) and provides incomplete program element data (e.g. no access types on the methods).

I think we should be very conservative about making changes to our ProgramElements, as we risk making our tool too complex. But if we do make changes, we should pay close attention that the addition should be backwards compatible, so that existing parsers should keep working. 

Another option here is to allow an extension point for specifying the subtypes of ProgramElement. In my mind, this is too much flexibility.

Apr 6, 2012 at 6:57 PM
Edited Apr 6, 2012 at 6:57 PM

One more thought: we may need to loosen up some of the contracts in the ProgramElements. For example, this code:

enum { IDD = IDD_PLAYMP3_DIALOG };

is legal C++, but I cannot create a EnumProgramElement with an empty name.

-Kosta

Apr 9, 2012 at 6:31 PM

Hi Bartek,

First of all, your scheme above works great in terms of setting up the ExtensionManager and reading extensions from XML.  Let's go with that approach.

Kosta and I just met today in person so that we could flesh out exactly what extension points we should implement.  I think this question is what is causing most of the confusion.  So, take a look at: http://sando.codeplex.com/wikipage?title=Extension%20Point%20Framework and see what you think.  We can discuss it here and make updates there. 

Apr 10, 2012 at 8:33 PM

I'm glad to hear you like my approach, however I'd like to clarify several thing- I think it will be better for us to enable registering custom ProgramElement subclasses - I can't think of any reason why not to do that. Moreover, I don't know how we will be able to add custom fields without adding a new ProgramElement subclass...
For me - the only way to do that is to store a dictionary of strings (named customFields), which will be added to the indexed document after the "normal" fields.  However - to add a field to this dictionary you will have to somehow call a custom method (providing parameters - like Body for SummarizedBody field) which will calculate the field value and return it, along with the field name. 
It can be possible if we use a new class, implemented by the users or if we introduce an expression language, which will be used to define the formula for the field calculation, but (I may not see the simplest solution) for me it seems that the custom ProgramElement objects will be the easiest to test, to implement and to inject into our plugin.

If we really want to let the users decide which existing fields or ProgramElement subclasses they want to use, we should move it to the plugin configuration screen and add an if statement before the field usage (before adding to the index document and reading from the index document), which will use the configuration flags as a condition.
This way, however, we will lose the contract checking (or we will have to create a special default value for each field, which will mean "indexing disabled")

To summarize - I think we can provide the way to omit the ProgramElement subclasses or/and some of the fields during the indexing using the configuration screen, but if you want to add anything or change the way the field or class works, you have to implement a custom subclass.

 

Tell my if I misunderstood/missed something

Apr 10, 2012 at 9:16 PM

OK, so my understanding of lordlothar's most recent comments is that he is wondering how we should implement the following extension point:

http://sando.codeplex.com/wikipage?title=Extension%20Point%20Framework > Indexer EP

This extension point is concerned with adding fields to an existing program element.  His main thoughts are as follows (if I understand correctly):

  1. The easiest way to implement the addition of new fields is to allow clients to subclass ProgramElement or ProgramElement subclasses.
  2. He is not sure why we would want to let users alter existing fields and feels that this would add unnecessary complexity.

For point #1 I agree, the easiest way to implement an extension point that allows users to add fields to a ProgramElement (e.g., adding a "summary" field to MethodElements) is to allow users to contribute subclasses.  My only followup question is whether he was thinking we should allow subclasses of all of the ProgramElement types (e.g., MethodElement, ClassElement, etc.) or not.  

For point #2 I also agree, the complexity of allowing users to alter existing fields is not worth the benefit.  We should only allow them to add fields.

Lordlothar,

Let me know if I've understood your comments correctly (I was honestly a bit confused) and let's discuss any of the other extension points if you have any questions for those.  Not sure if you saw this document as well (http://www.codeplex.com/Download?ProjectName=sando&DownloadId=365745) but it may help visualize what we were thinking in terms of extension points.

Apr 11, 2012 at 2:31 AM

I think lordlothar is suggesting the replacement of the Indexer EP with an extension point allowing the registration of custom PEs. I think this could be another reasonable way to go. One advantage that it has is that all EPs will only depend on PEs, which is nice for constancy. Creating summarized method body would involve subclassing MethodElement (to e.g. SummarizedMethodElement) and using one of the Parser EPs to either generate (by parsing) new SumarizedMethodElements, or to alter existing (pre-parsed) MethodElements into SummarizedMethodElements.

Am I correct?

The plugin configuration method of disabling PEs seems not necessary to me. I think that this can be done via the Parser EP by catching all ProgramElements of a particular type and removing them. Of course, that isn't exactly end-user friendly, but we don't anticipate the end-user adding PEs either.

Apr 11, 2012 at 9:05 AM

I'm sorry I made this unclear - I'll try to explain my point of view once again. Let me use user use-cases for that and describe the answers (as I'm thinking about them).

  1. As a user I would like to be able to select fields/PEs I'd like to index.
    This can be done (if we need it) by the configuration screen (checkboxes) - I agree this setting is not necessary, unless we want that for a non-developers.
  2. As a user I would like to be able to add a new Splitter/Parser/... - this is clear for all of us - no discussion needed.
  3. As a user I would like to be able to add a new field to the index - this is the most important point.
    This must be done be adding a new PE subclass and adding a new parser if needed (I think this will work like Kosta said).
    Example 1 - the SummarizedMethodElement can contain a logic to parse the Body field and create a new SummarizedBody field (the parsing can be done inside the SummarizedBody getter or the SummarizedMethodElement constructor - user decision) - no new parser needed.
    Example 2 - OperatorCounterElement can contain a dictionary of key-value pairs, where the key is an operator ("=", "+", ...) and the value contains a number of how many operators of the given type exists in the file/class/... - this will require adding also a parser, because our parsers remove the operators from the code.
    This means the Indexer EP will change into ProgramElement EP

One comment on the Query EP - I think the query weights should be done by the configuration (no need for the extension point). 
The custom string queries can be also enabled by the configuration - we will have to (based on the setting) create a SearchCriteria object from the screen data or pass the custom query (the search screen will be different for each option - simple input for the second, advanced search for the first) - also no need for EP here in my opinion.

And finally - one question about the SearchEngine EP - this one will be added after List<CodeSearchResult, score> is returned from the search?

Apr 11, 2012 at 2:49 PM

OK, I think we're getting very close to a mutual understanding.

>As a user I would like to be able to select fields/PEs I'd like to index. This can be done (if we need it) by the configuration screen (checkboxes) - I agree this setting is not necessary, unless we want that for a non-developers.

This is the right approach and a good idea but IMO is not as high a priority as the extension points we have proposed.  After the extension points are complete we should implement this.

> As a user I would like to be able to add a new field to the index - this is the most important point.

Agreed!  ;)

I've updated http://sando.codeplex.com/wikipage?title=Extension%20Point%20Framework, changing the Indexer EP into the ProgramElement EP and rewriting its description.  Basically, the ProgramElement EP allows you to add new subclasses of ProgramElements (or MethodElements or ClassElements...), thus allowing you to add fields.  However, users of the ProgramElement EP only specify the subclass here and must also implement the Parser EP in order to specify how the custom ProgramElement EP is parsed.  Otherwise, the default parsing scheme will be used and the added fields will remain empty.  Does this make sense?

> One comment on the Query EP - I think the query weights should be done by the configuration (no need for the extension point). 

Yes, let's still call it the Query EP (for consistency's sake) but let's provide the ability to do alter this via configuration.   

> The custom string queries can be also enabled by the configuration... 

I don't understand how this could be done via configuration.  I had the idea that we may do some heavy lifting during this phase.  For example, given an input string we may perform natural language analysis to try and determine which words are likely nouns and which are likely verbs.  Thus, we will need an extension point where we can contribute a QueryRewriter that takes a string and outputs an altered string.  Please note that this phase ONLY alters the text of the string (e.g., "play file" -> "playVerb fileNoun").  It is completed *before* the weights and field information are included in the final query string by Sando, as a preprocessing of the query.

> And finally - one question about the SearchEngine EP - this one will be added after List<CodeSearchResult, score> is returned from the search?

Yes.  The point of this EP is so that researchers or developers can rerank the search results as they see fit.  For instance, they could alphabetize the returned results or they can do more complicated re-ranking schemes.  This EP will require a configuration option so that the indexer and search engine return a configurable amount of results (i.e., the top 100 instead of just the top 10, as it is now).

Hope this makes sense...

Apr 11, 2012 at 3:05 PM

I have only one issue in what is being agreed upon:

> One comment on the Query EP - I think the query weights should be done by the configuration (no need for the extension point). 

I think adding query weights as an extension point would be useful to developers extending Sando. I think we all agree that query weights are important, and I think that they are more likely to be tweaked by extension developers than end-users (I'm assuming the end-user will be the target audience for the configuration screen). We can provide the ability to do both, but I really think developers need this ability.

Same thing about the query rewriting EP. 

Apr 11, 2012 at 3:23 PM
Edited Apr 11, 2012 at 7:59 PM

 

> However, users of the ProgramElement EP only specify the subclass here and must also implement the Parser EP in order to specify how the custom ProgramElement EP is parsed. Otherwise, the default parsing scheme will be used and the added fields will remain empty.

I'm not sure if I understand what you're saying - in my opinion user CAN only subclass the PE (or its subclass) if the data from the parser contains enough information for him (like for the SummarizedBody field you only need the Body field - no need for a custom parser) or MUST implement custom parser if needs a data which are omitted during parsing (not returned within any field by the Parse method).

There is one issue with the first approach (maybe that's what you were trying to tell me, but I wasn't able to understand) - without a custom parser, there is no way to add a custom PE to the collection that is indexed - I can see several solutions for that, but none of them is an easy one, so can we assume for now, that a user MUST add a custom parser with a new PE, otherwise the PE won't be used?

 

> I don't understand how this could be done via configuration. I had the idea that we may do some heavy lifting during this phase. For example, given an input string we may perform natural language analysis to try and determine which words are likely nouns and which are likely verbs. Thus, we will need an extension point where we can contribute a QueryRewriter that takes a string and outputs an altered string. Please note that this phase ONLY alters the text of the string (e.g., "play file" -> "playVerb fileNoun"). It is completed *before* the weights and field information are included in the final query string by Sando, as a preprocessing of the query.

Ok - so the input for this point is a string from the user (keywords), while the output is again the query, but with some additional information for the Lucene index search? Or is the input from the user (keywords) and output only for the user (logged, bot not used in the search process)? Please - you need to describe it more briefly for me...

 

>Yes. The point of this EP is so that researchers or developers can rerank the search results as they see fit. For instance, they could alphabetize the returned results or they can do more complicated re-ranking schemes. This EP will require a configuration option so that the indexer and search engine return a configurable amount of results (i.e., the top 100 instead of just the top 10, as it is now).

So won't it be better if we return the results from the search as IQuerable to enable direct filtering/sorting/querying using LINQ, without unneeded memory/performance issues related to the operations on the lists?

The interface for this EP can look like this:
IQueryable<PE_with_score_and_other_info_class> ReRankResults (IQueryable<PE_with_score_and_other_info_class> results)

 

 

> I think adding query weights as an extension point would be useful to developers extending Sando. I think we all agree that query weights are important, and I think that they are more likely to be tweaked by extension developers than end-users (I'm assuming the end-user will be the target audience for the configuration screen). We can provide the ability to do both, but I really think developers need this ability.

You've convinced me - let's leave that for the developers also.
One question - for which part of the query should we enable custom weights? I think it should be done for the usage types (Sando.Indexer.Documents.SandoField enum) - if you agree, I'll try to think about the interface method for this operation

Apr 11, 2012 at 9:45 PM

> There is one issue with the first approach (maybe that's what you were trying to tell me, but I wasn't able to understand) - without a custom parser, there is no way to add a custom PE to the collection that is indexed.

Yes, this is what I was trying to say...  that unless you also make them index a custom parser then we'd have to add another extension point to make sure the new field was populated (and later indexed).  I also thought that any other solution might be too complicated, which is why I thought we force them to implement a custom parser as well.  Note that the custom parser can just be a subclass of an existing parser that "parses" a new field.  In the running example a SummarizedBody field would be created by the new parser

Ok - so the input for this point is a string from the user (keywords), while the output is again the query, but with some additional information for the Lucene index search?

Yes.  Input is a string (keywords) and the output is a string (altered or added to keywords) that is again the query, but, as you say with some additional information for the Lucene index search?

> So won't it be better if we return the results from the search as IQuerable to enable direct filtering/sorting/querying... 

Hahah, this is why you are the dev lead ;)  I think you are right, this kind of interface is better.  Here's a use case so you can verify for yourself if your approach enables this use case.  I might want to rerank search results based on how connected they are to other search results.  So, I will build a call graph for the entire program, and then use this call graph to increase the score of results who are connected to other items in the result set.  If your approach can support this then we should use your approach.

> One question - for which part of the query should we enable custom weights? I think it should be done for the usage types (Sando.Indexer.Documents.SandoField enum) - if you agree, I'll try to think about the interface method for this operation

Yes, we should enable custom weights for all of the members of the SandoField enum.  We should start with this.  We may also (in the future) want to enable weights for custom fields (i.e, fields added by the ProgramElement EP).

Apr 11, 2012 at 10:01 PM
Edited Apr 11, 2012 at 10:02 PM

> I might want to rerank search results based on how connected they are to other search results.

Let me use pseudo code for that, so that I'm sure what you wanted to say:

IQueryable<SearchResult> searchResults = getResults();
CallGraph callGraph = CallGraphBuilder.BuildCallGraph(searchResults);
searchResults = searchResults.ForEach(s => s.Score += callGraph.GetResultWeight(s));
searchResults = searchResults.OrderBy(s => s.Score, SortOrder.Desc);

In general - IQueryable is the same as a list and can be easily converted to a list, but the main advantage is that all the sorting, filtering or/and ordering operations are converted to a single (sometimes complex, but still only one) LINQ expression which is executed over the data set only once - that's why I thought we can do that this way - we have two tasks in the backlog for filtering and sorting the results.

 

> We may also (in the future) want to enable weights for custom fields (i.e, fields added by the ProgramElement EP).

Good point! I need to think about more generic way to enable it than just using SandoField enum, so that we can use the code in the future

Apr 12, 2012 at 3:01 AM

> Let me use pseudo code for that, so that I'm sure what you wanted to say:

The pseudo code you posted is basically what I was expecting so your suggested solution looks good.

FYI, when you start working on this extension point system please feel free to make issues for me to complete.  Since the extension point system is more of an implementation concern I think you should take the lead on it.  Feel free to add issues and I'll do my best to complete them.  Otherwise I'll plan to contribute by writing lots of tests and example extension points to test the extension point system.

Apr 12, 2012 at 8:52 AM

Dave, I've started implementing ExtensionManager class in the Core project (I've already created the extension project and I've moved some of the classes, interfaces there) - I've decided that I won't use a generic RegisterImplementation<T> method as I was planning, because for some of the EP there will be additional arguments (like a language for the parser) - I'll just create a single method for every EP. I'm also planning to implement reading the configuration file, along with the appropriate design of an xml structure for this file.

I also have to create a class with a default contract implementations, so that we can use our class when the user didn't provide any or his class cannot be used (exception when creating a new instance etc).

When it's finished, I'll try to inject it into the application so that we use the instance of this class in all the places where we need an object from extensions. I'll also create all the issues that needs to be fixed and the ones for the testing.

I think it will be a good idea to create a new component for those (like Extensions) and both two of us will work on the issues from this component - what do you think? We can move the existing issues to this component as well.

Anyway - I think I'll commit something useful today so that you can start coding.

 

Please, look at my commits from time to time, tell me when you find any issues and let me know when you have a comment on this project

Apr 12, 2012 at 6:54 PM
Edited Apr 12, 2012 at 7:31 PM

One question - how are we able to decide which parser should be used for the given file?

I think we should replace "Language" setting with something like "SupportedFileExtensions", which will be the list of supported file extensions for the given parser, for example ".h, .cpp" - do you agree?

 

Dave - I also need your help with the plugin folder - are we able to find out where our plugin files are stored and create extension configuration file there during installation? I need to know the location of the extension points configuration file and it must be in a folder that can be easily found by the user, but better not something like C:\Sando - I hope we can create this file inside a plugin directory - can we?

Apr 12, 2012 at 8:35 PM

> I also need your help with the plugin folder - are we able to find out where our plugin files are stored and create extension configuration file there during installation?

Yes, I think we can use the following APIs to find the plugin files. See below from http://stackoverflow.com/questions/8762062/finding-the-home-directory-for-a-visual-studio-2010-extension.

--

   static IVsExtensionManager GetExtensionManager()
    {
        return myPackage.GetService(System.typeof(IVsExtensionManager)) as IVsExtensionManager;
    }
    static IInstalledExtension GetExtension(string identifier)
    {
        return GetExtensionManager().GetInstalledExtension(identifier);
    }
    static string GetExtensionDirectory(string identifier)
    {
        return GetExtension(identifier).InstallPath;
    }

The string identifier is whatever you put in the "ID" field of your extension'ssource.extension.vsixmanifest file. It defaults to the package GUID.

--

However, this will give you a directory path that looks like:

"C:\Windows\Microsoft.Net\assembly\GAC_MSIL\myCompany.myExtension\v4.0_1.0.0.0__936015a19c3638eb\myCompany.myExtension.dll"

which is not too pretty.  I think we could still use it, however, and just provide a preference dialog for Sando where users can look up this file and/or browse to use another file.  For instance, we can have UI that looks like:

Extensions File: [ <path to file> ] [Browse..]

in our configuration options.

Apr 12, 2012 at 8:37 PM

> One question - how are we able to decide which parser should be used for the given file?

Right now this is hacked... I think we check for .h, .cpp, .cxx, and .cs.  We need to fix this.

> I think we should replace "Language" setting with something like "SupportedFileExtensions", which will be the list of supported file extensions for the given parser, for example ".h, .cpp" - do you agree?

Yes, we need to fix this to work similar to how you specify. I've created http://sando.codeplex.com/workitem/101 to track this.

Apr 12, 2012 at 9:15 PM

I've just committed first versions of the extension framework classes.

Dave - I like you idea about moving the Extension configuration file path to the Sando configuration - now I wonder how am I able to get this value from configuration? :) I think the plugin settings will be saved inside the VS settings file, am I right?

Apr 12, 2012 at 9:21 PM

now I wonder how am I able to get this value from configuration?

I think I'm misunderstanding you but can't we get the initial path by using this API?

GetExtension(identifier).InstallPath;

Let me know if/how I've misunderstood so I can give a better answer ;)


Apr 12, 2012 at 9:27 PM

Ok - that seems fine for me :)

Apr 12, 2012 at 9:38 PM

> I've just committed first versions of the extension framework classes.

I just checked those files out and they look good.  Tomorrow (if I get the time) I will try to use these classes to contribute our existing parsers to Sando.  

Great start!

Apr 13, 2012 at 10:45 PM
Edited Apr 13, 2012 at 11:41 PM

Guys - I've just committed code that uses new extension classes for getting the right parser. I have couple of comments/questions - FYI

Dave - I've used UIPackage.Initialize method to add a code for registering default parser implementations for the plugin - is it the right place? As I understand - this method is called on plugin load, so that when you run more than one VS instance, it will be called only at the first time - am I right?

Kosta - I've removed parametrized parser constructors - they were useless in my opinion as you do all the initialization in the default constructor, but I mainly did that, because I may be forced to create the instance of one of these parser using reflection, so that I'll be using only the default constructor anyway. Let me know if you're ok with this change - I've updated all related unit tests and the are all passing, so I believe those constructors were not necessary

I've also added a method to TestUtils class, which I use to initialize default extension point classes for unit tests - you need to use this method every time you use the parser in your test, otherwise test files won't be parsed and you will get no results

 

One additional info - WordSplitter class is now also available as a default extension point for IWordSplitter contract. Get/Register methods have been added to the ExtensionPointsRepository

 

Let me know if you have any questions/comments

Apr 14, 2012 at 2:23 AM

Removing the parameterized constructors is probably a good idea - I too was contemplating it earlier. I'm glad things seem to be coming along nicely.

Apr 14, 2012 at 2:32 AM

> Dave - I've used UIPackage.Initialize method to add a code for registering default parser implementations for the plugin - is it the right place? As I understand - this method is called on plugin load, so that when you run more than one VS instance, it will be called only at the first time - am I right?

This is exactly where I would do it.  This method is only called once as far as I know, even if you have multiple instances of VS.

Apr 14, 2012 at 6:43 PM
Edited Apr 15, 2012 at 9:46 PM

I've implemented dynamic loading custom parsers and custom word splitter - see ExtensionPointsConfigurationAnalyzerTest for the live example of usage.

Errors, informations etc from the whole extension points registration process are now logged to the log file (I'm planning to create such file in plugin directory)

 

Update #1

Two additional comments - currently the extension points repository will return null in all the tests if custom extension point configuration was invalid (like typo in the class name) - I'll change that when I have the test library, so that the defaults are always used when the custom can't be.

The second comment - I've been thinking about the custom ProgramElement subclasses and in my opinion we should join them in the configuration with the parser (which is required), so that instead of:

<contract name="Sando.ExtensionContracts.ParserInterface">
      <implementation name="Sando.Parser.CustomParser" .../>
</contract>
<contract name="Sando.ExtensionContracts.ProgramElement">
      <implementation name="Sando.Parser.CustomMethodElement .../>
</contract>

we will have:

<contract name="Sando.ExtensionContracts.ParserInterface">
      <implementation name="Sando.Parser.CustomParser" .../>
      <programElements>
             <contract name="Sando.ExtensionContracts.ProgramElement">
                   <implementation name="Sando.Parser.CustomMethodElement .../>
             </contract>
      </programElements>
</contract>

It can simplify a lot validation, so that for example I'll know that I should remove the parser configuration if program element configuration is invalid (or there was an exception during type loading) - any objections?

 

Update #2

One more question - having a custom parser with a custom ProgramElement, how am I able to index this PE? Currently it's made by two classes - SandoDocumentFactory and ProgramElementReader, using SandoDocument subclass, which won't be implemented for custom PEs - should I add a logic to index them by reflection, using all public properties (which values will be always converted using ToSandoSearchable and ToSandoDisplayable) from the custom PE?

Oh - and one more - is there any reason to register PEs in repository? I can't see any need for direct reference, so in my opinion loading an assembly with a custom PE so that a custom parser can create an instance of it is enough - right?

 

Update #3

I've created test extension points library. I've also decided to change the parser Parse method return type from array to list - most of the time people use collections instead of the arrays so I believe it's better to provide the List<T> as return type in our extension API

 

Update #4

I can proudly show the working example of loading custom parser that uses custom PE, registering it, getting it using repository and calling the Parse method, which returns valid results - see FindAndRegisterValidExtensionPoints_RegistersUsableCustomParser unit test for details :)

 

Update #5

We have working extension points for parsers (with ProgramElement subclasses), a word splitter, a results reorderer and query weights supplier - it was quite a productive weekend ;)

Apr 16, 2012 at 3:54 PM

> Errors, informations etc from the whole extension points registration process are now logged to the log file (I'm planning to create such file in plugin directory)

Thank goodness, we needed to do this!   

Apr 16, 2012 at 3:57 PM

It can simplify a lot validation, so that for example I'll know that I should remove the parser configuration if program element configuration is invalid (or there was an exception during type loading) - any objections?

No, I think combining the extensions like this sounds good, as we previously discussed, they will have to implement both in most (all?) cases anyway.  

 

Apr 16, 2012 at 4:04 PM

One more question - having a custom parser with a custom ProgramElement, how am I able to index this PE? Currently it's made by two classes - SandoDocumentFactory and ProgramElementReader, using SandoDocument subclass, which won't be implemented for custom PEs - should I add a logic to index them by reflection, using all public properties (which values will be always converted using ToSandoSearchable and ToSandoDisplayable) from the custom PE?

Yes, we will need to implement a CustomProgramElementDocument that will wrap ProgramElements of unknown types and use reflection to AddDocumentFields as well as ReadProgramElementFromDocument.  

Apr 16, 2012 at 4:05 PM

> Oh - and one more - is there any reason to register PEs in repository? I can't see any need for direct reference, so in my opinion loading an assembly with a custom PE so that a custom parser can create an instance of it is enough - right?

Yup, I agree.

Apr 16, 2012 at 4:08 PM

 

We have working extension points for parsers (with ProgramElement subclasses), a word splitter, a results reorderer and query weights supplier - it was quite a productive weekend ;)

 

HOLY COW!  Are you serious??  This was an extremely productive weekend!!  I can't say anything else bug WOW.  

Apr 23, 2012 at 6:47 PM

I've just committed the extension point for the query rewriter, but I have no idea where I should use it - the SimpleSearchCriteria class has a sorted set SearchTerms, of course initialized after the splitter call - should I call the rewriter just before the splitter?

Dave - can you look at that and tell me which class/line is in your opinion the best place to use that EP?

Apr 24, 2012 at 2:01 AM

I guess you got your computer back up and running... great!  Nice work on adding the new extension point.

> Dave - can you look at that and tell me which class/line is in your opinion the best place to use that EP?

Yes, I just committed my changes as to where I would put this. Please feel free to modify after reading this.  I added it to SearchManager.GetCriteria.  Note that I also removed the use of the splitter extension point here.  We don't want to split the initial query, we only want to split things as they are indexed, IMO.  The intuition is this: people create names like "openFile", "SearchManager", etc. which need to be split when indexed, however, when people create a search query they usually write "open file", "search files", etc. (i.e., they have already split manually).  If we split for them again we may mess up things that the query rewriter may want to do.  

For instance, the query rewriter may want to add part-of-speech information (POS) to each word (like this: "openFile" -> "openVerb fileNoun").  If we run the splitter on that afterwards it will split the POSs off.  Note that I'm not totally opposed to splitting the query (as we had before) I just want to make sure that this POS case is supported.  Let me know what you think and/or if you need more explanation.

Apr 24, 2012 at 8:30 AM
Edited Apr 24, 2012 at 8:31 AM

Ok - but how are we going to search for the terms, without splitting it?

In my opinion the splitter was required to change the query "open file" into "Body:open AND Body:file" - without that step, we will have "Body:"open file"" (or even "Body:open file", which will cause the parser exception - we are not adding any quotes right now), which won't work as we want it to - do you know what I mean?

 

P.S. And yes - I'm using a replacement for my power supply until I get back the one that stopped working :)

Apr 24, 2012 at 7:19 PM

> In my opinion the splitter was required to change the query "open file" into "Body:open AND Body:file" - without that step, we will have "Body:"open file"" (or even "Body:open file", which will cause the parser exception - we are not adding any quotes right now), which won't work as we want it to - do you know what I mean?

Yes, I thought this might be an issue.  Thus I still split the query string using something like this: searchString.Split(' ').ToList();  So, the query still works, it just doesn't use the same, possibly more complex splitter that the indexer uses. Does this make sense? I have tested Sando with these changes and it seems to work as expected.

 

Apr 24, 2012 at 7:35 PM

Dave - now when I can see the code, I have an additional comment - the method that was used there, was not the one from the contract interface - it was an additional one (static) used just to extract the search terms (I thought this class would be the best place for it) - the main idea behind adding this method was to ensure the proper behavior when a search query contains quotes (like when you want to search for "open file" - with the exact words, whole expression) - there is no advanced logic there, like in the contract method - please take a look once again

Apr 24, 2012 at 7:41 PM

Ah, I have reverted to using the static call... I thought it was the extension point call... my mistake!

Apr 28, 2012 at 2:51 PM
Edited Apr 28, 2012 at 3:27 PM

I've created a new documentation page for the configuration file - it should be probably expanded, so let's call it a draft - Dave - change it in case of any comments.

I've also created a configuration page for Sando, which can be displayed from the Tools->Option->Sando VS menu. Currently it contains only the Extension points directory setting (which is however saved in the vssettings file, which means it is persistent across sessions), but it will be expanded in the future to cover all the required user settings.

 

Update:

I've changed the UIPackage to read the EP configuration from the configuration file - Dave - please take a look at the code inside this class - in my opinion the GetExtensionPointsConfiguration method and the ExtensionDirectoryProvider class can be removed - to register your custom EPs you need to create a configuration file, place that file in the EP directory (with your dlls) and select this directory from the Sando settings page

Apr 28, 2012 at 6:00 PM

I'm trying to use the extension points you guys have constructed for a toy example. It really is very well done - much better than I could have done myself.

I do have a couple of very naive questions/suggestions:

1. Can we provide a dummy configuration file somewhere? It may be easier for users to have an example file they can edit, rather than constructing a brand new one.

1.5 A related thing: Is the configuration file settings intended for the end users or the extension developers? I think end users will not care about configuring anything, and developers would probably want to give the end users a Sando version that is preconfigured with their extensions. Do we allow this kind of usage? or am I thinking about this incorrectly?

2. In my extension, I would like to give different weights to different words in the query. At the moment it looks to me like the provided functionality (QueryWeightsSupplier) is to assign weights per element type (e.g. name, body, etc.). Is there any way to assign weights based on some other characteristic?

Apr 29, 2012 at 11:56 AM
Edited Apr 29, 2012 at 11:56 AM
kostata wrote:

I'm trying to use the extension points you guys have constructed for a toy example. It really is very well done - much better than I could have done myself.

I do have a couple of very naive questions/suggestions:

1. Can we provide a dummy configuration file somewhere? It may be easier for users to have an example file they can edit, rather than constructing a brand new one.

1.5 A related thing: Is the configuration file settings intended for the end users or the extension developers? I think end users will not care about configuring anything, and developers would probably want to give the end users a Sando version that is preconfigured with their extensions. Do we allow this kind of usage? or am I thinking about this incorrectly?

2. In my extension, I would like to give different weights to different words in the query. At the moment it looks to me like the provided functionality (QueryWeightsSupplier) is to assign weights per element type (e.g. name, body, etc.). Is there any way to assign weights based on some other characteristic?

1. The full example file is on the extension points configuration file page, within the documentation - in my opinion that page will also contain the description of the file and will be the start point for the developers.

1.5 Good point and I really don't know what the answer should be - we may have the default value set for this configuration option for the normal users - maybe within the folder that ExtensionDirectoryProvider class is exposing.

2. This is an advanced scenario, because you want to not only set the weights per usage type (like body or name) but also per a single search term. It's not possible right now and it will be hard to do, because how will you know what the search terms are to apply your weights? You may want to write your own query rewriter and based on its results, select the appropriate weights, but all we can do is to provide the search terms list to the query weights supplier and create the second method, which will return the Dictionary<usageType,weight> for every search term.
We should discuss if this is scenario that we want to cover and if it will be used by the end users.

Additionally I'm not sure if it's not the task for the query rewriter rather than the query weights supplier (the boost factor = 1 is not added to the query, neither when the usage type is not within the keys nor the weight is set to 1, so you can write your own query weights supplier that would return empty dictionary (no boost factors added) and your own query rewriter that will apply the weights based on your own characteristics). This solution has one main advantage - user has full control of what he wants to do with the query - when we create a method for the query weight supplier, user will have to place the logic there to analyze the search terms, which seems to me like the query rewriter task.

Dave - what do you think about that?

Apr 30, 2012 at 8:58 PM
kostata wrote:

I'm trying to use the extension points you guys have constructed for a toy example. It really is very well done - much better than I could have done myself.

I do have a couple of very naive questions/suggestions:

1. Can we provide a dummy configuration file somewhere? It may be easier for users to have an example file they can edit, rather than constructing a brand new one.

1.5 A related thing: Is the configuration file settings intended for the end users or the extension developers? I think end users will not care about configuring anything, and developers would probably want to give the end users a Sando version that is preconfigured with their extensions. Do we allow this kind of usage? or am I thinking about this incorrectly?

2. In my extension, I would like to give different weights to different words in the query. At the moment it looks to me like the provided functionality (QueryWeightsSupplier) is to assign weights per element type (e.g. name, body, etc.). Is there any way to assign weights based on some other characteristic?

1. I think that the example file on the documentation page is good.  I also think that we should replace the first part of the UIPackage.RegisterExtensionPoints method with an actual configuration file.  This would load the default implementations (which is now being done in code) as well as provide an example to developers.  (http://sando.codeplex.com/workitem/121)

 

1.5 The configuration file settings are NOT intended for end users.  Only for extension developers and/or researchers.  These extension points are a good way for other people to provide us with new parsers (e.g., a parser for a new language) without having to actually contribute to our code base.  We can keep our code separate until we feel their contribution is of a sufficient quality, and then we can consider incorporating it directly into Sando.

2. Can you please explain why you would want to do this?  I can't imagine a scenario in which this is terribly useful... for instance, you usually don't specify weights of your search terms in Google and yet queries seem to work well.  If you give a concrete scenario of where this is useful it would help.

Apr 30, 2012 at 9:36 PM

1. Ok.

1.5 I'm talking about the configuration page (in Tools->Option->Sando). But, this concern is satisfied with your answer on (1).

2. Let's say I want to weigh verb-direct object pairs higher than the other words in Sando. That is, I don't want the other parts of speech to be worthless, just to weigh relatively less than the V-DO. How would I do that in Sando? Because, I was under the impression that there is no way for an Sando extender to tweak the weights when indexing (which is what Google may do),

May 1, 2012 at 7:48 PM

> 2. Let's say I want to weigh verb-direct object pairs higher than the other words in Sando. That is, I don't want the other parts of speech to be worthless, just to weigh relatively less than the V-DO. How would I do that in Sando? Because, I was under the impression that there is no way for an Sando extender to tweak the weights when indexing (which is what Google may do),

One way to do this would be to create a custom Verb field and a custom DO field when parsing.  For instance, when parsing a method identify all of the verbs and put them in the Verb field.  Then, in the query weights supplier increase the weight of the verb field to be higher.  This would effectively do what you were talking about above. Does this work for you?

> I was under the impression that there is no way for an Sando extender to tweak the weights when indexing (which is what Google may do),

Yes, there is no way to tweak the weights when *indexing* but you can tweak the weights when *querying*.  We specifically chose to tweak at the querying point because if you tweak at the indexing point then everytime you change the weights Sando would have to reindex the given project.  Does this make sense?

All of that being said, if you have a compelling reason to create a new extension point that we can't address by the existing EPs then we should create it.

May 2, 2012 at 4:02 AM

>One way to do this would be to create a custom Verb field and a custom DO field when parsing.  For instance, when parsing a method identify all of the verbs and put them in the Verb field.  Then, in the query weights supplier increase the weight of the verb field to be higher.  This would effectively do what you were talking about above. Does this work for you

I mentioned what I wanted to do to you offline, but a couple of things stick out here. One is that this seems a whole lot of work for something that could easily be done by modifying the actual query weights on a word by word basis. The other is that query weights supplier is a misleading name; perhaps, field weights supplier would make more sense.