Lucene.Net usage

Coordinator
Jan 24, 2012 at 7:18 AM

"I had to reference some of the Lucene dlls and realized that they are not in the LIBS folder, only in the Indexer project under Lucene. I'm not sure where the right place is for these dlls, as I imagine you might want to keep them private to the Indexer project. I guess we need to either put them in LIBS or make it so other projects (e.g., UI) don't need to depend on Lucene dlls. 

Anyway, no rush on this, just wanted to track this issue."

 

I've decided to create a discussion for this - it seems to be important issue and it would be probably a good idea if we talk about that.

In my opinion, it would be the best if we hide the usage of Lucene from other projects, even the search engine, so that we can easily switch to different indexing/searching library in the future - however I don't know if it's possible and I'm not sure i we want to be flexible in such a way.

I'm not sure how exactly the Lucene search works and looks like and I don't know if we can hide it behind some API functions, without performance or functionality losses - I hope you guys leave a comment on that - we can make the Lucene tightly coupled, if there is a reason for that, and in that case I will move its libraries to the Libs folder or we can make it hidden behind the API - let me know, what do you think about that

 

Coordinator
Jan 24, 2012 at 1:06 PM
lordlothar wrote:

In my opinion, it would be the best if we hide the usage of Lucene from other projects, even the search engine, so that we can easily switch to different indexing/searching library in the future - however I don't know if it's possible and I'm not sure i we want to be flexible in such a way.

I agree with this, it'd be best if we can hide the dependency on Lucene.  Of course, this goal will be dependent on whether it is possible, but I think it probably is.  

Coordinator
Jan 24, 2012 at 9:10 PM

Ok - guys - I'll try to implement API for SearchEngine.

DocumentIndexer class can now add documents to Lucene (see UnitTest project for details), so I can try to add searching capabilities.

Before I do that, I need your help with defining all possible options we're going to have for the user for the search engine.

  • search within comments (true/false)
  • filter by access type (using AccessLevel enum with checkbox which can turn on or off that option) 
  • filter by program element type (new enum with ProgramElement subclasses names with similar checkbox as in the previous point)
  • show only program elements definitions (search within program element names - true/false)
  • exact mode (true/false - true means that if you search for "stories" you won't get "story" in the results; false means the we will use language analyzer to find all the similar words, a whole word family, like "story", "stories" etc.)
  • "not" option (return all results except the ones found by running the query)

I believe we also would like to have the possibility to define multiple search conditions, like "search for 'story', only within definitions and not for private program elements" - this is what I wish the VS search engine had which I know that would be very useful for the user... I think we can achieve it by defining one search condition at the time and adding it (with possible brackets) to other (already defined) conditions, so that we can build complex expressions. It can be hard but definitely worth trying.

 

These are my thoughts about the search engine - please correct any of the functionality I mentioned if you think it should behave differently or shouldn't be here at all. I also hope you guys add all the options I missed, so that we will have the full scope of the search capabilities we're planning to have so that I can start implementing APIs.

 

Of course probably all the options are going to have completely different (more meaningful) names in the UI - I'm trying to focus on the functionality, not on the naming conventions at this moment

Jan 24, 2012 at 10:11 PM
Edited Jan 25, 2012 at 4:25 AM

We had some discussions on what search requests developers would like to issue to a code search engine. Some of these thoughts found their way in this wiki page: http://wiki.eclipse.org/Recommenders/CodeSearch No all queries are (extremely) useful, but maybe they give you some interesting ideas what you could index.

Coordinator
Jan 25, 2012 at 6:51 AM

Thanks! - this one will help a lot - I'll try to compare all the search cases mentioned there with ours and see which are missing. Thanks again!

Jan 25, 2012 at 7:09 AM
lordlothar wrote:

In my opinion, it would be the best if we hide the usage of Lucene from other projects, even the search engine, so that we can easily switch to different indexing/searching library in the future - however I don't know if it's possible and I'm not sure i we want to be flexible in such a way.

I'm not sure how exactly the Lucene search works and looks like and I don't know if we can hide it behind some API functions, without performance or functionality losses - I hope you guys leave a comment on that - we can make the Lucene tightly coupled, if there is a reason for that, and in that case I will move its libraries to the Libs folder or we can make it hidden behind the API - let me know, what do you think about that

 

I agree to this. I checked in something basic on search, let me think how we can hide Lucene dependency

Coordinator
Jan 25, 2012 at 2:53 PM
lordlothar wrote:

Before I do that, I need your help with defining all possible options we're going to have for the user for the search engine.

  • search within comments (true/false)
  • filter by access type (using AccessLevel enum with checkbox which can turn on or off that option) 
  • filter by program element type (new enum with ProgramElement subclasses names with similar checkbox as in the previous point)
  • show only program elements definitions (search within program element names - true/false)
  • exact mode (true/false - true means that if you search for "stories" you won't get "story" in the results; false means the we will use language analyzer to find all the similar words, a whole word family, like "story", "stories" etc.)
  • "not" option (return all results except the ones found by running the query)

This list of options looks great!  I'm having trouble thinking of any additional items to add....

Coordinator
Jan 25, 2012 at 5:41 PM

There is one more thing I hope we will be able to implement - "search in" option (btw - Marcel - thanks again for the link - I came up with this idea thanks to the post in the wiki page you mentioned) - I wish we will be able to limit the places where search should be run not only on project level (like in VS now), but also on directory level - I think it would be really great if user can click "search in" and see the directory tree, probably along with the checkboxes next to each directory name, so that he can choose all the folders where the search should be run.

It can look like:

Solution

Project1

Directory1

Directory2

Directory3

Project2

Directory4

Directory5

Directory6

and when you check checkboxes near Directory1, Directory3 and Directory6, plugin will search for the files where the relative path contains path to any of these directories.

My life would be sometimes much more easier if I have such an option for in the current VS - what do you think, guys?

Coordinator
Jan 25, 2012 at 5:44 PM
lordlothar wrote:

There is one more thing I hope we will be able to implement - "search in" option (btw - Marcel - thanks again for the link ...

I strongly second this idea.  It would be fantastic to right click on a folder, project, etc and be able to search in that...great idea.

Coordinator
Jan 25, 2012 at 7:41 PM

One more thing - there is a possibility in the VS while searching to choose target file types - I think it we can easily implement this functionality, but in order to do that I wonder - will it be better if we store file extension in the separate Lucene document field (and of course if remove it from the FileName field at the same time)?

Coordinator
Jan 25, 2012 at 7:45 PM
lordlothar wrote:

One more thing - there is a possibility in the VS while searching to choose target file types - I think it we can easily implement this functionality, but in order to do that I wonder - will it be better if we store file extension in the separate Lucene document field (and of course if remove it from the FileName field at the same time)?

Forget it - it's better when the file name contains extension, because we can search for files not only providing the extension, but also part of the name, i.e. *Manager.cs. I'm adding this option, but without FileName changes

Coordinator
Feb 1, 2012 at 8:11 PM

Guys - is there any reason why we are storing the file name and the file path in different fields?

I don't know how to implement search criteria "file in locations" when we have two fields which I need to concatenate in order to get a full path with the file name - can I store it in one field? If a user wants to run a search within the files described as "Model\*Logic.cs" I won't be able to prepare the appropriate query efficiently, because I will have the first part - let it be "C:\Projects\MVCTest\Model\" in path field and the second, like "DocumentLogic.cs" in the file name field.

I can't think of any benefits from using two fields for that - let me know if I'm wrong, otherwise I'm going to join these fields into one named "FullFilePath".

Coordinator
Feb 1, 2012 at 8:26 PM
lordlothar wrote:

Guys - is there any reason why we are storing the file name and the file path in different fields?

...otherwise I'm going to join these fields into one named "FullFilePath".

Yeah, I saw that the other day and thought it was a bit weird too.  I'm all for joining them into one field.  Good idea!