lundi 1 février 2016

Couchbase Full Text (CBFT) for Content Management

Full Text Search (FTS) is a main capability of Content Management systems to search both content and metadata associated to the content. In a previous blog, I already discussed about a new fully scalable architecture for Content Management using Apache Chemistry with Couchbase repository for metadata (and possibly blobs). Today, I would like to discuss about how to integrate FTS capability in a scalable way in this architecture without the need for yet another tier (ElasticSearch, Solr, LudicWorks). 

In 2015, Couchbase has announced the development of CBFT which stands for Couchbase Full Text search, actually in developer preview. CBFT is simple, integrated distributed Full Text server which covers 80% of features of most applications.You can find more informations on CBFT here: http://connect15.couchbase.com/agenda/sneak-peek-cbft-full-text-search-couchbase/

In this article, I will start to investigate how to integrate CBFT in CMIS Apache Chemistry for metadata full text search.

  • Setup
To install Couchbase, follow the documentation here.
Create a bucket called cmismeta. This bucket contains the metatada of each content (folder, file).
To install Apache Chemistry using Couchbase repository, follow the documentation here.
To install CBFT, follow the documentation here.

  • Create a CBFT index
Start CBFT on a local node : cbft -s http://localhost:8091
Point your web browser to cbft's web admin UI : http://localhost:8095



On the Indexes listing page, click on the New Index  button.
Create an index called cmis-fts on bucket cmismeta.

  • Test your index
To test your index, you need to add content on cmismeta bucket. You can either do it using the Apache Chemistry workbench to create content (folder, files) that will be associated with metadata in cmismeta bucket, or by adding simple content for testing (then remove it).

In this example, I already have a bunch of files added to the Content Management Couchbase repository.

Open the query tab and enter a query using Bleve syntax


  • CMIS Apache Chemistry project


 First, you need to activate the full text query capabilities of CMIS Couchbase repository class.


public class CouchbaseRepository {
   private RepositoryInfo createRepositoryInfo(CmisVersion cmisVersion) {
        // set repo infos
        RepositoryInfoImpl repositoryInfo = new RepositoryInfoImpl();
        repositoryInfo.setCmisVersionSupported(cmisVersion.value());
        ...
        // set repo capabilities
      RepositoryCapabilitiesImpl capabilities = new RepositoryCapabilitiesImpl();
        capabilities.setCapabilityQuery(CapabilityQuery.FULLTEXTONLY);
        ...
        repositoryInfo.setCapabilities(capabilities);
        return repositoryInfo;
     }
}

To query the CBFT index, we are using the REST API with a Jersey client.

First, add the dependency in the maven pom file.

        <dependency>
           <groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>  
          <version>1.8</version>
</dependency>

Then create a new CBFT service class. This service needs the CBFT location and index name. I provides a simple query method returning a list of keys referring to cmismeta bucket in Couchbase.

package org.apache.chemistry.opencmis.couchbase;

import java.util.ArrayList;
import java.util.List;

import com.couchbase.client.java.document.json.JsonArray;
import com.couchbase.client.java.document.json.JsonObject;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;


public class CBFTService  {

 private String cbftLocation = null;  
private Client client = null;
private String indexid = null;

public CBFTService(String location, String indexid) {
this.cbftLocation = location;
this.indexid = indexid;
client = Client.create();
}

/** Search cbft index.
* @param query the query to search
* @return list of keys matching the query
* */
public List<String> query(String query){
List<String>
results = new ArrayList<String>();

WebResource
webResource = client
.resource(
"http://"+this.cbftLocation+":8095/api/index/"+indexid+"/query");

  String input = "{" +
     "\"q\": \""+query+"\"," +
"\"indexName\": \""+indexid+"\"," +
"\"size\": 10,"+
"\"from\": 0,"+
"\"explain\": true,"+
"\"highlight\": {}," +
"\"query\": {" +
"\"boost\": 1,"+
"\"query\": \""+query + "\""+
"},"+
"\"fields\": [" +
"\"*\"" +
"]," +
"\"ctl\": {" +
"\"consistency\": {"+
"\"level\": \"\"," +
"\"vectors\": {}"+
"},"+
"\"timeout\": 0"+
"}"+
"}";
    ClientResponse response = webResource.type("application/json")
.post(ClientResponse.
class, input);

if (response.getStatus() != 200) {
throw new RuntimeException("Failed : HTTP error code : "
+
response.getStatus());
}

  String output = response.getEntity(String.class);

JsonObject
content = JsonObject.fromJson(output);

   JsonArray hits = content.getArray("hits");

if(hits != null){
  String id;
     for(int i=0 ; i<hits.size(); i++){
id = hits.getObject(i).getString("id");
  results.add(id);
    }  
}

return results;

}

}

You can now query the Content Management server using the workbench to retrieve content using the CBFT capability and click on the result to see the associated content.


1 commentaire:

  1. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..

    Essay writing service

    MBA assignment help

    Java assignment help

    Marketing assignment help

    RépondreSupprimer