今回は、curlでドキュメントの追加、更新、検索、削除をしてみようと思います。Dockerとvespa-cliが準備済みの前提で話を進めます。
まず、今回、利用するアプリケーションを用意します。適当なディレクトリを作成して、
- services.xml
- schemas/doc.sd
のファイルを作成します。
services.xmlは以下の内容で作成します。
<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0" xmlns:deploy="vespa" xmlns:preprocess="properties">
<container id="default" version="1.0">
<search></search>
<document-api></document-api>
<nodes>
<node hostalias="node1"></node>
</nodes>
</container>
<content id="mind" version="1.0">
<min-redundancy>1</min-redundancy>
<documents>
<document type="doc" mode="index"/>
</documents>
<nodes>
<node hostalias="node1" distribution-key="0" />
</nodes>
</content>
</services>
schemas/doc.sdは以下です。
schema doc {
document doc {
field doc_id type string {
indexing: summary | attribute
attribute: fast-search
}
field title type string {
indexing: index | summary
index: enable-bm25
}
field content type string {
indexing: index | summary
index: enable-bm25
}
}
fieldset default {
fields: title, content
}
rank-profile default {
first-phase {
expression: nativeRank(title, content)
}
}
}
シンプルなスキーマですが、doc_id、title、contentのフィールドを作成して、titleとcontentはbm25で検索できるようにしてます。
2つのファイルが準備できたら、Vespaを起動します。
$ docker run --detach --name vespa --hostname vespa-container --publish 8080:8080 --publish 19071:19071 vespaengine/vespa
起動したら、2つのファイルを作成したディレクトリでデプロイを実行します。
$ vespa deploy --wait 300 Waiting up to 5m0s for deploy API… Uploading application package… done Success: Deployed . with session ID 2 Waiting up to 5m0s for deployment to converge… Waiting up to 5m0s for cluster discovery… Waiting up to 5m0s for container default…
みたいな感じで、デプロイされます。
早速、ドキュメントを追加します。ChatGPTにブログっぽい記事のドキュメントを3つ作ってもらったので、以下を登録します。
$ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-1" \ -H "Content-Type: application/json" \ -d '{ "fields": { "doc_id": "blog-1", "title": "Exploring the Beauty of Nature", "content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life..." } }' {"pathId":"/document/v1/doc/doc/docid/blog-1","id":"id:doc:doc::blog-1"} $ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-2" \ -H "Content-Type: application/json" \ -d '{ "fields": { "doc_id": "blog-2", "title": "The Journey of Personal Growth", "content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant." } }' {"pathId":"/document/v1/doc/doc/docid/blog-2","id":"id:doc:doc::blog-2"} $ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-3" \ -H "Content-Type: application/json" \ -d '{ "fields": { "doc_id": "blog-3", "title": "The Future of Technology and Innovation", "content": "The rapid pace of technological advancement is shaping our future. From artificial intelligence to renewable energy solutions, innovative ideas are at the forefront of creating a sustainable and interconnected world..." } }' {"pathId":"/document/v1/fess/doc/docid/blog-1","id":"id:fess:doc::blog-1"}
id:doc:doc::blog-1のような感じのIDでそれぞれ登録されます。/document/v1のAPIは、/document/v1/[namespace]/[document type]/… のような形式で、今回はnamespaceはfessにして、document typeはdoc.sdでdocとしているので、docになります。namespaceは任意だと思うので、今回はfessにしてます(そのうち、FessのインデックスをVespaに入れてみたいなぁという希望を込めて…)。
IDで登録したドキュメントを取得してみます。
$ curl -X GET "http://localhost:8080/document/v1/fess/doc/docid/blog-1"|jq . { "pathId": "/document/v1/fess/doc/docid/blog-1", "id": "id:fess:doc::blog-1", "fields": { "doc_id": "blog-1", "content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life...", "title": "Exploring the Beauty of Nature" } }
次に、全件取得的なリクエストはVisitという感じであるので、それを使ってみます。ElasticsearchでいうところのScrollみたいな感じです。
$ curl http://localhost:8080/document/v1/fess/doc/docid|jq . { "pathId": "/document/v1/fess/doc/docid", "documents": [ { "id": "id:fess:doc::blog-3", "fields": { "doc_id": "blog-3", "content": "The rapid pace of technological advancement is shaping our future. From artificial intelligence to renewable energy solutions, innovative ideas are at the forefront of creating a sustainable and interconnected world...", "title": "The Future of Technology and Innovation" } } ], "documentCount": 1, "continuation": "AAAACAAAAAAAAAAUAAAAAAAAABMAAAAAAAABAAAAAAEgAAAAAAAAyAAAAAAAAAAA" }
という感じで、continuationが返ってくるので、これをリクエストパラメーターに追加して、次のドキュメントを取得します。
$ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAAAUAAAAAAAAABMAAAAAAAABAAAAAAEgAAAAAAAAyAAAAAAAAAAA"|jq . { "pathId": "/document/v1/fess/doc/docid", "documents": [ { "id": "id:fess:doc::blog-1", "fields": { "doc_id": "blog-1", "content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life...", "title": "Exploring the Beauty of Nature" } } ], "documentCount": 1, "continuation": "AAAACAAAAAAAAABEAAAAAAAAAEMAAAAAAAABAAAAAAEgAAAAAAAAwgAAAAAAAAAA" } $ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAABEAAAAAAAAAEMAAAAAAAABAAAAAAEgAAAAAAAAwgAAAAAAAAAA"|jq . { "pathId": "/document/v1/fess/doc/docid", "documents": [ { "id": "id:fess:doc::blog-2", "fields": { "doc_id": "blog-2", "content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant.", "title": "The Journey of Personal Growth" } } ], "documentCount": 1, "continuation": "AAAACAAAAAAAAACtAAAAAAAAAKwAAAAAAAABAAAAAAEgAAAAAAAANQAAAAAAAAAA" } $ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAACtAAAAAAAAAKwAAAAAAAABAAAAAAEgAAAAAAAANQAAAAAAAAAA"|jq . { "pathId": "/document/v1/fess/doc/docid", "documents": [], "documentCount": 0 }
という感じで、全件取得したら終了します。
続いて、検索をしてみましょう。titleにjourneyが含まれるものを検索します。YQLで記述します。
$ curl -X POST "http://localhost:8080/search/" -H "Content-Type: application/json" -d '{ "yql": "select * from sources doc where title contains \"journey\";" }'|jq . { "root": { "id": "toplevel", "relevance": 1, "fields": { "totalCount": 1 }, "coverage": { "coverage": 100, "documents": 3, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:fess:doc::blog-2", "relevance": 0.16343879032006284, "source": "mind", "fields": { "sddocname": "doc", "documentid": "id:fess:doc::blog-2", "doc_id": "blog-2", "title": "The Journey of Personal Growth", "content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant." } } ] } }
ドキュメント更新は、以下のようにblog-2のtitleを更新します。
$ curl -X PUT "http://localhost:8080/document/v1/fess/doc/docid/blog-2" \ -H "Content-Type: application/json" \ -d '{ "fields": { "title": { "assign": "The Trip of Personal Growth" } } }' {"pathId":"/document/v1/fess/doc/docid/blog-2","id":"id:fess:doc::blog-2"}
あとは、最後に削除を試してみましょう。blog-1を削除します。
$ curl -X DELETE "http://localhost:8080/document/v1/fess/doc/docid/blog-1" {"pathId":"/document/v1/fess/doc/docid/blog-1","id":"id:fess:doc::blog-1"}
という感じで、基本的なものを一通り試してみました。
最後にVespaを停止するのに
$ docker stop vespa
をして、終了です。
次は、rank-profileをいろいろと試したいところですね。