今回は、curlでドキュメントの追加、更新、検索、削除をしてみようと思います。Dockerとvespa-cliが準備済みの前提で話を進めます。
まず、今回、利用するアプリケーションを用意します。適当なディレクトリを作成して、
- services.xml
- schemas/doc.sd
のファイルを作成します。
services.xmlは以下の内容で作成します。
<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0" xmlns:deploy="vespa" xmlns:preprocess="properties">
<container id="default" version="1.0">
<search></search>
<document-api></document-api>
<nodes>
<node hostalias="node1"></node>
</nodes>
</container>
<content id="mind" version="1.0">
<min-redundancy>1</min-redundancy>
<documents>
<document type="doc" mode="index"/>
</documents>
<nodes>
<node hostalias="node1" distribution-key="0" />
</nodes>
</content>
</services>
schemas/doc.sdは以下です。
schema doc {
document doc {
field doc_id type string {
indexing: summary | attribute
attribute: fast-search
}
field title type string {
indexing: index | summary
index: enable-bm25
}
field content type string {
indexing: index | summary
index: enable-bm25
}
}
fieldset default {
fields: title, content
}
rank-profile default {
first-phase {
expression: nativeRank(title, content)
}
}
}
シンプルなスキーマですが、doc_id、title、contentのフィールドを作成して、titleとcontentはbm25で検索できるようにしてます。
2つのファイルが準備できたら、Vespaを起動します。
$ docker run --detach --name vespa --hostname vespa-container --publish 8080:8080 --publish 19071:19071 vespaengine/vespa
起動したら、2つのファイルを作成したディレクトリでデプロイを実行します。
$ vespa deploy --wait 300
Waiting up to 5m0s for deploy API…
Uploading application package… done
Success: Deployed . with session ID 2
Waiting up to 5m0s for deployment to converge…
Waiting up to 5m0s for cluster discovery…
Waiting up to 5m0s for container default…
みたいな感じで、デプロイされます。
早速、ドキュメントを追加します。ChatGPTにブログっぽい記事のドキュメントを3つ作ってもらったので、以下を登録します。
$ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-1" \
-H "Content-Type: application/json" \
-d '{
"fields": {
"doc_id": "blog-1",
"title": "Exploring the Beauty of Nature",
"content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life..."
}
}'
{"pathId":"/document/v1/doc/doc/docid/blog-1","id":"id:doc:doc::blog-1"}
$ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-2" \
-H "Content-Type: application/json" \
-d '{
"fields": {
"doc_id": "blog-2",
"title": "The Journey of Personal Growth",
"content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant."
}
}'
{"pathId":"/document/v1/doc/doc/docid/blog-2","id":"id:doc:doc::blog-2"}
$ curl -X POST "http://localhost:8080/document/v1/fess/doc/docid/blog-3" \
-H "Content-Type: application/json" \
-d '{
"fields": {
"doc_id": "blog-3",
"title": "The Future of Technology and Innovation",
"content": "The rapid pace of technological advancement is shaping our future. From artificial intelligence to renewable energy solutions, innovative ideas are at the forefront of creating a sustainable and interconnected world..."
}
}'
{"pathId":"/document/v1/fess/doc/docid/blog-1","id":"id:fess:doc::blog-1"}
id:doc:doc::blog-1のような感じのIDでそれぞれ登録されます。/document/v1のAPIは、/document/v1/[namespace]/[document type]/… のような形式で、今回はnamespaceはfessにして、document typeはdoc.sdでdocとしているので、docになります。namespaceは任意だと思うので、今回はfessにしてます(そのうち、FessのインデックスをVespaに入れてみたいなぁという希望を込めて…)。
IDで登録したドキュメントを取得してみます。
$ curl -X GET "http://localhost:8080/document/v1/fess/doc/docid/blog-1"|jq .
{
"pathId": "/document/v1/fess/doc/docid/blog-1",
"id": "id:fess:doc::blog-1",
"fields": {
"doc_id": "blog-1",
"content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life...",
"title": "Exploring the Beauty of Nature"
}
}
次に、全件取得的なリクエストはVisitという感じであるので、それを使ってみます。ElasticsearchでいうところのScrollみたいな感じです。
$ curl http://localhost:8080/document/v1/fess/doc/docid|jq .
{
"pathId": "/document/v1/fess/doc/docid",
"documents": [
{
"id": "id:fess:doc::blog-3",
"fields": {
"doc_id": "blog-3",
"content": "The rapid pace of technological advancement is shaping our future. From artificial intelligence to renewable energy solutions, innovative ideas are at the forefront of creating a sustainable and interconnected world...",
"title": "The Future of Technology and Innovation"
}
}
],
"documentCount": 1,
"continuation": "AAAACAAAAAAAAAAUAAAAAAAAABMAAAAAAAABAAAAAAEgAAAAAAAAyAAAAAAAAAAA"
}
という感じで、continuationが返ってくるので、これをリクエストパラメーターに追加して、次のドキュメントを取得します。
$ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAAAUAAAAAAAAABMAAAAAAAABAAAAAAEgAAAAAAAAyAAAAAAAAAAA"|jq .
{
"pathId": "/document/v1/fess/doc/docid",
"documents": [
{
"id": "id:fess:doc::blog-1",
"fields": {
"doc_id": "blog-1",
"content": "Nature has always been a source of solace and inspiration for many. From the majestic mountains to the serene beaches, nature offers a retreat from the hustle and bustle of everyday life...",
"title": "Exploring the Beauty of Nature"
}
}
],
"documentCount": 1,
"continuation": "AAAACAAAAAAAAABEAAAAAAAAAEMAAAAAAAABAAAAAAEgAAAAAAAAwgAAAAAAAAAA"
}
$ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAABEAAAAAAAAAEMAAAAAAAABAAAAAAEgAAAAAAAAwgAAAAAAAAAA"|jq .
{
"pathId": "/document/v1/fess/doc/docid",
"documents": [
{
"id": "id:fess:doc::blog-2",
"fields": {
"doc_id": "blog-2",
"content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant.",
"title": "The Journey of Personal Growth"
}
}
],
"documentCount": 1,
"continuation": "AAAACAAAAAAAAACtAAAAAAAAAKwAAAAAAAABAAAAAAEgAAAAAAAANQAAAAAAAAAA"
}
$ curl "http://localhost:8080/document/v1/fess/doc/docid?continuation=AAAACAAAAAAAAACtAAAAAAAAAKwAAAAAAAABAAAAAAEgAAAAAAAANQAAAAAAAAAA"|jq .
{
"pathId": "/document/v1/fess/doc/docid",
"documents": [],
"documentCount": 0
}
という感じで、全件取得したら終了します。
続いて、検索をしてみましょう。titleにjourneyが含まれるものを検索します。YQLで記述します。
$ curl -X POST "http://localhost:8080/search/" -H "Content-Type: application/json" -d '{
"yql": "select * from sources doc where title contains \"journey\";"
}'|jq .
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 1
},
"coverage": {
"coverage": 100,
"documents": 3,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:fess:doc::blog-2",
"relevance": 0.16343879032006284,
"source": "mind",
"fields": {
"sddocname": "doc",
"documentid": "id:fess:doc::blog-2",
"doc_id": "blog-2",
"title": "The Journey of Personal Growth",
"content": "Personal growth is a continuous journey. It involves understanding oneself, setting meaningful goals, and pushing for constant improvement. While the path may be challenging, the rewards are truly significant."
}
}
]
}
}
ドキュメント更新は、以下のようにblog-2のtitleを更新します。
$ curl -X PUT "http://localhost:8080/document/v1/fess/doc/docid/blog-2" \
-H "Content-Type: application/json" \
-d '{
"fields": {
"title": {
"assign": "The Trip of Personal Growth"
}
}
}'
{"pathId":"/document/v1/fess/doc/docid/blog-2","id":"id:fess:doc::blog-2"}
あとは、最後に削除を試してみましょう。blog-1を削除します。
$ curl -X DELETE "http://localhost:8080/document/v1/fess/doc/docid/blog-1"
{"pathId":"/document/v1/fess/doc/docid/blog-1","id":"id:fess:doc::blog-1"}
という感じで、基本的なものを一通り試してみました。
最後にVespaを停止するのに
$ docker stop vespa
をして、終了です。
次は、rank-profileをいろいろと試したいところですね。