Pinecone
This is the implementation of the Pinecone for MindsDB.
Pinecone is a vector database which is fully-managed, developer-friendly, and easily scalable.
Prerequisites
Before proceeding, ensure the following prerequisites are met:
- Install MindsDB locally via Docker or use MindsDB Cloud.
- To connect Pinecone to MindsDB, install the required dependencies following this instruction.
- Install or ensure access to Pinecone.
Implementation
This handler uses pinecone-client
python library connect to a pinecone environment.
The required arguments to establish a connection are:
api_key
: the API key that can be found in your pinecone accountenvironment
: the environment name corresponding to theapi_key
These optional arguments are used with CREATE
statements:
dimension
: dimensions of the vectors to be stored in the index (default=8)metric
: distance metric to be used for similarity search (default=‘cosine’)pods
: number of pods for the index to use, including replicas (default=1)replicas
: the number of replicas. replicas duplicate your index. they provide higher availability and throughput (default=1)pod_type
: the type of pod to use, refer to pinecone documentation (default=‘p1’)
Limitations
-
DROP TABLE
support - Support for namespaces
- Display score/distance
- Support for creating/reading sparse values
-
content
column is not supported since it does not exist in Pinecone
Usage
In order to make use of this handler and connect to an environment, use the following syntax:
CREATE DATABASE pinecone_dev
WITH ENGINE = "pinecone",
PARAMETERS = {
"api_key": "...",
"environment": "..."
};
You can query pinecone indexes (temp
in the following examples) based on id
or search_vector
, but not both:
SELECT * from pinecone_dev.temp
WHERE id = "abc"
LIMIT 1
SELECT * from pinecone_dev.temp
WHERE search_vector = "[1,2,3,4,5,6,7,8]"
If you are using subqueries, make sure that the result is only a single row since the use of multiple search vectors is not allowed
SELECT * from pinecone_database.temp
WHERE search_vector = (
SELECT embeddings FROM sqlitetesterdb.test WHERE id = 10
)
Optionally, you can filter based on metadata too:
SELECT * from pinecone_dev.temp
WHERE id = "abc" AND temp.metadata.hello < 100
You can delete records using id
or metadata
like so:
DELETE FROM pinecone_dev.temp
WHERE id = "abc"
Note that deletion through metadata is not supported in starter tier
DELETE FROM pinecone_dev.temp
WHERE temp.metadata.tbd = true
You can insert data into a new collection like so:
CREATE TABLE pinecone_dev.temp (
SELECT * FROM mysql_demo_db.temp LIMIT 10);
To update records, you can use insert statement. When there is a conflicting ID in pinecone index, the record is updated with new values. It might take a bit to see it reflected.
INSERT INTO pinecone_test.testtable (id,content,metadata,embeddings)
VALUES (
'id1', 'this is a test', '{"test": "test"}', '[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]'
);