Here’s how txtai docs define how o
from txtai import Embeddings
embeddings = Embeddings(path="sentence-transformers/nli-mpnet-base-v2")
As you can see, it supports Huggingface models by default. This can be fine, but what if you would rather use a service like OpenAI, or Cohere?
I dug around the documentation, issues and the examples and even the source code for a good while before I found a working example
Let me save you the trouble
from langchain_openai import AzureOpenAIEmbeddings
from txtai import Embeddings
import numpy as np
from typing import List
from os import getenv
from dotenv import load_dotenv
load_dotenv()
Next, define your Embeddings
openai_embeddings = AzureOpenAIEmbeddings(
azure_deployment=getenv("EMBEDDINGS_NAME"),
api_key=getenv("OPENAI_API_KEY"),
azure_endpoint=getenv("AZURE_OPENAI_ENDPOINT"),
)
and a function to generate embeddings
def get_openai_embeddings(texts: List[str]):
results = openai_embeddings.embed_documents(texts=texts)
return np.array([x for x in results], dtype=np.float32)
Now, define the Embeddings object
embeddings = Embeddings(
{
"transform": get_openai_embeddings,
"backend": "numpy",
"content": True,
}
)
As you can see we are passing a custom transform
function here
Let’s test this out
data = [
"US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day"
]
embeddings.index(data)
print(embeddings.search("feel good story", 1))
[{'id': '4', 'text': 'Maine man wins $1M from $25 lottery ticket', 'score': 0.764844536781311}]