Utilities for Langchain

Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Langchain with Johnsnowlabs Tutorial Notebook

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Langchain based on Spark-NLP’s DocumentCharacterTextSplitter and supports all of it’s parameters

from langchain.document_loaders import TextLoader
from johnsnowlabs.llm import embedding_retrieval

loader = TextLoader('/content/state_of_the_union.txt')
documents = loader.load()


from johnsnowlabs.llm import embedding_retrieval

# Create Pre-Processor which is connected to spark-cluster
processor = embedding_retrieval.JohnSnowLabsLangChainCharSplitter(
    chunk_overlap=2,
    chunk_size=20,
    explode_splits=True,
    keep_seperators=True,
    patterns_are_regex=False,
    split_patterns=["\n\n", "\n", " ", ""],
    trim_whitespace=True,
)
# Process document distributed on a spark-cluster
pre_processed_docs = jsl_splitter.split_documents(documents)

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs. You must provide the NLU reference of a sentence embeddings to load it. You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments. For clusters, you must setup the cluster-env correctly, using nlp.install_to_databricks() is recommended.

# Create Embedder which connects is connected to spark-cluster
from johnsnowlabs.llm import embedding_retrieval
embeddings =  embedding_retrieval.JohnSnowLabsLangChainEmbedder('en.embed_sentence.bert_base_uncased',hardware_target='cpu')

# Compute Embeddings distributed
from langchain.vectorstores import FAISS
retriever = FAISS.from_documents(pre_processed_docs, embeddings).as_retriever()

# Create A tool
from langchain.agents.agent_toolkits import create_retriever_tool
tool = create_retriever_tool(
  retriever,
  "search_state_of_union",
  "Searches and returns documents regarding the state-of-the-union."
)


# Use Create LLM Agent with the Tool 
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(openai_api_key='YOUR_API_KEY')
agent_executor = create_conversational_retrieval_agent(llm, [tool], verbose=True)
result = agent_executor({"input": "what did the president say about going to east of Columbus?"})
result['output']

>>>
> Entering new AgentExecutor chain...
Invoking: `search_state_of_union` with `{'query': 'going to east of Columbus'}`
[Document(page_content='miles east of', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='in America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='out of America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='upside down.', metadata={'source': '/content/state_of_the_union.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.
> Finished chain.
I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.

PREVIOUSUtilities for Haystack

NEXTRelease Testing Utilities