Loading...

How Set Up Ollama to Run DeepSeek R1 Locally for RAG

8 Mins
Pravin Prajapati  ·   26 Feb 2025
set-up-ollama-to-run-deepseek-R1
service-banner

The remarkable capabilities and open-source availability of DeepSeek-R1, created by the Chinese AI startup DeepSeek, have caused quite a stir in the AI community. This model's performance is drawing attention due to its innovative architecture, cost-efficient design, and the fact that it rivals top models from OpenAI. Recently, DeepSeek-R1 has been recognized for its efficiency gains, being up to 50 times cheaper to run than many U.S. AI models, which has significant implications for the global AI race.

A number of organizations have deployed DeepSeek-R1 with the help of Indian Python developers. The model's open-source nature allows developers to freely access, customize, and deploy locally, further enhancing its appeal. Newer versions of the model, such as R1 1776, try to improve its ability to answer any question objectively and factually.

Here in this blog, we'll go over the basics of installing DeepSeek-R1 on your machine with Ollama, and then we'll delve into its usage and potential in artificial intelligence development.

Run AI Models Locally with Ollama

Ollama is a tool that allows you to run large language models (LLMs) directly on your computer. Unlike cloud-based AI services, Ollama works offline, giving you complete control over the models you use. You can download and run different AI models without an internet connection.

Example: If you want to run DeepSeek R1 model on your machine, you can use this command:

ollama run deepseek-r1:1.5b

This will download and launch the model so you can start using it.

Why use Ollama?
  • Free: No need to pay for cloud-based AI services.
  • Private: Your data stays on your machine, ensuring better security.
  • Fast: Running models locally means lower response times.
  • Works offline: You don't need an internet connection to use AI.

LangChain – Connect AI to Real-World Applications

LangChain is a powerful tool that helps developers build AI-powered applications. It allows large language models (LLMs) to connect with real-world data, such as documents, APIs, and databases.

Why use LangChain?
  • It helps AI interact with data beyond just answering simple questions.
  • You can use it to build chatbots, automate document processing, or improve search systems.
  • It makes it easy to integrate AI into various applications.
  • For example, suppose you want to build an intelligent assistant that can search through company reports, summarize information, and answer questions based on accurate data. In that case, LangChain makes this possible by linking an AI model with the reports.

RAG (Retrieval-Augmented Generation)

RAG, or Retrieval-Augmented Generation, is a technique that improves how AI answers questions. Instead of relying only on what it has learned, RAG allows the AI to fetch information from external sources like PDFs, websites, or databases before giving a response.

Why use RAG?
  • More accurate answers: AI refers to actual documents instead of making up responses.
  • Reduces hallucinations: Helps prevent AI from giving incorrect or misleading information.
  • Great for research: AI can look up relevant information before responding.

Imagine you have a collection of legal documents and need AI to answer questions based on them. A basic AI model might struggle because it wasn't explicitly trained on those documents. However, with RAG, the AI can retrieve relevant legal texts before generating an answer, making its responses much more reliable.

DeepSeek R1

DeepSeek R1 is an advanced AI model developed by the Chinese AI company DeepSeek. It is designed to be good at reasoning, problem-solving, and retrieving factual information. Many people compare it to some of OpenAI's best models.

Why use DeepSeek R1?
  • Strong logical reasoning: Good at solving problems and answering complex questions.
  • Open-source: Free to use and modify.
  • Works well with RAG: Can retrieve and process external data for better accuracy.
  • It runs locally with Ollama: There is no need for an internet connection.

If you need an AI model that can think logically and provide reliable answers, DeepSeek R1 is a great option, especially when combined with RAG and LangChain.

Why Run DeepSeek-R1 Locally?

Running DeepSeek-R1 on your own machine offers complete control over the model without depending on external servers. Here's why it's a smart choice:

  • Enhanced Privacy and Security: Keep your data entirely on your device, ensuring maximum confidentiality.
  • Uninterrupted Access: No rate limits, service downtime, or third-party restrictions.
  • Optimized Performance: Enjoy faster response times by eliminating API latency.
  • Full Customization: Adjust parameters, fine-tune prompts, and seamlessly integrate the model into your workflow.
  • Cost Savings: Skip expensive API fees and use the model for free.
  • Offline Functionality: Once downloaded, the model runs without an internet connection.

By running DeepSeek-R1 locally, you gain greater control, security, and efficiency, all while cutting costs and boosting performance.

How to Set DeepSeek-R1 Locally With Ollama?

Ollama makes running large language models (LLMs) on your machine easy by effortlessly managing model downloads, quantization, and execution.

Step 1: Install Ollama

Ollama supports macOS, Linux, and Windows. Follow these steps to install it:

How to Set DeepSeek-R1 Locally With Ollama?
  • Visit the official Ollama download page. (https://ollama.com/download)
  • Select your operating system (macOS, Linux, Windows)
  • Click the Download button.
  • Follow the installation instructions for your system.

Running DeepSeek R1 on Ollama

Once Ollama is installed, you can run DeepSeek R1 models.

Pull & Run DeepSeek R1 Model:

To download and set up the DeepSeek R1 (1.5B parameter model), run: After downloading, you can interact with the model using:


ollama pull deepseek-r1:1.5b
ollama run deepseek-r1:1.5b

Ollama supports multiple DeepSeek R1 models, ranging from 1.5B to 671B parameters. The 671B model is the original DeepSeek R1, while smaller versions are distilled models based on Qwen and Llama architectures.

Running DeepSeek R1 on Ollama

If your hardware cannot handle the full 671B model, you can run a smaller version by replacing X with the desired parameter size (1.5b, 7b, 8b, 14b, 32b, 70b, 671b):

ollama run deepseek-r1:Xb

This flexibility allows you to leverage DeepSeek R1's capabilities even without high-end hardware.

Setting Up a RAG System Using Streamlit

Now that DeepSeek R1 is running let's integrate it into a retrieval-augmented generation (RAG) system using Streamlit.

Before setting up the RAG system, ensure you have the following installed:

  • Python
  • Conda environment (Recommended for package management)
  • Required Python packages

To install the necessary dependencies, run:

pip install -U langchain langchain-community
pip install streamlit
pip install pdfplumber
pip install semantic-chunkers
pip install open-text-embeddings
pip install faiss
pip install ollama
pip install prompt-template
pip install langchain
pip install langchain_experimental
pip install sentence-transformers
pip install faiss-cpu

Once installed, you can build the RAG system with Streamlit and DeepSeek R1.

Running the RAG System

Now your system is set. Upload your project to show the result.

1. Clone or Create the Project

First, create a new project directory and navigate into it:

mkdir rag-system && cd rag-system
2. Create a Python Script

Create a new Python file named app.py and paste the following Streamlit-based script:

import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA

# Streamlit UI
st.title("📄 RAG System with DeepSeek R1 & Ollama")

uploaded_file = st.file_uploader("Upload your PDF file here", type="pdf")

if uploaded_file:
    with open("temp.pdf", "wb") as f:
        f.write(uploaded_file.getvalue())

    loader = PDFPlumberLoader("temp.pdf")
    docs = loader.load()

    text_splitter = SemanticChunker(HuggingFaceEmbeddings())
    documents = text_splitter.split_documents(docs)

    embedder = HuggingFaceEmbeddings()
    vector = FAISS.from_documents(documents, embedder)
    retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})

    llm = Ollama(model="deepseek-r1:1.5b")

    prompt = """
    Use the following context to answer the question.
    Context: {context}
    Question: {question}
    Answer:
    """

    QA_PROMPT = PromptTemplate.from_template(prompt)

    llm_chain = LLMChain(llm=llm, prompt=QA_PROMPT)
    combine_documents_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="context")

    qa = RetrievalQA(combine_documents_chain=combine_documents_chain, retriever=retriever)

    user_input = st.text_input("Ask a question about your document:")

    if user_input:
        response = qa(user_input)["result"]
        st.write("**Response:**")
        st.write(response)
3. Run the RAG System

Once your script is ready, launch the Streamlit app using:

streamlit run app.py

Now, you can upload a PDF document and interact with DeepSeek R1 to retrieve answers based on your document's content.

Other Methods for Run Deepseek R1 Locally

DeepSeek-R1 can be used directly on your local machine, providing flexibility for different use cases. Here's how to start, whether you prefer interacting with the model through the command line, integrating it into applications via an API, or using it in a Python environment.

Step 1: Running Inference via Command Line Interface

Once the model is downloaded, you can interact with DeepSeek-R1 directly in the terminal by running the following command.

ollama run deepseek-r1

This command initializes the model, allowing you to enter queries and receive responses within the terminal.

Step 2: Accessing DeepSeek-R1 via API

For application integration, you can use the Ollama API with a simple curl command. This method enables seamless interaction with DeepSeek-R1 over a local server.

curl http://localhost:11434/api/chat -d '{ "model": "deepseek-r1", "messages": [{ "role": "user", "content": "Solve 25 * 25" }], "stream": false }'

The curl command-line tool, commonly found in Linux and available on macOS and Windows, allows users to request HTTP directly from the terminal. This makes it an excellent option for testing and integrating APIs.

Step 3: Accessing DeepSeek-R1 via Python

Ollama can be integrated into Python environments, making it ideal for AI-driven applications. First, install the Ollama Python package using the following command.

pip install ollama

Once installed, you can use the following Python script to interact with DeepSeek-R1.


import ollama

response = ollama.chat(
    model="deepseek-r1",
    messages=[
        {"role": "user", "content": "Explain Newton's second law of motion"},
    ],
)

print(response["message"]["content"])

The ollama.chat function processes user input as a conversational exchange and returns a response generated by DeepSeek-R1. This approach allows seamless AI-powered interactions within Python applications.

With these steps, you can effectively use DeepSeek-R1 locally through the command line, API, or Python. This flexibility ensures you can easily integrate advanced AI capabilities into your workflow.

Essence

You have successfully set up Ollama and DeepSeek R1, enabling you to build AI-powered RAG applications with local LLMs. This setup gives you complete control over your AI workflows, ensuring privacy, efficiency, and flexibility while running models on your machine.

Now, take it a step further, experiment with different models, fine-tune your workflows, and explore new possibilities in retrieval-augmented generation. Try uploading PDFs and dynamically asking questions to see your system in action!

Need expert guidance or custom AI solutions? Contact Elightwalk Technology for professional development services and AI-driven solutions tailored to your needs.

FAQs related to deepseek-r1 and Ollama

How do I install Ollama to run DeepSeek R1 locally?

What are the benefits of running DeepSeek R1 locally with Ollama?

Can I use DeepSeek R1 with LangChain for Retrieval-Augmented Generation (RAG)?

What hardware is required to run DeepSeek R1 locally?

How do I integrate DeepSeek R1 into a Python project?

Pravin Prajapati
Full Stack Developer

Expert in frontend and backend development, combining creativity with sharp technical knowledge. Passionate about keeping up with industry trends, he implements cutting-edge technologies, showcasing strong problem-solving skills and attention to detail in crafting innovative solutions.

Most Visited Blog

Everything You Need to Know about Ollama
Ollama allows you to run large language models locally, enhancing privacy, performance, and customization. Explore its features, models, and benefits.
Augmented Reality (AR) the Future of eCommerce Business
Augmented reality (AR) is changing eCommerce by making the shopping experience better for customers, getting them more involved, and increasing sales in the online market.
What is eCommerce ERP Integration?
Explore how an ERP system can help you organize your eCommerce store activities. An online store's ERP dashboard handles stock, sales, and shipping.