Installation Guide

This guide will help you install and set up the Document-Based Question Answering System.

Prerequisites

Python 3.10+: The system requires Python 3.10 or higher
8GB+ RAM: For local LLM models and embedding generation
5GB+ disk space: For models and indexes
Git: For cloning the repository

System Requirements

Operating System: Windows, macOS, or Linux
Memory: Minimum 8GB RAM (16GB+ recommended for large models)
Storage: 5GB+ free disk space
Network: Internet connection for initial model downloads

Installation Steps

Clone the Repository

git clone <repository-url>
cd ai-engineer-code-challenge

Create Virtual Environment (Recommended)

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install Dependencies
```
pip install -r requirements.txt
```

Download Models (Optional)

# Create models directory
mkdir models

# Download a GGUF model for llama-cpp (optional)
# wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf -O models/mistral-7b-instruct.gguf

Verify Installation

# Test the installation
python main.py --help

Installation Options

Standard Installation

The standard installation includes all core dependencies:

pip install -r requirements.txt

Minimal Installation

For minimal installation (without optional dependencies):

pip install python-dotenv PyYAML argparse PyMuPDF sentence-transformers faiss-cpu numpy transformers torch accelerate pytest pytest-cov pytest-mock ruff black structlog tqdm psutil

GPU Support

For GPU acceleration (optional):

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install FAISS with GPU support
pip install faiss-gpu

Development Installation

For development with documentation:

pip install -r requirements.txt
pip install sphinx sphinx-rtd-theme sphinx-autodoc-typehints myst-parser

Configuration

Create Configuration File

The system uses config.yaml for configuration. A sample configuration is provided:

# PDF Processing
pdf:
  engine: "pymupdf"
  chunk_size: 1000
  chunk_overlap: 200

# Embedding Model
embedding:
  model_name: "all-MiniLM-L6-v2"
  similarity_threshold: 0.7
  top_k: 5

# LLM Configuration
llm:
  backend: "transformers"
  model_path: "microsoft/DialoGPT-medium"
  temperature: 0.2
  max_tokens: 1024

Set Environment Variables (Optional)

Create a .env file for sensitive configuration:

# OpenAI API (if using OpenAI backend)
OPENAI_API_KEY=your_api_key_here

# Custom model paths
LLM_MODEL_PATH=./models/custom-model.gguf
EMBEDDING_MODEL_PATH=./models/custom-embedding

Troubleshooting

Common Installation Issues

Memory Issues

If you encounter memory issues during installation:

# Use pip with memory optimization
pip install --no-cache-dir -r requirements.txt

Compilation Issues

For compilation issues with llama-cpp-python:

# Install with specific compiler flags
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

CUDA Issues

If you have CUDA issues:

# Install CPU-only version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Permission Issues

For permission issues on Linux/macOS:

# Use user installation
pip install --user -r requirements.txt

Verification

After installation, verify the setup:

# Test basic functionality
python main.py --help

# Test configuration loading
python -c "import yaml; yaml.safe_load(open('config.yaml'))"

# Test imports
python -c "from src.ingest import DocumentIngester; print('✓ Imports working')"

Next Steps

After successful installation:

Read the Quick Start Guide: quickstart
Configure the System: configuration
Try the Examples: user_guide/examples

For more detailed information, see the user_guide/index.