Sharif OCW Scrapy's Downloader

One-week MVP sprint board for OCW Sharif Scrapy's Downloader.

🚀 Getting Started¶

📥 Clone the Repository¶

First, clone the repository to your local machine:

Bash

git clone https://github.com/Mdevpro78/sharif-ocw-scrapy-downloader
cd sharif-ocw-scrapy-downloader

⚡️ Quick Installation¶

Install the required dependencies using one of the following methods:

Installation Options

Using pipUsing uv (Recommended)

Bash

# Install using pip
pip install -r requirements.txt (or requirements.dev.txt )

Bash

# Sync dependencies
make uv_sync_all

🏃‍♂️ Running the Crawler¶

Method 1: Using Makefile Commands¶

The project includes several convenient Makefile targets for common operations:

Available Makefile Commands

Bash

# Set up development environment
make setup_dev

# Run pre-commit hooks
make uv_pre_commit

# Clean cache and temporary files
make clean

Method 2: Direct Execution¶

To download course content, use the following command:

Bash

uv run python sharif_ocw_downloader/runner.py --course-id=course_id --max-concurrent-downloads=2 --output-path="../test_download_dir"

Command Parameters

--course-id=course_id: Specifies the course ID to download (replace with desired course ID)
--max-concurrent-downloads=2: Sets the maximum number of concurrent downloads
--output-path="../test_download_dir": Defines the output directory for downloaded files

📁 Project Layout¶

Directory Structure

Text Only

sharif-ocw-scrapy-downloader/
├── .github/                    # GitHub workflows and configurations
├── docs/                       # Documentation files
│   ├── index.md               # Main documentation page
│   ├── contributing.md        # Contribution guidelines
│   └── static/                # Static assets for documentation
├── src/                       # Source code
│   └── sharif_ocw_downloader/ # Main package
│       ├── spiders/           # Scrapy spiders
│       ├── config.py          # Configuration management
│       ├── items.py           # Scrapy items
│       ├── middlewares.py     # Scrapy middlewares
│       ├── pipelines.py       # Scrapy pipelines
│       ├── runner.py          # Main runner script
│       └── settings.py        # Scrapy settings
├── scripts/                   # Utility scripts
├── Dockerfile                 # Docker configuration
├── docker-compose.yml         # Docker Compose configuration
├── Makefile                   # Build and development commands
├── pyproject.toml             # Project configuration
└── requirements.txt           # Python dependencies

👥 Contributing¶

Ways to Contribute

We welcome all contributions! Choose your path:

Text Only

- 🐛 [Report Bugs](#bug-reports)
- 💡 [Suggest Features](#feature-requests)
- 🔧 [Submit Code](#pull-requests)

Bug Reports¶

Bug Report Template

Markdown

**Description:**
Clear, concise description of the issue
**Steps to Reproduce:**
1. Step one
2. Step two
3. Current result
4. Expected result
**Environment:**
- OS: [e.g., Windows 11]
- Browser: [e.g., Chrome 120]
- Version: [e.g., 1.0.0]

Feature Requests¶

Feature Request Template

Markdown

**Problem:** What problem does this feature solve?
**Solution:**
Describe your proposed solution
**Alternatives:**
What alternatives have you considered?

Pull Requests¶

Quick Start

Markdown

# Setup git clone <https://github.com/username/repo>
cd repo uv sync
# Development
git checkout -b feature/name
# Make changes
git commit -m "feat: add amazing feature"
git push origin feature/name

Guidelines:

✅ Follow code style
📝 Update docs

Ready to Contribute

The project follows standard GitHub workflows. For detailed guidelines, see our Contribution Guide.

Built with ❤️ using MkDocs and Material for MkDocs