Skip to content

Sharif OCW Scrapy's Downloader

Project Status: Active Python Version Scrapy Version Pydantic Version Documentation License

One-week MVP sprint board for OCW Sharif Scrapy's Downloader.

Getting Started View on GitHub


🚀 Getting Started

📥 Clone the Repository

First, clone the repository to your local machine:

Bash
git clone https://github.com/Mdevpro78/sharif-ocw-scrapy-downloader
cd sharif-ocw-scrapy-downloader

⚡️ Quick Installation

Install the required dependencies using one of the following methods:

Installation Options

Bash
# Install using pip
pip install -r requirements.txt (or requirements.dev.txt )
Bash
# Sync dependencies
make uv_sync_all

🏃‍♂️ Running the Crawler

Method 1: Using Makefile Commands

The project includes several convenient Makefile targets for common operations:

Available Makefile Commands

Bash
# Set up development environment
make setup_dev

# Run pre-commit hooks
make uv_pre_commit

# Clean cache and temporary files
make clean

Method 2: Direct Execution

To download course content, use the following command:

Bash
uv run python sharif_ocw_downloader/runner.py --course-id=course_id --max-concurrent-downloads=2 --output-path="../test_download_dir"

Command Parameters

  • --course-id=course_id: Specifies the course ID to download (replace with desired course ID)
  • --max-concurrent-downloads=2: Sets the maximum number of concurrent downloads
  • --output-path="../test_download_dir": Defines the output directory for downloaded files

📁 Project Layout

Directory Structure

Text Only
sharif-ocw-scrapy-downloader/
├── .github/                    # GitHub workflows and configurations
├── docs/                       # Documentation files
│   ├── index.md               # Main documentation page
│   ├── contributing.md        # Contribution guidelines
│   └── static/                # Static assets for documentation
├── src/                       # Source code
│   └── sharif_ocw_downloader/ # Main package
│       ├── spiders/           # Scrapy spiders
│       ├── config.py          # Configuration management
│       ├── items.py           # Scrapy items
│       ├── middlewares.py     # Scrapy middlewares
│       ├── pipelines.py       # Scrapy pipelines
│       ├── runner.py          # Main runner script
│       └── settings.py        # Scrapy settings
├── scripts/                   # Utility scripts
├── Dockerfile                 # Docker configuration
├── docker-compose.yml         # Docker Compose configuration
├── Makefile                   # Build and development commands
├── pyproject.toml             # Project configuration
└── requirements.txt           # Python dependencies

👥 Contributing

Ways to Contribute

We welcome all contributions! Choose your path:

Text Only
- 🐛 [Report Bugs](#bug-reports)
- 💡 [Suggest Features](#feature-requests)
- 🔧 [Submit Code](#pull-requests)

Bug Reports

Bug Report Template

Markdown
**Description:**
Clear, concise description of the issue
**Steps to Reproduce:**
1. Step one
2. Step two
3. Current result
4. Expected result
**Environment:**
- OS: [e.g., Windows 11]
- Browser: [e.g., Chrome 120]
- Version: [e.g., 1.0.0]

Feature Requests

Feature Request Template

Markdown
**Problem:** What problem does this feature solve?
**Solution:**
Describe your proposed solution
**Alternatives:**
What alternatives have you considered?

Pull Requests

Quick Start

Markdown
# Setup git clone <https://github.com/username/repo>
cd repo uv sync
# Development
git checkout -b feature/name
# Make changes
git commit -m "feat: add amazing feature"
git push origin feature/name

Guidelines:

  1. ✅ Follow code style
  2. 📝 Update docs

Ready to Contribute

The project follows standard GitHub workflows. For detailed guidelines, see our Contribution Guide.


Built with ❤️ using MkDocs and Material for MkDocs