AutoCatalog is a Python-based automation tool designed to scan, monitor, and log paginated digital library listings. It eliminates the need for manual inspection of vast catalogs by automating A-to-Z category crawling and title verification β empowering librarians, digital archivists, and QA teams to ensure content availability and maintain catalog health with ease.
Whether you're overseeing a university library's e-resources or managing a digital archive, AutoCatalog helps detect broken links, loading failures, and missing entries β with detailed logs and screenshots for complete transparency.
- π Crawls category-wise listings (AβZ, 0β9)
- π Supports deep pagination (up to 100+ pages)
- π Verifies individual titles from Excel input
- π Built-in retry logic for timeouts or failed loads
- π· Captures screenshots of error pages
- π Logs status of each title or page
- π Saves results in Excel for easy review or reporting
- Python 3.10+
- Selenium WebDriver
- Pandas
- Headless Chrome
Scans paginated digital catalog from AβZ and logs status of each page.
python catalog_crawler.py
- Page load issues will be retried.
- Failing pages are logged and screenshot saved in /screenshots.
Searches and verifies the availability of each book listed in AtoZeBooks.xlsx.
python title_checker.py
- Each book title is searched individually.
- Result (Success, Link Not Found, etc.) is saved in an Excel file.
- β Excel file with detailed status per book or page.
- πΌοΈ Screenshots of pages where loading failed.
- π§Ύ Console logs with live status and retries.