Creating a Persistent Queue for Site Crawling with Python

Опубликовано: 03 Февраль 2020
на канале: Mike Levin, SEO in NYC
184
1

Coding your own crawler is almost a right of passage in using Python for SEO, but housekeeping is one of the more difficult things to conceptualize and implement. In this video, I create a pickled unvisited and visited list, updating the pickled object in storage with every read and write so that even if the (small) crawl is interrupted, it can pick up where it left off. I haven't implemented the resume capability yet, but is ready for it.