“Looking for a job is a full-time job”- Anonymous Parents
The enigma of job searching is captured by the above quote that is parroted by many giving counsel to those seeking new opportunities and I agree with the sentiment. However, I disagree with how that time is ultimately spent. Currently, job-search is more akin to how we online shop for shoes with one of the largest costs being the individual’s time searching for employment opportunities rather than improving their prospects of actualizing the ideal role. The below post is an overview of how I automated my job search.
With the motivations of synergistic practice and my looming graduation in March of 2021, I decided rather than spending my days browsing through job listings to find suitable jobs to start my career in Business Analytics to only then tailor an application. I would use Python to automate my job search to free up time and energy to focus on actualizing my career. With the lack of an adequate data RSS/API for job listings to ensure I’m capturing all the best opportunities, this data pipeline was a great way to actively learn how to make a data lake and value-driven pipeline.
As I looked at companies and the job search platforms to devise my approach, I found that it would be best to directly scrape from larger corporations along with the sites Indeed and Linkedin using filters. For instance, Amazon as of March 2021 has averaged about 8,000 to 10,000 job openings with 700+ of them being directly related to business/data analytics, which would be underrepresented in 3rd-party sites.
The data was collected in two approaches and increments. First through web scraping using Selenium to gather an initial large set of job posts to build the data set from. Secondly, an update using email RSS from Indeed and Linkedin to gather their recent suggested jobs.
These jobs were extrapolated using Spacy, as the NLP package, to interpret job postings for keywords and grammar patterns of skills and duties. With the rule-based matcher object functions in Spacy, patterns were made to comprehend the odd command phrase jargon of incomplete sentences used in job posts.
The patterns triggered a boolean as well as categorical count to score posts with the most matches by the pattern type (i.e. programming languages/tools & analytics). To understand the tenure of the individual matches within the post, the years pattern also contained further noun chunking and sentence detection to determine the subject for the years of experience patterns.
The resulting functions now allow me to execute my data aggregation efforts once a week while spending my time actively researching the top suggested roles for fit cutting out hours of time spent reviewing 100s of job posts every week. As the saying goes, “Looking for a job is a full time job.”, but at least the discovery process shouldn’t be the primary focus.
https://www.vecteezy.com/vector-art/174193-online-job-searching for the great image