I scraped the Indeed job list, here's how I did it

I wanted to scrape indeed for all job queries related to my area, but let me tell you, scraping is a dirty job (in terms of copyrigths and I do not recommend it without further investigating more), in the end I came up with a decent solution that I still would consider just a first step. What I did was trowing a bunch of html code through the dangerouslySetInnerHTML API and that was it. You will see what I'm talking about

Basically I installed puppeteer on a heroku server and let node.js (express) do the work for me, I was than able to fetch the data from my gatsby front-end as you normally would with any RESTFul API

Here is the code I hosted on heroku

const puppeteer = require('puppeteer')
const express = require('express')
const app = express()
const fs = require('fs') // I'm not sure why this is still here but there are no issues

const cors = require('cors')

app.use(cors()) //set it like this and will accept queries from any origin

app.get('/', async function (req, res) {

  async function scrap() {
    try {
      const browser = await puppeteer.launch({
        headless: true,  //if you say false you can see the browser opening and could debug better
        args: ['--no-sandbox', '--disable-setuid-sandbox'], //make it work on heroku

      const page = await browser.newPage()

      await page.goto(

      // await page.waitForSelector('footer') didn't really worked

      const data = await page.evaluate(() => {
        const data = document.querySelector('#mosaic-provider-jobcards')
        return {

      await page.close()
      await browser.close()
      return data
    } catch (error) {

  const results = await scrap()

  res.send({ results })

const PORT = process.env.PORT || 5000

app.listen(PORT, function () {
  console.log(`Running on port 5000`)

Easy right? One more thing you need to know is that your heroku buildpacks (look into settings) must provide these additional packages



Now if everything goes well and your app is up and running, you should be able to call the home route and retrieve the data.

I could certainly have implemented a pagination feature or even a query search component, but I don't have neither money nor time, we'll see how it goes, bye for now

