We decided to scrape the Irish electrical grid's public real-time dashboard to help create awareness around how Ireland is a leading country in wind power generation.
What were the results?
Our Twitter account @IrishEnergyBot now has 2,000 followers receiving a daily report on how much wind generation there was on the Irish electric grid in the last 24 hours. Over the past ~18 months wind has met ~33% of Irish electrical demand on average. On windy days it regularly goes as high as 75%! We're #2 in the world. Only Denmark has more wind power.
Why did you choose Browserless for automation?
@IrishEnergyBot scrapes its data from a free, public dashboard provided by Ireland's electrical grid operator. Because the dashboard loads data dynamically after the initial page load, a modern browser with JavaScript is required.
Thanks to Browserless I can keep my puppeteer script in a simple, low-maintenance serverless environment. The connection is fast and reliable and since I need just a few minutes of browser time each month, usage-based pricing works out great.
Browserless is an essential component of @IrishEnergyBot that I just never have to worry about.
import * as _ from "underscore";
import puppeteer = require("puppeteer");
const TIMEOUT_MS = 10000;
// or "roi" or "ni".
const REGION = "all";
(async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://chrome.browserless.io?token=${process.env.BROWSERLESS_TOKEN}`,
});
try {
const scrapedData = await scrape(await browser.newPage());
console.log(JSON.stringify(scrapedData, undefined, 2));
} finally {
await browser.close();
}
})();
async function scrape(page: puppeteer.Page) {
// data frequently fails to load: empirically, if it hasn't loaded in the
// first ~10s then we may as well fail.
async function impatientGoto(url: string) {
await page.goto(url, {
waitUntil: "networkidle2",
timeout: TIMEOUT_MS,
});
}
async function impatientWaitForSelector(selector: string) {
await page.waitForSelector(selector, {
timeout: TIMEOUT_MS,
});
}
// figures are contained in various divs, all with the class .stat-box. there
// isn't a good way to find the ones we want without inspecting their text
// content. this function extracts the number from the "stat box" under the
// specified parent containing the specified phrase.
async function extractStatBoxFigure(parent: string, keyPhrase: string) {
const selector = `${parent} .stat-box`;
await impatientWaitForSelector(selector);
const statBoxesTextContents = await page.$$eval(selector, (elements) => {
return elements.map((element) => {
return element.textContent || "";
});
});
const matchingStatBox = _.find(
statBoxesTextContents,
(s) => s.toLowerCase().indexOf(keyPhrase) >= 0
);
if (!matchingStatBox) {
throw new Error(`no stat box found containing "${keyPhrase}"`);
}
return extractFirstNumber(matchingStatBox);
}
impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/demand`);
const demand_mw = await extractStatBoxFigure("#demand", "system demand");
impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/generation`);
const gen_mw = await extractStatBoxFigure("#generation", "system generation");
impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/wind`);
const wind_mw = await extractStatBoxFigure("#wind", "wind generation");
return { gen_mw, demand_mw, wind_mw };
}
// extracts the first integer from a (potentially messy) blob of text, e.g.:
// " LATEST SYSTEM GENERATION 4,994 MW " -> 4994
function extractFirstNumber(s: string) {
// remove commas, e.g. 4,800 -> 4800
const withoutCommas = s.replace(/,/g, "");
// https://stackoverflow.com/questions/8441915/tokenizing-strings-using-regular-expression-in-javascript
const tokens = withoutCommas.match(/[^\s]+/g) || [];
const firstNumber = _.find(
tokens.map((t) => parseInt(t, 10)),
(i) => !isNaN(i)
);
if (!firstNumber) {
throw new Error("no number found");
}
return firstNumber;
}
Sign up for a free account and get an API key. You have 6 hours of usage for free! After that, you can pay as you go, and only pay per second that you use!
If you’ve already tested our service and want a dedicated machine for your requests, you might be interested in signing up for a dedicated account, this works best if your doing screencasting or have a heavy load of requests since you won’t be sharing resources.
If you’re using one of our hosted services; be that usage-based or capacity-based, just connect to our WebSocket securely with your token to start web scraping!