The amount of available time was very limited and there was no budget to pay for external tools. But a host with pre installed and running Docker was at hand, which offered some free resources.
After the implementation of the first prototype it became clear, that exposing Data from WordPress via GraphQL, even with manual additions, is nice and has it´s uses.
But already in the beginning having to battle with Timeouts seems to make little sense.
Also the fact that there need to be regular requests (pulls) to get the data, even if it just to learn that no new data has arrived, is a waste of resources.
Plus I personally dislike such an approach.
So it was time for some more research. It´s WordPress, but still I couldn´t imagine that I´m the only one needing a push approach, ideally with the ability to filter on which types and actions I want them.
After some time searching I found WP Webhooks from Ironikus and tried it out. Even the free version already allowed me to set up a webhook, which can call another url when a post is created.
WP Webhooks adds Webhooks (as it´s name implies).
Webhooks are often used for generic integrations. It´s basically a notification pattern on specific triggers.
In my case on the creation of a new post with a specific post type.
As you might have already guessed that´s fantastic for me, because I prefer a push (which a notification pattern is).
So what did I do?
Well, on a High Level I implemented a small api (think receiver and adapter) as the receiver of this notification, extract the required data and save it into a database.
This allows the retrieval in the main application by simply querying the database.
Details Prototype 2
To be more specific, the http call comes into the API (think microservice) on a specified url.
It contains a JSON formatted string, which can easily converted into a JSON Object.
Extracting the required information is then a breeze, as is saving them into the database.
Result (for now)
The result of the experiment is that I invested less time and money than required to either write x amount of scrapers to gather the data or invest money continuously to use proprietary tools to ease the scraping.
Additionally I learned how to leverage WordPress in a data processing capacity and thinking outside of the box of a perfect setup.
Reflection (as of now)
With spending money this might have been faster, it was not an option.
Additionally I was curious if I can´t make do without spending the money and needing to write and handle so many different source formats for scraping.
As of now, the system works. A headless WordPress with a RSS/Scraping Plugin gathers the information.
WP Webhooks notifies a Microservice when something new was found.
The Microservice extracts and converts the data as required and saves it into the database.
NextJS as the actual Front-End queries the Database and does the Server-Side Rendering of the aggregated Content.
The WordPress Installation is hosted inside a docker container on a Host of another project.
NextJS is hosted on Vercel.com, the creators of NextJS.
As Database is currently a Amazon RDS in use, with allowed access from the NextJS Site.
Let me know how you liked the article.