Summary: Develop an open-source approach to finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and them consuming and indexing them as part of an ongoing search. Providing a simple way that developers can find APIs that exist by finding documentation, repositories, and other common aspects of running an API.
Skills: Knowledge of the web, and how to crawl web pages, follow URLs, or utilize an existing solution like Common Crawl.
Expected Outcomes: Provide a simple open-source API that abstracts away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Providing a simple interface that will set in motion an asynchronous searching of the web or corpus of web content looking for APIs. Allowing users to initiate a search, but then return regularly to see the results of the search over time, building up results, but then aggregating them for each pulling via simple API.
I have some doubts related to that actually not only me there are a couple of people who have the same or somewhat similar doubts.
Here I’m quoting two threads from the project GitHub description page.
I am comparing this with API Marketplaces, the proposed solution aims to help developers find APIs that may not be available on existing API marketplaces like RapidAPI by crawling the web and looking for Swagger and OpenAPI definitions, indexing them, and providing access through a simple API interface. This can make it easier for developers to discover and use APIs that are not part of any marketplace and could be relevant for their specific use case.
The idea of crawling the web to find all the Swagger and OpenAPI definitions out there sounds like a Herculean task. Can you tell me more about how can we plan on making it happen? Are we talking about building an army of web crawlers independently or do you have something else in mind?
Nevertheless Exciting stuff!
There are 2 ways to get OpenAPI definitions from Open Web
Crawling the web using through different self-made crawlers (spiders)
Using Common Crawl dataset (Common Crawl update its dataset every month)
For both the approach we are required to define a list of sites [eg. apis.guru, github.com , and other sites where there is possibility of getting OpenAPI definitions].
Although we can use whole CommonCrawl dataset to look for OpenAPI definitions [without defining a list of sites], but this dataset is huge(around 300TB) and scraping OpenAPI definitions from this dataset and storing them for building a search engine will be very much computationally expensive imo.
Is there any workaround for this?
If I got it right the assignment asked us to find swaggers and open API definitions and list them on a frontend web application. Does that application work like a search engine for swaggers and open API definitions?
Can you please help us to resolve those doubts cause the proposal submission date get closer day by day and I have to prepare for that in advance.