ProductPromotion
Logo

Go.Lang

made by https://0x3d.site

GitHub - cyucelen/walker: Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!
Seamlessly fetch paginated data from any source. Simple and high performance API scraping included! - cyucelen/walker
Visit Site

GitHub - cyucelen/walker: Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!

GitHub - cyucelen/walker: Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!

walker

Walker simplifies the process of fetching paginated data from any data source. With Walker, you can easily configure the start position and count of documents to fetch, depending on your needs. Additionally, Walker supports parallel processing, allowing you to fetch data more efficiently and at a faster rate.

The real purpose of the library is to provide a solution for walking through the pagination of API endpoints. With the NewApiWalker, you can easily fetch data from any paginated API endpoint and process the data concurrently. You can also create your own custom walker to fit your specific use case.

Features

  • Provides a walker to paginate through the pagination of API endpoint. This is for scraping an API, if such a term exists.
  • cursor and offset pagination strategies.
  • Fetching and processing data concurrently without any effort.
  • Total fetch count limiting
  • Rate limiting

Examples

Basic Usage

func source(start, fetchCount int) ([]int, error) {
	return []int{start, fetchCount}, nil
}

func sink(result []int, stop func()) error {
	fmt.Println(result)
	return nil
}

func main() {
	walker.New(source, sink).Walk()
}

Output:

[0 10]
[1 10]
[4 10]
[2 10]
[3 10]
[5 10]
[8 10]
[9 10]
[7 10]
[6 10]
...
to Infinity
  • source function will receive start as the page number and count as the number of documents. Use this values to fetch data from your source.
  • sink function will receive the result you returned from source and a stop function. You can save the results in this function and decide to stop sourcing any further pages depending on your results by calling stop function, otherwise it will continue to forever unless a limit provided.
  • Beware of order is not ensured since source and sink functions called concurrently.

Walking through the pagination of API endpoints

Fetching all the breweries from Open Brewery DB:

func buildRequest(start, fetchCount int) (*http.Request, error) {
	url := fmt.Sprintf("https://api.openbrewerydb.org/breweries?page=%d&per_page=%d", start, fetchCount)
	return http.NewRequest(http.MethodGet, url, http.NoBody)
}

func sink(res *http.Response, stop func()) error {
	var payload []map[string]any
	json.NewDecoder(res.Body).Decode(&payload)

	if len(payload) == 0 {
		stop()
		return nil
	}

	return saveBreweries(payload)
}

func main() {
	walker.NewApiWalker(http.DefaultClient, buildRequest, sink).Walk()
}

To create API walker you just need to provide:

  • RequestBuilder function to create http request using provided values
  • sink function to process the http response

Check examples for more usecases.

Configuration

Option Description Default Available Values
WithPagination Defines the pagination strategy walker.OffsetPagination{} walker.OffsetPagination{}, walker.CursorPagination{}
WithMaxBatchSize Defines limit for document count to stop after reached 10 int
WithParallelism Defines number of workers to run provided source runtime.NumCPU() int
WithLimiter Defines limit for document count to stop after reached walker.InfiniteLimiter() walker.InfiniteLimiter(), walker.ConstantLimiter(int)
WithRateLimit Defines rate limit by count and per duration unlimited (int, time.Duration)
WithContext Defines context context.Background() context.Context

Contribution

I would like to accept any contributions to make walker better and feature rich. Feel free to contribute with your usecase!

Articles
to learn more about the golang concepts.

Resources
which are currently available to browse on.

mail [email protected] to add your project or resources here ๐Ÿ”ฅ.

FAQ's
to know more about the topic.

mail [email protected] to add your project or resources here ๐Ÿ”ฅ.

Queries
or most google FAQ's about GoLang.

mail [email protected] to add more queries here ๐Ÿ”.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory