Code Example

Kotlin Android WebScraper Snippets

Learn how to programmatically scrape resources from the web using these examples.

1. Use NiceHttp

A small and simple OkHttp wrapper to ease scraping. Mostly for personal use..

NiceHttp Example Tutorial

A small and simple Android OkHttp wrapper to ease scraping. Mostly for personal use.
Featuring:

  • Document scraping using jsoup
  • Json parsing using jackson
  • Easy functions akin to python requests

Getting started

Step 1: Setup

In build.gradle repositories:

maven { url 'https://jitpack.io' }

Inapp/build.gradle dependencies:

implementation 'com.github.Blatzar:NiceHttp:+'

Step 2: Scraping a document

val requests = Requests()
val doc = requests.get("https://github.com/Blatzar/NiceHttp").document
// Using CSS selectors to get the about text
println(doc.select("p.f4.my-3").text())

Step 3: Parsing json

data class GithubJson(
    val description: String,
    val html_url: String,
    val stargazers_count: Int,
    val private: Boolean
)

// Implement your own requests parser here with your library of choice, this is with jackson :)

val parser = object : ResponseParser {
    val mapper: ObjectMapper = jacksonObjectMapper().configure(
        DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES,
        false
    )

    override fun <T : Any> parse(text: String, kClass: KClass<T>): T {
        return mapper.readValue(text, kClass.java)
    }

    override fun <T : Any> parseSafe(text: String, kClass: KClass<T>): T? {
        return try {
            mapper.readValue(text, kClass.java)
        } catch (e: Exception) {
            null
        }
    }

    override fun writeValueAsString(obj: Any): String {
        return mapper.writeValueAsString(obj)
    }
}

val requests = Requests(responseParser = parser)

val json = requests.get("https://api.github.com/repos/blatzar/nicehttp").parsed<GithubJson>()
println(json.description)

Step 4: Using cache

Not working properly currently! No idea why 🙁

// Just pass in a 
val cache = Cache(
    File(cacheDir, "http_cache"),
    50L * 1024L * 1024L // 50 MiB
)

val okHttpClient = OkHttpClient.Builder()
    .cache(cache)
    .build()

val cacheClient = Requests(okHttpClient)
cacheClient.get("...", cacheTime = 1, cacheUnit = TimeUnit.HOURS)

Example

Here is a simple example:

1. asyncMap.kt

Here is the full code for our asyncMap.kt file:

package com.lagradost.nicehttp.example

import kotlinx.coroutines.async
import kotlinx.coroutines.runBlocking

fun <A, B> List<A>.asyncMap(f: suspend (A) -> B): List<B> = runBlocking {
    map { async { f(it) } }.map { it.await() }
}

2. MainActivity.kt

Here is the full code for our MainActivity.kt file:

package com.lagradost.nicehttp.example

import android.os.Bundle
import androidx.appcompat.app.AppCompatActivity
import androidx.lifecycle.lifecycleScope
import com.fasterxml.jackson.annotation.JsonProperty
import com.fasterxml.jackson.databind.DeserializationFeature
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.kotlin.jacksonObjectMapper
import com.lagradost.nicehttp.Requests
import com.lagradost.nicehttp.ResponseParser
import kotlinx.coroutines.launch
import kotlin.reflect.KClass

data class GithubJson(
    @JsonProperty("description") val description: String,
    @JsonProperty("html_url") val html_url: String,
    @JsonProperty("stargazers_count") val stargazers_count: Int,
    @JsonProperty("private") val private: Boolean
)

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        lifecycleScope.launch {

            /**
             * Implement your own json parsing to then do request.parsed<T>()
             * */
            val parser = object : ResponseParser {
                val mapper: ObjectMapper = jacksonObjectMapper().configure(
                    DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES,
                    false
                )

                override fun <T : Any> parse(text: String, kClass: KClass<T>): T {
                    return mapper.readValue(text, kClass.java)
                }

                override fun <T : Any> parseSafe(text: String, kClass: KClass<T>): T? {
                    return try {
                        mapper.readValue(text, kClass.java)
                    } catch (e: Exception) {
                        null
                    }
                }

                override fun writeValueAsString(obj: Any): String {
                    return mapper.writeValueAsString(obj)
                }
            }

            val requests = Requests(responseParser = parser)

            // Example for query selector
            val doc = requests.get("https://github.com/Blatzar/NiceHttp").document
            println("Selector description: ${doc.select("p.f4.my-3").text()}")

            // Example for json Parser
            val json =
                requests.get("https://api.github.com/repos/blatzar/nicehttp").parsed<GithubJson>()
            println("JSON description: ${json.description}")

            // Example for Async-ed Requests
            (0..3).toList().asyncMap {
                println("Entered Async")
                println("Response ::: " + requests.get("https://github.com/").code)
                println("Exit Async")
            }
        }
    }
}

3. activity_main.xml

Here is the full code for our activity_main.xml file:

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <TextView
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Hello World!"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

Reference

You can DOWNLOAD FULL CODE.
You can also browse code or read more here.
Follow code author here.

Read More.

2. Use Android-Web-Scraper

Android Web Scraper is a simple library for android web automation. You can perform web task in background to fetch website data programmatically..

Android Web Scraper is a simple library for android web automation. You can perform web task in background to fetch website data programmatically.

Android-Web-Scraper Example Tutorial

Android Web Scraper is a simple library for android web automation. You can perform web task in background to fetch website data programmatically.

Step 1: Setup

implementation 'com.daandtu:android-web-scraper:1.0.1'

Add internet permission to AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET"/>

Step 2: Usage

Initialisation:

WebScraper webScraper = new WebScraper(this);
webScraper.setUserAgentToDesktop(true); //default: false
webScraper.setLoadImages(true); //default: false
webScraper.loadURL("https://www.github.com/");

If you want to see the browser automation in action:

layout.addView(webScraper.getView());

Interact with webpage elements:

Element el1 = webScraper.findElementByXpath("//*[@id="search"]");
Element el2 = webScraper.findElementByName("img",3);
el1.setText("Android");
el2.click();
Element el3 = webScraper.findElementById("result");
String result = el3.getValue();

Setup OnPageLoadedListener:

webScraper.setOnPageLoadedListener(new WebScraper.onPageLoadedListener() {
            @Override
            public void loaded(String URL) {
                //TODO
            }
        });

Other methods:

Bitmap screenshot = webScraper.takeScreenshot(); //Pay attention with big webpages or use
Bitmap screenshot2 = webScraper.takeScreenshot(500,MAX);

String title = webScraper.getWebsiteTitle();

String html = webScraper.getHtml();

webScraper.clearHistory();
webScraper.clearCache();
webScraper.clearCookies();
webScraper.clearAll(); //Clear history, cache and cookies

webScraper.reload();

Android Web Scraper is a simple library for android web automation. You can perform web task in background to fetch website data programmatically.

Full Example

Here is a simple example:

1. WebScraperExample.kt

Here is the full code for our WebScraperExample.kt file:

package com.daandtu.webscrapertest

import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import android.widget.LinearLayout
import com.daandtu.webscaper.WebScraper

class WebScraperExample : AppCompatActivity() {

    private lateinit var webScraper: WebScraper

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_web_scraper_example)
        val parentLayout = findViewById<LinearLayout>(R.id.test_layout)

        val webScraper = WebScraper(this)
        parentLayout.addView(webScraper)
    }
}

2. activity_web_scraper_example.xml

Here is the full code for our activity_web_scraper_example.xml file:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:id="@+id/test_layout"
    tools:context=".WebScraperExample"
    android:orientation="vertical">

</LinearLayout>

Reference

You can DOWNLOAD FULL CODE.
You can also browse code or read more here.
Follow code author here.

Read More.