Can't break the String line consisting of an array into separate links using .split

Issue

I’m trying to get an ArrayList of mp3 links from a JavaScript array found on this page:

Link

The array looks like this:

var audioPlaylist = new Playlist("1", [
{name:"The 4D Doodler", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
4ddoodler_waldeyer_edm_64kb.mp3"},
{name:"Bread Overhead", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
bread_overhead_leiber_ms_64kb.mp3"},
{name:"Image of the Gods", free:true, 
mp3:"http://www.archive.org/download/short_scifi_001_0711/
imageofthegods_nourse_jk_64kb.mp3"},

…and so on

I’m trying to break it into strings using .split, here is my async class:

public class FillBook extends AsyncTask<Void, Void, List<String>> {

//site URL to be passed into constructor
private String link;
private String imgLink;
private String title;
String description;
private List<String> tmpChapters = new ArrayList<>();
private List<SingleBook> books = new ArrayList<>();

public FillBook(String link, String imgLink, String title) {

    this.link = link;
    this.imgLink = imgLink;
    this.title = title;
}

@Override
protected List<String> doInBackground(Void... params) {

    //parsed doc will be stored in this field
    Document doc = null;

    //fields to store raw html lines used to extract book names, their thumbnails
    // as well as number of total pages of the books category
    Elements mLines;

    try {
        //connect to the site
        doc = Jsoup.connect(link).get();

    } catch (IOException | RuntimeException e) {
        e.printStackTrace();
    }
    if (doc != null) {

        // getting all elements with classname "layout"
        mLines = doc.getElementsByClass("book-description");

        //searching for book names and their thumbnails and adding them to ArrayLists
        for (Element mLine : mLines) {
            description = mLine.text();
        }

        String arr = "";
        String html = doc.body().html();
        if (html.contains("var audioPlaylist = new Playlist(\"1\", ["))
            arr = html.split("var audioPlaylist = new Playlist\\(\"1\", \\[")[1];
        if (arr.contains("]"))
            arr = arr.split("\\]")[0];
        //-----------------------------------------
        if (arr.contains("},{")) {
            for (String mLine2 : arr.split("\\},\\{")) {
                if (mLine2.contains("mp3:\""))
                    tmpChapters.add(mLine2.split("mp3:\"")[1].split("\"")[0]);
            }
        } else if (arr.contains("mp3:\""))
            tmpChapters.add(arr.split("mp3:\"")[1].split("\"")[0]);
    } else
        System.out.println("ERROR");


    return tmpChapters;

}

protected void onPostExecute(List<String> tmpChapters) {
    super.onPostExecute(tmpChapters);
    Toast.makeText(BookActivity.this, "size "+ tmpChapters.size(), Toast.LENGTH_SHORT).show();

    if (tmpChapters.size() > 0) {
        try {
            Picasso.get().load(imgLink).into(bookCover);
            nameAndAuthor.setText(title);
            bookDescription.setText(description);
            for (int i = 0; i < tmpChapters.size(); i++) {
                books.add(new SingleBook(tmpChapters.get(i)));
            }
            if (listChapters.getAdapter() != null) {
                adapter.clear();
                adapter.addAll(books);
            } else {
                adapter = new CustomAdaterChapters(BookActivity.this,
                        R.layout.book_chapters_listview_item, books);
                listChapters.setAdapter(adapter);

            }

        } catch (RuntimeException e) {
            e.printStackTrace();
        }

    } else Toast.makeText(BookActivity.this, "NETWORK ERROR", Toast.LENGTH_LONG).show();

}

I have a problem with the regex part. In postexecute I made this Toast to check the size of the array which is supposed to be like 43, but it shows only 1. The first link from 43. Splitting code is not mine, some coder from other forum helped me, and it was working, but no more. I am a novice and can’t find the mistake, everything seems fine to me but it’s not working. Please help correct the mistake.

P.S. I added two logs and it turns out the the code before line is correct, array splits into this:

{name:"Chapter 01", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_01_twain_64kb.mp3"},
{name:"Chapter 02", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_02_twain_64kb.mp3"},
{name:"Chapter 03", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_03_twain_64kb.mp3"},
{name:"Chapter 04", free:true, 
mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/
huckleberry_finn_04_twain_64kb.mp3"},

But then there is a mistake somewhere.

Solution

The problem is that your ‘arr’ has new lines.
Remove them by adding this line and everything will work well.

        //-----------------------------------------
        arr = arr.replaceAll("\n", "");
        if (arr.contains("},{")) {

But have you considered using Gson for this?

@Test
public void testGson() throws IOException {

    Document doc = Jsoup.connect("http://www.loyalbooks.com/book/adventures-of-huckleberry-finn-by-mark-twain").get();

    String regex = "new Playlist.*?(\\[.*?\\])";
    String string = doc.html();

    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
    Matcher matcher = pattern.matcher(string);
    if (matcher.find() && matcher.groupCount() == 1) {
        String json = matcher.group(1);
        System.out.println(json);

        Gson gson = new Gson();
        PlaylistElement[] playlist = gson.fromJson(json, PlaylistElement[].class);
        System.out.println(playlist.length);

    } else {
        System.out.println("No match found");
    }

}


private static class PlaylistElement {
    private String name;
    private boolean free;
    private String mp3;
}

Answered By – Luk

Answer Checked By – Terry (FlutterFixes Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *