Jump to content

How to create custom tube video grabber for KVS


Recommended Posts

KVS provides API to use youtube-dl server library for scrapping videos from other tube sites.

You can implement your own grabber class in PHP language and upload it into KVS. Here is how this can be done. The example features fully working custom youtube grabber (KVS has built-in grabber for youtube by the way).

NOTE: it is not strictly required to use youtube-dl API, it is also possible to create a completely custom grabber with your own code.

 

Implementing grabber class using youtube-dl API

Create CustomGrabberYoutube.php with the following code (also attached here as a text file):

<?php

// when you change classname, change it at the very bottom as well in this line:
// $grabber = new CustomGrabberYoutube();
class CustomGrabberYoutube extends KvsGrabberVideoYDL
{
   // ===============================================================================================================
   // infrastructure methods
   // ===============================================================================================================

   public function get_grabber_id()
   {
       //prefix your grabber ID with "custom_"
       return "custom_videos_youtube";
   }

   public function get_grabber_name()
   {
       // name displayed in admin panel
       return "youtube.com";
   }

   public function get_grabber_version()
   {
       // this is required for grabbers that are autoupdated from KVS
       return "1";
   }

   public function get_grabber_domain()
   {
       // domain name, KVS will check this to find out if this grabber is suitable for the given URL
       return "youtube.com";
   }

   public function get_supported_url_patterns()
   {
       // returns list of regexp patterns that describe video URLs, for youtube this pattern will match
       // https://www.youtube.com/watch?v=htOroIbxiFY
       return array("/https?:\/\/(www\.)?youtube\.com\/watch.*/i");
   }

   public function can_grab_description()
   {
       // return true if your grabber is going to provide description for each video
       return false;
   }

   public function can_grab_categories()
   {
       // return true if your grabber is going to provide categories for each video
       return false;
   }

   public function can_grab_tags()
   {
       // return true if your grabber is going to provide tags for each video
       return false;
   }

   public function can_grab_models()
   {
       // return true if your grabber is going to provide models for each video
       return false;
   }

   public function can_grab_content_source()
   {
       // return true if your grabber is going to provide content source for each video
       return false;
   }

   public function can_grab_date()
   {
       // return true if your grabber is going to provide date for each video
       return false;
   }

   public function can_grab_rating()
   {
       // return true if your grabber is going to provide rating for each video
       return false;
   }

   public function can_grab_views()
   {
       // return true if your grabber is going to provide views for each video
       return false;
   }


   public function can_grab_video_files()
   {
       // this should be true for youtube-dl
       return true;
   }

   public function get_supported_qualities()
   {
       // list of supported video qualities, should match what youtube-dl returns in its info under formats
       // run this command:
       // youtube-dl --dump-json https://www.youtube.com/watch?v=PhDXRCLsqz4 >> test.json
       // and open test.json in Firefox, find "formats" array and look into the available formats
       // youtube has too many formats, KVS only supports formats with "ext"="mp4"
       // you can list them here and you will be able to select from them in grabber settings
       return array('360p', '720p');
   }

   public function get_downloadable_video_format()
   {
       // for youtube-dl grabber KVS only supports mp4 formats
       return 'mp4';
   }

   public function can_grab_lists()
   {
       // return true if you want to allow this grabber to grab lists and thus be used on autopilot
       // if true, you will also need to implement grab_list() method - see below
       return false;
   }

   // ===============================================================================================================
   // parsing methods - modify if you need to parse lists or add additional info
   // ===============================================================================================================

   public function grab_list($list_url, $limit)
   {
       // this method is used to grab lists of videos from the given list URL
       // $limit parameter means the number of videos to grab (including pagination)
       // if $limit == 0, then you just need to find all videos on the given URL, no need to care about pagination

       $result = new KvsGrabberListResult();

       // $page_content here is the HTML code of the given page
       $page_content = $this->load_page($list_url);

       // parse $page_content and add all video URLs to the result
       // consider pagination if needed
       // you can use $this->load_page($list_url) method to get HTML from any URL
       $result->add_content_page("https://youtube.com/video1");
       $result->add_content_page("https://youtube.com/video2");
       $result->add_content_page("https://youtube.com/video3");

       return $result;
   }

   protected function grab_video_data_impl($page_url, $tmp_dir)
   {
       // by default the base class will populate these fields (if provided by youtube-dl):
       // - title
       // - MP4 video files for the qualities listed in get_supported_qualities() function
       // - description (should be enabled in can_grab_description() function)
       // - date (should be enabled in can_grab_date() function)
       // - tags (should be enabled in can_grab_tags() function)
       // - categories (should be enabled in can_grab_categories() function)

       $result = parent::grab_video_data_impl($page_url, $tmp_dir);
       if ($result->get_error_code() > 0)
       {
           return $result;
       }

       // do any custom grabbing here for additional fields, which are not supported by youtube-dl
       // $page_content here is the HTML code of the given video page
       //$page_content = $this->load_page($page_url);

       // parse HTML code and set additional data into $result, e.g. data which is not provided by youtube-dl
       //$result->set_rating(85);
       //$result->set_votes(10);
       //$result->set_views(123874);
       //$result->set_content_source("Content Source Name");
       //$result->add_model("Model 1");
       //$result->add_model("Model 2");

       return $result;
   }
}

$grabber = new CustomGrabberYoutube();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;

The code has comments where needed. Basically youtube-dl provides main video info, such as title, description, tags, categories, date and files. If this is enough for you, you should only modify set of methods on top grouped under infrastructure methods section. These methods are designed to integrate grabber into KVS, so you should change them as described.

You should also modify grabber class name in 2 places (top and bottom) and make sure that grabber class name is unique and has Custom in its name (to avoid intersections with any future grabbers we will add).

If you want to implement parsing lists or add additional info, you should modify parsing methods as explained in the code.

 

Implementing grabber class without youtube-dl

Here is example grabber class that is not using youtube-dl. Put your custom parsing logic:

<?php

// when you change classname, change it at the very bottom as well in this line:
// $grabber = new CustomGrabberYoutube();
class CustomGrabberYoutube extends KvsGrabberVideo
{
   // ===============================================================================================================
   // infrastructure methods
   // ===============================================================================================================

   public function get_grabber_id()
   {
       //prefix your grabber ID with "custom_"
       return "custom_videos_youtube";
   }

   public function get_grabber_name()
   {
       // name displayed in admin panel
       return "youtube.com";
   }

   public function get_grabber_version()
   {
       // this is required for grabbers that are autoupdated from KVS
       return "1";
   }

   public function get_grabber_domain()
   {
       // domain name, KVS will check this to find out if this grabber is suitable for the given URL
       return "youtube.com";
   }

   public function get_supported_url_patterns()
   {
       // returns list of regexp patterns that describe video URLs, for youtube this pattern will match
       // https://www.youtube.com/watch?v=htOroIbxiFY
       return array("/https?:\/\/(www\.)?youtube\.com\/watch.*/i");
   }

   public function can_grab_description()
   {
       // return true if your grabber is going to provide description for each video
       return true;
   }

   public function can_grab_categories()
   {
       // return true if your grabber is going to provide categories for each video
       return true;
   }

   public function can_grab_tags()
   {
       // return true if your grabber is going to provide tags for each video
       return true;
   }

   public function can_grab_models()
   {
       // return true if your grabber is going to provide models for each video
       return true;
   }

   public function can_grab_content_source()
   {
       // return true if your grabber is going to provide content source for each video
       return true;
   }

   public function can_grab_date()
   {
       // return true if your grabber is going to provide date for each video
       return true;
   }

   public function can_grab_rating()
   {
       // return true if your grabber is going to provide rating for each video
       return true;
   }

   public function can_grab_views()
   {
       // return true if your grabber is going to provide views for each video
       return true;
   }

   public function can_grab_video_files()
   {
       // return true if your grabber is going to provide video files for each video
       return true;
   }

   public function can_grab_video_embed()
   {
       // return true if your grabber is going to provide embed code for each video
       return true;
   }

   public function can_grab_video_duration()
   {
       // return true if your grabber is going to provide duration for each video
       return true;
   }

   public function can_grab_video_screenshot()
   {
       // return true if your grabber is going to provide screenshot for each video
      return true;
   }

   public function get_supported_qualities()
   {
       // list of supported video qualities that your grabber provides
       return array('360p', '720p');
   }

   public function get_downloadable_video_format()
   {
       // only grabbers that return MP4 files are supported
       return 'mp4';
   }

   public function can_grab_lists()
   {
       // return true if you want to allow this grabber to grab lists and thus be used on autopilot
       // if true, you will also need to implement grab_list() method - see below
       return false;
   }

   // ===============================================================================================================
   // parsing methods
   // ===============================================================================================================

   public function grab_list($list_url, $limit)
   {
       // this method is used to grab lists of videos from the given list URL
       // $limit parameter means the number of videos to grab (including pagination)
       // if $limit == 0, then you just need to find all videos on the given URL, no need to care about pagination

       $result = new KvsGrabberListResult();

       // $page_content here is the HTML code of the given page
       $page_content = $this->load_page($list_url);

       // parse $page_content and add all video URLs to the result
       // consider pagination if needed
       // you can use $this->load_page($list_url) method to get HTML from any URL
       $result->add_content_page("https://youtube.com/video1");
       $result->add_content_page("https://youtube.com/video2");
       $result->add_content_page("https://youtube.com/video3");

       return $result;
   }

   protected function grab_video_data_impl($page_url, $tmp_dir)
   {
       $result = new KvsGrabberVideoInfo();

       // $page_code here is the HTML code of the given video page
       $page_code = $this->load_page($page_url);
       if (!$page_code)
       {
           $result->log_error(KvsGrabberVideoInfo::ERROR_CODE_PAGE_UNAVAILABLE, "Page can't be loaded: $page_url");
           return $result;
       }

       // parse HTML code and set data into $result
       // replace with your parsing logic

       $result->set_canonical($page_url);
       $result->set_title("Demo title");
       $result->set_description("Demo description long description long description long description long description.");

       $result->set_screenshot("http://www.localhost.com/test/test.jpg");
       $result->set_duration(30);
       $result->set_date(time());

       $result->set_views(1526);
       $result->set_rating(87);
       $result->set_votes(11);

       $result->set_embed("<div>embed code</div>");

       $result->add_category("Category 1");
       $result->add_category("Category 2");
       $result->add_category("Category 3");

       $result->add_tag("Tag 1");
       $result->add_tag("Tag 2");
       $result->add_tag("Tag 3");

       $result->add_model("Model 1");
       $result->add_model("Model 2");
       $result->add_model("Model 3");

       $result->set_content_source("Content Source 1");

       $result->add_video_file("360p", "http://www.localhost.com/test/test_360p.mp4");
       $result->add_video_file("720p", "http://www.localhost.com/test/test_720p.mp4");

       $result->add_custom_field(1, "Custom1");
       $result->add_custom_field(3, "Custom3");

       return $result;
   }
}

$grabber = new CustomGrabberYoutube();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;

 

Testing grabber class

Put grabber class file to your project root folder. Also create test_grabber.php file in the same folder with the following code:

<?php

header('Content-Type: text/plain; charset=utf8');
ini_set('display_errors', 1);
error_reporting(E_ERROR | E_PARSE | E_COMPILE_ERROR);

require_once('admin/plugins/grabbers/classes/KvsGrabber.php');

$grabber = require_once('CustomGrabberYoutube.php');
$grabber->init(new KvsGrabberSettings(), "");
if ($grabber instanceof KvsGrabberVideoYDL)
{
   $grabber->set_ydl_path('/usr/bin/yt-dlp'); // make sure this path is correct in your system
}
print_r($grabber->grab_video_data('https://www.youtube.com/watch?v=htOroIbxiFY', 'tmp'));

Modify this code to your class name and specify your demo URL.

Then run via browser:

http://domain.com/test_grabber.php

If everything is fine, you should see dumped info from the scrapped video.

 

Installing grabber into KVS

Just go to Plugins -> Grabbers in admin panel and upload your grabber class into Custom grabber field. Then after saving the form you will see your grabber installed marked with red color. You need to open this grabber settings and select Content mode = Download. Also enable the needed fields under Data.

NOTE: If you don't see any fields under Data, then your grabber class doesn't return true from can_grab_xxx() methods.

If you want to update grabber class, simply upload it again. It is recommended to increment version in get_grabber_version() method to stay sure on which version KVS is using.

 

Finding the list of supported video files to grab

If you don't know which formats source site provides (usually a subset of: 240p, 360p, 480p, 720p, 1080p), you can check that from youtube-dl:

youtube-dl --dump-json https://www.youtube.com/watch?v=PhDXRCLsqz4 >> test.json

This should generate test.json file which can be open in firefox to show JSON structure.

Find a node called formats, it should be a list with items describing each supported format.

KVS can only import formats with ext = mp4, you can list them in get_supported_qualities() method using XXXp notation, e.g. 360p, 720p.

Here is sample screenshot for youtube:

youtube_dl_formats.png.e903bb9f4ba9692a6936073621b607db.png

CustomGrabberYoutube.txt

Link to comment
Share on other sites

  • 4 months later...

Here is sample code for album grabber:

<?php

class KvsGrabberAlbumCustomSample extends KvsGrabberAlbum
{
   public function get_grabber_id()
   {
       return "albums_custom_sample";
   }

   public function get_grabber_name()
   {
       return "Sample custom grabber";
   }

   public function get_grabber_version()
   {
       return "1";
   }

   public function get_grabber_domain()
   {
       return "domain1.com";
   }

   public function get_supported_url_patterns()
   {
       return array("/https?:\/\/(www\.)?domain1\.com\/.*/i");
   }

   public function can_grab_description()
   {
       return true;
   }

   public function can_grab_categories()
   {
       return true;
   }

   public function can_grab_tags()
   {
       return true;
   }

   public function can_grab_models()
   {
       return true;
   }

   public function can_grab_content_source()
   {
       return true;
   }

   public function can_grab_rating()
   {
       return true;
   }

   public function can_grab_views()
   {
       return true;
   }

   public function can_grab_date()
   {
       return true;
   }

   public function can_grab_lists()
   {
       return true;
   }

   public function grab_list($list_url, $limit)
   {
       $result = new KvsGrabberListResult();
       $result->add_content_page("http://domain1.com/album1/");
       $result->add_content_page("http://domain1.com/album2/");
       return $result;
   }

   protected function grab_album_data_impl($page_url, $tmp_dir)
   {
       $result = new KvsGrabberAlbumInfo();

       $page_code = $this->load_page($page_url);
       if (!$page_code)
       {
           $result->log_error(KvsGrabberAlbumInfo::ERROR_CODE_PAGE_UNAVAILABLE, "Page can't be loaded: $page_url");
           return $result;
       }

       $result->set_canonical($page_url);
       $result->set_title("Demo title");
       $result->set_description("Demo description long description long description long description long description.");
       $result->set_date(time());

       $result->set_views(1526);
       $result->set_rating(87); //0-100%
       $result->set_votes(11);

       $result->add_category("Category 1");
       $result->add_category("Category 2");
       $result->add_category("Category 3");

       $result->add_tag("tag 1");
       $result->add_tag("tag 2");
       $result->add_tag("tag 3");

       $result->add_model("Model 1");
       $result->add_model("Model 2");
       $result->add_model("Model 3");

       $result->set_content_source("Content Source 1");

       $result->add_image_file("http://www.domain1.com/test/test.jpg?v=1");
       $result->add_image_file("http://www.domain1.com/test/test.jpg?v=2");

       return $result;
   }
}

$grabber = new KvsGrabberAlbumCustomSample();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;
 
Link to comment
Share on other sites

  • 5 months later...

Fatal error: Call to a member function is_import_categories_as_tags() on a non-object in /home/admin/web/xxxxxxxx/public_html/admin/plugins/grabbers/classes/KvsGrabber.php on line 2842

 

We updated test code in the original post for this issue. The new grabber API has things coded differently.

 

Hello as I do so that by url I detect all the vidos in a url and all the albums in a url

 

Content URLs on the page should be detected automatically based on what you provide in this function:

 

public function get_supported_url_patterns()
{
return array("/regexp here/i");
}

Link to comment
Share on other sites

  • 1 year later...

Hello

I added my class 'CustomGrabberRedporn' into KVS (KVS v5.5.0) and activated him.

Also, i uploaded test_grabber.php to my servser and runned him. I see for the next error:

<br />
<b>Fatal error</b>:  require_once(): Failed opening required 'CustomGrabberRedporn.php' (include_path='.:/usr/share/php') in <b>.......cc/test_grabber.php</b> on line <b>9</b><br />

Can you help me?

Link to comment
Share on other sites

  • 1 year later...

Grabbers in KVS have 2 ways of parsing:

1) Grab individual video URL (using grab_video_data_impl($page_url, $tmp_dir) function in grabber PHP code).

2) Grab list URLs, which produces list of URLs passed into function from #1 (using grab_list($list_url, $limit) function in grabber PHP code).

When you submit a URL to grabber, KVS will check if this type of URL is individual URL, or not. If URL is considered as individual, the URL is passed to grab_video_data_impl() function for grabbing video details from it, otherwise the URL is considered as list URL and will be passed to grab_list() function to get list of individual URLs from it. The detection is based on whether the provided URL matches one of the regexps returned from function get_supported_url_patterns().

So you should code accordingly. For example you can define individual video URL pattern only if it has some #hash at the end. For example the video page is:

https://www.kvs-demo.com/videos/69/300-spartans/

And the detected videos on this page are these:

https://www.kvs-demo.com/videos/69/300-spartans/#video1

https://www.kvs-demo.com/videos/69/300-spartans/#video2

Then you should code get_supported_url_patterns() function so that it returns pattern with #videoN at the end, and in this case the first URL will be passed to grab_list() function. This function should parse it and return 2 sub-urls with #video1 and #video2 at the end. Finally these sub-urls will be passed to grab_video_data_impl() function, which does the actual parsing and returns video details. Based on the #hash passed in the URL you can guess whether it should return title 1 or title 2.

 

Link to comment
Share on other sites

  • 4 months later...

Is there a way to duplicate a grabber? For example, pornhub one. i would like to have different settings for the autograbber (certain urls to be imported more frequent). I thought it might be possible to create a pornhub.com_1 version, but i can't find the actual files that go with the pornhub grabber.

Link to comment
Share on other sites

Not possible, because in KVS grabbers are connected to domain. When you submit URLs, KVS uses grabber domain name to find out which grabber should be used. If there are 2 grabbers for the same domain, there will no be any guarantee which grabber would be used. While it may be possible that due to filename sorting KVS would always use the same grabber for "pornhub.com" domain, but in any way you can't affect which grabber should be used in which case, it will always use the same.

Link to comment
Share on other sites

On 5/12/2017 at 3:18 AM, Tech Support said:

Testing grabber class

Put grabber class file to your project root folder. Also create test_grabber.php file in the same folder with the following code:

Please could you clarify the project root folder within this example KVS install:

/var/www/fastuser/data/www/exampletube.com/admin/plugins/grabbers

Link to comment
Share on other sites

10 hours ago, Jim said:

Does purchase of the extra open source code de-encrypt all the grabber files? 

KVS grabber files are not encrypted, but if you mean the base class admin/plugins/grabbers/classes/KvsGrabber.php, then yes, purchasing open source code option will provide source code for all PHP files.

Link to comment
Share on other sites

  • 3 weeks later...
  • 3 months later...
On 5/12/2017 at 2:48 PM, Tech Support said:
public function get_supported_url_patterns()
   {
       // returns list of regexp patterns that describe video URLs, for youtube this pattern will match
       // https://www.youtube.com/watch?v=htOroIbxiFY
       return array("/https?:\/\/(www\.)?youtube\.com\/watch.*/i");
   }

how to create the url pattern for the target website? 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...