Ted

Meet ted! Your new way of downloading tv shows from the web!

Add your favourite tv shows to ted and ted will automatically download torrents of new episodes!
Tasklist

FS#313 - Get a new torrent sniffer

Attached to Project: Ted
Opened by Roel (roel) - Monday, 08 February 2010, 21:25 GMT+2
Task Type Bug Report
Category Backend / Core → Parser
Status Assigned
Assigned To Joshua (josh)
Operating System All
Severity High
Priority Urgent
Reported Version Development
Due in Version 0.98
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

To retrieve seeder/leecher count of a torrent, we use torrentsniffer. A really old and dead java torrent libary.

It has some issues:

- It does not do DHT, so it does not count seeders that are not listed on trackers
- It only checks one tracker, although the torrent can have more

In practice, this means that ted mostly cannot determine the seeder/leecher count of a torrent and thus blocks it from downloading.

We should get a new torrent library (or implement our own?) that can do all the things mentioned above so ted can better judge the quality of torrents.

I found some open source libraries and although developement on them was stopped, they might be useful for us to try out. I contacted one of the developers of such a libary to see if he is interested of helping us integrate his code into ted. He responded positive. See his project page: http://sourceforge.net/projects/yaircc/
This task depends upon

Comment by Joshua (josh) - Tuesday, 09 February 2010, 05:55 GMT+2
ok, i had a look at torrentsniffer, it doesnt implement the "announce-list" element that many torrents now have, it only has "announce". i will try and hack in some handling for "announce-list" and see if that improves things. if that is the only issue it will save us the trouble of implementing a new sniffer.

here is some light reading if ur interested:
http://wiki.theory.org/BitTorrentSpecification
http://bittorrent.org/beps/bep_0003.html
Comment by Roel (roel) - Tuesday, 09 February 2010, 22:19 GMT+2
Okay, Joost pointed that out to me a while ago. Please give it a try!
Comment by Joshua (josh) - Wednesday, 10 February 2010, 02:33 GMT+2
ok, i had a look at adding "announce-list" to torrentsniffer and it looks doable, with a little work(the specs arent exactly clear, in fact they are flat out wrong in the proposal)
http://bittorrent.org/beps/bep_0012.html

I have quickly modified/cutdown a version of the torrent class from vuze, which does implement "announce-list" so we can use that to implement multiple trackers, its kind of complicated but it would make saving/storing torrents easier than it currently is.


DHT, thats complicated, and if i read the docs correctly (which i probably didn't) DHT is proposed to work 2 ways, only one way would really work for us the other not so much.
1:) when a tracker returns a peer, it also returns a peers DHT port, which will allow for getting peers from DHT. this wouldnt really work for us because A.) we would need to implement trackers within ted and it would be slow(gotta wait for peers from trackers). Vuze uses this method, because its a torrent client and it uses trackers and keeps a DHT table based off those peers.
2:) torrent file contains a small list of DHT nodes and we can quicky work out peers using them, but alot of torrents dont implement this, vuze doesnt use this method at the moment, at least as far as i can tell.



Comment by Roel (roel) - Wednesday, 10 February 2010, 10:28 GMT+2
Do you have a clue how many torrents are DHT? Option 1 would be too slow for ted, as you already indicated. We want to search quickly through the torrents.

I would be interested to see the modified torrent class, as soon as you have it working somewhat :)

Another thing we should do is persuade torrent sites to include the seeder/leecher count in their RSS feeds, as some of them already do. That eliminates the need for ted to communicate with trackers/peers altogether.
Comment by Joshua (josh) - Thursday, 11 February 2010, 03:01 GMT+2
I don't know how many torrents are using DHT and how many use ONLY DHT/PEX even less so. i just randomly check a bunch of torrents to get a rough idea.

as for the persuading torrent sites to include the seeder/leechers in RSS it comes back to this
https://ted.nu/forum/viewtopic.php?t=2086&sid=44359ed7abd910f206826f23a140a2ae

It doesn't seem to have moved since last year tho...

I have attached a patch file for the changes i have made to torrent-sniffer to support multiple trackers

I will upload the vuze torrent when i actually get it to work properly, a little harder to get it to scrape than i originally thought.
Comment by Jofo (JoFo) - Wednesday, 17 February 2010, 16:45 GMT+2
Josh, if you think this multiple tracker fix of yours works please just check it in the SVN yourself as you also have write access. That is the best way to test it (I always use the latest nightly build). If this works I would like to know ASAP as I definitely then want to merge this fix to 0.92 as it will improve the performance of ted a lot!
Comment by Joshua (josh) - Monday, 22 February 2010, 03:30 GMT+2
Added to SVN now, im still not 100% sure about the scraping though, some docs say keep trying to scrape all trackers until you get one response using tiers for precedence, others say try and scape 1 tracker in each tier. I have set it up to keep scraping until we get one response for now.

also added the functionality for bootstrapping nodes, for use with DHT.

I think we can add a form of DHT scraping, its wont be proper DHT, just a quick check to see how many seeders we can reach in X seconds, but this would require a change to the way we parse torrents.

Also need to look into adding scraping for UDP trackers, there seems to be more and more trackers out there which are implementing this.
Comment by Roel (roel) - Tuesday, 23 February 2010, 21:01 GMT+2
I tested it briefly and it seems to work, I do not see the tracker error messages that I used to see (Bencoding Exception). However, for one torrent for Top Gear I get this exception:
Error - Feb 23, 2010 @ 8:58 PM: java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.vertor.com/index.php?mod=download&id=1780585
Exception

But it seems that this torrents is blocked on the server so it is a valid exception.

Josh: do you have any ideas how we could test your implementation?


Comment by Roel (roel) - Tuesday, 23 February 2010, 21:02 GMT+2
Also: when this helps fixing some of the problems ted has with finding torrents I would like to merge your solution to 0.92 when your solution is finished.
Comment by Joshua (josh) - Tuesday, 23 February 2010, 22:30 GMT+2
I will look into writing some JUnit tests, but i'm not sure about the correct way to test network based things yet.

the only thing that needs to be tested is scraping, as i just extended the base torrent class.
So i guess so long as we get 1 good scrape its working properly.

Loading...