-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Hi, first I am not sure this is the place to ask but I feel is most appropriate though.
I am running a classic mass download job with trio and asks libraries. As expected, I launch trio.run from the main thread, I create a nursery and use .start_soon method for every URL in the main function and I perform the task of actual download on the second function.
Now I want to use tqdm to monitor the progress and I am using this trio instrument:
class TrioProgress(trio.abc.Instrument):
def __init__(self, total, notebook_mode=False, **kwargs):
if notebook_mode:
from tqdm.notebook import tqdm
else:
from tqdm import tqdm
self.tqdm = tqdm(total=total, desc="Downloaded: [ 0 ] / Links ", **kwargs)
def task_exited(self, task):
if task.custom_sleep_data == 0:
self.tqdm.update(7)
if task.custom_sleep_data == 1:
self.tqdm.update(7)
self.tqdm.desc = self.tqdm.desc.split(":")[0] + ": [ " + str( int(self.tqdm.desc.split(":")[1].split(" ")[2]) + 1 ) + " ] / Links "
self.tqdm.refresh()
Let ignore the details and focus on the main task of the progress bar, i.w. to tick once at every processed URL. I thought the second function is the place to add such lines:
async def request_image(datas, start_sampleid):
tmp_data = []
import asks
asks.init("trio")
session = asks.Session(connections=64)
session.headers = {
"User-Agent": "Googlebot-Image",
"Accept-Language": "en-US",
"Accept-Encoding": "gzip, deflate",
"Referer": "https://www.google.com/",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
async def _request(data, sample_id):
url, alt_text, license = data
*task = trio.lowlevel.current_task()*
*task.custom_sleep_data = None*
try:
proces = process_img_content(
await session.get(url, timeout=5, connection_timeout=40), alt_text, license, sample_id
)
if proces is not None:
tmp_data.append(proces)
*task.custom_sleep_data = 1*
except Exception:
return
Except that if I count the ticks they are not equal to the size of my URL list. So the progress bar is not answering the basic question: "how long until finish"
Experimenting with 1 tick at every exit from the second function, the intuitive way, I noticed the ticks are about 2.5 - 3 times more than expected. But depending on the actual URL list this can go up to 7 as in the above example.
I would like to understand what is happening and maybe find a way to properly count finished download tasks (successful or unsuccessful). Succesful ones I was able to count correctly by confirming the actual download but all others are in the mist...