Frequently Asked Questions
What is the goal of Volpex? The state of the art of parallel computing on volunteer nodes is generally limited to applications with “master- slave” or “bag-of-tasks” parallelism (a task does its own portion of the job without communicating with other tasks). The current bag-of-tasks model on BOINC works well. However, many scientific applications are not bag-of-task as they naturally require communication between tasks; therefore, they cannot run with BOINC without significant modifications. Our project aims to remove that limitation, enabling BOINC to work with a larger class of scientific applications so that we can support many more interesting projects that will benefit the world in the future. How do volunteer computers work together in a parallel job? Is there any security threat when they communicate with each other? A parallel job requires a fixed number of processes to work together. When a parallel job is created, our BOINC server will begin recruiting volunteer hosts. When a sufficient number of host is reached, all recruited hosts will begin their computation simultaneously. The hosts communicate with each other by reading and writing data objects into our central server; thus, there is no particular security threat. Why does my BOINC client often say that Volpex project does not have any task available? Do you have any need for my machine? This generally only means that your client made inquiries when hosts were not being recruited. Our server stops recruiting hosts when a sufficient number of hosts have been recruited for a job's execution. The execution can take from minutes to hours. A new recruitment period begins when a new job is ready. Why is the task's status shown as "Waiting to run (Scheduler wait)" on my BOINC client? A parallel job starts simultaneously on a group of hosts, after sufficient hosts are recruited. During this recruitment period, the recruited hosts will not execute the Volpex task, and instead will temporarily exit, relinquishing the CPU to other tasks.This is the scenario in which the above message is displayed on the BOINC client's GUI. Some hosts are designated spare nodes for the computation, where the wait period can last longer as they wait to replace a failed node. Why does my BOINC client say "Task finishes but output files absent"? Is this a failure? As stated above, during the recruitment period, the recruited hosts may temporarily exit, relinquishing the CPU to other tasks. BOINC client incorrectly thinks of this as a possible error and displays this message, but the error does not indicate a real problem. This issue was reported to BOINC developers and we expect that a fix will be incorporated in a future version of the BOINC client. Why does my task end with an error status? Is something wrong? An erroneous task status does not necessarily mean something wrong in a Volpex project. A task can be intentionally terminated by the Volpex server before it successfully reaches the end because the parallel job has already finished or your task is not needed anymore for the overall parallel computation. However, if you are in doubt, please do not hesitate to tell us on the forum. Why does the CPU seem to be idle when my task runs? Shouldn’t the CPU be relinquished to other projects' tasks? In a parallel job, a process may have to wait for data from another process to continue its computation. During this wait period, a task can stay idle. However, if this process is swapped out for other tasks, it will not be able to respond to communication from other processes in a timely manner, thus, negatively affecting the execution of the overall parallel job. This is an artifact of any parallel computing system. Why does my task fail rightaway and stderr says "Unable to find result name...."? Our project requires BOINC client's ability to report its task's name for the purpose of identification. BOINC client that is older than version 6.12 does not have this ability. Therefore, if you are using BOINC client older than version 6.12, please upgrade. Why does my task fail rightaway and stderr says "Child client failed to connect"? The volunteer hosts communicate with each other by reading and writing data objects into our central server. Therefore, our application requires TCP connection to our central data server at the IP address of 126.96.36.199 and port 9999 and port 9316 for reading and writing data objects. If you see the "Child client failed to connect" error message, most likely your proxy/firewall does not allow the above mentioned TCP connection to be made. Please have your your proxy/firewall reconfigured. Why is the task's deadline so short compared to the task's estimated computation time? The task's deadline is set that way so that if a computer cannot start working on it right away (because the computer is busy or there are other urgent workunits running), the workunit should be cancelled. This basically means that a Volpex task needs to be run at high priority. Why do Volpex tasks need to run at high priority? Running a coordinated parallel job requires processes to be reachable at all times to respond to communication requests from other processes. If a single process becomes unavailable, a computation of 100s of processes may come to a halt, as all other processes wait for a communication response from the missing process. Therefore, Volpex tasks need to run at high priority to be available without interruption during job execution. How is the credit granted in Volpex? In our project, credits are not granted when the task succeeds, but periodically, proportional to the amount of time your computer spends on the task. The task does not have to finish with success status in order to gain credit. I found a bug / I am having problem running Volpex, where should I report it? Please make a topic in this forum. The administrator team and other volunteers are always ready to help!