Quantcast
Channel: Cadence Community
Viewing all articles
Browse latest Browse all 3331

ocnxlRun never returns as some splits are hung forever

$
0
0

I am using ocean script to launch ADEXL simulations in netbatch (pool of machines). After setting up the testbench, I use ocnxlRun to launch the simulation. Several hunderd splits get launched in parallel. Due to machine or other issues, it so happens that one of the splits never returns. Therefore the entire job just waits forever and we never get the result back. As a workaround, I typically ssh to the machine running the hung split and kill virtuoso process. It then restarts the split from scratch and job finishes. We tend to loose significant time on this and lately we are seeing more than one splits that are hung. Do you have suggestion on how to deal with this situation? Do you think specifying runtimeout to say 2 hours would direct the master process to kill the splits that are running for 2 hours and restart them? Any other recommendations?

This is the job setup I am using: 

ocnxlJobSetup( '(

        "ADEXL_NB_POOL" "pool_name"

        "ADEXL_NB_QSLOT" "qslot_name"

        "blockemail" "1"

        "configuretimeout" "-1"

        "distributionmethod" "NB interface (free)"

        "lingertimeout" "30"

        "maxjobs" "240"

        "name" "Netbatch"

        "preemptivestart" "1"

        "reconfigureimmediately" "1"

        "runtimeout" "-1"

        "showerrorwhenretrying" "0"

        "showoutputlogerror" "1"

        "startmaxjobsimmed" "1"

        "starttimeout" "432000"

) )

 


Viewing all articles
Browse latest Browse all 3331

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>