Blender Git Loki

Blender Git "temp-cycles-opencl-staging" branch commits.

June 8, 2017, 09:35 (GMT)
Cycles: Adjust split kernel tile updating logic to make rendering a bit faster

This makes tiles update less frequently and causes there to be more samples
in each batch making rendering faster. This helps a bit with the
slowdown seen from D2703.

I don't really like tiles not updating as much, it feels much less
responsive, maybe theres another way to go about it?

Timings by nirved: https://hastebin.com/ifanihewum.css
June 8, 2017, 09:35 (GMT)
Cycles: Pass all buffers to each kernel call for OpenCL

Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.

Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
June 8, 2017, 09:19 (GMT)
Cycles: Faster split branched path tracing by sharing samples with inactive threads

Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.

Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.

Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
June 8, 2017, 09:19 (GMT)
Cycles: Add function to accumulate samples with atomics for split kernel

Samples ran in parallel need a safe way to accumulate their results
with the results of other threads.
June 8, 2017, 09:19 (GMT)
Cycles: Add function to dequeue a ray
June 8, 2017, 09:19 (GMT)
Cycles: Add atomic decrement functions to util_atomic.h
June 8, 2017, 09:19 (GMT)
Cycles: Add kernel to enqueue inactive rays

The queue will be used to make reuse of inactive threads to keep
the GPU more busy.
June 8, 2017, 09:19 (GMT)
Cycles: Blacklist unsupported OpenCL devices

Due to various driver issues with AMD GCN 1 cards we can no longer support
these GPUs. This patch makes them unavailable to select for Cycles rendering.

GCN cards 2 and higher are still supported. Please use the most recent
drivers available to ensure proper functionality.

See here for a list to check which GPUs are supported:
https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
Tehnyt: Miika HämäläinenViimeksi päivitetty: 07.11.2014 14:18MiikaH:n Sivut a.k.a. MiikaHweb | 2003-2021