Journal Articles
Apan Qasem and Joshua Magee.
Improving TLB performance on current chip multiprocessor
architectures through demand-driven superpaging.
Software Practice and Experience (SPE), 2012.
• BibTeX
Santosh Sarangkar and Apan Qase.
Mats: A model-driven adaptive tuning system for parallel workloads.
Journal of Parallel and Cloud Computing (JPCC), 1(2), 2012.
• BibTeX
Apan Qasem.
High-level language extensions for fast execution of
pipeline-parallelized code on current chip multi-processor systems.
International Journal of Programming Languages and Applications
(IJPLA) [In Press], 2(3), 2012.
• BibTeX
Apan Qasem.
Architectural considerations for compiler-guided unroll-and-jam of
cuda kernels.
American Journal of Computer Architecture, 1(2), 2012.
• BibTeX
Apan Qasem.
Autotuning strategies for reducing synchronization costs in
multithreaded kernels.
Journal of Systems and Software, 2(4), 2012.
• BibTeX
Hammad Rashid, Clara Novoa, Mark McKenney, and Apan Qasem.
Efficient parallel solutions to the integral knapsack problem on
current chip-multiprocessor systems.
International Journal of Parallel, Emergent and Distributed
Systems (IJPEDS), 27(1):19-44, 2012.
• BibTeX
Apan Qasem and Ken Kennedy.
Model-guided empirical tuning of loop fusion.
International Journal of High Performance Systems Architecture
(IJHPSA), 1(3):183-198, 2008.
• BibTeX
Apan Qasem, Ken Kennedy, and John M. Mellor-Crummey.
Automatic tuning of whole applications using direct search and a
performance-based transformation system.
The Journal of Supercomputing, 36(2):183-196, 2006.
• BibTeX
Conference and Workshop Papers
Apan Qasem, Michael Jason Cade, and Dan Tamir.
Improved energy efficiency for multithreaded kernels through
model-based autotuning.
In Proceedings of the 2012 IEEE Green Technology Conferenc
(GTC12), 2012.
• BibTeX
Swapneela Unkule, Christopher Shaltz, and Apan Qasem.
Automatic restructuring of GPU kernels for exploiting inter-thread
data locality.
In Proc. Int'l. Conf. on Compiler Construction (CC12), pages
21-40, 2012.
• BibTeX
Apan Qasem.
Efficient execution of time-step computations with pipelined
parallelism and inter-thread data locality optimizaitions.
In Proceedings of the 2012 PPOPP International Workshop on
Programming Models and Applications for Multicores and Manycores (PMAM12),
pages 27-35, 2012.
• BibTeX
Apan Qasem and Dan Tamir.
Memory performance diagnosis through feedback synthesis.
In Proceeding of the Workshop on Feedback-Directed Compiler
Optimization for Multi-Core Architectures (COMA12 a HIPEAC workshop), 2012.
• BibTeX
Faizur Rahman, Qing Yi, and Apan Qasem.
Understanding stencil code performance on multicore architectures.
In Conf. Computing Frontiers (CF11), page 30, 2011.
• BibTeX
Swapneela Unkule and Apan Qasem.
Regisxter pressure aware code transformations on GPU.
In 24th International Conference on High Performance Computing
Networking, Storage and Analysis - Companion Volume (SC11), pages 19-20,
2011.
• BibTeX
Clara Novoa, Apan Qasem, Hammad Rashid, and Mark McKenney.
Dynamic programming solutions for the integral knapsack problem on
multicore architectures, (extended abstract).
In 11th INFORMS Computing Society Conference, (ICS11), 2011.
• BibTeX
Santosh Sarangkar and Apan Qasem.
Intelligent feedback for fast and effective autotuning, (extended
poster abstract).
In 23rd International Conference on High Performance Computing,
Networking, Storage and Analysis - Companion Volume (SC10), 2010.
• BibTeX
Qing Yi, Jichi Guo, and Apan Qasem.
Evaluating the role of optimization-specific search heuristics in
effective autotuning (short paper).
In 23rd International Workshop Languages and Compilers for
Parallel Computing (LCPC10), 2010.
• BibTeX
Apan Qasem.
Locality-conscious superpaging for improved tlb behavior of stencil
computations.
In Proceedings of the 2010 International Conference on High
Performance Computing Systems (HPCS10), 2010.
• BibTeX
Qing Yi, Santosh Sarangkar, and Apan Qasem.
Improving autotuning effciency and portability through feedback
diagnostics.
In Proceedings of the Fifth International Workshop on Automatic
Performance Tuning (iWAPT10), 2010.
• BibTeX
Hammad Rashid, Clara Novoa, and Apan Qasem.
An evaluation of parallel knapsack algorithms on multicore
architectures.
In Proceedings of the 2010 International Conference on
Scientific Computing (CSC10), pages 230-235, 2010.
• BibTeX
Santosh Sarangkar and Apan Qasem.
Restructuring parallel loops to curb false sharing on multicore
architectures.
In 24th IEEE International Symposium on Parallel and Distributed
Processing (IPDPS Workshops), pages 1-7, 2010.
• BibTeX
Apan Qasem, Jichi Guo, Faizur Rahman, and Qing Yi.
Exposing tunable parameters in multi-threaded numerical code.
In Network and Parallel Computing, IFIP International
Conference, (NPC10), pages 46-60, 2010.
• BibTeX
Joshua Magee and Apan Qasem.
A case for compiler-driven superpage allocation.
In Proceedings of the 47th Annual Southeast Regional Conference,
(ACMSE09), 2009.
• BibTeX
Michael Jason Cade and Apan Qasem.
Balancing locality and parallelism on shared-cache mulit-core
systems.
In 11th IEEE International Conference on High Performance
Computing and Communications (HPCC09), pages 188-195, 2009.
• BibTeX
Qing Yi and Apan Qasem.
Exploring the optimization space of dense linear algebra kernels.
In 21st International Workshop Languages and Compilers for
Parallel Computing (LCPC08), pages 343-355, 2008.
• BibTeX
Apan Qasem.
Evaluating an early-stop criterion and a statistical pruning strategy
of the optimization search space.
In Proceedings of the International Conference on Parallel and
Distributed Processing Techniques and Applications (PDPTA), pages 506-510,
2008.
• BibTeX
Apan Qasem and Ken Kennedy.
Pruning the optimization search space using architectureaware cost
models.
In Proceedings of the First Workshop on Statistical and Machine
Learning Approaches Applied to Architecture and Compilation (SMART07), 2007.
• BibTeX
Apan Qasem and Ken Kennedy.
Profitable loop fusion and tiling using model-driven empirical
search.
In Proceedings of the 20th Annual International Conference on
Supercomputing (ICS), pages 249-258, 2006.
• BibTeX
Apan Qasem and Ken Kennedy.
A cache-conscious profitability model for empirical tuning of loop
fusion.
In 18th International Workshop on Languages and Compilers for
Parallel Computing, (LCPC), pages 106-120, 2005.
• BibTeX
Apan Qasem, Ken Kennedy, and John Mellor-Crummey.
Automatic tuning of whole applications using direct search and a
performance-based transformation system.
In Proceedings of the Los Alamos Computer Science Institute 5th
Annual Symposium (LACSI04), 2004.
• BibTeX
Robert Fowler, John Mellor-Crummey, Guohua Jin, and Apan Qasem.
A source-to-source loop transformation tool (extended poster
abstract).
In Proceedings of the Los Alamos Computer Science Institute 3rd
Annual Symposium (LACSI02), 2002.
• BibTeX
Apan Qasem, David B. Whalley, Xin Yuan, and Robert van Engelen.
Using a swap instruction to coalesce loads and stores.
In 7th International Euro-Par Conference Parallel Processing,
(EuroPar01), pages 235-240, 2001.
• BibTeX
Automatically created from self.bib at Sun Mar 24 09:57:51 2013 by yab2web.
Technical Reports
- Apan Qasem and Ken Kennedy, Evaluating a Model for Cache Conflict Miss Prediction, Technical Report CS-TR05-457, Dept. of Computer Science, Rice University, Jul 2005.
- Apan Qasem, Guohua Jin and John Mellor-Crummey, Improving Performance with Integrated Program Transformations, Technical Report CS-TR03-419, Dept. of Computer Science, Rice University, Oct 2003.
Theses
- Hammad Rashid, Parallel Knapsack Algorithms on Multicore Architectures, Masters Thesis, (Advisor: Apan Qasem), Texas State University, May 2010.
- Joshua A. Magee, Automated Compiler Driven Superpage Allocation and its Applications, Masters Thesis, (Advisor: Apan Qasem), Texas State University, Dec 2008.
- Michael Jason Cade, Balancing Data Locality and Parallelism for Improved Application Performance on Multi-core Platforms, Masters Thesis, (Advisor: Apan Qasem), Texas State University, Dec 2008.
- Apan Qasem, Automatic Tuning of Scientific Applications, Ph.D. Dissertation, Rice University, Jul 2007.
- Apan Qasem, Using a Swap Instruction to Reduce Memory Accesses in Applications, Masters Thesis, Florida State University, May 2001.