Month: March 2015

C++ code optimiaztion via gprof and gcov

CS作业代写

Due date: April 2, in class

Read about the linpack benchmark at

http://www.netlib.org/utk/people/JackDongarra/faq-linpack.html

Download the (slightly modified) file linpack.c from

http://www.cas.mcmaster.ca/~nedialk/COURSES/3f03/private/linpack.c

When running this benchmark on a 2.66GHz Dual-Core Intel Xeon, I obtain this table (MFLOPS

is million floating-point operations per second):

n max MFLOPS compiler flags

100 329.813 gcc

1000 410.306

330.914 -O0

931.276 -O1

930.425 -O2

1276.764 -O3

408.111 -O0

1208.931 -O1

1220.399 -O2

1619.045 -O3

Problem 1 (6 points) Study the compile options of the gcc compiler, and in particular the

optimization flags, and the flags that are related to SSE instructions.

Produce the same table for each of penguin, mills, and your own computer. Keep the same

flags as above, but you can also add other flags. Try to obtain as high count as you can. Submit

the three tables and a discussion on your results.

Note. You can obtain information about your system

MIPS 处理器 数据通路设计 verilog代写

1
EECE 3324
Computer Architecture and Organization
Final Project
MIPS Architecture Implementation
Due on Apr. 14th (M) 11:59pm
Basic project: single-cycle MIPS architecture implementation (worth 25% of the total course points)
1. Overview
For the EECE3324 project, you will implement the standard single-cycle MIPS architecture in Verilog. You are given a memory Verilog file which contains both text (program instructions) and data, you should write a processor Verilog file which contains all the modules for the processor datapath and controller. The processor module interacts with the memory module. You should write your own testbench file to simulate the processor and memory. Finally, you should calculate the CPI of the provided program from the Verilog simulator.
I recommend using Modelsim as the HDL simulator. Instruction file on Modelsim installation and usage have been posted on BB. You can also use others, like ISE simulator, vcs, etc., if you are familiar with them. However, you have to let the TA and me

histogram equalization verilog代写

ECE 464/520: Project Technical Requirements
You are to produce a Histogram Equalization Unit for image processing. A general description of what a histogram equalization unit does can be found on Wilkipedia amongst other sources,
http://en.wikipedia.org/wiki/Histogram_equalization
You will be processing a series of small (640 x 480 pixel ) images. The images will contain 32-bit unsigned pixels representing gray scale images. A basic description of the algorithm is found below (this is extracted from a requirements document in one of our research projects.
Change (Feb 6, 2014). The data supplied to you actually has a dynamic range of only 8 bits per pixel. SO that you can all take advantage of this, you only need to produce a histogram where the value of each pixel is sorted into L=28 buckets, not 216 buckets.

 

You have to design a unit that maximizes the number of images that can be processed per unit area. You thus need to report how long (in seconds) it takes to process an image, and what is the cel

FIFO设计 verilog代写

hw5

同步FIFO的设计思路:

1.同步FIFO的基本原理:

FIFO(First In First Out)——是一种可以实现数据先入先出的存储器件。FIFO就像一个单向管道,数据只能按固定的方向从管道一头进来,再按相同的顺序从管道另一头出去,最先进来的数据必定是最先出去。FIFO被普遍用作数据缓冲器。

FIFO的基本单元是寄存器,作为存储器件,FIFO的存储能力是由其内部定义的存储寄存器的数量决定的。本题中所设计的是同步FIFO(即输出输入端时钟频率一致),异步复位,其存储能力为(4×4,设计的这么小主要是由于板子的寄存器数量非常有限当然也可以使用4×8或者4×16),输出两个状态信号:full与empty,以供后继电路使用。

2.本组同步FIFO的设计分析与框架:

同步FIFO整体架构:

解释与说明:上图中最大的矩形框所包围的内部部分为所设计的同步FIFO,由FIFO主控体和RAM构成。FIFO主控体接收来自外部的读写控制信号(read_n,write_n)、复位信号(reset_n)和时钟信号(clock),并在时钟上升沿到来时根据从RAM返回的counter信号进行读写控制判断以及读写指针的计算,并将所得结果以mwrite_n,mread_n,wr_pointer ,rd_pointer信号的形式传递给RAM进行相应的读写操作。其中counter信号代表RAM体内已存储未读数据的数据个数。整个同步FIFO包括八条外部数据信号线(包括总线)和五条内部数据信号线。

 

 

同步FIFO的具体设计:

程序开始先进行复位判断,假如复位键按下,则进行复位。接着判断读信号是否有效,假如无效,则判断写是否有效,假如有效,并且存储体不满的话则进行写操作,先产生正确的写指针,然后将输入的数据写入对应的RAM空间内。读雷同。当读写都有效时,counter不做更改,而直接产生读写指针,然后进行读写操作,当然在读写之前要先判断是否空满。

arbiter设计部分代码注释 verilog代写

module top_to_bottom(

input  wire        clk,

input  wire        rst_n,

input  wire [15:0] req, //request信号入口  0~15号分别对应

output reg  [15:0] grt  //grant 信号出口

);

 

reg [4:0] i;

reg [15:0] grt_tmp;  //定义一个中间变量 方便进行非阻塞赋值 注意这里并没有产生

//register

 

always @ (*) begin

grt_tmp = 16’b0;  //进行初始赋值 注意不要在posedge clk下赋值 要不然会产生寄存器

for (i=0;i<5’d16;i=i+1) begin

if (req[i] == 1’b1)

begin

grt_tmp[i] = 1’b1;  //如果对应的node号申请就按照top bottom的优先级放权

i      = 5’d15;   //一旦放权后就退出循环,方法是把i打到最大

end

end

end

 

always @ (posedge clk) begin //进行寄存器赋值操作

if (!rst_n)

grt <= 16’b0;

else

grt <= grt_tmp;  //

end

endmodule

 

 

module round_robin(

input wire        clk,

input wire        rst_n,

input wire [15:0] req,

output reg [15:0] grt

);

reg [3:0] cnt;//一个计数器,相当于一个指针,每当上一次node申请成功后,就指向与此//node相邻的下一个node,这个信息需要跨时钟沿,so we need some registers!

reg [3:0] i;

reg [15:0] grt_tmp;//注意i和grt_tmp在综合时其实都不是寄存器,虽然他们是reg类型J

 

always @ (*) begin

i = cnt;

grt_tmp = 16’b0;

while (rst_

cache设计 verilog代写

In this assignment, you will design a generic memory block and use it to perform various tasks.

 

Part 1: Memory/cache design.

A physical cache block is made up of memory cells which are associated in rows and columns. Each row corresponds to a cache-line, which may contain any size of data. These lines are then stacked in column form. In general, each cache line consists of 3 major parts, the ID tag, the data, and the state (or status) of each line. The state bits will be ignored for this assignment (just use ID tag and data).

When a request comes in for a line, the cache lines are searched concurrently to determine if a line matches the requested ID tag. If the tag matches, then the bits in the data portion of the cache line are sent out of the system. (Note: this assignment is a scaled down version of a traditional fully-associative cache, not including any sense amplifiers and other support circuitry).

Your job is to design such a cache, with a compile-time vari

IrDA设计 部分说明 verilog代写

基于IrDA协议的串行通信说明文档

  1. 串行通信和UART协议

UAR T (U n i ve rs a l A s ynch r onou s Rece i ve r Tr an s m it 2t e r)协议是一种串行数据传输协议 。UAR T允许在串行链路上进行全双工通信 ,在数据通信及控制系统中得到了广泛运用 。

基本 的 UAR T 通信 只需 要两 条信号 线 ( R x D ,Tx D )就可以完成数据的全双工通信任务 。 Tx D 是UAR T发送端 , 为输出 ; Rx D 是 UAR T 接收端 , 为输入。UAR T的基本特点是 : 在信号线上共有两种状态 ,分别用逻辑 1 ( 高电平 ) 和逻辑 0 ( 低电平 ) 来区分 。 例如 , 在发送器空闲时 , 数据线保持在逻辑高电平状态 ,发送器是通过发送起始位来开始一个数据帧的传送 ,起始位使数据线处于逻辑 0 状态 , 提示接收器数据传输即将开始 。 接着发送数据位 ,数据位一般为 8 位一个字节的数据 ( 也有 5 位 、 6 位或 7 位的情况 ) ,低位 (L S B ) 在前 ,高位 (M S B ) 在后 。 然后发送校验位 ,校验位一般用来判断传输的数据位有无错误 ,一般是奇偶校验 。停止位在最后 ,用以标识数据传送的结束 , 它对应于逻辑1 状态 。

图1 UART协议格式

(重点)实现UART的verilog代码时主要分为三个部分,波特率产生模块、发送模块、接受模块。在原代码file中其本意也主要是这三个模块,只是本来可以通过实例化两个波特率产生模块的任务变成了两个模块即txbaud和rxbaud模块,为了满足条件:

图2 要求

  1. Baud Rate

我们需要调节Baud Rate为57600,那么就需要在txbaud和rxbaud中进行修改:

首先我们要了解波特率发生器的原理,波特率发生器实际上就是分频器 ,可以根据给定的系统时钟频率 ( 晶振时钟 ) 和要求的波特率算出波特率分频因子 ,把算出的波

subtractor verilog代写

  1. Write a structural Verilog code to generate a 2-bit subtractor. The circuit has two 2-bit inputs A and B and outputs S-sum (2-bits) and B-borrow. Write a test bench simulating all possible input combinations and verify your design checking proper output values and delays. To get the gate-level design of the circuit, create (and show in the write-up) the truth table, and then simplify them. (Use Karnaugh maps). Note that you should assume 2s complements for negative numbers.

 

  1. Using the truth table, create a set of user-defined primitives to accomplish the same 2-bit subtractor.

 

  1. Design an 8-to-1 MUX in Verilog. Each input to the MUX is 16-bit in width. You have to write two different modules implementing a MUX:

1) Using a CASE statement and

2) Using IF-ELSE statements.

Verify each module by writing a test bench and simulating selection of each input line.

 

  1. Write and simulate/test a behavi

ALU设计 verilog代写

参照“4ALU设计方案”“ALU Projec 图表更改.pdf”的功能和时序要求,完成以下工作:

  • 完成该ALUHDL设计,给出其HDL代码。变量命名请参照表1和表2
  • 给出其功能仿真策略;给出测试该ALUHDL代码;给出功能仿真结果。
  • 完成该ALU 逻辑综合,请以脚本.tcl形式给出你所加的设计约束条件和相应的综合结果。
  • 完成该ALU的版图综合(可选);完成该ALULVSDRC(可选);完成该ALU的后仿(可选)。

快速进位加法器 verilog代写

In this project you are asked to design and implement a 32-bit CLA adder circuit by making use of a fast parallel prefix circuit. The CLA circuit should have 32 inputs and 33 outputs, where the 33rd output is the carry- out. The carry-in is set to zero.

The design may be implemented and synthesized in the design tool of your choice. You may use HDL tools or schematic entry for design entry. For simulation and synthesis you are allowed to use Synopsys, Mentor Graphics, or Xilinx FPGA toolkits.

ou should simulate your design for functional correctness by providing proper inputs. After verifying the implementation synthesize it by selecting an appropriate target FPGA (or ASIC) model. Extract circuit size and critical path delay information from the post synthesis report.

Also design and implement a 32-bit carry-propagate adder. Compare it to your CLA design in terms of circuit size and delay.

You should turn in a printout of the schematic of your circuit, simulation results showing functional correctness, an