Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What if the speed of memcpy is too slow?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

今天就跟大家聊聊有关memcpy速度太慢该怎么办,可能很多人都不太了解,为了让大家更加了解,小编给大家总结了以下内容,希望大家根据这篇文章可以有所收获。

  memcpy是C/C++的一个标准函数,原型void *memcpy(void *dest, const void *src, size_t n),用于从源src所指的内存地址的起始位置开始拷贝n个字节到目标dest所指的内存地址的起始位置中。

  neon是适用于ARM Cortex-A系列处理器的一种128位SIMD(Single Instruction, Multiple Data,单指令、多数据)扩展结构。neon支持一次指令处理多个数据,比如处理8个8-bit、4个16-bit、2个32-bit或1个64-bit。正是这个特性可以用于加速内存拷贝。

  在正常情况下memcpy的性能已经足够使用了,但是当我们因为某些原因在拷贝大内存遇到瓶颈的时候,可以考虑使用neon来加速内存拷贝。比如我在使用glMapBufferRange把PBO从GPU内存映射到CPU内存的时候遇到了耗时问题,拷贝921600字节的数据需要30ms,在使用neon后,内存拷贝耗时直接降低到了4ms,相差将近8倍。事实上,在arm平台上使用neon指令可以高效提升数据并行处理性能,而不仅仅局限于内存拷贝。google开源的libyuv内部也使用了neon指令来并行处理数据。

使用neon指令#ifdef __ARM__static void neon_memcpy(volatile unsigned char *dst, volatile unsigned char *src, int sz){ if (sz & 63) sz = (sz & -64) + 64; asm volatile ( "NEONCopyPLD: \n" " VLDM %[src]!,{d0-d7} \n" " VSTM %[dst]!,{d0-d7} \n" " SUBS %[sz],%[sz],#0x40 \n" " BGT NEONCopyPLD \n" : [dst]"+r"(dst), [src]"+r"(src), [sz]"+r"(sz) : : "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "cc", "memory");}#endif

  由于并不是所有的armv7架构cpu都支持neon,所以这里增加cpufeatures库用来判断是否支持neon。下面是正确的使用方式。

#ifdef __ARM__ if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM && (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0){//支持NEON neon_memcpy(destBuffer, src, length); }else{ memcpy(destBuffer, src, length); }#else//其它架构使用memcpy memcpy(destBuffer, src, length);#endifAndroid mk开启neon#arm架构增加neon支持ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)LOCAL_CFLAGS := -D__cpusplus -g -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a8 -DHAVE_NEON=1endif#开启两个架构的neon支持(x86可以通过将neon转为sse间接支持)ifeq ($(TARGET_ARCH_ABI),$(filter $(TARGET_ARCH_ABI), armeabi-v7a x86))LOCAL_ARM_NEON := trueendifLOCAL_STATIC_LIBRARIES := cpufeatures

include $(BUILD_SHARED_LIBRARY)$(call import-module,android/cpufeatures)Cmake开启neon# 引入cpufeatures模块include_directories(${ANDROID_NDK}/sources/android/cpufeatures)

if (${ANDROID_ABI} STREQUAL "armeabi-v7a") set_property(SOURCE ${SOURCES} APPEND_STRING PROPERTY COMPILE_FLAGS " -mfpu=neon") add_definitions("-DHAVE_NEON=1")elseif (${ANDROID_ABI} STREQUAL "x86") set_property(SOURCE ${SOURCES} APPEND_STRING PROPERTY COMPILE_FLAGS " -mssse3 -Wno-unknown-attributes \ -Wno-deprecated-declarations \ -Wno-constant-conversion \ -Wno-static-in-inline") add_definitions(-DHAVE_NEON_X86=1 -DHAVE_NEON=1)endif ()add_library( yourLibrary SHARED ${ANDROID_NDK}/sources/android/cpufeatures/cpu-features.c)

  事实上并不是只有arm架构才支持SIMD,x86也是支持的(SSE),并且Android也提供了适用于x86的NEON_2_SSE.h。x86并不直接支持neon指令,而是通过这个头文件将其转为sse指令,以提供与neon相同的api。

看完上述内容,你们对memcpy速度太慢该怎么办有进一步的了解吗?如果还想了解更多知识或者相关内容,请关注行业资讯频道,感谢大家的支持。

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report