Autonomous Machines & Society.

2022-08-25

Scan This Picture And Index The Whole Video/Document/Ppt/Textbook!

non-max suppression in combining similar bounding boxes

the lib:

1 2	from imutils.object_detection import non_max_suppression

basically greek letters

maybe you can document another great range of symbols by just enabling the system to search in greek?

could also search among math symbols, do math ocr.

kindly reminders

when building python c++ libraries without xcode, please add commandline header files like this:

in order to have this during build:

/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/Headers/Python.h
we need to do:

1
2
3


ln -s /Library/Developer/CommandLineTools /Applications/Xcode.app/Contents/Developer

search by image instead of cranking latex out

image search libraries

character level optical char segmentation called chargrid ocr
image match used for copyright violation detection, using phash algorithm

get the latex out

first detect the location of math formula

scanning single shot detection for math formulas
dataset for math symbol detection
detect different part of document with yolov3
math expression detection

next find the tool for picture to latex conversion

新开源的Python工具——Pix2Text (P2T)，目标是 Mathpix 的Python开源替代品，现在可以识别截图中的数学公式并转换为Latex表示，也可以识别图片中的中英文文字。在线Demo： https://huggingface.co/spaces/breezedeus/pix2text 。 Github: https://github.com/breezedeus/pix2text ，Gitee: https://gitee.com/breezedeus/pix2text 。
attention based math ocr
gui of image2latex
im2latex-tensorflow
im2markup with only nvidia support using torch, model for latex conversion can be found here
deeplearning picture to latex
pix2tex
using pix2tex
中文公式手写公式识别需要进一步训练
 中文手写 pytorch版本

mask latex area and get conventional things out

easyocr with pytorch support

search the formula

formula search based on sympy
can we render latex to picture with sympy?
latex search engine

2022-08-23

关于人类发展规律和需求的随想

人需要获得海量信息然后才能在某些领域取得成就这个相互关系有强相关和弱相关的范畴

collaborative filtering, recommendation engine

neo4j tutorial on recommendation engine

random search libraries

spotify randomizer

heuristic search libraries

twitch chat scraper and meme prediction heuristic

you can check the same set of video and plot their historical stats, or use official ‘trending’ api to find out.

find ‘random’ videos of certain topic:

search for playlists, collect recommendations

apply to some video feeding apis or official api like giphy

heuristic search, graph search intro

use heuristic recursive search, apply random parameters, find related keywords, apply filters and update weights

topic modeling using gensim

bertopic tutorial can predict topics of new document and get topic similarity

1 2	pip3 install bertopic

可以事先设定好目标不管这个搜没搜到都要奖励搜索成功的那次过程比如老头环和elden ring的对应关系

有没有相关的工具？名字是什么？

recursive text search engine

Heuristic Text Search Engine

free pdf: Heuristic and Systematic Use of Search Engines

it’s like webgpt, which has arxiv pdf paper

openai alignment research is to make artificial general intelligence (AGI) aligned with human values and follow human intent.

there’s also a fake news detector inside web browser

搜索一个词拿到感兴趣的继续搜下一个

把你搜索的过程记录下来搜集信息寻找关联的过程记录下来然后交给ai进行离线训练

同时可以把你创建内容组织结构的过程记录下来交给ai离线训练适用于[template based content generator](./pyjom schedules.md)

2d转3d 图片生成3d模型

几张类似的图片生成一个3d的视频

https://3d-moments.github.io

2022-08-22

Hardware Simulator

pyspice uses ngspice and xyce as backend, capable of simulating MOS, JFET, diode and more

2022-08-22

连续区间离散区间从离散数据中获得离散区间交并补

离散区间的获得可以用边界条件判定即最近n个连续的概率大于多少容忍值为多少最近n个小于多少直接作为结束边界的条件也可以用convolution Gaussian blur

离散区间交并补可以转化为连续区间交并补更简单省事

如果要做下面的运算建议用第三方库比如wolfram swi-prolog的clpr sympy

连续区间交并补先排序设置首末端的操作然后进行相应区间选取进行下一步操作直到结束输出总的结果

combining similar/nearby bounding boxes, suppressing near duplicate bounding boxes over short time

see here

you can merge a group of things, then analyze them over time using object tracker, tweening them.

Discrete Interval Set Union Solvers

you may want to filter out short intervals. mind the lopen/ropen interval after intersection or difference operation.

you may also want to quantize these intervals, set them to nearest possible points. 用到某采样率还是根本不用吧就是属于那个区间的离散点上面执行相应的操作变化但是那个区间如何划分怎么把离散点归类到不同区间里面完全是其他的逻辑需要做的事情一般同类别的区间不能相交但是之后再考虑吧怎么用呢所有的全部弄到一个列表里面还是选取最小的那个来用？

category with different groups -> subcategories

first the sample set:

import sympy
# make sure every subset is ordered.
mSet = [(1.0,1.1,1.2),(2.4,2.5,2.6)]
mSet2 = [(0.9,1.05,1.15),(2.45,2.55,2.65,2.75)]
# convert to intervals first please?
mSetIntervals = [(x[0],x[-1]) for x in mSet]
mSet2Intervals = [(x[0],x[-1]) for x in mSet2]
# additional check: these intervals cannot overlap!
def checkOverlap(intervalTupleList):
unionInterval = sympy.EmptySet # shall be empty here.
for start, end in intervalTupleList:
newInterval = sympy.Interval(start,end)
isOverlapped = (sympy.EmptySet == unionInterval.intersect(newInterval))
if isOverlapped:
print("INTERVAL", newInterval, "OVERLAPPED!")
return isOverlapped
unionInterval += newInterval
return False
assert not checkOverlap(mSetIntervals)
assert not checkOverlap(mSet2Intervals)

then pool and sort all the boundaries of converted intervals:

mPoints = mSetIntervalBoundaries + mSet2IntervalBoundaries
mPoints = list(set(mPoints))
mPoints.sort()

with sympy

1 2	# all the same

with less sympy

1 2	# all the same

Continual Interval Set Union Solvers

you must be able to explicitly point out different group index of different category. maybe you can just do it in all-new subcategories?

less exponential solution here?

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# basically the same example.
# assume no overlapping here.
import sympy
def unionToTupleList(myUnion):
unionBoundaries = list(myUnion.boundary)
unionBoundaries.sort()
leftBoundaries = unionBoundaries[::2]
rightBoundaries = unionBoundaries[1::2]
return list(zip(leftBoundaries, rightBoundaries))
def tupleSetToUncertain(mSet):
mUncertain = None
for start, end in mSet:
if mUncertain is None:
mUncertain = sympy.Interval(start,end)
else:
mUncertain += sympy.Interval(start,end)
typeUncertain = type(mUncertain)
return mUncertain, typeUncertain
def mergeOverlappedInIntervalTupleList(intervalTupleList):
mUncertain, _ = tupleSetToUncertain(intervalTupleList)
mUncertainBoundaryList = list(mUncertain.boundary)
mUncertainBoundaryList.sort()
mergedIntervalTupleList = list(zip(mUncertainBoundaryList[::2], mUncertainBoundaryList[1::2]))
return mergedIntervalTupleList
mSet = mergeOverlappedInIntervalTupleList([(0,1), (2,3)])
mSet2 = mergeOverlappedInIntervalTupleList([(0.5,1.5),(1.6,2.5)])
print("MSET", mSet)
print("MSET2", mSet2)
mSetCandidates = [mSet, mSet2]
mSetUnified = [x for y in mSetCandidates for x in y]
leftBoundaryList = set([x[0] for x in mSetUnified])
rightBoundaryList = set([x[1] for x in mSetUnified])
# they may freaking overlap.
# if want nearby-merge strategy, simply just expand all intervals, merge them with union and shrink the individual intervals inside union respectively.
markers = {"enter":{k:[] for k in leftBoundaryList}, "exit":{k:[] for k in rightBoundaryList}}
for index, mSetCandidate in enumerate(mSetCandidates):
leftBoundaryListOfCandidate = [x[0] for x in mSetCandidate]
rightBoundaryListOfCandidate = [x[1] for x in mSetCandidate]
for leftBoundaryOfCandidate in leftBoundaryListOfCandidate:
markers["enter"][leftBoundaryOfCandidate].append(index) # remap this thing!
for rightBoundaryOfCandidate in rightBoundaryListOfCandidate:
markers["exit"][rightBoundaryOfCandidate].append(index) # remap this thing!
# now, iterate through the boundaries of mSetUnified.
unifiedBoundaryList = leftBoundaryList.union(rightBoundaryList) # call me a set instead of a list please? now we must sort this thing
unifiedBoundaryList = list(unifiedBoundaryList)
unifiedBoundaryList.sort()
unifiedBoundaryMarks = {}
finalMappings = {}
# print("MARKERS", markers)
# breakpoint()
for index, boundary in enumerate(unifiedBoundaryList):
previousMark = unifiedBoundaryMarks.get(index-1, [])
enterList = markers["enter"].get(boundary,[])
exitList = markers["exit"].get(boundary,[])
currentMark = set(previousMark + enterList).difference(set(exitList))
currentMark = list(currentMark)
unifiedBoundaryMarks.update({index:currentMark})
# now, handle the change? or not?
# let's just deal those empty ones, shall we?
if previousMark == []: # inside it is empty range.
# elif currentMark == []:
if index == 0: continue # just the start, no need to note this down.
else:
finalMappings.update({"empty":finalMappings.get("empty",[])+[(unifiedBoundaryList[index-1], boundary)]})
# the end of previous mark! this interval belongs to previousMark
else:
key = previousMark.copy()
key.sort()
key = tuple(key)
finalMappings.update({key:finalMappings.get(key,[])+[(unifiedBoundaryList[index-1], boundary)]})
# also the end of previous mark! belongs to previousMark.
### NOW THE FINAL OUTPUT ###
finalCats = {}
for key, value in finalMappings.items():
# value is an array containing subInterval tuples.
value = mergeOverlappedInIntervalTupleList(value)
finalCats.update({key: value})
print("______________FINAL CATS______________")
print(finalCats)

sympy solution

sympy seems to provide support for discrete and continuous interval? will that save any damn time anyway? i’m afraid no? maybe there’s a way!

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sympy
def unionToTupleList(myUnion):
#  seriously wrong. this will fuck up.
unionBoundaries = list(myUnion.boundary)
unionBoundaries.sort()
leftBoundaries = unionBoundaries[::2]
rightBoundaries = unionBoundaries[1::2]
return list(zip(leftBoundaries, rightBoundaries))
def tupleSetToUncertain(mSet):
mUncertain = None
for start, end in mSet:
if mUncertain is None:
mUncertain = sympy.Interval(start,end)
else:
mUncertain += sympy.Interval(start,end)
typeUncertain = type(mUncertain)
return mUncertain, typeUncertain
# borrowed from above code.
def mergeOverlappedInIntervalTupleList(intervalTupleList):
mUncertain, _ = tupleSetToUncertain(intervalTupleList)
mUncertainBoundaryList = list(mUncertain.boundary)
mUncertainBoundaryList.sort()
#  print(mUncertain)
#  print(mUncertainBoundaryList)
mergedIntervalTupleList = list(zip(mUncertainBoundaryList[::2], mUncertainBoundaryList[1::2]))
# print(mergedIntervalTupleList)
return mergedIntervalTupleList
mSet = [(0,1), (2,3)]
mUncertain, typeUncertain = tupleSetToUncertain(mSet)
unrolledMSet = list(mUncertain.boundary)
# can be either sympy.sets.sets.Interval of sympy.sets.sets.Union
mSet2 = [(0.5,1.5),(1.6,2.5)]
mUncertain2, typeUncertain2 = tupleSetToUncertain(mSet2)
unrolledMSet2 = list(mUncertain2.boundary)
print("MSET", mSet)
print("MSET2", mSet2)
############################################################
# hypothetical mSet2 and mUncertain2! please complete the hypothetical shit and make it runnable!
def checkCommon(subInterval, masterInterval):
return subInterval == sympy.Intersection(subInterval, masterInterval)
mUncertains = [mUncertain, mUncertain2]
subIntervals = list(set(unrolledMSet2 + unrolledMSet))
subIntervals.sort()
subIntervals = zip(subIntervals[:-1], subIntervals[1:])
subIntervals = list(subIntervals)
#  breakpoint()
# for subIntervals, it's still not real interval but tuple at above line.
reversedCats = {}
import functools
subIntervalUnion = functools.reduce(lambda a,b: a+b, mUncertains)
for subIntervalIndex, (start, end) in enumerate(subIntervals):
subIntervalCandidate = sympy.Interval(start, end)
reverseIndex = [] # there must be at least one such index.
for index, uncertainCandidate in enumerate(mUncertains):
if checkCommon(subIntervalCandidate, uncertainCandidate):
reverseIndex.append(index) # this is the index of the in-common set of the original set list
reversedCats.update({subIntervalIndex:reverseIndex}) # need to sort and index? or not to sort because this is already done?
normalCats = {}
for k,v in reversedCats.items():
v.sort()
v = tuple(v)
normalCats.update({v:normalCats.get(v, [])+[k]})
# we only get interval, not the actural union period!
# how to get interval elements out of union structure for hell sake?
finalCats = {}
for k,v in normalCats.items():
# now k is the original set index list, representing belonging of the below union.
#  print(subIntervals)
#  print(index)
#  print(v)
#  breakpoint()
mFinalUnionCandidate = [subIntervals[index] for index in v]
## REPLACED ##
# mFinalUnionCandidate, _ = tupleSetToUncertain(mFinalUnionCandidate)
##### union to tuple list, could be replaced #####
#mFinalUnionCandidateBoundaryList = list(mFinalUnionCandidate.boundary)
#left_bounds, right_bounds = mFinalUnionCandidateBoundaryList[0::2],mFinalUnionCandidateBoundaryList[1::2] # check it dammit! not sure how to step the list properly?
#mFinalIntervalListCandidate = list(zip(left_bounds, right_bounds))
# mFinalIntervalListCandidate = unionToTupleList(mFinalUnionCandidate)
##### union to tuple list, could be replaced #####
## REPLACED ##
# print("M_FINAL_UNION_CANDIDATE",mFinalUnionCandidate)
mFinalIntervalListCandidate = mergeOverlappedInIntervalTupleList(mFinalUnionCandidate)
# print("M_FINAL_INTERVAL_LIST_CANDIDATE", mFinalIntervalListCandidate)
# breakpoint()
finalCats.update({k:mFinalIntervalListCandidate.copy()})
# this whole calculation could just be exponential. goddamn it?
# before that, we need to get the "empty" out. but is that really necessary? i think it is, as an important feature.
#  subIntervalsStart, subIntervalsEnd = subIntervals[0][0], subIntervals[-1][-1]
#
#  relativeCompleteInterval = sympy.Interval(subIntervalsStart, subIntervalsEnd)
#
# subIntervalUnion
#  emptyIntervalUnion = relativeCompleteInterval - subIntervalUnion # really uncertain if it is just a union or not.
#  emptyIntervalTupleList = unionToTupleList(emptyIntervalUnion)
#
#  finalCats.update({"empty":emptyIntervalTupleList})
finalCats.update({"empty":finalCats[()]})
del finalCats[()]
print("_____FINAL CATS_____")
print(finalCats)

2022-08-20

Ocr Tools

tesseract

tesseract.js with 100+ language support, still need to predefine language type

chineseocr, with arbitrary text direction and rubust handwriting recognization support

lightweight chinese ocr model

efficient ocr python lib based on tr

paddleocr

easyocr

pearocr client side webpage/browser based ocr

notes on macbook air

this damn thing sucks, in every aspect. i am getting tired of it.

the body position of using this thing

raise my legs with multiple pillows, put this thing on my hip and lean on a triangular shaped pile of pillows.

still not ideal but pretty close. in order to make this laptop not sliping down my hip i need to fill the gap between laptop and my belly with clothes. need to support my arms with some toys.

2022-08-18

A Good/Bad Proposal On V2Ray

clash has relay config option which functions like proxychains.

suggest to enable multiple v2ray client/servers which talk to each other but only visit the network with one single outbound. maybe like the onion router.

offline backup

schedule pyjom on alpharetta backup to disk every 12 hours
set a notice to let me execute time machine backup every 1 week (next scheduled backup: thu aug 18)

online backup

send systemwide notification if aliyun disk token expires, with reacquiring method broadcasted
schedule pyjom on alpharetta backup to cloud disks every 12 hours

Blog of James Brown

2022-08-25

Scan This Picture And Index The Whole Video/Document/Ppt/Textbook!

non-max suppression in combining similar bounding boxes

basically greek letters

kindly reminders

search by image instead of cranking latex out

image search libraries

get the latex out

first detect the location of math formula

next find the tool for picture to latex conversion

mask latex area and get conventional things out

search the formula

2022-08-23

关于人类发展规律和需求的随想

2022-08-23

递归搜索启发式搜索

collaborative filtering, recommendation engine

random search libraries

heuristic search libraries

find ‘random’ videos of certain topic:

2022-08-22

Generating 3D Models From Images With The 3D-Moments Tool

2d转3d 图片生成3d模型

2022-08-22

Hardware Simulator

2022-08-22

连续区间离散区间从离散数据中获得离散区间交并补

combining similar/nearby bounding boxes, suppressing near duplicate bounding boxes over short time

Discrete Interval Set Union Solvers

with sympy

with less sympy

Continual Interval Set Union Solvers

less exponential solution here?

sympy solution

2022-08-20

Ocr Tools

2022-08-19

Macbook Air Usage Notes

notes on macbook air

the body position of using this thing

2022-08-18

A Good/Bad Proposal On V2Ray

2022-08-18

backup schedules

offline backup

online backup

Links

Blog of James Brown

2022-08-25 Scan This Picture And Index The Whole Video/Document/Ppt/Textbook!

non-max suppression in combining similar bounding boxes

basically greek letters

kindly reminders

search by image instead of cranking latex out

image search libraries

get the latex out

first detect the location of math formula

next find the tool for picture to latex conversion

mask latex area and get conventional things out

search the formula

2022-08-23 关于人类发展规律和需求的随想

2022-08-23 递归搜索 启发式搜索

collaborative filtering, recommendation engine

random search libraries

heuristic search libraries

how to find trending topics or videos?

find ‘random’ videos of certain topic:

2022-08-22 Generating 3D Models From Images With The 3D-Moments Tool

2d转3d 图片生成3d模型

2022-08-22 Hardware Simulator

2022-08-22 连续区间 离散区间 从离散数据中获得离散区间 交并补

combining similar/nearby bounding boxes, suppressing near duplicate bounding boxes over short time

Discrete Interval Set Union Solvers

with sympy

with less sympy

Continual Interval Set Union Solvers

less exponential solution here?

sympy solution

2022-08-20 Ocr Tools

2022-08-19 Macbook Air Usage Notes

notes on macbook air

the body position of using this thing

2022-08-18 A Good/Bad Proposal On V2Ray

2022-08-18 backup schedules

offline backup

online backup

Links

2022-08-25

Scan This Picture And Index The Whole Video/Document/Ppt/Textbook!

2022-08-23

关于人类发展规律和需求的随想

2022-08-23

递归搜索启发式搜索

2022-08-22

Generating 3D Models From Images With The 3D-Moments Tool

2022-08-22

Hardware Simulator

2022-08-22

连续区间离散区间从离散数据中获得离散区间交并补

2022-08-20

Ocr Tools

2022-08-19

Macbook Air Usage Notes

2022-08-18

A Good/Bad Proposal On V2Ray

2022-08-18

backup schedules