BigSnarf blog

Infosec FTW

Monthly Archives: November 2017


MNIST dataset:

class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.MaxPool2d(kernel_size=3, stride=2),
self.classifier = nn.Sequential(
nn.Linear(256 * 6 * 6, 4096),
nn.Linear(4096, 4096),
nn.Linear(4096, num_classes),
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x

view raw


hosted with ❤ by GitHub

Click to access 1710.07035.pdf


Holder for future CapsNet work

Semantic Segmentation using Adversarial Networks

Click to access 1611.08408.pdf



[1] A. Arnab, S. Jayasumana, S. Zheng, and P. Torr. Higher order conditional random fields in deep neural networks. In ECCV, 2016.

[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In ICLR, 2015.

[3] G. Csurka, D. Larlus, and F. Perronnin. What is a good evaluation measure for semantic segmentation? In BMVC, 2013.

[4] E. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a Laplacian pyramid of adversarial networks. In NIPS, 2015.

[5] A. Dosovitskiy, J. Springenberg, and T. Brox. Learning to generate chairs with convolutional neural networks. In CVPR, 2015.

[6] M. Everingham, S. Ali Eslami, L. van Gool, C. Williams, J. Winn, and A. Zisserman. The PASCALvisual object classes challenge: A retrospective. IJCV, 111(1):98–136, 2015.

[7] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. PAMI, 35(8):1915–1929, 2013.

[8] J. Gauthier. Conditional generative adversarial nets for convolutional face generation. Unpublished, .

[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.

[10] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In ICCV, 2009.

[11] D. Grangier, L. Bottou, and R. Collobert. Deep convolutional networks for scene parsing. In ICML Deep Learning Workshop, 2009.

[12] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In ICCV, 2011.

[13] P. Kohli, L. Ladický, and P. Torr. Robust higher order potentials for enforcing label consistency. IJCV, 82(3):302–324, 2009.

[14] P. Krähenbühl and V. Koltun. Parameter learning and convergent inference for dense random fields. In ICML, 2013.

[15] G. Lin, C. Shen, A. van den Hengel, and I. Reid. Efficient piecewise training of deep structured models for semantic segmentation. In CVPR, 2016.

[16] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.

[17] D. Martin, C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using local brightness, color, and texture cues. PAMI, 26(5):530–549, 2004.

[18] M. Mathieu, C. Couprie, and Y. LeCun. Deep multi-scale video prediction beyond mean square error. In ICLR, 2016.

[19] M. Mirza and S. Osindero. Conditional generative adversarial nets. In NIPS deep learning workshop, 2014.

[20] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, 2015.

[21] H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In ICCV, 2015.

[22] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, and A. Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.

[23] P. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, 2014.

[24] P. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollár. Learning to refine object segments. In ECCV, 2016.

[25] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016.

[26] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. In ICML, 2016.

[27] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, 2015.

[28] S. Roweis, L. Saul, and G. Hinton. Global coordination of local linear models. In NIPS, 2002.

[29] S. Saxena and J. Verbeek. Convolutional neural fabrics. In NIPS, 2016.

[30] A. Schwing and R. Urtasun. Fully connected deep structured networks. Arxiv preprint, 2015.

[31] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.

[32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.

[33] D. Tarlow and R. Zemel. Structured output learning with high order loss functions. In AISTATS, 2012.

[34] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.

[35] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr. Conditional random fields as recurrent neural networks. In ICCV, 2015.

[36] Y. Zhou, X. Hu, and B. Zhang. Interlinked convolutional neural networks for face parsing. In International Symposium on Neural Networks, 2015.

Blur affects on Neural Network preception

Screen Shot 2017-11-22 at 10.45.29 AM

Image quality is an important practical challenge that is often overlooked in the design of machine vision systems.

Commonly, machine vision systems are trained and tested on high quality image datasets, yet in practical applications the input images can not be assumed to be of high quality. Recently, deep neural networks have obtained state-of-the-art performance on many machine vision tasks.

In this paper we provide an evaluation of 4 state-of-the-art deep neural network models for image classification under quality distortions. We consider five types of quality distortions: blur, noise, contrast, JPEG, and JPEG2000 compression.

We show that the existing networks are susceptible to these quality distortions, particularly to blur and noise. These results enable future work in developing deep neural networks that are more invariant to quality distortions.

Click to access 1604.04004.pdf

Click to access 1612.01227.pdf






Screen Shot 2017-11-22 at 11.55.20 AM

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird’s eye view projection.

Screen Shot 2017-11-22 at 11.29.31 AM

In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network.

Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections.

Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.

Apple version:




Reality gap in robotic vision

The difficulty of transferring simulated experience into the real world is often called the “reality gap.” The reality gap is a subtle but important discrepancy between reality and simulation that prevents simulated robotic experience from directly enabling effective real-world performance.

Visual perception often constitutes the widest part of the reality gap: while simulated images continue to improve in fidelity, the peculiar and pathological regularities of synthetic pictures, and the wide, unpredictable diversity of real-world images, makes bridging the reality gap particularly difficult when the robot must use vision to perceive the world, as is the case for example in many manipulation tasks.


Context aware threat hunting AI

3d travel and navigation planning, concept

Morning aware, location, aware, application aware, pattern aware (high dimension coincidence)

Basic items like checking user logins of ID and password with context aware ML will help SOC analysts.

Every morning 10,000 employees login into their workstations. They enter the building and come into each office area each morning using their RFID badges. They sit down at specific desktops and login. Laptop users will hit specific WiFi access points.

On Monday morning some forget their passwords or had password change the week before. These users can get buckets into risky behavior for the failures. Most will enter their routine of getting their coffee and come back to their workstations. Users will check their email and open their calendars. Users will check slack.  Mostly predictable behaviors.

All of these behaviors are easily logged and can be eliminated as threat vectors quite easily. Add video analysis and facial recognition, chat behaviour, and response analysis for both email and slack, and you can be pretty confident the right person is using the right resources.

I haven’t discussed IP addresses or ports. How about asking the user if you are really unsure? Slack message, confirmation from peers?

Robots need to hear and listen

Vision stuff TODO reading

NLP Neural Networks

CapsuleNet – CapsNet – Capsule Network



“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discriminatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.”

Dynamic routing between capsules

View at

Matrix capsules with EM routing

View at

View at